Skip to main content
Serverless AI jobs run container images as one-off or scheduled batch workloads. They are suitable for training, fine-tuning and data processing where you want to use computing resources only to perform a task and stop when the task is done. Each job runs on a container over a Compute virtual machine that is billed only while the job is running.

How to create a job

To run a container image as a batch workload for training, fine-tuning or data processing, create a job.
To create a job, run the following command:
nebius ai job create \
   --name <job_name> \
   --image <image:tag> \
   --registry-username <username> \
   --registry-password <password> \
   --container-command "<command>" \
   --args <arguments> \
   --env <key=value> \
   --working-dir <absolute_path> \
   --timeout <duration> \
   --platform <platform_ID> \
   --preset <preset> \
   --disk-size <size> \
   --volume <source:container_path|source:container_path:mode> \
   --shm-size <size> \
   --ssh-key "<SSH_public_key>"
nebius ai job create \
  --name training-job \
  --image pytorch/pytorch:latest \
  --container-command "python train.py --epochs 5" \
  --args "--lr 0.01" \
  --env EPOCHS=5 \
  --platform gpu-h100-sxm \
  --preset 1gpu-16vcpu-200gb \
  --disk-size 200Gi \
  --shm-size 16Gi \
  --timeout 3h \
  --subnet-id vpcsubnet-e*** \
  --ssh-key "$(cat ~/.ssh/id_rsa.pub)"
In the command, specify the following parameters:
  • Job settings:
    • --name: Application (job) name.
    • --image: Container image in image:tag format. Use a public registry or provide --registry-username and --registry-password for private registries.
    • --registry-username (optional): Username for private container registry.
    • --registry-password (optional): Password for private container registry.
    • --container-command (optional): Entrypoint command for the job container.
    • --args (optional): Override container arguments passed to the entrypoint.
    • --env (optional): Environment variables in key=value format. Repeat for multiple variables.
    • --working-dir (optional): Working directory (absolute path).
    • --timeout (optional): Job timeout (e.g., 2h30m10s, 24h). Minimum: 1h, maximum: 168h. Default: 24h.
  • Computing resources:
    • --platform (optional): Platform of compute resources (e.g., gpu-h100-sxm, gpu-l40s-d). Default: gpu-h100-sxm in eu-north1, gpu-h200-sxm elsewhere. See Types of virtual machines and GPUs.
    • --preset (optional): Preset for the platform (e.g., 1gpu-16vcpu-200gb). Default: minimum available preset. See Presets for GPU platforms.
  • Storage:
    • --disk-size (optional): Disk size (e.g., 100Gi, 500Gi, 1Ti). Default: 250Gi. See how disk performance depends on disk size.
    • --volume (optional): Volume mount in source:container_path or source:container_path:mode format. Repeat for multiple volumes. For example:
      --volume 'computefilesystem-e***:/input:ro' \
      --volume 'storagebucket-e***:/output:rw'
      
      Use for job results and checkpoints. Volumes persist if the job is recreated after a maintenance event.
    • --shm-size (optional): Size of /dev/shm (e.g., 64Mi, 128Mi, 1Gi). Default: 16Gi.
  • Access:
    • --ssh-key (optional): Comma-separated list of SSH keys to access the container over VM by SSH. When you add an SSH key, a public dynamic IP address is assigned to the job. Before you add the key, check the quota on the number of public IP addresses in the web console.
  • Other parameters:
    • --parent-id (optional): Project ID. If omitted, taken from the CLI profile.
    • --subnet-id (optional): Subnet ID. Required if the project has multiple subnets.
The job creation usually takes a few minutes. Jobs run until the workload finishes. When the job completes successfully or fails, the container over VM is deleted automatically. If you mounted volumes, they will remain, and you should delete them manually.

How to check job logs

To view logs from a running or completed job, run:
nebius ai job logs <job_ID> --follow
You can add the following options to control the output:
  • --follow or -f: Stream logs in real time.
  • --since <value>: Show logs starting from the specified time. For example, 1h (from 1 hour ago), 30m (from 30 minutes ago) or 2024-01-01 (from that date).
  • --tail <value>: Number of recent lines to show in the output.
  • --timestamps: Include timestamps in the output.
  • --until <value>: Show logs up to the specified time. For example, 1h (up to 1 hour ago), 30m (up to 30 minutes ago) or 2024-01-01 (up to that date).

How to cancel a job

If you do not need a job to continue running, you can cancel it. The jobs that finish with COMPLETED status are canceled automatically.
  1. List jobs:
    nebius ai job list
    
    In the output, copy the ID of the required job.
  2. To cancel a job, run:
    nebius ai job cancel <job_ID>
    
Canceling a job immediately stops the container over VM and deletes the container disk. The job remains in the list of jobs. Mounted volumes are retained. You can remove the mounted volumes manually. See the guides on deleting a filesystem and deleting a bucket.
If you need to remove any record about the job from the job list, delete the job instead of canceling it.

How to delete a job

If you want to delete a record about the job, use the delete command.
  1. List jobs:
    nebius ai job list
    
    In the output, copy the ID of the required job.
  2. Delete the job:
    nebius ai job delete <job_ID>
    
When the job is deleted, it disappears from the list of jobs. If a job is running, delete cancels the job first.If the job uses additional volumes, they are not deleted with it. You can remove the mounted volumes manually. See the guides on deleting a filesystem and deleting a bucket.