You can use SkyPilot as a single, declarative job surface that runs your AI workloads across one or more Managed Service for Kubernetes clusters. SkyPilot picks the best cluster for each job based on hardware availability and the constraints in the task definition, and fails over across clusters when capacity is tight in the preferred one. The SkyPilot placement logic combines constraint matching with policy:Documentation Index
Fetch the complete documentation index at: https://docs.nebius.com/llms.txt
Use this file to discover all available pages before exploring further.
- With capability match, SkyPilot filters clusters by whether they meet the requested hardware and features, such as the GPU model, the number of GPUs per node, InfiniBand™ or a shared filesystem.
- With capacity chasing, SkyPilot chases capacity across other clusters or regions when the preferred cluster has insufficient capacity.
- With failover and retries, SkyPilot handles provisioning failures, such as preemptions or insufficient capacity, by automatically retrying with other matching clusters.
Costs
Nebius AI Cloud charges you for the following billing items:- Managed SkyPilot API Server (standalone application)
- Managed Kubernetes nodes
Steps
Install dependencies
- Make sure you have Python 3.10 or higher installed.
-
Install SkyPilot with Kubernetes and Nebius support:
Prepare infrastructure
-
Deploy the Managed SkyPilot API Server:
- In the Nebius AI Cloud console, go to
AI Services → SkyPilot.
- Enter a name for the application or keep the default one.
- Select a Platform and a Preset (vCPUs and RAM) for the API server virtual machine.
- Click Deploy application.
- In the Nebius AI Cloud console, go to
-
Connect to the SkyPilot API server. On the application page in the web console, click How to connect and copy the
sky api logincommand. Then run the command in your terminal: -
Check that SkyPilot can reach your project:
At this stage, no Managed Kubernetes clusters have been added yet, so the output looks similar to the following:You will run the same command again later to confirm that the contexts are picked up.
- Create at least one Managed Kubernetes cluster with a GPU node group. To demonstrate cross-cluster failover, create two or more clusters.
Add Managed Kubernetes clusters to SkyPilot
The Managed SkyPilot API Server auto-discovers all Managed Kubernetes clusters in the same project. You do not need to add a local kubeconfig or configure a service account.- Open the SkyPilot dashboard. On the application page in the web console, click How to connect and then click on the public endpoint URL.
- On the dashboard, go to the Infra tab and click Refresh. The dashboard lists the Managed Kubernetes clusters available to SkyPilot.
-
Verify that SkyPilot can access the clusters:
The output lists the enabled contexts:
-
(Optional) For detailed per-cluster and per-node GPU availability, run:
The output shows the available GPUs and per-node availability:
(Optional) Limit clusters which SkyPilot uses
By default, SkyPilot can place jobs on any Managed Kubernetes cluster it discovers. To restrict SkyPilot to a subset of clusters for every user of this Managed SkyPilot API Server, setkubernetes.allowed_contexts in the dashboard:
- In the SkyPilot dashboard, click Configuration.
-
In the Edit SkyPilot API Server Configuration textbox, paste the following YAML, listing the contexts in the order in which SkyPilot should evaluate them:
- Click Apply.
sky check kubernetes again.
Run a job
Decide how SkyPilot should choose the target Managed Kubernetes cluster:-
To let SkyPilot fail over across clusters, run
sky launchwithout specifying a cluster:SkyPilot picks the first context that satisfies the request and submits the job: -
To target a specific Managed Kubernetes cluster, set
--infra k8s/<context>:If the targeted cluster does not have the requested resources, SkyPilot returns an error:
--gpus parameter, set the node group platform, such as H100, B300 or L40S.
Both examples run a bash command as the entrypoint. You can also pass a YAML task definition instead. For examples, see the SkyPilot quickstart.
(Optional) Monitor jobs
To list all SkyPilot jobs created during this tutorial and their statuses, run:How to delete the created resources
Some of the created resources are chargeable. If you do not need them, delete these resources, so Nebius AI Cloud does not charge for them:-
Delete all SkyPilot jobs created during this tutorial:
- Delete Managed Kubernetes clusters.
-
If you no longer need the Managed SkyPilot API Server, delete it in the Nebius AI Cloud console. Go to
AI Services → SkyPilot, open the application, go to the Settings tab and click Delete application.
InfiniBand and InfiniBand Trade Association are registered trademarks of the InfiniBand Trade Association.