How to add nodes with GPUs to a cluster
When creating a node group in a Managed Service for Kubernetes cluster, specify a virtual machine platform that supports GPUs:- Web console
- CLI
In the node group creation form (
Compute → Kubernetes → your cluster → Node groups → Create node group), under Computing resources:
- Select With GPU.
- Select a platform and a preset. For available platforms and presets, see Types of virtual machines and GPUs in Nebius AI Cloud and How to find out platforms and presets available in a project.
- Under GPU settings, keep the Install NVIDIA GPU drivers and other components option enabled.
- Under Drivers, select a CUDA driver version. For available driver versions, see GPU drivers and other components.
-
Under Operating system, select an OS. Available operating systems depend on the selected driver.
If you need to modify the NVIDIA device plug-in (for example, to enable multi-instance GPU), disable the Install NVIDIA GPU drivers and other components option. Then, manually install the GPU operator.
GPU drivers and other components
For node groups with GPUs, Managed Kubernetes offers boot disk images with GPU drivers and other components required for GPUs. You can specify Managed Kubernetes GPU images with--template-gpu-settings-drivers-preset. The preset determines the CUDA toolkit and NVIDIA driver series. Each preset has a default operating system (OS), you can optionally override it with --template-os.
| Driver preset | cuda12.8 | cuda13.0 | cuda12.4 |
|---|---|---|---|
| NVIDIA Data Center GPU Driver | 570.x | 580.x | 550.x |
| OS | ubuntu24.04 | ubuntu24.04 | ubuntu22.04 |
drivers_preset and os values are supported for your platform and Kubernetes version, check the compatibility matrix:
drivers_preset and os values to select the driver branch and, optionally, an operating system (OS) in a node group configuration. For instructions on how to specify these parameters when creating a node group, see How to add nodes with GPUs to a cluster.
How to change the driver preset
To change the driver preset for an existing node group, run:How to install the drivers and components on existing node groups
You can create a node group without the boot disk image. For example, you may opt not use the Install NVIDIA GPU drivers and other components option when you create the node group in the web console. In this case, you can choose one of the following options to install the drivers and components:- Create a new node group with the image and migrate your workloads to it (recommended) For instructions, see Moving workload from the existing node group.
-
Modify the node group to use the image
How to modify the node group
- CLI
Run the nebius mk8s node-group update command: -
Manually install NVIDIA operators
You can install Kubernetes operators from NVIDIA that manage components required for GPUs and their networking:
-
NVIDIA Network Operator
Installing NVIDIA Network Operator is required when at least one node group in the cluster does not use the boot disk image offered by Managed Kubernetes and satisfies any of the following conditions:
- The node group uses NVIDIA B200 GPUs.
- The node group is added to a GPU cluster for InfiniBand interconnection.
- NVIDIA GPU Operator Any cluster with at least one node group that has GPUs and does not use the boot disk image offered by Managed Kubernetes, must have NVIDIA GPU Operator installed.
With InfiniBand: GPU and network operators
-
Prepare your environment:
-
Configure kubectl, the Kubernetes CLI, to work with your cluster:
For more details, see How to connect to Managed Service for Kubernetes® clusters using kubectl.
-
Install Helm, the package manager for Kubernetes that we will use to install the operator:
For more ways to install, see the Helm documentation.
-
Configure kubectl, the Kubernetes CLI, to work with your cluster:
-
Install the NVIDIA Network Operator from the Nebius AI Cloud chart repository:
-
Verify that the NVIDIA Network Operator installed its components correctly. Get the
NICClusterPolicyinstance status:The output example is the following:Whilestate-OFEDisnotReady, you can check the driver installation logs: -
Install the NVIDIA GPU Operator from the Nebius AI Cloud chart repository:
GPUDirect RDMA is enabled by default and uses the recommended DMA-BUF Linux kernel module. For more command options, see the NVIDIA GPU Operator documentation.
-
Verify that the GPU driver is installed correctly.
Get the last log line from each DaemonSet that installs the driver:
If the last lines are
Done, now waiting for signal, the driver should work correctly.
Without InfiniBand: GPU operator
-
Prepare your environment:
-
Configure kubectl, the Kubernetes CLI, to work with your cluster:
For more details, see How to connect to Managed Service for Kubernetes® clusters using kubectl.
-
Install Helm, the package manager for Kubernetes that we will use to install the operator:
For more ways to install, see the Helm documentation.
-
Configure kubectl, the Kubernetes CLI, to work with your cluster:
-
Install the NVIDIA GPU Operator from the Nebius AI Cloud chart repository:
For more options, see the NVIDIA GPU Operator documentation.
-
Verify that the GPU driver is installed correctly.
Get the last log line from each DaemonSet that installs the driver:
If the last lines are
Done, now waiting for signal, the driver should work correctly.
-
NVIDIA Network Operator
Installing NVIDIA Network Operator is required when at least one node group in the cluster does not use the boot disk image offered by Managed Kubernetes and satisfies any of the following conditions:
Example: Using CUDA for vector addition
To test CUDA support in the cluster with GPU nodes and drivers installed on them, you can run a small CUDA application, which adds two vectors together:- Connect to the cluster using kubectl.
- Follow instructions in the NVIDIA GPU Operator documentation.
See also
- Interconnecting GPUs in a Managed Kubernetes cluster using InfiniBand
- Tutorial: Running NCCL tests in a cluster with InfiniBand-connected GPUs
- Creating and modifying node groups
InfiniBand and InfiniBand Trade Association are registered trademarks of the InfiniBand Trade Association.