Skip to main content
You can group your virtual machines with GPUs into a GPU cluster. The cluster accelerates high-performance computing (HPC) tasks such as training and inference. These tasks require a lot of processing power that a single VM cannot provide. The GPU clusters are built with InfiniBand secure high-speed networking. Each GPU in a VM is connected through a network interface card (NIC) that provides 400 Gbps. As a compute VM for GPU clusters consists of 8 GPUs, the total bandwidth for a node is 3.2 Tbps. Nebius AI Cloud uses GPUDirect RDMA, an NVIDIA technology of remote direct memory access (RDMA) that allows data to flow directly between each GPU and its NIC, avoiding CPU, thus boosting the data exchange speed.

InfiniBand fabrics

Each GPU cluster is created in one of physical InfiniBand fabrics. This is where GPUs interconnected over InfiniBand are located. Each fabric has limited GPU capacity. When creating a GPU cluster, select an InfiniBand fabric for it. Take into account the type of GPUs you are going to use. For example, if you select fabric-7, you can only add NVIDIA® H200 NVLink with Intel Sapphire Rapids GPUs to this cluster. Available fabrics and corresponding regions (private regions are marked with *):
FabricGPU platformRegion
fabric-2NVIDIA® H100 NVLink with Intel Sapphire Rapids (gpu-h100-sxm)eu-north1
fabric-3NVIDIA® H100 NVLink with Intel Sapphire Rapids (gpu-h100-sxm)eu-north1
fabric-4NVIDIA® H100 NVLink with Intel Sapphire Rapids (gpu-h100-sxm)eu-north1
fabric-5NVIDIA® H200 NVLink with Intel Sapphire Rapids (gpu-h200-sxm)eu-west1
fabric-6NVIDIA® H100 NVLink with Intel Sapphire Rapids (gpu-h100-sxm)eu-north1
fabric-7NVIDIA® H200 NVLink with Intel Sapphire Rapids (gpu-h200-sxm)eu-north1
eu-north2-aNVIDIA® H200 NVLink with Intel Sapphire Rapids (gpu-h200-sxm)eu-north2
me-west1-aNVIDIA® B200 NVLink with Intel Emerald Rapids (gpu-b200-sxm-a)me-west1
uk-south1-aNVIDIA® B300 NVLink with Intel Granite Rapids (gpu-b300-sxm)uk-south1
us-central1-aNVIDIA® H200 NVLink with Intel Sapphire Rapids (gpu-h200-sxm)us-central1
us-central1-bNVIDIA® B200 NVLink with Intel Emerald Rapids (gpu-b200-sxm)us-central1
In most cases, you do not need to change the preselected fabric. We recommend that you create a GPU cluster in another fabric only if it is better suited for a different platform or if you experience capacity issues with an existing GPU cluster.

How to enable InfiniBand for VMs with GPUs

  1. Create a GPU cluster:
    1. In the sidebar, go to https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/sidebar/compute.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=b91340217b08a1456d88ae0347f281d1 Compute → GPU clusters.
    2. Click https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/plus.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=7c9efc69d65fc58db0eb73702fd81aa1 Create GPU cluster.
    3. On the page that opens, specify the cluster name. It should contain from 3 to 63 characters: lowercase letters, numbers and hyphens.
    4. Select the InfiniBand fabric.
    5. Click Create GPU cluster.
  2. Add VMs to the cluster. You can do it only when creating the VMs:
    All virtual machines added to the GPU cluster, including Managed Service for Kubernetes® nodes, must be in the same project.
    1. In the sidebar, go to https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/sidebar/compute.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=b91340217b08a1456d88ae0347f281d1 Compute → Virtual machines.
    2. Click https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/plus.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=7c9efc69d65fc58db0eb73702fd81aa1 Create virtual machine.
    3. On the page that opens, specify the VM’s details and select the GPU cluster name in the GPU cluster list.
You can also create a GPU cluster while creating the first VM in it:
  1. In the Computing resources section of the VM creation form:
    1. Select With GPU.
    2. Select a platform and a preset compatible with GPU clusters. The compatible platforms and presets:
      PlatformPresetsRegions
      NVIDIA® B300 NVLink with Intel Granite Rapids
      (gpu-b300-sxm)
      8gpu-192vcpu-2768gbuk-south1
      NVIDIA® B200 NVLink with Intel Emerald Rapids
      (gpu-b200-sxm)
      8gpu-160vcpu-1792gbus-central1
      NVIDIA® B200 NVLink with Intel Emerald Rapids
      (gpu-b200-sxm-a)
      8gpu-160vcpu-1792gbme-west1
      NVIDIA® H200 NVLink with Intel Sapphire Rapids
      (gpu-h200-sxm)
      8gpu-128vcpu-1600gbeu-north1, eu-north2, eu-west1, us-central1
      NVIDIA® H100 NVLink with Intel Sapphire Rapids
      (gpu-h100-sxm)
      8gpu-128vcpu-1600gbeu-north1
  2. In the Boot disk section of the VM creation form, select the boot disk for NVIDIA GPUs. For details, see Boot disk images for Compute virtual machines.

How to test the connection with the NCCL tests

To test InfiniBand performance in a Compute cluster, you can run the NVIDIA NCCL test in it. For instructions, see our tutorial on running distributed jobs with MPIrun: it uses the NCCL test as an example.

See also


InfiniBand and InfiniBand Trade Association are registered trademarks of the InfiniBand Trade Association.