Working with InfiniBand™ topology of a GPU cluster

InfiniBand topology can help you increase performance of multi-VM jobs in GPU clusters.

How to get InfiniBand topology of a GPU cluster

After you create a GPU cluster and add VMs to it, you can view the InfiniBand topology of this cluster and its VMs.

Cluster topology
VM topology

To get the topology for all VMs in a GPU cluster, run the following command:

nebius compute gpu-cluster get --id computegpucluster-***

In the --id parameter, specify the GPU cluster ID.The output example is the following:

...
status:
  infiniband_topology_path:
    instances:
      - instance_id: computeinstance-***rnqz
        path:
          - ***27bf
          - ***bb9b
          - ***b7ad
      - instance_id: computeinstance-***pepp
        path:
          - ***27bf
          - ***bb9b
          - ***e1ff
      - ...

To get the topology for an individual VM, run the following command:

nebius compute instance get --id computeinstance-***

In the --id parameter, specify the ID of the VM attached to the GPU cluster.The output example is the following:

...
status:
  infiniband_topology_path:
    path:
      - ***27bf
      - ***bb9b
      - ***e1ff

The path parameter shows components of the InfiniBand network layers:

path:
  - ***27bf # 1st network layer (InfiniBand fabric)
  - ***bb9b # 2nd network layer (point of delivery, POD)
  - ***e1ff # 3rd network layer (scalable unit, SU)

How to create the Slurm topology configuration

To set up topology in Slurm and run jobs for ML workloads, you need the topology.conf file. This file shows the network hierarchy: how nodes are interconnected, what layers they are located at and what switches are used. The topology.conf file represents InfiniBand topology. You can create this file based on the information about a given GPU cluster. To create the file, run one of the scripts from the Nebius AI Cloud solution library on GitHub.

InfiniBand and InfiniBand Trade Association are registered trademarks of the InfiniBand Trade Association.

​How to get InfiniBand topology of a GPU cluster

​How to create the Slurm topology configuration

How to get InfiniBand topology of a GPU cluster

How to create the Slurm topology configuration