Skip to main content
Soperator is an open-source solution from Nebius that allows you to consolidate Slurm and Kubernetes into a single infrastructure. You can manage nodes by using standard Kubernetes resources and run machine learning experiments by using Slurm. You can use Soperator in the following different environments: InfiniBand and InfiniBand Trade Association are registered trademarks of the InfiniBand Trade Association.

About Slurm and Soperator

Get acquainted with key Soperator features

Deploying Soperator clusters

Choose a deployment method that fits your use case best

Creating clusters in Managed Service for Soperator

Get started with Slurm and Soperator in Nebius AI Cloud with minimum effort

Connecting to Slurm nodes

Connect to login and worker nodes, so you can start managing machine learning workloads

Running batch jobs

Define, configure and launch your workloads in Slurm

Managing jobs and queue

View and control current and historical jobs

Running the all-reduce NCCL test

Check the NVLink and InfiniBand™ performance between GPUs on one or multiple nodes

Managing users

Create users, so they can connect to Slurm nodes

Monitoring job and node statuses

Get up-to-date information about Slurm jobs and nodes