Skip to main content
You can create and manage Soperator clusters in Managed Service for Soperator in the web console. A cluster includes login, controller and worker nodes, and provides a full Slurm environment for batch and interactive workloads.

Prerequisites

  • If you need worker nodes with GPUs, make sure that you have capacity block groups that reserve GPUs.
  • Make sure you are in a group that has at least the editor role within your tenant; for example, the default editors group. You can check this in the Administration → IAM section of the web console.
  • Generate at least one SSH key pair to connect to Slurm login nodes as the default root user:
    If you do not have an SSH key pair, generate it on your local machine:
    1. In the terminal, go to the ~/.ssh directory:
      cd ~/.ssh
      
    2. Create an SSH key pair:
      ssh-keygen -t ed25519 -C "<comment>"
      
      -C "<comment>" is optional but it helps distinguish the key from others.
    3. At the prompt that appears, enter the following information:
      • Name of the file where the key should be stored.
      • Passphrase for the key. Press Enter if you do not want to use a passphrase.
    4. Get the contents of the generated public key:
      cat <file_name>.pub
      
      Use the file name that you specified during the key pair creation.

How to create a cluster

  1. In the sidebar, go to https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/sidebar/compute.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=b91340217b08a1456d88ae0347f281d1 Compute → Soperator.
  2. Click Create cluster.
  3. In the Overview, configure the cluster’s general parameters:
    1. Enter the cluster name.
    2. Add one or more SSH public keys (ssh-ed25519 AAA***) to access the login node.
  4. Configure node sets. A cluster must have a login node set and at least one worker node set.
    1. For the login node set, specify a number of nodes.
    2. For each worker node set, specify the following: If you already have a worker node set and would like to create another one with the same configuration, next to the worker set, click https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/button-vellipsis.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=e80b8e57c43bfd117679262e6a1334ad → Clone node set.
    Each cluster also contains service nodes: controller nodes, accounting nodes and Soperator system nodes. They are subject to billing and quotas in the same way as login and worker nodes.
  5. Add visible and, optionally, hidden partitions. You can use the default partitions or define your own ones. Partitions group nodes into logical (and possibly overlapping) sets and define how workloads are scheduled on those node sets. See details in the Slurm quickstart on partitions. In Managed Soperator, hidden partitions are not listed by Slurm CLI tools as available (per the Hidden parameter in slurm.conf), but you can create and manage them in the web console and other Nebius AI Cloud interfaces. For each partition, specify:
    • PartitionName: A unique partition name that you will use when submitting jobs.
    • Nodes: The worker node sets that the partition can schedule jobs on. A partition can include one or more node sets, and a node set can belong to more than one partition.
    • PriorityTier: Determines how Soperator prioritises partitions when resources are limited. A higher partition priority means that more jobs from this partition are favored for the same resources.
    • DefaultTime: The default time limit for jobs submitted to the partition, in HH:MM:SS format. Jobs inherit this limit unless they have a different time in their submission settings.
    • DefMemPerNode: The default amount of memory available to each node for jobs scheduled in the partition. Must not exceed the node capacity.
    • PreemptMode: Preemption mode controls what happens to currently running jobs when higher priority jobs require resources.
  6. Add volumes. A cluster can include shared, local and memory volumes.
    • Сluster volumes are created per cluster and are available to all node sets.
    • Shared volumes are created per project and are available to all node sets.
    • Local volumes are created per node and store temporary or runtime data.
    • Memory volumes store data in RAM. You cannot change them.
    For more information about cluster and shared volumes, see Types of storage volumes in Compute.
  7. Review the configuration on the Review page and click Create cluster.

What’s next

How to delete a cluster

When you delete a cluster, all data stored on its nodes and volumes is permanently removed. If you want to stop using the cluster temporarily and save costs, see stop and start it.
  1. In the sidebar, go to Managed Soperator.
  2. In the list of clusters, find the one that you want to delete.
  3. Next to the cluster, click https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/button-vellipsis.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=e80b8e57c43bfd117679262e6a1334ad → Delete.
  4. Enter the cluster name to confirm and click Delete cluster.

InfiniBand and InfiniBand Trade Association are registered trademarks of the InfiniBand Trade Association.