Prerequisites
- Reserve a capacity block group that your Soperator worker nodes and inference node group share.
-
Make sure your Soperator cluster runs version 3.0 or later and has ephemeral nodes enabled. Nebius enables this at cluster provisioning time; when you request the cluster, ask your Nebius manager or technical support to enable ephemeral nodes on the relevant worker node sets. If
scontrol powercommands do not work, ask support to upgrade the cluster or enable ephemeral nodes. -
Set up an inference node group that uses the same capacity block:
- Separate clusters: Create a Managed Service for Kubernetes cluster for inference workloads and add a node group to it.
- Same cluster: Add a node group for inference workloads to the cluster that already runs Soperator.
-
Make sure you are in a group that has at least the
editorrole within your tenant or project; for example, the defaulteditorsgroup. You can check this in the Administration → IAM section of the web console. - Generate an SSH key pair and set up access to a login node in the Soperator cluster.
How to move capacity from training to inference
When training nodes are idle but inference needs more GPUs, release nodes from Soperator and add them to the inference node group.- Connect to a login node in the Soperator cluster.
-
List worker nodes and their states:
Choose nodes that are
idleor that you are ready to drain. See node states for details. -
Release the chosen nodes from the Soperator cluster:
-
To deprovision nodes when they are idle, use plain
power down. Without theasapparameter, power down has lower priority than starting new jobs from the queue, so Slurm may run queued jobs on the node before powering it down: -
To drain nodes so no new jobs are scheduled, wait for the current job to finish (if any), and then deprovision the nodes:
-
To power down nodes immediately and cancel running jobs:
<node_list>in the commands above with a Slurm hostlist of worker node names, for exampleworker-10,worker-11,worker-[10,11], orworker-[0-3,5-8,13],worker-cpu-18. -
To deprovision nodes when they are idle, use plain
-
To prevent the nodes from powering back up automatically when queued jobs target them, drain them:
Use the same Slurm hostlist as in the
power downcommand. For details, see Automatic node provisioning. -
Wait until the nodes are powered down. Confirm their state with the following command:
Powered-down ephemeral nodes remain in the node list with a powered-down cloud state. They no longer run worker Pods.
- In the Managed Service for Kubernetes cluster that hosts your inference workloads, increase the inference node group size by the same number of nodes you released from Soperator. Use a node group that draws GPUs from the same capacity block group.
How to move capacity from inference to training
When inference traffic drops and you want to run training jobs on idle GPUs, scale down the inference node group and power worker nodes back on in Soperator.- In the Managed Service for Kubernetes cluster that hosts your inference workloads, reduce the inference node group size by the number of nodes you want to move to training. Wait until the nodes are removed and the GPUs are released to the capacity block.
- Connect to a login node in the Soperator cluster.
-
If you drained the nodes when you released them, resume them:
-
Power on worker nodes in Soperator:
Replace
<node_list>in the command above with a Slurm hostlist of worker node names to bring back, for exampleworker-10,worker-11,worker-[10,11], orworker-[0-3,5-8,13],worker-cpu-18. Soperator creates worker Pods for the requested nodes if enough free GPUs remain in the capacity block. Alternatively, submit a job withsrunorsbatchthat needs those nodes; Slurm may power them on automatically. For details, see Automatic node provisioning. -
Confirm that the nodes are available for scheduling:
The powered-on nodes should move toward the
idlestate when they are ready for new jobs.