- Reuse reserved GPU capacity across training and inference instead of buying pay-as-you-go capacity when demand shifts.
- Release or add specific worker nodes without stopping the entire Soperator cluster.
- Choose which nodes to deprovision, starting with idle nodes when possible.
- Move capacity on your own schedule, without waiting for support to resize the cluster.
Ephemeral nodes are available in Soperator clusters running version 3.0 or later. Nebius must also enable this feature in your cluster.
How ephemeral nodes work
Ephemeral nodes rely on a shared capacity block group. A capacity block reserves a fixed number of GPUs in a region. Workloads that use the same capacity block draw GPUs from that pool. In the typical setup, you have two Managed Service for Kubernetes clusters that share one capacity block:- A training cluster that runs Managed Service for Soperator.
- An inference cluster that runs inference workloads in one or more node groups.
Both clusters must use the same capacity block group and GPU platform. If you are not sure how your capacity is set up, check the Capacity block groups tab on the Limits page in the web console or contact your Nebius manager.
Deployment scenarios
How you move capacity depends on how your training and inference workloads are deployed. In both cases, GPUs move through the shared capacity block: worker nodes are released from Soperator, then consumed by an inference node group, or the other way around.Training and inference in separate Kubernetes clusters
In this scenario, one Managed Service for Kubernetes cluster hosts Soperator for training and a separate cluster hosts inference workloads in one or more node groups. Both clusters use the same capacity block group. To move capacity in this layout, see Moving capacity between training and inference workloads.Training and inference in the same Kubernetes cluster
Soperator and inference can also run in different node groups within a single Managed Service for Kubernetes cluster. You still release worker nodes from the Soperator node group and adjust the inference node group size in the same cluster. To move capacity in this layout, see Moving capacity between training and inference workloads.Slurm power management commands
In a Soperator cluster with ephemeral nodes enabled, you can provision and deprovision worker nodes with standard Slurm power management commands. Run them from a login node after you connect to the cluster.| Command | Behavior |
|---|---|
scontrol power up <node_list> | Provisions the specified nodes in the Soperator cluster if free GPUs are available in the capacity block. |
scontrol power down <node_list> | For each node, deprovisions the node when it is idle. Without the asap flag, power down has lower priority than starting new jobs from the queue, so Slurm may run queued jobs on the node before powering it down. |
scontrol power down asap <node_list> | Same as power down, but also drains the nodes so no new jobs are scheduled on them. Slurm waits only for the current job to finish (if any), then deprovisions the node. |
scontrol power down force <node_list> | Deprovisions the nodes immediately and cancels all jobs running on them. |
<node_list> in the commands above with a Slurm hostlist of worker node names, for example worker-10,worker-11, worker-[10,11] or worker-[0-3,5-8,13],worker-cpu-18. To check node names and states before you run a command, see How to monitor job and node statuses in a Soperator cluster.
If
scontrol power commands do not work, contact technical support and ask to upgrade the cluster to Soperator 3.0 or later and enable ephemeral nodes on the relevant worker node sets.Automatic node provisioning
Powered-down ephemeral nodes are automatically provisioned again when queued jobs need them. You can also trigger provisioning by submitting work withsrun or sbatch: for ephemeral CLOUD nodes, these commands run ResumeProgram, and Soperator creates the corresponding worker Pods if capacity is available.
If you want to release nodes and keep them powered down even when queued jobs target them, also drain the nodes:
power down command.