> ## Documentation Index
> Fetch the complete documentation index at: https://docs.nebius.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Soperator cluster architecture

Soperator deploys Slurm to Kubernetes® clusters. In a Soperator cluster, Slurm nodes, storage and other components are Kubernetes resources: Pods, PersistentVolumes, etc.

The diagram below outlines the architecture of a Soperator cluster:

<Frame>
  <img src="https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/slurm-soperator/soperator-arch.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=3e8b01d9a537b075dd8b2c0393d47f4e" alt="soperator-architecture" width="1232" height="633" data-path="_assets/slurm-soperator/soperator-arch.svg" />
</Frame>

## Cluster specification and Slurm configuration

When Soperator is installed in a Kubernetes cluster, it adds the SlurmCluster [custom resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) to it. This resource contains the specification of the Slurm cluster deployed in the Kubernetes cluster. The Slurm operator itself is a Pod that uses the SlurmCluster specification to create and reconcile the Kubernetes resources in the Slurm cluster, such as login, worker and controller nodes, and storage resources.

The configuration files of Slurm itself (`slurm.conf`, `gres.conf`, `cgroup.conf`, `plugstack.conf`, etc.) are [Kubernetes ConfigMaps](https://kubernetes.io/docs/concepts/configuration/configmap/) controlled by the Slurm operator.

## Nodes

In Soperator clusters, all Slurm nodes are Kubernetes Pods. The main [types of Slurm nodes](https://slurm.schedmd.com/quickstart_admin.html#nodes) in Soperator clusters are the following:

* [Login nodes](/slurm-soperator/overview/architecture#login-node) provide users with access to the cluster.
* [Worker nodes](/slurm-soperator/overview/architecture#worker-node) execute Slurm jobs.
* [Controller nodes](/slurm-soperator/overview/architecture#controller-node) manage scheduling and orchestration.

For simplicity, there are nodes that are not represented on the diagram in this article, for example, [DBD (database daemon) nodes](https://slurm.schedmd.com/quickstart_admin.html#dbd) for accounting, nodes that export metrics, nodes that other Kubernetes operators manage for backups and auto-healing.

### Login nodes

To work with a Slurm (submit jobs, check their status, write `sbatch` scripts and prepare data for them, etc.), users connect to its *login nodes*. The `sshd` daemon runs on every login node.

Soperator balances load between login nodes — each time a user connects to the cluster via SSH, they are directed to a random login node.

### Worker nodes

*Worker nodes*, also known as *compute nodes*, perform computations for Slurm jobs. The [slurmd](https://slurm.schedmd.com/slurmd.html) daemon runs on every worker node. It monitors, launches and terminates jobs.

For more information on how to work with login and worker nodes, see [Connecting to login and worker nodes](/slurm-soperator/clusters/connect).

### Controller nodes

*Controller nodes* orchestrate Slurm activities, such as job queuing, monitoring node states and allocating resources. The central management daemon, [slurmctld](https://slurm.schedmd.com/slurmctld.html), runs on all controller nodes.

## Persistent storage

Soperator's main storage feature is its *shared root filesystem*. It is mounted to all login and worker nodes in a special way — you see it as the root directory (`/`) in your SSH sessions and Slurm jobs. This helps maintain the traditional Slurm user experience where you work with the entire root filesystem on each node.

The filesystem is shared, which means you do not need to keep it identical across nodes manually. When you make changes to the filesystem on one node, these changes automatically show up on other nodes.

The shared root filesystem is implemented as a Kubernetes [PersistentVolume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) (PV) that ensures data is preserved when nodes restart.

Soperator also uses PVs for system needs, like storing cluster and controller states, etc.

## Ephemeral storage

[Local SSD disks](/compute/storage/types#local-ssd-disks) are available on [supported platforms, presets and regions](/compute/storage/local-disks#availability). Unlike the shared root filesystem, local SSD disks are added to an individual node and are not shared across the cluster.

Use local SSD disks for data that can be recreated and benefits from high performance and low latency, such as scratch space, caches and intermediate files created by Slurm jobs. For durable or shared data, consider using persistent storage.

## See also

For more information about the Soperator cluster architecture, see:

* [Architecture](https://github.com/nebius/soperator/blob/dev/docs/architecture.md) in Soperator's GitHub repository.
* [Explaining Soperator, Nebius' open-source Kubernetes operator for Slurm](https://nebius.com/blog/posts/soperator-in-open-source-explained) in the Nebius blog.
