Soperator cluster architecture

Soperator deploys Slurm to Kubernetes® clusters. In a Soperator cluster, Slurm nodes, storage and other components are Kubernetes resources: Pods, PersistentVolumes, etc. The diagram below outlines the architecture of a Soperator cluster:

Cluster specification and Slurm configuration

When Soperator is installed in a Kubernetes cluster, it adds the SlurmCluster custom resource to it. This resource contains the specification of the Slurm cluster deployed in the Kubernetes cluster. The Slurm operator itself is a Pod that uses the SlurmCluster specification to create and reconcile the Kubernetes resources in the Slurm cluster, such as login, worker and controller nodes, and storage resources. The configuration files of Slurm itself (slurm.conf, gres.conf, cgroup.conf, plugstack.conf, etc.) are Kubernetes ConfigMaps controlled by the Slurm operator.

Nodes

In Soperator clusters, all Slurm nodes are Kubernetes Pods. The main types of Slurm nodes in Soperator clusters are the following:

Login nodes provide users with access to the cluster.
Worker nodes execute Slurm jobs.
Controller nodes manage scheduling and orchestration.

For simplicity, there are nodes that are not represented on the diagram in this article, for example, DBD (database daemon) nodes for accounting, nodes that export metrics, nodes that other Kubernetes operators manage for backups and auto-healing. To work with a Slurm (submit jobs, check their status, write sbatch scripts and prepare data for them, etc.), users connect to its login nodes. The sshd daemon runs on every login node. Soperator balances load between login nodes — each time a user connects to the cluster via SSH, they are directed to a random login node.

Worker nodes

Worker nodes, also known as compute nodes, perform computations for Slurm jobs. The slurmd daemon runs on every worker node. It monitors, launches and terminates jobs. For more information on how to work with login and worker nodes, see Connecting to login and worker nodes.

Controller nodes

Controller nodes orchestrate Slurm activities, such as job queuing, monitoring node states and allocating resources. The central management daemon, slurmctld, runs on all controller nodes.

Persistent storage

Soperator’s main storage feature is its shared root filesystem. It is mounted to all login and worker nodes in a special way — you see it as the root directory (/) in your SSH sessions and Slurm jobs. This helps maintain the traditional Slurm user experience where you work with the entire root filesystem on each node. The filesystem is shared, which means you do not need to keep it identical across nodes manually. When you make changes to the filesystem on one node, these changes automatically show up on other nodes. The shared root filesystem is implemented as a Kubernetes PersistentVolume (PV) that ensures data is preserved when nodes restart. Soperator also uses PVs for system needs, like storing cluster and controller states, etc.

Ephemeral storage

Local SSD disks are available on supported platforms, presets and regions. Unlike the shared root filesystem, local SSD disks are added to an individual node and are not shared across the cluster. Use local SSD disks for data that can be recreated and benefits from high performance and low latency, such as scratch space, caches and intermediate files created by Slurm jobs. For durable or shared data, consider using persistent storage.

​Cluster specification and Slurm configuration

​Nodes

​Login nodes

​Worker nodes

​Controller nodes

​Persistent storage

​Ephemeral storage

​See also