Documentation Index
Fetch the complete documentation index at: https://docs.nebius.com/llms.txt
Use this file to discover all available pages before exploring further.
Soperator clusters allow you to run jobs in containers by using Apptainer. Apptainer is a secure and portable container runtime compatible with Slurm. It is designed for high-performance computing (HPC) and scientific computing. Apptainer is formerly known as Singularity and supports the same .sif container image format.
To run a containerized job by using Apptainer:
-
Connect to a login node of your Soperator cluster.
-
Install Apptainer:
sudo add-apt-repository -y ppa:apptainer/ppa
sudo apt update
sudo apt install -y apptainer
-
Verify that the installation was successful by checking the Apptainer version:
Expected output:
-
Use
srun to pull a container image and convert it to the .sif format:
srun apptainer pull cuda_image.sif docker://nvidia/cuda:12.4.1-cudnn-devel-rockylinux8
The pull command can download or convert a container from the specified URL. In particular, you can pull an image from Docker Hub or another container registry. For more information, see the Apptainer documentation.
-
Create the
apptainer_job.sh script with the following contents:
#!/bin/bash
#SBATCH --job-name=apptainer_job
#SBATCH --gres=gpu:8
#SBATCH --output=output.log
#SBATCH --error=error.log
apptainer exec --nv cuda_image.sif nvidia-smi
This script uses the following parameters:
--gres=gpu:8 requests 8 GPUs for the job.
--nv enables NVIDIA GPU support inside the container.
This script runs the nvidia-smi monitoring utility by NVIDIA to print information on GPU visibility inside the container.
To run custom workloads, replace cuda_image.sif with a different container image. Also, replace nvidia-smi with the required application or command. For example:
apptainer exec --nv my_custom_image.sif python train.py
Ensure that my_custom_image.sif contains Python and all other dependencies of train.py.
-
Run the job:
The output contains the following confirmation:
Submitted batch job <number>
-
When the job completes, check the logs. The
output.log file contains the list of all 8 GPUs that are available for usage inside the container:
cat output.log
Wed Apr 16 13:55:48 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 On | 00000000:8D:00.0 Off | 0 |
| N/A 29C P0 73W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA H100 80GB HBM3 On | 00000000:91:00.0 Off | 0 |
| N/A 27C P0 67W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA H100 80GB HBM3 On | 00000000:95:00.0 Off | 0 |
| N/A 29C P0 67W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA H100 80GB HBM3 On | 00000000:99:00.0 Off | 0 |
| N/A 27C P0 71W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA H100 80GB HBM3 On | 00000000:AB:00.0 Off | 0 |
| N/A 29C P0 70W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA H100 80GB HBM3 On | 00000000:AF:00.0 Off | 0 |
| N/A 27C P0 70W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA H100 80GB HBM3 On | 00000000:B3:00.0 Off | 0 |
| N/A 29C P0 69W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA H100 80GB HBM3 On | 00000000:B7:00.0 Off | 0 |
| N/A 26C P0 68W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+