Skip to main content
Soperator clusters allow you to run jobs in containers by using Apptainer. Apptainer is a secure and portable container runtime compatible with Slurm. It is designed for high-performance computing (HPC) and scientific computing. Apptainer is formerly known as Singularity and supports the same .sif container image format. To run a containerized job by using Apptainer:
  1. Connect to a login node of your Soperator cluster.
  2. Install Apptainer:
    sudo add-apt-repository -y ppa:apptainer/ppa
    sudo apt update
    sudo apt install -y apptainer
    
  3. Verify that the installation was successful by checking the Apptainer version:
    apptainer --version
    
    Expected output:
    apptainer version 1.3.4
    
  4. Use srun to pull a container image and convert it to the .sif format:
    srun apptainer pull cuda_image.sif docker://nvidia/cuda:12.4.1-cudnn-devel-rockylinux8
    
    The pull command can download or convert a container from the specified URL. In particular, you can pull an image from Docker Hub or another container registry. For more information, see the Apptainer documentation.
  5. Create the apptainer_job.sh script with the following contents:
    #!/bin/bash
    #SBATCH --job-name=apptainer_job
    #SBATCH --gres=gpu:8
    #SBATCH --output=output.log
    #SBATCH --error=error.log
    apptainer exec --nv cuda_image.sif nvidia-smi
    
    This script uses the following parameters:
    • --gres=gpu:8 requests 8 GPUs for the job.
    • --nv enables NVIDIA GPU support inside the container.
    This script runs the nvidia-smi monitoring utility by NVIDIA to print information on GPU visibility inside the container. To run custom workloads, replace cuda_image.sif with a different container image. Also, replace nvidia-smi with the required application or command. For example:
    apptainer exec --nv my_custom_image.sif python train.py
    
    Ensure that my_custom_image.sif contains Python and all other dependencies of train.py.
  6. Run the job:
    sbatch apptainer_job.sh
    
    The output contains the following confirmation:
    Submitted batch job <number>
    
  7. When the job completes, check the logs. The output.log file contains the list of all 8 GPUs that are available for usage inside the container:
    cat output.log
    Wed Apr 16 13:55:48 2025
    +-----------------------------------------------------------------------------------------+
    | NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
    |-----------------------------------------+------------------------+----------------------+
    | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
    |                                         |                        |               MIG M. |
    |=========================================+========================+======================|
    |   0  NVIDIA H100 80GB HBM3          On  |   00000000:8D:00.0 Off |                    0 |
    | N/A   29C    P0             73W /  700W |       0MiB /  81559MiB |      0%      Default |
    |                                         |                        |             Disabled |
    +-----------------------------------------+------------------------+----------------------+
    |   1  NVIDIA H100 80GB HBM3          On  |   00000000:91:00.0 Off |                    0 |
    | N/A   27C    P0             67W /  700W |       0MiB /  81559MiB |      0%      Default |
    |                                         |                        |             Disabled |
    +-----------------------------------------+------------------------+----------------------+
    |   2  NVIDIA H100 80GB HBM3          On  |   00000000:95:00.0 Off |                    0 |
    | N/A   29C    P0             67W /  700W |       0MiB /  81559MiB |      0%      Default |
    |                                         |                        |             Disabled |
    +-----------------------------------------+------------------------+----------------------+
    |   3  NVIDIA H100 80GB HBM3          On  |   00000000:99:00.0 Off |                    0 |
    | N/A   27C    P0             71W /  700W |       0MiB /  81559MiB |      0%      Default |
    |                                         |                        |             Disabled |
    +-----------------------------------------+------------------------+----------------------+
    |   4  NVIDIA H100 80GB HBM3          On  |   00000000:AB:00.0 Off |                    0 |
    | N/A   29C    P0             70W /  700W |       0MiB /  81559MiB |      0%      Default |
    |                                         |                        |             Disabled |
    +-----------------------------------------+------------------------+----------------------+
    |   5  NVIDIA H100 80GB HBM3          On  |   00000000:AF:00.0 Off |                    0 |
    | N/A   27C    P0             70W /  700W |       0MiB /  81559MiB |      0%      Default |
    |                                         |                        |             Disabled |
    +-----------------------------------------+------------------------+----------------------+
    |   6  NVIDIA H100 80GB HBM3          On  |   00000000:B3:00.0 Off |                    0 |
    | N/A   29C    P0             69W /  700W |       0MiB /  81559MiB |      0%      Default |
    |                                         |                        |             Disabled |
    +-----------------------------------------+------------------------+----------------------+
    |   7  NVIDIA H100 80GB HBM3          On  |   00000000:B7:00.0 Off |                    0 |
    | N/A   26C    P0             68W /  700W |       0MiB /  81559MiB |      0%      Default |
    |                                         |                        |             Disabled |
    +-----------------------------------------+------------------------+----------------------+
    
    +-----------------------------------------------------------------------------------------+
    | Processes:                                                                              |
    |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
    |        ID   ID                                                               Usage      |
    |=========================================================================================|
    |  No running processes found                                                             |
    +-----------------------------------------------------------------------------------------+