- Enroot is a simple container runtime. It was created by NVIDIA specifically for machine learning and high-performance computing. Enroot supports Docker images and can execute the same containers, but works better with Slurm. It allows you to pull the images from container registries, such as Docker Hub, NVIDIA NGC (
nvcr.io) or Container Registry by Nebius. - Pyxis is a plugin for Slurm, which uses Enroot to allow cluster users to run containerized jobs by using the
sruncommand with additional--container-***parameters.
How to run a job for a container registry image
-
Create the following job called
test.sbatch:This job pulls a TensorFlow image from the NVIDIA container registry, starts a container and executes a simple Python script within it. Use the--container-image="<your.container.registry#repository/container:tag>"parameter for thesruncommand, to specify a container image. In Soperator clusters, a container image is first pulled from the registry, then saved to the cluster’s shared filesystem. Next, all worker nodes can use this image to start the container, without repeated downloads of the same data from the registry. You can disable this default behavior and add the--container-image-save=""parameter with an empty value to thesruncommand. In this parameter, you can also set the path where the image is stored in the filesystem:--container-image-save="<path-to-my-images>". For more information about other parameters available forsrun, see Pyxis documentation. -
Run the job:
How to authenticate in a container registry
Docker Hub and NVIDIA NGC container registries are configured by default, and you do not need to authenticate to pull public container images from them. If you need to pull images from another registry, or to pull private container images, configure credentials in the~/.config/enroot/.credentials file:
- To pull a private container image, the password is required. For more information about the login and password, consult the documentation of the selected container registry.
- To pull a public container image from a registry other than Docker Hub or NVIDIA NGC, you can use an arbitrary string instead of a password.
cr.eu-north1.nebius.cloud: For theeu-north1region.cr.eu-west1.nebius.cloud: For theeu-west1region.
How to run a job for a local image
To run a job in a container with a local image, do the following:-
Create the following job called
test.sbatch:This job starts a container with a local image, then executes a simple Python script within the container. Use the--container-image="<full-path-to-image>"parameter for thesruncommand, to specify the container image. For more information about other parameters available forsrun, see Pyxis documentation. -
Run the job: