Running applications and custom containers on virtual machines

In Nebius AI Cloud, you can run applications in containers over virtual machines (VMs). A container over VM allows you to launch a VM with a pre-installed container image, such as Jupyter Notebook, or a custom Docker image from a public registry. Containers over VMs are useful when you want to quickly deploy an application environment without manually configuring the VM or installing dependencies. This tutorial demonstrates two alternative ways to run containers over VMs:

Run a container over VM with a pre-installed application image
Run a container over VM with a custom Docker image from the public registry

Costs

The tutorial includes the following chargeable resources:

Prerequisites

Generate an SSH key pair.

Steps

Run a container over VM with a pre-installed application image

Create a single-GPU VM with Jupyter Notebook

In the web console, go to Compute → Containers over VMs.
Click Create container over VM.
Specify the VM name.
Select the Jupyter Notebook container image.
Copy and save the token that appears. You will need this token later to access the JupyterLab web interface.
In Computing resources, configure the VM with one GPU. For example, select NVIDIA® L40S PCIe with Intel Ice Lake and keep the predefined preset with eight CPUs.
In Local storage, specify the disk size.
In Access, add new credentials or select existing ones. To add new credentials:
1. Specify the username. Do not use the root or admin usernames. They are reserved for internal needs and cannot be used for SSH access.
2. Copy the contents of the .pub file generated earlier and paste it into the Public key field.
3. Click Add credentials.
Click Create container over VM.

Launch Jupyter Notebook

When the container over VM is running, connect to the application:

In Containers over VMs, open the page of the created VM.
Click Go to Web UI at the top of the VM page.
When prompted, paste the token into the authentication field and click Log in.
If you did not save the Jupyter token earlier, you can copy it from the Container parameters section on the container over VM page.
In JupyterLab, create a new notebook.

Run the following code. It shows information about available GPUs:

import torch
if torch.cuda.is_available():
    print("CUDA is available. PyTorch can use your GPU.")
    print(f"Number of GPUs available: {torch.cuda.device_count()}")
    print(f"GPU Name: {torch.cuda.get_device_name()}")
else:
    print("CUDA is not available. PyTorch will run on CPU.")

Example output:

CUDA is available. PyTorch can use your GPU.
Number of GPUs available: 1
GPU Name: NVIDIA L40S

Benchmark a VM with one GPU

Run a simple benchmark to measure how long the GPU takes to multiply large matrices. This test multiplies two 30,000×30,000 tensors several times and measures the total execution time. Later in the tutorial, you will repeat the same benchmark on a VM with eight GPUs and compare the results. Run the following code:

import torch
import time

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
matrix_size = 30000

a = torch.randn(matrix_size, matrix_size, device=device)
b = torch.randn(matrix_size, matrix_size, device=device)

num_runs = 10

for _ in range(3):
    torch.matmul(a, b)
    torch.cuda.synchronize()

start_time = time.time()

for _ in range(num_runs):
    torch.matmul(a, b)
    torch.cuda.synchronize()

end_time = time.time()

average_time = (end_time - start_time)
print(f"Time for {num_runs} matrix multiplications ({matrix_size}x{matrix_size}): {average_time:.4f} seconds")

Example output:

Time for 10 matrix multiplications (30000x30000): 16.7096 seconds

In this benchmark test, CUDA synchronization ensures that each multiplication finishes before the next one starts, which makes the timing more accurate.

Replace the VM with an 8-GPU VM while preserving data

To scale from one GPU to eight GPUs and keep your notebooks:

Delete the current VM. When deleting the VM, select the option to keep the boot disk.
Create a new container over VM as described in the Create a single-GPU VM with Jupyter Notebook section, but:
- In Computing resources, choose a configuration with eight GPUs
- Attach the existing disk that contains your data as an additional disk

Benchmark a VM with eight GPUs

Run the benchmark test again on the VM with eight GPUs to measure how the workload performs after scaling.

Go to Jupyter Notebook and open your existing notebook.

Replace the benchmark code with:

import torch
import time

num_gpus = torch.cuda.device_count()
matrix_size = 30000
num_runs = 10
chunk_size = matrix_size // num_gpus

B_cpu = torch.randn(matrix_size, matrix_size)
B_chunks = [B_cpu.to(f"cuda:{i}") for i in range(num_gpus)]
A_chunks = [torch.randn(chunk_size, matrix_size, device=f"cuda:{i}")
            for i in range(num_gpus)]

start_time = time.time()

for _ in range(num_runs):
    C_chunks = []
    for i in range(num_gpus):
        C = A_chunks[i] @ B_chunks[i]
        C_chunks.append(C)
    for i in range(num_gpus):
        torch.cuda.synchronize(i)

end_time = time.time()

print(f"Time for {num_runs} multi-GPU matrix multiplications ({matrix_size}x{matrix_size}): {(end_time - start_time):.4f} seconds")

Example output:

Time for 10 multi-GPU matrix multiplications (30000x30000): 1.3283 seconds.

This demonstrates the performance improvement when scaling from one GPU to eight GPUs.

Run a container over VM with a custom Docker image

In the previous section, you deployed a container over VM by using a pre-installed application image with Jupyter Notebook. You can also deploy containers with custom Docker images from public registries. In this section, you will create a container over VM by using a Docker image from Docker Hub and access the application running inside the container. The TensorFlow Jupyter image is used as an example. This image includes both TensorFlow and Jupyter Notebook, so you can run TensorFlow workloads directly in a notebook environment.

Create a container over VM with a custom Docker image

In the web console, go to Compute → Containers over VMs.
Click Create container over VM.
Specify the VM name.
Select Custom Image.
In Docker Image, enter tensorflow/tensorflow:nightly-gpu-jupyter.
In Docker run arguments, specify --restart=always --gpus all --shm-size=16GB -p 8888:8888. These arguments enable GPU access, allocate shared memory and expose port 8888 for Jupyter Notebook.
In Computing resources, use at least one GPU.
In Local storage, specify the disk size.
In Access, select the previously created credentials.
Click Create container over VM.

Connect to the VM

When the container over VM is running, connect to the application:

In the Containers over VMs section, open the page of the VM with the custom Docker image installed.
In the Network section, copy the Public IPv4 address.
Connect to the VM by using SSH:
```
ssh <username>@<public_IP_address>
```
List the running containers and copy the name of the TensorFlow container:
```
sudo docker ps
```
Get the Jupyter token:
```
sudo docker logs <container_name>
```
Open in browser http://<public_IP_address>:8888/?token=<jupyter_token>.

In JupyterLab, create a new notebook and run the following code to verify that TensorFlow works in the container:

import tensorflow as tf
import time

matrix_size = 10000

a = tf.random.normal([matrix_size, matrix_size])
b = tf.random.normal([matrix_size, matrix_size])

start = time.time()
c = tf.matmul(a, b)
_ = c.numpy()
end = time.time()

print(f"Matrix multiplication ({matrix_size}x{matrix_size}) took {end - start:.4f}

Example output:

Matrix multiplication (10000x10000) took 1.9316 seconds

This example performs large matrix multiplication by using TensorFlow and prints the execution time.

How to delete the created resources

The created Compute VMs and their boot disks are chargeable. If you do not need them, delete the resources created during this tutorial:

Go to Compute → Containers over VMs.
Open the VM page.
Switch to Settings.
Click Delete virtual machine.
In the window that opens, select Delete the boot disk.
Confirm the deletion.
Repeat these steps for any other VMs created during this tutorial.

​Costs

​Prerequisites

​Steps

​Run a container over VM with a pre-installed application image

​Create a single-GPU VM with Jupyter Notebook

​Launch Jupyter Notebook

​Benchmark a VM with one GPU

​Replace the VM with an 8-GPU VM while preserving data

​Benchmark a VM with eight GPUs

​Run a container over VM with a custom Docker image

​Create a container over VM with a custom Docker image

​Connect to the VM

​How to delete the created resources

Costs

Prerequisites

Steps

Run a container over VM with a pre-installed application image

Create a single-GPU VM with Jupyter Notebook

Launch Jupyter Notebook

Benchmark a VM with one GPU

Replace the VM with an 8-GPU VM while preserving data

Benchmark a VM with eight GPUs

Run a container over VM with a custom Docker image

Create a container over VM with a custom Docker image

Connect to the VM

How to delete the created resources