Getting started with Compute: Host an LLM on Nebius AI Cloud

You can create a virtual machine (VM) in Nebius AI Cloud, deploy the Qwen/Qwen2.5-72B-Instruct large language model (LLM) on the VM and then use Open WebUI to provide access to the model in a browser.

Before you start

Meet the following prerequisites, depending on the preferred interface:

Web console
CLI

Generate an SSH key pair.

Install and configure the Nebius AI Cloud CLI.
To extract JSON data from the CLI output, install jq:
sudo apt-get install jq
Generate an SSH key pair.

Create the VM

Web console
CLI

Go to the web console, click Create resource and then select Virtual machine.
On the VM creation page that opens, set the following parameters:
- Platform: NVIDIA® H100 NVLink with Intel Sapphire Rapids.
- Preset: 1 GPU - 16 CPUs - 200 GiB RAM.
- Boot disk image: Ubuntu 22.04 LTS for NVIDIA® GPUs (CUDA® 12). For details about boot disk images, see Boot disk images for Compute virtual machines.
- Boot disk size: 300 GiB SSD.
- Network: Select the Public IP address: Auto assign static IP option.
- Username and SSH key: Select the public key that you created earlier. In this field, do not use the root or admin usernames. They are reserved for internal needs and are not allowed to connect to a VM by SSH.
Click Create VM.

The given example assumes that you work with a VM that has a public address, so you can later connect to this VM by SSH. However, if you need an isolated VM, do not assign a public address. To access the VM, you can set up a WireGuard jump server later. This approach enhances security and still provides access to the VM within the same subnet.For more information about creating VMs and managing their network parameters, see How to create a virtual machine in Nebius AI Cloud.

Create a boot disk and save its ID to an environment variable:

export BOOT_DISK_ID=$(nebius compute disk create \
  --name openwebui-disk-1 \
  --size-gibibytes 300 \
  --type network_ssd \
  --source-image-family-image-family ubuntu24.04-cuda13.0 \
  --block-size-bytes 4096 \
  --format json | jq -r ".metadata.id")

The command creates a 300 GiB SSD disk with a 4 KiB block size, and an Ubuntu boot image with pre-installed NVIDIA® GPU drivers. For details about boot disk images, see Boot disk images for Compute virtual machines.

Get the default subnet ID and save it to an environment variable:

export SUBNET_ID=$(nebius vpc subnet list \
  --format json \
  | jq -r ".items[0].metadata.id")

Create the VM with one GPU:

export USER_DATA=$(jq -Rrs '.' <<EOF
#cloud-config
users:
  - name: <username>
    sudo: ALL=(ALL) NOPASSWD:ALL
    shell: /bin/bash
    ssh_authorized_keys:
      - $(cat ~/.ssh/id_ed25519.pub)
EOF
)

export VM_ID=$(nebius compute instance create \
  --name openwebui \
  --resources-platform gpu-h100-sxm \
  --resources-preset 1gpu-16vcpu-200gb \
  --boot-disk-existing-disk-id "$BOOT_DISK_ID" \
  --boot-disk-attach-mode READ_WRITE \
  --cloud-init-user-data "$USER_DATA" \
  --network-interfaces "[{\"name\": \"default-subnet\", \"subnet_id\": \"$SUBNET_ID\", \"ip_address\": {}, \"public_ip_address\": {}}]" \
  --format json | jq -r ".metadata.id")

The given example assumes that you work with a VM that has a public address, so you can later connect to this VM by SSH. However, if you need an isolated VM without a public address, remove "public_ip_address": {} from the --network-interfaces parameter. To access the VM, you can set up a WireGuard jump server later. This approach enhances security and still provides access to the VM within the same subnet.For more information about creating VMs and managing their network parameters, see How to create a virtual machine in Nebius AI Cloud.

Connect to the VM

Get the public IP address of the VM:

Web console
CLI

Open the VM page.
In the Network block, copy the Public IPv4 value.

Run the following command:

export PUBLIC_IP_ADDRESS=$(nebius compute instance get-by-name \
  --name openwebui \
  --format json \
  | jq -r '.status.network_interfaces[0].public_ip_address.address | split("/")[0]')

Connect to the VM:
```
ssh <username>@<ip_address>
```
Specify the received public IP address and the username that you set during the VM creation.

Create a virtual environment and install the necessary packages

To work with Open WebUI, you need a dedicated virtual environment. It enables you to set up and run the OpenWebUI server in isolation from other software on the VM. To create a virtual environment, use Miniconda. To configure the environment:

Download and install the latest Miniconda version:

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh

Initialize Miniconda:
```
source ~/miniconda3/bin/activate
```
On initialization, Miniconda activates its base environment.

Create an OpenWebUI environment with Python 3.11:

conda create -n OpenWebUI python=3.11
conda init bash
echo "conda activate OpenWebUI " >> ~/.bashrc
source ~/.bashrc

This command creates and activates a new environment.

Install Ollama, which provides access to the model:

curl -fsSL https://ollama.com/install.sh | sh

Install Open WebUI:
```
pip install open-webui
```

Start the Open WebUI server

Start the server:
```
open-webui serve
```
Open the Open WebUI interface in the browser. To do this, enter the http://<public_ip_address>:8080 address in the search bar.
In the Open WebUI interface, create an account to work with LLMs locally within the VM. For details on working with Open WebUI, see their documentation.

If you need to restart the server, use the same command: open-webui serve.

Download the Qwen/Qwen2.5-72B-Instruct model

In Open WebUI, click Select a model.
Paste qwen2.5:72b into the search bar.
Click Pull “qwen2.5:72b” from Ollama and wait for the download to finish.
Click Select a model again and then choose Qwen/Qwen2.5-72B-Instruct.

Now you can chat with the model in the browser.

Make Open WebUI start automatically

With the current configuration, you need to manually start the Open WebUI server every time you connect to your VM. Alternatively, you can configure the server to start up whenever the VM starts. To do this:

Create a systemd service file for Open WebUI and open the file in an editor:
```
sudo nano /etc/systemd/system/openwebui.service
```

Paste the following contents into the file and save it. Specify the username that you set during the VM creation:

[Unit]
Description=OpenWebUI Server
After=network.target
[Service]
User=<username>
WorkingDirectory=/home/<username>/
ExecStart=/home/<username>/miniconda3/envs/OpenWebUI/bin/open-webui serve
Restart=always
[Install]
WantedBy=multi-user.target

To make the new service file recognizable, reload systemd:
```
sudo systemctl daemon-reload
```

To start automatically and immediately, enable the systemd service:

sudo systemctl enable openwebui.service
sudo systemctl start openwebui.service

Verify that the service is running:

sudo systemctl status openwebui.service

Whenever you start up your VM in the web console, Open WebUI now automatically launches in the background. You can directly access it in the browser at http://ip_address:8080 and work with the Qwen/Qwen2.5-72B-Instruct model.

​Before you start

​Create the VM

​Connect to the VM

​Create a virtual environment and install the necessary packages

​Start the Open WebUI server

​Download the Qwen/Qwen2.5-72B-Instruct model

​Make Open WebUI start automatically