Costs
Nebius AI Cloud charges you for the following billing items:- Compute virtual machines (for JupyterLab deployed as a container over VM and for a Serverless AI endpoint)
- Managed Service for PostgreSQL cluster
Prerequisites
- Make sure that you are in a group that has at least the
editorrole within your tenant; for example, the defaulteditorsgroup. You can check this in the Administration → IAM section of the web console. - Create resources in a project in one of the following regions:
eu-north1,eu-west1,us-central1or privateeu-north2. - In the web console, go to Administration → Limits and make sure that you have at least one NVIDIA® Hopper® H200 for regular VMs without reservations and one virtual machine under Compute, and one allocation under Virtual Private Cloud in the region that you use. Increase quotas if needed.
Steps
Create a Managed Service for PostgreSQL cluster
- In the web console, go to Managed Service for PostgreSQL.
-
Click Create cluster and configure the cluster:
- Enter a name.
- In Access, select Public and private so that JupyterLab can connect to the cluster from the internet.
- In Resources, choose a preset (for example,
4 vCPUs – 16 GiB RAM) and set the storage size. - In Database, set the database name, username and password. For example: database
rag-example, userrag_example_userand a strong password.
- Click Create cluster and wait until it is ready.
- Open the cluster page. On Cluster overview, click Copy endpoint URL and select Public RW endpoint URL to copy the connection host. In the General block you can copy Bootstrap database and Username if needed. You will need the endpoint host, database name, username and password in the notebook.
Deploy the JupyterLab application
-
Deploy JupyterLab on a VM:
- In the web console, go to Applications.
- Find JupyterLab by searching for it or browsing by category, then open the application page.
- Click Deploy on VM. The creation page for a container over VM opens.
- Securely save the generated token. You will need the token to access the application UI.
- Set computing resources (for example,
Non-GPU Intel Ice Lakeand4 vCPUs – 16 GiB RAM) and local storage size. - In Access, add a username and SSH key.
- Click Create container over VM and wait until the VM is running. It takes about five minutes.
-
Open the application UI:
- In the sidebar, go to Compute → Containers over VMs.
- Open the VM page and then click the web UI link to connect to JupyterLab. Open a new notebook there. You will run all the code below in this notebook.
Prepare the knowledge base in JupyterLab
-
In JupyterLab, open a terminal and install dependencies:
-
Create three Markdown files in the notebook’s working directory:
laptop_catalog.md,headphones_catalog.md, andreturn_policy.md. Use sections separated by##headers. Full example contents:laptop_catalog.md
headphones_catalog.md
return_policy.md
-
In the notebook, run the following in order:
-
Connect to the database and create the table.
Replace
<host>,<dbname>,<user>and<password>with your Managed Service for PostgreSQL cluster connection details: use Public RW endpoint URL as the hostname and the Bootstrap database, Username and password from the General block. Theall-MiniLM-L6-v2model produces 384-dimensional vectors. If you use another model, changevector(384)in the table definition. -
Load the embedding model and define helpers.
-
Ingest the documents:
-
Define retrieval and prompt building:
-
Connect to the database and create the table.
Replace
Deploy a serverless LLM endpoint
The endpoint exposes an OpenAI-compatible API that you will call from your JupyterLab notebook.- In the web console, go to AI Services → Endpoints and click Create endpoint.
-
Under Endpoint settings → Image path, enter
vllm/vllm-openai:latest. -
Under Ports, set the container port to
8000. -
Expand Advanced settings:
- In Entrypoint command, enter
python3 -m vllm.entrypoints.openai.api_server. - In Arguments, enter
--model Qwen/Qwen3-0.6B --host 0.0.0.0 --port 8000.
- In Entrypoint command, enter
-
Under Computing resources, choose With GPU, then select a platform (for example,
NVIDIA® H200 NVLink) and a preset (for example,1 GPU — 16 vCPUs — 200 GiB RAM). -
Click Create and wait until the endpoint status is
Running. It may take five minutes. -
Open the endpoint. In the Network section, copy the value of a public endpoint in the
http://<IP_address>:8000format.
Call the LLM from the notebook
-
Add the following code to your JupyterLab notebook to call the deployed endpoint. Replace
<ENDPOINT_IP>with the endpoint address from Deploy a serverless LLM endpoint. -
Run a RAG query:
Example
Example
Your question: What laptop and headphones should I buy with a budget of 1000 USD?Answer: The laptop to buy is the Sony WH-1000XM5, and the headphones are the JBL Tune 510BT. Both products are within the 1000 USD budget.
How to delete the created resources
Some of the created resources are chargeable. If you don’t need them, delete these resources so Nebius AI Cloud doesn’t charge for them:- JupyterLab VM: In the web console, go to Compute → Containers over VMs, open the VM and delete it.
- Serverless AI endpoint: In the web console, go to AI Services → Endpoints. Open the endpoint, stop it if it is running, and then delete it.
- Managed Service for PostgreSQL cluster: Follow the instructions.
Postgres, PostgreSQL and the Slonik Logo are trademarks or registered trademarks of the PostgreSQL Community Association of Canada, and used with their permission.