Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.nebius.com/llms.txt

Use this file to discover all available pages before exploring further.

Serverless AI lets you deploy and manage endpoints without handling infrastructure yourself. With endpoints, you can create an OpenAI-compatible model backend in a few minutes. This tutorial shows how to prepare your environment, create your first endpoint with an open-source large language model (LLM), and send a chat request. The endpoint is based on the vllm/vllm-openai:latest image. vLLM automatically downloads the model from Hugging Face when the endpoint starts. The container exposes an OpenAI-compatible /v1/chat/completions API.

Costs

Nebius AI Cloud charges you for Compute virtual machines.

Prerequisites

  • Make sure that you are in a group that has the admin role within your tenant; for example, the default admins group.
  • On the Administration → Limits → Quotas page of the web console, check that you have quotas on the following resources in the region you use:
    • NVIDIA® L40S for regular VMs without reservations, under Compute, there should be at least one GPU available.
    • Number of virtual machines, under Compute, there should be at least one VM available.
    • Total number of allocations, under Virtual Private Cloud, there should be at least one allocation available.
    Increase quotas if needed.

Steps

Create an endpoint

  1. In the sidebar, go to https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/sidebar/ai-services.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=ab4ff229f7690c99deb1dc52d3daf987 AI ServicesEndpoints.
  2. Click https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/plus.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=7c9efc69d65fc58db0eb73702fd81aa1 Create endpoint.
  3. On the page that opens, specify the following endpoint settings:
    • Image path: vllm/vllm-openai:v0.18.0-cu130.
    • Ports: 8000.
    • Advanced settingsEntrypoint command: python3 -m vllm.entrypoints.openai.api_server.
    • Advanced settingsArguments: --model Qwen/Qwen3-0.6B --host 0.0.0.0 --port 8000.
    • Advanced settingsAuthentication: Token authentication. Copy and save the generated token.
    • Computing resources: With GPU.
    • Available platform: NVIDIA® L40S PCIe with Intel Ice Lake.
    • Preset: 1GPU — 8 CPUs — 32 GiB RAM.
    • Network: Public static IP.
  4. Click Create.
The endpoint creation takes approximately five minutes.

Check the endpoint status

Wait until the endpoint status is Running. You can check the status on the endpoint page.

Test the endpoint

  1. In the sidebar, go to https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/sidebar/ai-services.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=ab4ff229f7690c99deb1dc52d3daf987 AI ServicesEndpoints.
  2. Open the page of the required endpoint.
  3. In the Network section, copy the IP address from the Public endpoints or Private endpoints field.
Test the endpoint by listing available models:
curl "http://<endpoint_IP_address>/v1/models" \
  -H "Authorization: Bearer <token>" | jq
Send a chat request to the model:
curl "http://<endpoint_IP_address>/v1/chat/completions" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [
      {"role": "user", "content": "Say '\''Hello Nebius AI'\'' and nothing else"}
    ]
  }' | jq -r '.choices[0].message.content'

How to delete the created resources

The endpoint and its computing resources are chargeable. If you don’t need the endpoint, delete it, so Nebius AI Cloud doesn’t charge for it:
  1. In the sidebar, go to https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/sidebar/ai-services.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=ab4ff229f7690c99deb1dc52d3daf987 AI Services → Endpoints.
  2. Locate the endpoint and then click https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/button-vellipsis.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=e80b8e57c43bfd117679262e6a1334ad → Delete.
  3. In the window that opens, confirm the deletion.