Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.nebius.com/llms.txt

Use this file to discover all available pages before exploring further.

How to create an endpoint

To deploy an AI model, create an endpoint. Serverless AI endpoints are based on containers over virtual machines (VMs) in Compute. Your model runs in a container over VM, and you can access the model by using the endpoint.
  1. In the sidebar, go to https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/sidebar/ai-services.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=ab4ff229f7690c99deb1dc52d3daf987 AI ServicesEndpoints.
  2. Click https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/plus.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=7c9efc69d65fc58db0eb73702fd81aa1 Create endpoint.
  3. Specify the endpoint name.
  4. In the Endpoint settings section, specify the image path to the container image. If you use a private registry, click https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/plus.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=7c9efc69d65fc58db0eb73702fd81aa1 Add registry and provide the details for your registry.
  5. Set the container ports for the endpoint. You can add multiple ports.
  6. (Optional) Configure advanced settings:
    • Entrypoint command: Specify an entrypoint command for the container.
    • Arguments: Override container arguments that are passed to the entrypoint.
    • Environment variables: Specify environment variables in key-value pairs.
    • SSH key: Add an SSH key for the VM’s user so you can connect to the VM.
    • Authentication: If the endpoint serves production traffic, enable token authentication. The system generates a token. Copy and save the token securely before proceeding. If you are prototyping or testing, you can leave authentication disabled.
  7. (Optional) Configure the Computing resources section:
    1. Select whether the VM should have GPUs.
    2. Specify the VM type: regular or preemptible. VMs without GPUs only support the regular type.
    3. Select the platform and preset.
  8. (Optional) Configure Storage settings:
  9. Configure the Network section:
    • Select a subnet or create a new one.
    • Select the IP address type: Public static IP or Private IP. If you want to connect to the endpoint from the internet, select Public static IP.
  10. Click Create.
The endpoint creation takes approximately five minutes.

How to call an endpoint

You can call an endpoint when you want to interact with an AI model hosted in this endpoint; for example, when you want to chat with the model. To call the endpoint:
  1. Get the endpoint IP address:
    1. In the sidebar, go to https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/sidebar/ai-services.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=ab4ff229f7690c99deb1dc52d3daf987 AI ServicesEndpoints.
    2. Open the page of the required endpoint.
    3. In the Network section, copy the IP address from the Public endpoints or Private endpoints field.
  2. Call the endpoint by using an HTTP client. For example, with curl:
    curl "http://<endpoint_IP_address>/v1/chat/completions" \
       -H "Authorization: Bearer <token>" \
       -H "Content-Type: application/json" \
       -d '{
          "model": "llama-3-70b",
          "messages": [
             {"role": "user", "content": "Tell a joke about AI."}
          ]
       }'
    
    In the command, specify the following parameters:
    • <endpoint_IP_address>: IP address that you copied earlier.
    • <token>: Authentication token that you specified when you created the endpoint. If you didn’t specify any token, don’t use the Authorization HTTP header.
    • model: AI model that is hosted in the endpoint and that you chat with.
    • content: Message that you want to send to the model.
    The response looks like the following:
    {
      "choices": [
        {
          "message": {
            "role": "assistant",
            "content": "Why did the AI cross the road? Because it learned the optimal path after 10,000 epochs."
          }
        }
      ]
    }
    

How to stop or start an endpoint

If you don’t currently need your endpoint but you want to preserve it, you can stop the endpoint and then start it later. Computing resources of stopped endpoints aren’t charged. However, if you mounted a volume to the endpoint, you are charged for the volume even if the endpoint is stopped.
  1. In the sidebar, go to https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/sidebar/ai-services.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=ab4ff229f7690c99deb1dc52d3daf987 AI ServicesEndpoints.
  2. Locate the endpoint and then click https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/button-vellipsis.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=e80b8e57c43bfd117679262e6a1334adStop or Start.
  3. In the window that opens, confirm the action.

How to delete an endpoint

When you delete an endpoint, Serverless AI automatically deletes its VM and container (boot) disk. If you no longer need the endpoint, delete it:
  1. In the sidebar, go to https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/sidebar/ai-services.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=ab4ff229f7690c99deb1dc52d3daf987 AI Services → Endpoints.
  2. Locate the endpoint and then click https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/button-vellipsis.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=e80b8e57c43bfd117679262e6a1334ad → Delete.
  3. In the window that opens, confirm the deletion.