Skip to main content

How to create an endpoint

To deploy an AI model, create an endpoint. Serverless AI endpoints are based on containers over virtual machines (VMs) in Compute. Your model runs in a container over VM, and you can access the model by using the endpoint.
To create an endpoint, run the following command:
nebius ai endpoint create \
  --name <endpoint_name> \
  --image <image_path> \
  --registry-username <username> \
  --registry-password <password> \
  --container-command "<command>" \
  --container-port <port> \
  --args "<arguments>" \
  --env "<key>=<value>" \
  --auth token \
  --token <token> \
  --volume "<source:container_path|source:container_path:mode>" \
  --subnet-id <subnet_ID> \
  --platform <platform_ID> \
  --preset <preset> \
  --disk-size <size> \
  --shm-size <size> \
  --ssh-key "$(cat <path_to_SSH_public_key>)" \
  --public
nebius ai endpoint create \
  --name my-chatbot \
  --image vllm/vllm-openai:latest \
  --container-command "python3 -m vllm.entrypoints.openai.api_server" \
  --container-port 8000 \
  --env "name=MODEL,value=llama-3-70b" \
  --auth token \
  --token df88e*** \
  --volume "storagebucket-e***:/output:rw" \
  --subnet-id vpcsubnet-e*** \
  --platform gpu-h100-sxm \
  --preset 1gpu-16vcpu-200gb \
  --disk-size 250Gi \
  --shm-size 128Mi \
  --ssh-key "$(cat ~/.ssh/id_ed25519.pub)" \
  --public
In the command, specify the following parameters:
  • Endpoint characteristics:
    • --name: Endpoint name.
    • --image: Container image in the <registry>/<image>:<tag> format. Use an image from a public registry or your authenticated private registry.
    • --registry-username, --registry-password (optional): If you pull an image from a private registry, specify your credentials for this registry:
      • --registry-username: Username.
      • --registry-password: Personal access token, password or an API key. Depends on where your registry is hosted. It can be Docker Hub, Microsoft Azure, GitHub, NVIDIA or a custom registry.
      If you pull an image from a public registry or from Container Registry in the same project, you don’t need to specify the credentials.
    • --container-command (optional): Entrypoint command for the container.
    • --args (optional): Arguments for docker run to pass to the entrypoint command.
    • --env (optional): Environment variables for the container. Set them in the key=value format where the key is the environment variable and the value is the value of this variable. If you need to set several variables, list the key=value pairs separated by commas.
    • --container-port (optional): Port that the endpoint exposes.
    • --auth (optional): Authentication method. If the parameter isn’t set (default), no authentication is required. Useful when you want to create an endpoint prototype and test it. If you set --auth token, you enable authentication. Useful for production purposes. When you call the endpoint, specify the token in the "Authorization: Bearer <token>" HTTP header.
    • --token (optional): Token for authentication. To get the token, run openssl rand -hex 32.
    • --volume (optional): Bucket or shared filesystem to mount to the endpoint container. You can use volumes to store model files and other endpoint artifacts. Specify the value in the source:container_path or source:container_path:mode format. The supported modes are ro, read only, and rw, read-write (default). Repeat for multiple volumes. For example:
      --volume 'computefilesystem-e***:/input:ro' \
      --volume 'storagebucket-e***:/output:rw'
      
    In source, you can specify either the volume ID or its name.
  • Underlying container over VM characteristics:
    • --subnet-id: Subnet ID. Required if the project has multiple subnets.
    • --platform: VM platform. See available platforms in Types of virtual machines and GPUs in Nebius AI Cloud.
    • --preset: Number of GPUs, vCPUs and RAM allocated to the container. The preset must match the selected platform. See available presets in Presets for GPU platforms.
    • --disk-size: Disk size of the container over VM. Specify the value such as 100Gi, 500Gi or 1Ti. The default value is 250Gi. See how disk performance depends on disk size.
    • --shm-size (optional): Shared memory size of /dev/shm. Specify the value such as 64Mi, 128Mi or 1Gi. The default value is 16Gi.
    • --ssh-key (optional): SSH key to access the container over VM by SSH. When you add an SSH key, a public dynamic IP address is assigned. Before you add the key, check the quota on the number of public IP addresses in the web console.
The endpoint creation takes approximately five minutes.

How to call an endpoint

You can call an endpoint when you want to interact with an AI model hosted in this endpoint; for example, when you want to chat with the model. To call the endpoint, do the following:
  1. To get the endpoint ID, list all endpoints:
    nebius ai endpoint list
    
    In the output, copy the ID of the required endpoint.
  2. Get the endpoint IP address:
    nebius ai endpoint get <endpoint_ID> \
      --format json | jq -r '.status.instances[0].public_ip'
    
  3. Call the endpoint:
    curl "http://<endpoint_IP_address>/v1/chat/completions" \
       -H "Authorization: Bearer <token>" \
       -H "Content-Type: application/json" \
       -d '{
          "model": "llama-3-70b",
          "messages": [
             {"role": "user", "content": "Tell a joke about AI."}
          ]
       }'
    
    In the command, specify the following parameters:
    • <endpoint_IP_address>: IP address that you copied earlier.
    • <token>: Authentication token that you specified when you created the endpoint. If you didn’t specify any token, don’t use the Authorization HTTP header.
    • model: AI model that is hosted in the endpoint and that you chat with.
    • content: Message that you want to send to the model.
    The response looks like the following:
    {
      "choices": [
        {
          "message": {
            "role": "assistant",
            "content": "Why did the AI cross the road? Because it learned the optimal path after 10,000 epochs."
          }
        }
      ]
    }
    

How to stop or start an endpoint

If you don’t currently need your endpoint but you want to preserve it, you can stop the endpoint and then start it later. Computing resources of stopped endpoints aren’t charged. However, if you mounted a volume to the endpoint, you are charged for the volume even if the endpoint is stopped.
  1. To get the endpoint ID, list all endpoints:
    nebius ai endpoint list
    
    In the output, copy the ID of the required endpoint.
  2. To stop an endpoint, run the following command:
    nebius ai endpoint stop --id <endpoint_ID>
    
  3. To start an endpoint, run the following command:
    nebius ai endpoint start --id <endpoint_ID>
    

How to delete an endpoint

If you no longer need the endpoint, delete it:
  1. To get the endpoint ID, list all endpoints:
    nebius ai endpoint list
    
    In the output, copy the ID of the required endpoint.
  2. Delete the endpoint:
    nebius ai endpoint delete --id <endpoint_ID>
    
    If a static IP address is assigned to the endpoint, the prompt asks to confirm the address release. Enter y to confirm or n to keep the address allocated in a pool.