> ## Documentation Index
> Fetch the complete documentation index at: https://docs.nebius.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Managing endpoints in Serverless AI

## How to create an endpoint

To deploy an AI model, create an endpoint. Serverless AI endpoints are based on [containers over virtual machines](/compute/virtual-machines/containers) (VMs) in Compute. Your model runs in a container over VM, and you can access the model by using the endpoint.

<Tabs group="interfaces">
  <Tab title="Web console">
    1. In the sidebar, go to <Icon icon="https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/sidebar/ai-services.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=ab4ff229f7690c99deb1dc52d3daf987" width="16" height="16" data-path="_assets/sidebar/ai-services.svg" /> **AI Services** → **Endpoints**.

    2. Click <Icon icon="https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/plus.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=7c9efc69d65fc58db0eb73702fd81aa1" width="16" height="16" data-path="_assets/plus.svg" /> **Create endpoint**.

    3. Specify the endpoint name.

    4. Configure **Endpoint settings**:

       1. In **Image path**, set the path to the container image.

       2. If you use a private registry, click <Icon icon="https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/plus.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=7c9efc69d65fc58db0eb73702fd81aa1" width="16" height="16" data-path="_assets/plus.svg" /> **Add registry** and provide the details for your registry.

       3. In **Ports**, set the container ports for the endpoint. You can add multiple ports.

       4. (Optional) In **Entrypoint command**, specify the entrypoint command for the container.

          If you need to pass container arguments, specify them in this field as well.

       5. (Optional) In **Environment variables**, specify environment variables in key-value pairs.

       6. (Optional) In **Authentication**, enable token authentication if the endpoint serves production traffic.

          The system generates a token. Copy and save the token securely before proceeding.

          If you are prototyping or testing, you can leave authentication disabled.

    5. (Optional) Configure the **Computing resources** section:

       1. Select whether the VM should have GPUs.

       2. Specify the VM type: regular or [preemptible](/compute/virtual-machines/preemptible).

          VMs without GPUs only support the regular type.

       3. Select the [platform and preset](/compute/virtual-machines/types).

    6. (Optional) Configure **Storage** settings:

       * Set the container disk size. See how [disk performance depends on disk size](/compute/storage/types#disk-performance).
       * Attach a bucket or a filesystem to provide storage. You can create a bucket or filesystem, or use an existing one. To create a new bucket, see [Bucket parameters](/object-storage/buckets/manage#bucket-parameters). To create a new filesystem, see [Volume parameters](/compute/storage/manage#volume-parameters).

    7. (Optional) In the **Access** section, add an SSH key for the VM's user so you can [connect to the VM](/compute/virtual-machines/connect#connect-to-the-vm-by-using-ssh).

       You can add new credentials or select existing ones. If you decide to use an existing credential, make sure that the SSH key is stored for the **nebius** username.

    8. Configure the **Network** section:
       * Select a subnet or create a new one.
       * Select the IP address type: **Public static IP** or **Private IP**. If you want to connect to the resource from the internet, select **Public static IP**.

    9. Click **Create**.
  </Tab>

  <Tab title="CLI">
    To create an endpoint, run the following command:

    ```bash theme={null}
    nebius ai endpoint create \
      --name <endpoint_name> \
      --image <image_path> \
      --registry-username <username> \
      --registry-password <password> \
      --container-command "<command>" \
      --container-port <port> \
      --args "<arguments>" \
      --env "<key=value>" \
      --auth token \
      --token <token> \
      --volume "<source:container_path[:mode]|s3://bucket:/container_path[:mode[:profile]]>" \
      --subnet-id <subnet_ID> \
      --platform <platform_ID> \
      --preset <preset> \
      --disk-size <size> \
      --shm-size <size> \
      --ssh-key "$(cat <path_to_SSH_public_key>)" \
      --public
    ```

    <Accordion title="Endpoint creation example">
      ```bash theme={null}
      nebius ai endpoint create \
        --name my-chatbot \
        --image vllm/vllm-openai:latest \
        --container-command "python3 -m vllm.entrypoints.openai.api_server" \
        --container-port 8000 \
        --env MODEL=llama-3-70b \
        --auth token \
        --token df88e*** \
        --volume "storagebucket-e***:/output:rw" \
        --subnet-id vpcsubnet-e*** \
        --platform gpu-h100-sxm \
        --preset 1gpu-16vcpu-200gb \
        --disk-size 250Gi \
        --shm-size 128Mi \
        --ssh-key "$(cat ~/.ssh/id_ed25519.pub)" \
        --public
      ```
    </Accordion>

    In the command, specify the following parameters:

    * Endpoint characteristics:

      * `--name`: Endpoint name.

      - `--image`: Container image reference in the `registry/path:tag` or `registry/path@digest` format. Use an image from a public registry or your authenticated private registry.

      - `--registry-username`, `--registry-password` (optional): Credentials to authenticate if you pull an image from a private registry. Alternatively, use `--registry-secret` for credentials stored in [MysteryBox](/mysterybox/).

        * `--registry-username`: Username.
        * `--registry-password`: Personal access token, password or an API key. Depends on where your registry is hosted. It can be Docker Hub, Microsoft Azure, GitHub, NVIDIA or a custom registry.

        If you pull an image from a public registry or from [Container Registry](/container-registry) in the same project, you don't need to specify credentials.

      - `--registry-secret` (optional): [MysteryBox secret](/mysterybox/overview#secrets-and-versions) selector with `REGISTRY_USERNAME` and `REGISTRY_PASSWORD` payload keys. You can specify a secret name, secret ID, version ID or a combined secret/version selector such as `mbsec-e00***@mbsecver-e00***`.

      - `--container-command` (optional): Entrypoint command for the container.

      - `--args` (optional): Arguments for `docker run` to pass to the entrypoint command.

      - `--env` (optional): Environment variables for the container. Set them in the `key=value` format where the `key` is the environment variable and the `value` is the value of this variable. If you need to set several variables, list the `key=value` pairs separated by commas.

      - `--env-secret` (optional): Environment variables loaded from a [MysteryBox secret](/mysterybox/overview#secrets-and-versions) in the `key=value` format. The value can be a secret name, secret ID, version ID or a combined secret/version selector such as `mbsec-e00***@mbsecver-e00***`. If you need to set several variables, list the pairs separated by commas.

      * `--container-port` (optional): Port that the endpoint exposes.
      * `--auth` (optional): Authentication method.

        If the parameter isn't set (default), no authentication is required. Useful when you want to create an endpoint prototype and test it.

        If you set `--auth token`, you enable authentication. Useful for production purposes. When you [call the endpoint](#how-to-call-an-endpoint), specify the token in the `"Authorization: Bearer <token>"` HTTP header. Use `--token` or `--token-secret` to configure the token. If you don't provide either, the CLI generates a random token.

        * `--token` (optional): Token for authentication. To generate one manually, run `openssl rand -hex 32`. If you don't provide `--token`, the CLI generates one for you.
        * `--token-secret` (optional): [MysteryBox](/mysterybox/overview) secret selector with the `AUTH_TOKEN` payload key. You can specify a secret name, secret ID, version ID or a combined secret/version selector such as `mbsec-e00***@mbsecver-e00***`.
      * `--volume` (optional): [Bucket](/object-storage/overview#buckets) or [shared filesystem](/compute/storage/types#shared-filesystems) to mount to the endpoint container. You can use volumes to store model files and other endpoint artifacts.

        Specify the value in either format:

        * `source:container_path[:mode]` for mounting Nebius shared filesystems and existing bucket or volume resources by ID or name.
        * `s3://bucket:/container_path[:mode[:profile]]` for mounting an Object Storage bucket with AWS profile credentials or S3 credentials stored in MysteryBox. The `profile` is the AWS credentials profile to use. If you manage your credentials with [MysteryBox](/mysterybox/overview), use `profile@<secret_selector>`, where `<secret_selector>` is a secret name, secret ID, version ID or a combined secret/version selector such as `mbsec-e00***@mbsecver-e00***`

        The supported modes are `ro`, read only, and `rw`, read-write (default). Repeat for multiple volumes. For example:

        ```bash theme={null}
        --volume 'computefilesystem-e***:/input:ro' \
        --volume 'storagebucket-e***:/output:rw' \
        --volume 's3://training-results:/output:rw:default'
        ```

      In `source`, you can specify either the volume ID or its name.

    * Underlying container over VM characteristics:

      * `--subnet-id`: [Subnet ID](/vpc/networking/resources#how-to-get-a-subnet-id). Required if the project has multiple subnets.

      * `--platform`: VM platform. See available platforms in [Types of virtual machines and GPUs in Nebius AI Cloud](/compute/virtual-machines/types).

      * `--preset`: Number of GPUs, vCPUs and RAM allocated to the container. The preset must match the selected platform. See available presets in [Presets for GPU platforms](/compute/virtual-machines/types#presets-for-gpu-platforms).

      * `--disk-size`: Disk size of the container over VM. Specify the value such as `100Gi`, `500Gi` or `1Ti`. The default value is `250Gi`.

        See how [disk performance depends on disk size](/compute/storage/types#disk-performance).

      * `--shm-size` (optional): Shared memory size of `/dev/shm`. Specify the value such as `64Mi`, `128Mi` or `1Gi`. The default value is `16Gi`.

      * `--ssh-key` (optional): SSH key to access the container over VM by SSH. When you add an SSH key, a public dynamic IP address is assigned. Before you add the key, check the quota on the number of public IP addresses in the [web console](https://console.nebius.com/quota).

      - `--public` (optional): Assigns a [public IP address](/compute/virtual-machines/network#public-ip-addresses) to the container over VM. Required if you want to connect to the endpoint from the internet.
  </Tab>
</Tabs>

The endpoint creation takes approximately five minutes.

## How to call an endpoint

You can call an endpoint when you want to interact with an AI model hosted in this endpoint; for example, when you want to chat with the model.

To call the endpoint:

1. Get the endpoint IP address:

   <Tabs group="interfaces">
     <Tab title="Web console">
       1. In the sidebar, go to <Icon icon="https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/sidebar/ai-services.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=ab4ff229f7690c99deb1dc52d3daf987" width="16" height="16" data-path="_assets/sidebar/ai-services.svg" /> **AI Services** → **Endpoints**.
       2. Open the page of the required endpoint.
       3. In the **Network** section, copy the IP address from the **Public endpoints** or **Private endpoints** field.
     </Tab>

     <Tab title="CLI">
       1. To get the endpoint ID, list all endpoints:
          ```bash theme={null}
          nebius ai endpoint list
          ```
          In the output, copy the ID of the required endpoint.
       2. Get the endpoint IP address:

          ```bash theme={null}
          nebius ai endpoint get <endpoint_ID> \
            --format json | jq -r '.status.instances[0].public_ip'
          ```
     </Tab>
   </Tabs>

2. Call the endpoint by using an HTTP client. For example, with `curl`:

   ```bash theme={null}
   curl "http://<endpoint_IP_address>/v1/chat/completions" \
      -H "Authorization: Bearer <token>" \
      -H "Content-Type: application/json" \
      -d '{
         "model": "llama-3-70b",
         "messages": [
            {"role": "user", "content": "Tell a joke about AI."}
         ]
      }'
   ```

   In the command, specify the following parameters:

   * `<endpoint_IP_address>`: IP address that you copied earlier.
   * `<token>`: Authentication token that you specified when you created the endpoint. If you didn't specify any token, don't use the `Authorization` HTTP header.
   * `model`: AI model that is hosted in the endpoint and that you chat with.
   * `content`: Message that you want to send to the model.

   The response looks like the following:

   ```json theme={null}
   {
     "choices": [
       {
         "message": {
           "role": "assistant",
           "content": "Why did the AI cross the road? Because it learned the optimal path after 10,000 epochs."
         }
       }
     ]
   }
   ```

## How to stop or start an endpoint

If you don't currently need your endpoint but you want to preserve it, you can stop the endpoint and then start it later. Computing resources of stopped endpoints aren't charged. However, if you mounted a volume to the endpoint, you are charged for the volume even if the endpoint is stopped.

<Tabs group="interfaces">
  <Tab title="Web console">
    1. In the sidebar, go to <Icon icon="https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/sidebar/ai-services.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=ab4ff229f7690c99deb1dc52d3daf987" width="16" height="16" data-path="_assets/sidebar/ai-services.svg" /> **AI Services** → **Endpoints**.
    2. Locate the endpoint and then click <Icon icon="https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/button-vellipsis.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=e80b8e57c43bfd117679262e6a1334ad" width="12" height="24" data-path="_assets/button-vellipsis.svg" /> → **Stop** or **Start**.
    3. In the window that opens, confirm the action.
  </Tab>

  <Tab title="CLI">
    1. To get the endpoint ID, list all endpoints:
       ```bash theme={null}
       nebius ai endpoint list
       ```
       In the output, copy the ID of the required endpoint.

    2. To stop an endpoint, run the following command:

       ```bash theme={null}
       nebius ai endpoint stop --id <endpoint_ID>
       ```

    3. To start an endpoint, run the following command:

       ```bash theme={null}
       nebius ai endpoint start --id <endpoint_ID>
       ```
  </Tab>
</Tabs>

## How to delete an endpoint

When you delete an endpoint, Serverless AI automatically deletes its VM and container (boot) disk.

If you no longer need the endpoint, delete it:

<Tabs group="interfaces">
  <Tab title="Web console">
    1. In the sidebar, go to <Icon icon="https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/sidebar/ai-services.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=ab4ff229f7690c99deb1dc52d3daf987" width="16" height="16" data-path="_assets/sidebar/ai-services.svg" /> **AI Services** → **Endpoints**.
    2. Locate the endpoint and then click <Icon icon="https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/button-vellipsis.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=e80b8e57c43bfd117679262e6a1334ad" width="12" height="24" data-path="_assets/button-vellipsis.svg" /> → **Delete**.
    3. In the window that opens, confirm the deletion.
  </Tab>

  <Tab title="CLI">
    1. To get the endpoint ID, list all endpoints:
       ```bash theme={null}
       nebius ai endpoint list
       ```
       In the output, copy the ID of the required endpoint.
    2. Delete the endpoint:

       ```bash theme={null}
       nebius ai endpoint delete --id <endpoint_ID>
       ```

       If a static IP address is assigned to the endpoint, the prompt asks to confirm the address release. Enter `y` to confirm or `n` to keep the address allocated in a pool.
  </Tab>
</Tabs>
