Skip to main content
To work with an NVIDIA NIM microservice in Nebius AI Cloud, you can deploy it as an application in the web console.

Prerequisites

Make sure you are in a group that has at least the editor role within your tenant; for example, the default editors group. You can check this in the Administration → IAM section of the web console.

How to deploy

  1. In the web console, go to https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/sidebar/applications.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=06329add2f560a2a83d6c136ca5dfc9b Applications.
  2. Find the NVIDIA NIM microservice model that you want to deploy by searching for it or browsing by category, then open the application page. Nebius AI Cloud offers the following models as NVIDIA NIM microservices:
  3. Click Deploy application to deploy using the Standalone option.
  4. Configure the application:
    1. Enter the application name.
    2. Select the network where the application should be located. This network will allocate private and public IP addresses to the application. The Network field is only shown if your project has more than one network.
    3. Configure the credentials that you will use to log in:
      • Enter the username. It must be between 1 and 63 characters long, and consist of Latin letters, digits, hyphens and underscores. It cannot start with a hyphen.
      • Copy the generated password, save it securely and confirm that you saved it. To get another password, click Generate.
    4. Under Resources, configure the application’s computing and storage resources:
      1. Select a platform and preset. Available platforms and presets depend on the region where you deploy the application.
      2. Specify the disk size.
      Computing resources of an application determine how much you pay for using it and are subject to quotas. For more details, see Standalone Applications pricing in Nebius AI Cloud and Standalone Applications quotas in Nebius AI Cloud.
  5. Click Deploy application.
  6. Wait until the status is Running. It takes 15–20 minutes.

What’s next

When the application is running, you can send inference requests to it.