MLflow helps to track and manage your machine learning experiments. With clusters for MLflow in Nebius AI Cloud, you can deploy MLflow in a cloud in just a couple of clicks and connect to it easily from machines that run your workloads.
This guide covers how to set up your environment to work with Managed Service for MLflow, create your first cluster, run a simple experiment and access its artifacts in an Object Storage.
Prepare your environment
Meet the following prerequisites, depending on the preferred interface:
Install the MLflow Python package that provides an API to run and manage experiments in your code:Make sure that you (if you are not your tenant’s owner) are in a group that has the admin role within your tenant; for example, the default admins group. You can check this in the Administration → IAM section of the web console. Install the following tools:
- MLflow: Python package that provides an API to run and manage experiments in your code.
- jq: (How to install jq) and use it to parse JSON outputs from the Nebius AI Cloud CLI, and extract resource IDs for other commands.
- Nebius AI Cloud CLI: (How to install the CLI), to manage all Nebius AI Cloud resources.
Here are all the installation commands in a single copy-and-paste block:
-
For Ubuntu:
pip install mlflow
sudo apt-get install unzip jq
curl -sSL https://storage.eu-north1.nebius.cloud/cli/install.sh | bash
nebius profile create
-
For macOS:
pip install mlflow
sudo brew install jq
curl -sSL https://storage.eu-north1.nebius.cloud/cli/install.sh | bash
nebius profile create
The last command, nebius profile create, will open the Nebius AI Cloud web console sign-in screen in your browser. Sign in to the web console to complete the initialization. After that, save your project ID in the CLI configuration:
-
Copy your project ID from the web console:
-
Go to the web console and then expand the top list of projects.
-
Next to the project’s name, click
→ Copy project ID.
-
Add the project ID to the CLI configuration:
nebius config set parent-id <project_ID>
Make sure that you (if you are not your tenant’s owner), or the service account that you use to manage clusters, are in a group that has the admin role within your tenant; for example, the default admins group.
Create a cluster
-
In the sidebar, go to
ML tools → MLflow.
-
Click Create cluster.
-
On the cluster creation page that opens, set the following parameters:
-
Name: Enter a name for your cluster.
-
Service account: Select the
mlflow-sa service account. It was created for Managed MLflow by default after you signed up for Nebius AI Cloud.
-
Access: Select the Public and private option.
In this case, the cluster has both public and private tracking endpoints. You can access the public endpoint from the internet and the private endpoint from the network where the cluster is located (for example, via a virtual machine in this network).
-
Cluster size: Select Medium.
-
Admin credentials: Enter a username and a strong password.
-
Click Create cluster.
-
Create a password for administrator access to the cluster and save it to an environment variable:
export PASSWORD=<password>
The password must be between 8 and 64 characters long and must contain at least the following:
- One uppercase letter
- One lowercase letter
- One digit
- One special character from **-!@#$^&*_=+:;’”\|/?,.~\§\±()[]{}<>\`**.
-
Get the ID of the
mlflow-sa service account. It was created by default after you signed up for Nebius AI Cloud. Save the ID to an environment variable:
Get the ID of the mlflow-sa service account. It was created by default after you signed up for Nebius AI Cloud. Save the ID to an environment variable:
export SA_ID=$(nebius iam service-account get-by-name \
--name mlflow-sa \
--format json | jq -r '.metadata.id')
-
Get the ID of the default network and save it to an environment variable:
export NETWORK_ID=$(nebius vpc network get-by-name \
--name default-network \
--format json | jq -r '.metadata.id')
-
Create a test cluster:
nebius msp mlflow v1alpha1 cluster create \
--name mlflow-test-cluster \
--service-account-id $SA_ID \
--network-id $NETWORK_ID \
--admin-username admin \
--admin-password $PASSWORD \
--public-access=true
Together with the cluster, Managed Service for MLflow creates a bucket in Object Storage to store the experiment logs and metrics. This bucket is not deleted automatically after you delete the cluster. You are still charged for the bucket until you delete it.
-
Get the cluster tracking endpoint and save it to an environment variable:
Go to the cluster page and copy the Public Tracking URI value. Append https:// and save the resulting URL to an environment variable:nebius msp mlflow v1alpha1 cluster create \
--name mlflow-test-cluster \
--service-account-id $SA_ID \
--network-id $NETWORK_ID \
--admin-username admin \
--admin-password $PASSWORD \
--public-access=true
Run the following command:export MLFLOW_TRACKING_URI=https://$(nebius msp mlflow v1alpha1 cluster get-by-name \
--name mlflow-test-cluster \
--format json | jq -r ".status.tracking_endpoint")
-
Set up other environment variables used by MLflow Tracking:
export MLFLOW_TRACKING_USERNAME=<admin_username>
export MLFLOW_TRACKING_PASSWORD=<admin_password>
export MLFLOW_EXPERIMENT_NAME=<default_experiment_name>
For more details on secure connection to the tracking server, see MLflow documentation.
Run an experiment and check its results
-
Run an experiment:
-
Create a Python script,
nebius_mlflow_test.py, that trains a linear regression model on a simple, randomly populated dataset and uses MLflow autologging to log the training:
import mlflow
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
np.random.seed(42)
X = np.random.rand(100, 1)
y = 3.5 * X.squeeze() + np.random.randn(100) * 0.5
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42)
mlflow.autolog()
model = LinearRegression()
model.fit(X_train, y_train)
-
Run the script:
python3 nebius_mlflow_test.py
-
Check the experiment artifacts:
Go to the cluster page and click Go to web UI. Enter the admin credentials you specified when creating the cluster.In the left pane, you can see a list of experiments. Select the experiment you just ran and review its results.
Get the public tracking endpoint URL of the cluster:nebius msp mlflow v1alpha1 cluster get-by-name \
--name mlflow-test-cluster \
--format json | jq -r ".status.tracking_endpoint"
Open the URL in your browser and enter the admin credentials you specified when creating the cluster.In the left pane, you can see a list of experiments. Select the experiment you just ran and review its results.
If the client where you work requires a certificate, use the certificate that your machine’s OS provides. For example, check the /etc/ssl/ folder for macOS.