To work with metrics in Prometheus, connect Prometheus to Observability Metrics and query the data by using PromQL.Documentation Index
Fetch the complete documentation index at: https://docs.nebius.com/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
- Install and configure Nebius AI Cloud CLI.
- If you don’t have a service account for observability services, create one.
-
Make sure that the service account is in a group that has at least the
viewerrole within your tenant; for example, the defaultviewersgroup. You can check this in the Administration → IAM section of the web console. If the service account is not in the required group, click→ Add to group, and select
viewers. -
Issue a static key for the service account using the following command:
Copy the value of the static key from the
tokenparameter of the response. You will need it on later steps.
How to connect Prometheus
Prometheus can only show a limited amount of monitoring data. If you have a large infrastructure, consider connecting a data source in Grafana® instead.
- Download the latest release of Prometheus for your platform.
-
Extract the contents and switch to the folder with Prometheus:
-
Create the
prometheus.ymlconfiguration file that configures Prometheus to retrieve the metrics. Use one of the following configurations depending on your Prometheus version:In this file, change the following parameters:-
bearer_token: Enter the static key that you got earlier. -
metrics_path: Specify your project ID in the URL. Optionally, add a service in the path in the following format:The following services are available:compute: metrics related to Compute virtual machines.gpu: GPU-related metrics.nbs: metrics related to Compute volumes.sp_storage: metrics related to Object Storage.msp: metrics related to Managed Service for PostgreSQL® and Managed Service for MLflow.
-
match[]: optionally specify which data Prometheus collects by filtering for labels or metric names. For example, to collect only metrics with thediskprefix, set the following value: -
scrape_interval: you can change the interval, but the recommended interval is no less than 15 seconds.
-
-
Start Prometheus:
How to shard large scraping jobs
If a scraping job needs to return a large amount of data, shard (split) it into several jobs. Use sharding when any of the following is true:- Prometheus takes too long to retrieve metrics because one job requests too many time series.
- A large scraping job intermittently times out or becomes unreliable.
- You expect your cluster to grow significantly and want to avoid reworking the Prometheus configuration later.
scrape_configs entries that use the same metrics_path but different match[] selectors. Make the selectors non-overlapping so that the same metric is not collected more than once.
For example, when you collect only GPU metrics, split the requests by theChoose one sharding strategy and use it consistently. For example, split requests by service, by metric name prefix or by a stable label that clearly partitions your infrastructure.uuidlabel:
How to explore and manage metrics
Open http://localhost:9090 in your browser and explore the metrics by using PromQL queries. For example, to get all metrics related to Compute virtual machines, enter the following query:The Grafana Labs Marks are trademarks of Grafana Labs, and are used with Grafana Labs’ permission. We are not affiliated with, endorsed or sponsored by Grafana Labs or its affiliates. Postgres, PostgreSQL and the Slonik Logo are trademarks or registered trademarks of the PostgreSQL Community Association of Canada, and used with their permission.