How the agent works
Compute installs two components for monitoring on all new virtual machines:nebius-observability-agentcollects resource’s metrics. For Compute virtual machines, these metrics include GPU, InfiniBand™, operating system and other metrics.nebius-observability-agent-updaterupdates the agent and delivers new features.
- Collects resource’s metrics.
- Stores the metrics in a safe storage in case the metrics collection endpoint becomes unavailable.
- Adds labels that identify the resource and the project to the metrics.
- Visualizes enriched metrics on the web console dashboards.
How to manage the agent
To manage the monitoring agent (e.g. disable its updates, rollback to an older version etc.), connect to the VM over SSH and follow instructions in the next sections.Stop automatic agent updates
Stop automatic agent updates
You can keep a particular version of the agent. The agent will still collect the metrics, but will stop updating and will not collect metrics for future features. To stop the agent from auto-updating, uninstall the agent updater:
Rollback to a previous agent revision
Rollback to a previous agent revision
Nebius team observes the agent state and updates it in case of failures. However, if you find that the current version of the agent is not working as intended, you may prefer to work with a previous agent revision. To update the agent to a previous version:
- Connect to the VM over SSH.
-
On the VM, run:
This command lists every version of the
nebius-observability-agentpackage available from your configured APT sources. Example output: -
Select a version from the second column (for example,
0.1.139). -
Run the following commands, replacing
<agent_version>with that version: -
Check that the installed package version matches the one you selected:
The output version should match
<agent_version>(for example,0.1.139). If it doesn’t, check/etc/apt/preferences.d/agent.confand run the commands from the previous step again.
Uninstall the agent
Uninstall the agent
If you no longer need the metrics to monitor and troubleshoot your resources, connect to each resource and delete the agent:Please mind that after you uninstall the agent, Nebius support team will have no future data to investigate problems with your Nebius AI Cloud resources. Instead of uninstalling the agent, consider stopping its automatic updates.
Reinstall the agent
Reinstall the agent
If you accidentally uninstall the agent, you can always reinstall it on the resource:
-
Install the agent:
-
(Optional) Enable automatic agent updates:
Data retention and deletion
Collected metrics are stored with full resolution (one datapoint every 15 seconds for most metrics) for 1 month and later with reduced resolution (one datapoint every 5 minutes) for 1 year. If you want to delete all your metrics, contact support. Please mind that after deleting your metrics, Nebius support team will have no data to investigate problems with your Nebius AI Cloud resources.InfiniBand and InfiniBand Trade Association are registered trademarks of the InfiniBand Trade Association.