Managed Service for Kubernetes nodes are Compute virtual machines (VMs). Diagnostic logs from Managed Kubernetes nodes help you troubleshoot issues with VM operations, networking and workloads. The procedure for collecting diagnostic logs depends on the GPU and access settings you configured when you created the node group. We strongly recommend collecting logs while the issue is still occurring, because they capture more information about the broken state than logs collected after the issue has been resolved. Determine which of the following cases applies to your environment, and follow the relevant procedure:Documentation Index
Fetch the complete documentation index at: https://docs.nebius.com/llms.txt
Use this file to discover all available pages before exploring further.
- Nodes that have one or more GPUs, without SSH configuration: connect to the cluster with kubectl to start a debug session.
- Nodes that have one or more GPUs, with SSH configuration: connect to the node with SSH to collect logs.
- Nodes without GPUs: contact our support team.
Types of logs
This guide describes how to collect the following types of logs for troubleshooting:- GPU logs:
nvidia-bug-report.sh. - General system logs, including more context about system services and package versions:
sos report. - NVIDIA® Mellanox® adapter (InfiniBand™/NVSwitch/Ethernet) logs:
sysinfo-snapshot.
How to collect logs by using kubectl
If your nodes have GPUs and you havekubectl access to the cluster, but no SSH access to the nodes, do the following to collect the logs:
- Connect to the cluster with kubectl.
-
Start a debugging session for the required node and open an interactive shell in the debug container:
In the command, specify:
-
node_ID: The node to debug. To get the nodes in the cluster, run:Alternatively, in the web console, go toCompute → Virtual machines and click
next to the node ID to copy it.
-
--image: Container image to use for the debug container. We recommend setting it toubuntuto start a temporary debug container. -
--profile: Set tosysadminto use the built-in debugging profile. Refer to the Kubernetes documentation for more information. -
-it: Starts an interactive terminal session in the debug container. -
bash: Starts the Bash shell in the debug container.
-
-
Switch to the host filesystem:
-
Generate GPU logs:
This command usually runs for about five minutes and generates
nvidia-bug-report.log.gzin the current working directory. If the command stops responding, run it in safe mode: -
If you need more system information, generate general system logs:
This command generates an archive in the following format:
/tmp/sosreport-<node_ID>-<date>-<random_ID>.tar.gz. -
If you are troubleshooting Mellanox adapter issues, generate Mellanox adapter logs:
This command generates an archive in the following format:
/tmp/sysinfo-snapshot-<node_ID>-<date>-<random_ID>.tgz. -
From your local shell, copy the generated log file(s) from the debug Pod:
In the command, specify:
debug_Pod_name: The name of the temporary debug Pod created when you rankubectl debug.generated_file_path: The path to the generated log file on the node, for example,/tmp/sosreport-*.tar.gz.local_file_name: The name to save the file as on your local machine, for example,/tmp/sosreport.tar.gz.
How to collect logs by using SSH
If your nodes have GPUs, and you have configured SSH access, do the following to collect the logs:- Connect to the node over SSH. Nodes are Compute VMs, therefore, you connect the same way you would connect to a VM by using SSH.
- Generate the logs as described in How to collect logs.
- Retrieve the generated log files as described in How to get generated log files.