Skip to main content

How to enable health checks

Health checks are enabled by default in Managed Service for Kubernetes clusters created on December 1, 2025, or later. If you created your cluster before this date and you want to enable the health checks, contact technical support. Alternatively, create a cluster and move your workloads there.

How to disable health checks

The commands for disabling health checks depend on the issue type to which the health checks are applied.

How to stop processing I/O issues of a Network SSD NRD boot disk

If you want to prevent Managed Kubernetes from deleting a node when a Network SSD Non-replicated boot disk reports input/output (I/O) issues, run the following command and disable a health check:
nebius mk8s node-group update --id <node_group_ID> \
   --auto-repair-conditions '[{"type":"NebiusBootDiskIOError","status":"TRUE","disabled":true}]'

False or unknown status of a node

If you want to prevent Managed Kubernetes from deleting a node when it reports a false or unknown status, disable a health check:
  1. Disable the health check for the Unknown status of the node:
    nebius mk8s node-group update --id <node_group_ID> \
       --auto-repair-conditions '[{"type":"NodeReady","status":"UNKNOWN","disabled":true}]'
    
  2. Disable the health check for the False status of the node:
    nebius mk8s node-group update --id <node_group_ID> \
       --auto-repair-conditions '[{"type":"NodeReady","status":"FALSE","disabled":true}]'
    

How to stop processing issues with GPUs on a node

If you want to prevent Managed Kubernetes from cordoning, draining and stopping a node when the service detects issues with GPUs on it, run the following command and disable a health check:
nebius mk8s node-group update --id <node_group_ID> \
   --auto-repair-conditions '[{"type":"NebiusGPUError","status":"TRUE","disabled":true}]'