Autoscaling in Managed Service for Kubernetes

In Managed Service for Kubernetes, the cluster autoscaler integrates with the underlying infrastructure to monitor and manage node groups in your cluster, to add or remove nodes seamlessly as needed. It makes scaling decisions based on the following principles:

If there are unschedulable Pods in the cluster due to resource constraints, the cluster autoscaler adds new nodes to accommodate these Pods.
If nodes in the cluster are underutilized, the cluster autoscaler removes these nodes in order to optimize resource usage and reduce costs.

If you have a GPU node group with autoscaling, add a CPU node group with at least two nodes (or with autoscaling) to the cluster. In this case, when there are no tasks to perform, CoreDNS and Cilium networking add-ons can run on CPU nodes, so that the GPU node group can scale down and reduce your costs.

Set up autoscaling for new node groups

You can set up autoscaling when creating a new node group:

Web console
Nebius AI Cloud CLI

When creating a node group:

Under Size, select Enable autoscaling.
Specify the Min. nodes and Max. nodes numbers in the group.

When creating a node group, add the following parameters to the nebius mk8s node-group create command:

nebius mk8s node-group create \
  ... \
  --autoscaling-min-node-count <minimum_number_of_nodes> \
  --autoscaling-max-node-count <maximum_number_of_nodes>

For example, to set the autoscaling from 2 to 4 nodes, add --autoscaling-min-node-count 2 --autoscaling-max-node-count 4 to the nebius mk8s node-group create command.

Set up autoscaling for existing node groups

You can only manage autoscaling for existing node groups by using the Nebius AI Cloud CLI. To enable autoscaling for an existing node group, add the following parameters to the nebius mk8s node-group update command:

nebius mk8s node-group update --id <node_group_ID> \
  --autoscaling-min-node-count <minimum_number_of_nodes> \
  --autoscaling-max-node-count <maximum_number_of_nodes>

For example, to set the autoscaling from 2 to 4 nodes, add --autoscaling-min-node-count 2 --autoscaling-max-node-count 4 to the nebius mk8s node-group create command.

Configure autoscaling parameters

You can only configure autoscaling parameters for existing node groups by using the Nebius AI Cloud CLI. To configure the minimum and maximum numbers of nodes for autoscaling, add the following parameters to the nebius mk8s node-group update command:

nebius mk8s node-group update --id <node_group_ID> \
  --autoscaling-min-node-count <minimum_number_of_nodes> \
  --autoscaling-max-node-count <maximum_number_of_nodes>

For example, to set the autoscaling from 2 to 4 nodes, add --autoscaling-min-node-count 2 --autoscaling-max-node-count 4 to the nebius mk8s node-group update command.

Troubleshooting

More GPU nodes than required

Issue: When a Managed Kubernetes cluster has the NVIDIA GPU Operator and the NVIDIA Network Operator installed, and workloads on a GPU node group are run with autoscaling, the cluster autoscaler can create more nodes in the group than the workloads require.
Possible reason: A bug in Kubernetes Autoscaler that causes inconsistency in how nodes are considered ready or not ready for Pods. For more information, see Excess multiGPU nodes when using GPU + network operators in the Kubernetes Autoscaler repository on GitHub.
Solution:
1. Uninstall the NVIDIA operators.
2. Create a GPU node group and migrate your workloads to it. A node group created this way uses the GPU-adapted boot disk image offered by Managed Kubernetes, which solves the issue because the NVIDIA operators are no longer required.

​Set up autoscaling for new node groups

​Set up autoscaling for existing node groups

​Configure autoscaling parameters

​Troubleshooting

​More GPU nodes than required

​See also

Set up autoscaling for new node groups

Set up autoscaling for existing node groups

Configure autoscaling parameters

Troubleshooting

More GPU nodes than required

See also