> ## Documentation Index
> Fetch the complete documentation index at: https://docs.nebius.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Creating and modifying Managed Service for Kubernetes® node groups

Clusters in Managed Service for Kubernetes use Compute virtual machines as nodes to run applications. In this guide, you will learn how to create node groups, add them to clusters, modify and delete them.

To learn how to manage clusters outside of their node groups, see [How to create and modify Managed Service for Kubernetes® clusters](../clusters/manage).

## Prerequisites

<Tip>
  You do not need to complete any prerequisites if you create or modify node groups in the web console.
</Tip>

<Tabs>
  <Tab title="CLI">
    1. [Install](/cli/install) and [configure](/cli/configure) the Nebius AI Cloud CLI.

    2. [Create a cluster](../clusters/manage#how-to-create-clusters) and save its ID to an environment variable:

       ```bash theme={null}
       export K8S_CLUSTER_ID=$(nebius mk8s cluster get-by-name \
         --name <cluster_name> --format json | jq -r '.metadata.id')
       ```
  </Tab>

  <Tab title="Terraform">
    1. [Install and configure](/terraform-provider/quickstart) the Nebius AI Cloud provider for Terraform.

    2. [Create a cluster](../clusters/manage#how-to-create-clusters).
  </Tab>
</Tabs>

## How to create node groups

Node groups define the characteristics of the virtual machines (VMs) that run your workloads. Each node group includes identical nodes created with the same template.

You can create different types of node groups depending on your performance, cost and availability requirements. For example, you can choose high-performance GPUs for compute-intensive workloads or preemptible VMs to reduce costs for interruptible tasks.

### Regular node groups

<Tabs>
  <Tab title="Web console">
    1. In the sidebar, go to <Icon icon="https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/sidebar/compute.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=b91340217b08a1456d88ae0347f281d1" width="16" height="16" data-path="_assets/sidebar/compute.svg" /> **Compute** → **Kubernetes**.

    2. Open the page of the cluster where you want to create a node group.

    3. Switch to the **Node groups** tab.

    4. Click <Icon icon="https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/plus.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=7c9efc69d65fc58db0eb73702fd81aa1" width="16" height="16" data-path="_assets/plus.svg" /> **Create node group**.

    5. On the page that opens, specify a name for the node group (for example, `mk8s-node-group-test`).

    6. (Optional) Enable the **Assign public IPv4 addresses** option if you want the nodes to be accessible from the internet.

    7. Under **Size**, specify the initial **Number of nodes**. If you want to let the node group scale up or down depending on the workload, enable autoscaling. After that, specify the minimum and maximum number of nodes that the group can have.

    8. Configure the **Computing resources** section:

       1. Select whether the node group should have GPUs.

       2. Select a regular VM type.

          VMs without GPUs only support the regular type.

          For information about creating preemptible node groups, see [instructions below](#preemptible-node-groups).

       3. (Optional) For a regular VM with GPUs, select **Reservation usage**. Specify whether Managed Kubernetes should allocate resources for the node group from [reservations](./reservations).

          The **Reservation usage** field is only displayed if you have [capacity block groups](/overview/limits/capacity-block-groups).

              <Accordion title="More information about reservation usage">
                * **With reservations**: The resources are allocated from reservations ([capacity block groups](/overview/limits/capacity-block-groups)). For example, if a Nebius manager has created a capacity block group for you, Managed Kubernetes allocates GPUs for the node group from this capacity block group. This ensures that resources are always available, even if VMs in the node group are stopped (for example, by you or a [maintenance event](../maintenance/index)).

                  You can use one of the following reservation types:

                  * **Any** (default): You do not need to select reservations. The service uses the reservations that are most suitable for the configuration of your VM.
                  * **Specific**: Select specific reservations. Make sure to select reservations that have enough capacity and that do not expire in several days.

                  If there are no reservations available during the VM lifecycle, you can run your VM without reservations. Resources for it will be taken from the common pool. To configure this behavior, enable the **Start without a reservation when reservation capacity is exhausted** option.

                * **Without reservations**: The resources are allocated from a common pool, and no reservations are used for the node group.
              </Accordion>

       4. Select an available [platform and a preset](/compute/virtual-machines/types) (a combination of GPUs, vCPUs and RAM) that fits your workload requirements.

       5. (Optional) If you create a node group with 8 GPUs (for example, for training models), use a GPU cluster for the node group. InfiniBand™ in the cluster allows you to accelerate tasks that require high-performance computing (HPC) power. A single node group without InfiniBand cannot perform these tasks as quickly.

          To use a GPU cluster, select an existing one or create a new cluster:

          1. Click <Icon icon="https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/plus.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=7c9efc69d65fc58db0eb73702fd81aa1" width="16" height="16" data-path="_assets/plus.svg" /> **Create** in the **GPU cluster** field.
          2. In the window that opens, specify the cluster name and InfiniBand fabric. To select the fabric, see [InfiniBand fabrics](/compute/clusters/gpu/index#infiniband-fabrics).
          3. Click **Create**.

       6. (Optional) Enable or disable **GPU settings**. They are enabled by default, and they allow Managed Kubernetes to pre-install NVIDIA drivers and the [Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/index.html). You can also select a specific NVIDIA CUDA driver version.

          Disable **GPU settings** only if you need to [install specific driver versions manually](../gpu/set-up#how-to-install-the-drivers-and-components-on-existing-node-groups) or use a custom operator. Disabling is not recommended.

       7. Select an operating system for the nodes (for example, `Ubuntu 24.04 LTS`).

    9. Under **Node storage**, select the disk type and specify the size in <Tooltip tip={<>Nebius uses binary units. For example, a <i>gibibyte</i> (GiB) is 2<sup>30</sup> (1024<sup>3</sup>) bytes.</>}>GiB</Tooltip>. Supported [disk types](/compute/storage/types#disk-types) are the following:

       * **SSD**: Standard solid-state drive for general-purpose workloads.
       * **SSD NRD**: Network-replicated SSD providing higher reliability through data duplication across the network.
       * **SSD IO**: High-performance SSD optimized for I/O-intensive operations with lower latency.

    10. (Optional) If you want to attach a filesystem to your node group, in the **Shared filesystems** section, specify the settings of this filesystem:

        1. Click <Icon icon="https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/plus.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=7c9efc69d65fc58db0eb73702fd81aa1" width="16" height="16" data-path="_assets/plus.svg" /> **Attach shared filesystem**.

        2. In the window that opens, select an existing filesystem or create a new one.

        3. If you create a new filesystem, specify its name, size and the block size.

        4. Click **Attach filesystem** or **Create and attach filesystem**.

        5. After the window is closed, specify a mount tag for mounting the filesystem to the VM.

           Create your own tag, such as `my-filesystem`. Make sure that it is unique within the VM.

        6. To mount the filesystem to the node group automatically, keep the **Auto mount** option enabled.

    11. (Optional) In the **Username and SSH key** field, add credentials, so you can [connect to the node group](/compute/virtual-machines/connect):

        1. Generate an [SSH key pair](/compute/virtual-machines/ssh-keys).

        2. In the **Username and SSH key** field, click <Icon icon="https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/chevron-down.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=e3ec12ce62b3c2e03427533b780997f0" width="16" height="16" data-path="_assets/chevron-down.svg" />.

        3. If you added an SSH key earlier and you want to reuse it, select the key from the drop-down list.

           If you want to add a new key, click <Icon icon="https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/plus.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=7c9efc69d65fc58db0eb73702fd81aa1" width="16" height="16" data-path="_assets/plus.svg" /> **Add credentials**.

        4. In the window that opens, specify the username of the node group user, a public key of your SSH key pair and the credentials name to recognize the key in the list.

        5. Click **Add credentials**.

    12. (Optional) Under **Additional**, select or create a [service account](/iam/overview) that will perform actions on behalf of the nodes.

    13. Click **Create node group**.
  </Tab>

  <Tab title="CLI">
    Create a node group:

    ```bash theme={null}
    nebius mk8s node-group create \
      --parent-id $K8S_CLUSTER_ID \
      --name <node_group_name> \
      --fixed-node-count <number_of_nodes> \
      --template-resources-platform <platform_ID> \
      --template-resources-preset <preset_name> \
      --template-gpu-settings-drivers-preset <driver_preset>
    ```

    For description of node group parameters, see [Node group parameters](#node-group-parameters).

    <Note>
      If you need to modify the NVIDIA device plug-in (for example, to enable multi-instance GPU), don't add the `--template-gpu-settings-drivers-preset` parameter to the command. Instead, [manually install the GPU operator](../gpu/set-up#how-to-install-the-drivers-and-components-on-existing-node-groups).
    </Note>

    For more details about GPUs in node groups, see [Working with GPUs in the Managed Service for Kubernetes®](../gpu/set-up) and [Interconnecting GPUs in Managed Service for Kubernetes® clusters using InfiniBand™](../gpu/clusters).
  </Tab>

  <Tab title="Terraform">
    1. Create a node group configuration file:

       ```hcl theme={null}
       resource "nebius_mk8s_v1_node_group" "<node_group_name>" {
         name = "<node_group_name>"
         parent_id = "<cluster_ID>"
         fixed_node_count = <number_of_nodes>

         template = {
           resources = {
             platform = "<platform_ID>"
             preset = "<preset_name>"
           }

           gpu_settings = {
             drivers_preset = "<driver_preset>"
           }
         }
       }
       ```

       For description of node group parameters, see [Node group parameters](#parameters).

    2. Check that the configuration is correct:
       ```bash theme={null}
       terraform validate
       ```

    3. Apply the changes:
       ```bash theme={null}
       terraform apply
       ```
  </Tab>
</Tabs>

### Preemptible node groups

Preemptible nodes use virtual machines that can be stopped by Nebius AI Cloud at any time. These VMs are more cost-efficient than regular ones and suitable for workloads with interruptions, such as batch processing or training ML models.

For more information about how preemptible VMs work, see [Preemptible virtual machines](../../compute/virtual-machines/preemptible).

<Tabs group="interfaces">
  <Tab title="Web console">
    1. In the sidebar, go to <Icon icon="https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/sidebar/compute.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=b91340217b08a1456d88ae0347f281d1" width="16" height="16" data-path="_assets/sidebar/compute.svg" /> **Compute** → **Kubernetes**.
    2. [Create a cluster](../clusters/manage#how-to-create-clusters) or choose an existing one.
    3. On the cluster page, switch to the **Node groups** tab.
    4. Click <Icon icon="https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/plus.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=7c9efc69d65fc58db0eb73702fd81aa1" width="16" height="16" data-path="_assets/plus.svg" /> **Create node group**.
    5. When creating a node group, under **Computing resources**, select:

       * **With GPU**
       * **Preemptible** VM type

    For information about other node group parameters, see [instructions about creating regular node groups](#regular-node-groups).
  </Tab>

  <Tab title="CLI">
    Run the Nebius AI Cloud CLI command [nebius mk8s node-group create](/cli/reference/mk8s/node-group/create) with the `--template-preemptible` parameter:

    ```bash theme={null}
    nebius mk8s node-group create \
      ... \
      --template-preemptible
    ```
  </Tab>

  <Tab title="Terraform">
    Create a node group configuration file and set the `.template.preemptible` block to enable preemptibility:

    ```hcl theme={null}
    resource "nebius_mk8s_v1_node_group" "example" {
      name     = "preemptible-ng"
      ...

      template = {
        preemptible = {}
        ...
      }
    }
    ```
  </Tab>
</Tabs>

## How to modify node groups

Modifying the noge group template (the GPU cluster, GPU settings and boot disk) triggers a [rolling update](#deployment-strategy-and-quotas). Managed Kubernetes replaces each node with another one, with a new configuration.

If you modify other parameters, Managed Kubernetes does not replace the nodes, they remain unchanged.

During the node group update, by default, no node is unavailable, and the group size can increase by one node. This is based on the default values of the deployment strategy [parameters](#node-group-parameters): `--strategy-max-unavailable-count 0` and `--strategy-max-surge-count 1`. You can change them when you modify a node group by using the CLI.

<Tabs>
  <Tab title="Web console">
    To modify a node group:

    1. In the sidebar, go to <Icon icon="https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/sidebar/compute.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=b91340217b08a1456d88ae0347f281d1" width="16" height="16" data-path="_assets/sidebar/compute.svg" /> **Compute** → **Kubernetes**.

    2. Open the page of the required cluster and then go to the **Node groups** tab.

    3. Open the page of the node group that you wish to change.

    4. Switch to the **Settings** tab and then modify the required parameters.

       Parameters available for editing:

       * **Name**: Name of the node group.

       * **Size**:

         * **Number of nodes**: Target and fixed number of nodes (if autoscaling is disabled). The maximum number is 100.
         * **Enable autoscaling**: Allows you to set the range of nodes within which the [cluster autoscaler](./autoscaling) adds or removes nodes as needed.

       * **Computing resources**: Select whether the node group should have GPUs, and then specify the hardware configuration:

         * **VM type**:

           * **Regular**: Standard VMs for high-availability production workloads.
           * **Preemptible**: Lower-cost VMs that may be terminated by the platform at any time.

         * **Available platform** and **Preset**: Combination of GPUs, vCPUs and RAM that fits your workload requirements. For more information, see [Types of virtual machines and GPUs in Nebius AI Cloud](../../compute/virtual-machines/types).

         * **GPU cluster**: GPU cluster with InfiniBand. Allows you to accelerate tasks that require HPC power.

           Available only if the node group contains 8 GPUs.

         * **GPU settings**: If enabled, the system pre-installs NVIDIA drivers and the [Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/index.html). You can also select a specific NVIDIA CUDA driver version.

           Disable **GPU settings** only if you need to [install specific driver versions manually](../gpu/set-up#how-to-install-the-drivers-and-components-on-existing-node-groups) or use a custom operator.

         * **Drivers**: CUDA driver version based on enabled **GPU settings**.

         * **Operating system**: OS for the nodes, for example, `Ubuntu 24.04 LTS`.

       * **Node storage**:

         * **Disk type**: [Type of the boot disk](../../compute/storage/types#disk-types).
         * **Size**: Size of the boot disk in GiB.

    5. Click **Save changes**.

    The status of the node group changes to **Updating** while the new configuration is being applied.
  </Tab>

  <Tab title="CLI">
    1. Get the node group ID and save it to an environment variable:

       ```bash theme={null}
       export K8S_NODE_GROUP_ID=$(nebius mk8s node-group get-by-name \
         --parent-id $K8S_CLUSTER_ID \
         --name <node_group_name> --format json | jq -r '.metadata.id')
       ```

    2. Update the node group:

       ```bash theme={null}
       nebius mk8s node-group update \
         --id $K8S_NODE_GROUP_ID \
         (parameters)
       ```

       Only the parameters specified in the CLI command can be changed. You can do a full update instead by adding `--full` to the command. This will update all parameters with the values specified in the command or the default values. For more information, see [Specifying parameters](#specifying-parameters).
  </Tab>

  <Tab title="Terraform">
    1. In the node group configuration file, update the [parameters](#node-group-parameters) of the `nebius_mk8s_v1_node_group` resource.
    2. Check that the configuration is correct:
       ```bash theme={null}
       terraform validate
       ```
    3. Apply the changes:
       ```bash theme={null}
       terraform apply
       ```
  </Tab>
</Tabs>

## Deployment strategy and quotas

When you modify a node group's Kubernetes version or node template, Managed Kubernetes performs a *rolling update* to each node in the group:

1. Creates a replacement node.
2. Cordons the existing node (marks it as unschedulable).
3. Drains the existing node (evicts all pods from it).
4. Deletes the existing node.

Managed Kubernetes uses the node group's *deployment strategy* to determine how, in what order and to how many nodes at a time it performs the listed steps. You can configure the deployment strategy using the corresponding [parameters](#node-group-parameters).

To modify a node group, make sure that your [quotas on underlying Compute resources](../../compute/resources/quotas-limits) allow for at least one additional node that can be used for a rolling update. If there is no quota available for any of the required resources, the update fails. You can check your remaining quotas on the [Administration → Limits → Quotas](https://console.nebius.com/quota) page of the web console.

> For example, each node uses 8 GPUs, 128 vCPUs, 1600 GiB RAM and a public IP address. You have 3 nodes in the cluster and a deployment strategy with `--strategy-max-surge-count 2`. During the update, you need quotas for the following additional resources:
>
> * 16 GPUs (2 × 8)
> * 256 vCPUs (2 × 128)
> * 3200 GiB RAM (2 × 1600)
> * 2 public IP addresses

If your quotas allow for only one extra node, the update is still performed using the default `--strategy-max-surge-count 1`. In this case, nodes are updated one-by-one: while one node is being replaced, update attempts for the others may temporarily fail but will eventually complete.

When you or the [autoscaler](./autoscaling) scales a node group up or down, Managed Kubernetes does not recreate any nodes.

## Node group parameters

<Tabs>
  <Tab title="CLI">
    The `nebius mk8s node-group create` and `nebius mk8s node-group update` commands support the following parameters:

    * **Metadata**

      * `--name`: Node group name. Must be unique within the tenant. Cannot be changed after creation.

    * **Kubernetes version on nodes**

      * `--version`: Kubernetes version in `<major>.<minor>` format. Recommended version is 1.33. For more information, see [Kubernetes versions in Managed Service for Kubernetes](../versions).

    * **Node group size**

      * `--fixed-node-count`: Number of nodes per group. The maximum is 100.
      * `--autoscaling-min-node-count`, `--autoscaling-max-node-count`: Allow you to set the range of nodes within which the [cluster autoscaler](./autoscaling) adds or removes nodes as needed.

    * **Node template**

      All nodes in a group are identical and are created based on a *node template*. A node template is similar to a virtual machine specification in Compute.

      The node template has the following parameters:

      * `--template-taints`: Array of Kubernetes [taints](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) (rules that repel pods from nodes) for all nodes in the group.

      * `--template-resources-platform`: A platform with GPUs, see [Interconnecting GPUs in Managed Service for Kubernetes® clusters using InfiniBand™](../gpu/clusters).

      * `--template-resources-preset`: A compatible preset (number of GPUs and vCPUs, RAM size), see [Types of virtual machines and GPUs in Nebius AI Cloud](/compute/virtual-machines/types).

      * `--template-gpu-settings-drivers-preset`: GPU drivers preset, see [GPU drivers and other components](../gpu/set-up#gpu-drivers-and-other-components).

      * `--template-gpu-cluster-id`: GPU cluster ID.

      * `--template-service-account-id`: Service account ID. You can add a service account, for example, to [pull images from Container Registry](../workloads/images-container-registry).

      * `--template-network-interfaces`: Network interface configuration (for example, subnet ID, see [How to use a non-default subnet for Managed Service for Kubernetes® clusters and node groups](../networking/non-default-subnet)).

      * `--template-filesystems`: Filesystem for nodes, see [How to attach volumes to VMs](/compute/storage/use#how-to-attach-volumes-to-vms).

            <Warning>
              The filesystem that you are adding to a node group must be located in the same project as the node group's parent cluster. For more details about projects and resource hierarchy in Nebius AI Cloud, see [How resources, identities and access are managed in Nebius AI Cloud](/iam/overview).
            </Warning>

      * `--template-reservation-policy-policy`: Policy for reservation usage. You can use [reservations of capacity resources](./reservations) and run your node group based on them. As a result, the node group resources are reserved and always available.

      * `--template-reservation-policy-reservation-ids`: IDs of specific reservations. These are capacity block groups that a Nebius manager has created.

        For information about how to configure `--template-reservation-policy-policy` and `--template-reservation-policy-reservation-ids`, see [How to add reservations to node groups](./reservations#how-to-add-reservations-to-node-groups).

    * **Deployment strategy**

      The *deployment strategy* of a node group defines how it is updated when necessary — for example, when you modify the group's node template or Kubernetes version, or when nodes fail and need to be replaced. For more details, see [Deployment strategy and quotas](#deployment-strategy-and-quotas).

      The following parameters specify the deployment strategy:

      * `--strategy-max-unavailable-percent`, `--strategy-max-unavailable-count`: The maximum number of nodes in a group that can be unavailable at any time during an update, set as a percentage of the group's target size or a number of nodes. When a percentage is used, the number of nodes is calculated by rounding down.

        > For example, if the value of `--strategy-max-unavailable-percent` is 40 and the group's target size is 3, at most ⌊3 × 40%⌋ = ⌊1.2⌋ = 1 node can be unavailable at any time during the update. As a result, nodes are replaced one at a time; a running node is not stopped or deleted until the previous one has been replaced by a new running node.

        The default value is 0. Cannot be set to 0 if `--strategy-max-surge-count` or `--strategy-max-surge-percent` is 0.

      * `--strategy-max-surge-percent`, `--strategy-max-surge-count`: The maximum number of nodes in a group that can exceed the group's target size at any time during an update, set as a percentage of the target size or as a number of nodes.

        > For example, if the value of `--strategy-max-surge-count` is 2 and the group's target size is 3, then the group can only have 3 + 2 = 5 nodes at any time during the update.

        The default value is `--strategy-max-surge-count 1`. Cannot be set to 0 if `--strategy-max-unavailable-count` or `--strategy-max-unavailable-percent` is 0.

      * `--strategy-drain-timeout`: The maximum amount of time it can take to drain a node during the update. If the timeout is set, a node in the updated group is deleted when it reaches the timeout, even if its draining is not complete.

            <Warning>
              The timeout is not set by default and nodes are deleted only after the draining is complete.
            </Warning>
  </Tab>

  <Tab title="Terraform">
    The `nebius_mk8s_v1_node_group` resource supports the following parameters:

    * **Metadata**

      * `parent_id`: Cluster ID.
      * `name`: Node group name. Must be unique within the tenant. Cannot be changed after creation.

    * **Kubernetes version on nodes**

      * `version`: Kubernetes version in `<major>.<minor>` format. Recommended version is 1.33. For more information, see [Kubernetes versions in Managed Service for Kubernetes](../versions).

    * **Node group size**

      * `fixed_node_count`: Number of nodes per group. The maximum is 100. Cannot be set together with `autoscaling`.
      * `autoscaling.min_node_count`, `autoscaling.max_node_count`: Allow you to set the range of nodes within which the [cluster autoscaler](./autoscaling) adds or removes nodes as needed. Cannot be set together with `fixed_node_count`.

    * **Node template**

      All nodes in a group are identical and are created based on a *node template*. A node template is similar to a virtual machine specification in Compute.

      The node template is configured in the `template` block and supports the following parameters:

      * `template.taints`: Array of Kubernetes [taints](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) (rules that repel pods from nodes) for all nodes in the group.

      * `template.resources.platform`: A platform with GPUs, see [Interconnecting GPUs in Managed Service for Kubernetes® clusters using InfiniBand™](../gpu/clusters).

      * `template.resources.preset`: A compatible preset (number of GPUs and vCPUs, RAM size), see [Types of virtual machines and GPUs in Nebius AI Cloud](/compute/virtual-machines/types).

      * `template.gpu_settings.drivers_preset`: GPU drivers preset. For more information, see [GPU drivers and other components](../gpu/set-up#gpu-drivers-and-other-components).

      * `template.gpu_cluster.id`: GPU cluster ID.

      * `template.service_account_id`: Service account ID. You can add a service account, for example, to [pull images from Container Registry](../workloads/images-container-registry).

      * `template.network_interfaces`: Network interface configuration (for example, subnet ID, see [How to use a non-default subnet for Managed Service for Kubernetes® clusters and node groups](../networking/non-default-subnet)).

      * `template.filesystems`: Filesystem for nodes, see [How to attach volumes to VMs](/compute/storage/use#how-to-attach-volumes-to-vms).

            <Warning>
              The filesystem that you are adding to a node group must be located in the same project as the node group's parent cluster. For more details about projects and resource hierarchy in Nebius AI Cloud, see [How resources, identities and access are managed in Nebius AI Cloud](/iam/overview).
            </Warning>

      * `template.reservation_policy.policy`: Policy for reservation usage. You can use [reservations of capacity resources](./reservations) and run your node group based on them. As a result, the node group resources are reserved and always available.

      * `template.reservation_policy.reservation_ids`: IDs of specific reservations. These are capacity block groups that a Nebius manager has created.

        For information about how to configure `template.reservation_policy.policy` and `template.reservation_policy.reservation_ids`, see [How to add reservations to node groups](./reservations#how-to-add-reservations-to-node-groups).

    * **Deployment strategy**

      The *deployment strategy* of a node group defines how it is updated when necessary — for example, when you modify the group's node template or Kubernetes version, or when nodes fail and need to be replaced. For more details, see [Deployment strategy and quotas](#deployment-strategy-and-quotas).

      The deployment strategy is configured in the `strategy` block:

      * `strategy.max_unavailable.percent`, `strategy.max_unavailable.count`: The maximum number of nodes in a group that can be unavailable at any time during an update, set as a percentage of the group's target size or a number of nodes. When a percentage is used, the number of nodes is calculated by rounding down.

        > For example, if `strategy.max_unavailable.percent = 40` and the group's target size is 3, at most ⌊3 × 40%⌋ = ⌊1.2⌋ = 1 node can be unavailable at any time during the update. As a result, nodes are replaced one at a time; a running node is not stopped or deleted until the previous one has been replaced by a new running node.

        The default value is 0. Cannot be set to 0 if `strategy.max_surge.count` or `strategy.max_surge.percent` is set to 0.

      * `strategy.max_surge.percent`, `strategy.max_surge.count`: The maximum number of nodes in a group that can exceed the group's target size at any time during an update, set as a percentage of the target size or as a number of nodes.

        > For example, if `strategy.max_surge.count = 2` and the group's target size is 3, then the group can have up to 3 + 2 = 5 nodes at any time during the update.

        The default value is `strategy.max_surge.count = 1`. Cannot be set to 0 if `strategy.max_unavailable.count` or `strategy.max_unavailable.percent` is set to 0.

      * `strategy.drain_timeout`: The maximum amount of time it can take to drain a node during the update. If the timeout is set, a node in the updated group is deleted when it reaches the timeout, even if its draining is not complete.

            <Warning>
              The timeout is not set by default and nodes are deleted only after the draining is complete.
            </Warning>
  </Tab>
</Tabs>

## How to delete node groups

<Tabs>
  <Tab title="Web console">
    1. In the sidebar, go to <Icon icon="https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/sidebar/compute.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=b91340217b08a1456d88ae0347f281d1" width="16" height="16" data-path="_assets/sidebar/compute.svg" /> **Compute** → **Kubernetes**.
    2. Open the cluster page and then go to the **Node groups** tab.
    3. Open the page of the node group that you want to remove.
    4. Switch to the **Settings** tab.
    5. Click **Delete node group**.
    6. Confirm the deletion.
  </Tab>

  <Tab title="CLI">
    To delete a node group, get its ID as shown in [How to modify node groups](#how-to-modify-node-groups) and run the following command:

    ```bash theme={null}
    nebius mk8s node-group delete --id $K8S_NODE_GROUP_ID
    ```
  </Tab>

  <Tab title="Terraform">
    1. Remove the corresponding `nebius_mk8s_v1_node_group` resource from the node group configuration file.
    2. Check that the configuration is correct:
       ```bash theme={null}
       terraform validate
       ```
    3. Apply the changes:
       ```bash theme={null}
       terraform apply
       ```
  </Tab>
</Tabs>

## Examples

<Tabs>
  <Tab title="CLI">
    * Creating a node group with two nodes, each with 8 NVIDIA H100 GPUs, 128 vCPUs, 1600 GiB of RAM, a 100 GiB Network SSD disk and the Kubernetes version 1.33:

      ```bash theme={null}
      export SUBNET_ID=$(nebius vpc subnet list --format json \
        | jq -r '.items[0].metadata.id')
      nebius mk8s node-group create \
        --parent-id $K8S_CLUSTER_ID \
        --name node-group-example \
        --version 1.33 \
        --fixed-node-count 2 \
        --template-resources-platform gpu-h100-sxm \
        --template-resources-preset 8gpu-128vcpu-1600gb \
        --template-gpu-settings-drivers-preset cuda12.8 \
        --template-boot-disk-type NETWORK_SSD \
        --template-boot-disk-size-gibibytes 100 \
        --template-network-interfaces "[{\"subnet_id\": \"$SUBNET_ID\"}]"
      ```

    * Modifying the node group from the previous example (ID `$K8S_NODE_GROUP_ID`) to add a node and enable public IP addresses for all nodes:

      ```bash theme={null}
      nebius mk8s node-group update \
        --id $K8S_NODE_GROUP_ID \
        --fixed-node-count 3 \
        --template-network-interfaces "[{\"subnet_id\": \"$SUBNET_ID\", \"public_ip_address\": {}}]"
      ```
  </Tab>

  <Tab title="Terraform">
    Creating a node group with two nodes, each with 8 NVIDIA H100 GPUs, 128 vCPUs, 1600 GiB of RAM, a 100 GiB Network SSD disk and the Kubernetes version 1.33:

    ```hcl theme={null}
    resource "nebius_mk8s_v1_node_group" "node-group-example" {
      name = "node-group-example"
      parent_id = $K8S_CLUSTER_ID
      version = "1.33"
      fixed_node_count = 2

      template = {
        resources = {
          platform = "gpu-h100-sxm"
          preset = "8gpu-128vcpu-1600gb"
        }

        gpu_settings = {
          drivers_preset = "cuda12.8"
        }

        boot_disk = {
          type = "NETWORK_SSD"
          size_gibibytes = 100
        }
      }
    }
    ```
  </Tab>
</Tabs>

***

*InfiniBand and InfiniBand Trade Association are registered trademarks of the InfiniBand Trade Association.*
