Skip to main content
NodeGroup represents Kubernetes node pool - set of worker machines having the same configuration. A Node is a Nebius Compute Instance created in Cluster.metadata.parent_id container, running kubelet that registers in Kubernetes API and a Node object created.

Schema

Required

  • parent_id (String) Identifier of the parent resource to which the resource belongs.
  • template (Attributes) : Parameters for Kubernetes Node object and Nebius Compute Instance If not written opposite a NodeTemplate field update will cause NodeGroup roll-out according NodeGroupDeploymentStrategy. (see below for nested schema)

Optional

  • auto_repair (Attributes) Parameters for nodes auto repair. (see below for nested schema)
  • autoscaling (Attributes) : Enables Kubernetes Cluster Autoscaler for that NodeGroup, and defines autoscaling parameters. Cannot be set alongside fixed_node_count. (see below for nested schema)
  • fixed_node_count (Number) : Number of nodes in the group. Can be changed manually at any time. Cannot be set alongside autoscaling.
  • labels (Map of String) : Labels associated with the resource.
  • metadata (Attributes) :

    Inner value description

    Common resource metadata. The parent_id is an ID of Cluster (see below for nested schema)
  • name (String) Human readable name for the resource.
  • strategy (Attributes) : Defines deployment - roll-out, or nodes re-creation during configuration change. Allows to setup compromise in roll-out speed, extra resources consumption and workloads disruption. (see below for nested schema)
  • version (String) : Version is desired Kubernetes version of the cluster. For now only acceptable format is <major>.<minor> like “1.31”. Option for patch version update will be added later. By default the cluster control plane <major>.<minor> version will be used.

Read-Only

  • created_at (String) : Timestamp indicating when the resource was created. A string representing a timestamp in ISO 8601 format: YYYY-MM-DDTHH:MM:SSZ or YYYY-MM-DDTHH:MM:SS.SSS±HH:MM
  • id (String) Identifier for the resource, unique for its resource type.
  • resource_version (Number) : Version of the resource for safe concurrent modifications and consistent reads. Positive and monotonically increases on each resource spec change (but not on each change of the resource’s container(s) or status). Service allows zero value or current.
  • status (Attributes) (see below for nested schema)
  • updated_at (String) : Timestamp indicating when the resource was last updated. A string representing a timestamp in ISO 8601 format: YYYY-MM-DDTHH:MM:SSZ or YYYY-MM-DDTHH:MM:SS.SSS±HH:MM

Nested Schema for template

Required:
  • resources (Attributes) Resources that will have Nebius Compute Instance where Node kubelet will run. (see below for nested schema)
Optional:
  • boot_disk (Attributes) Parameters of a Node Nebius Compute Instance boot disk. (see below for nested schema)
  • cloud_init_user_data (String, Sensitive) : cloud-init user-data Should contain at least one SSH key.
  • filesystems (Attributes List) : Static attachments of Compute Filesystem. Can be used as a workaround, until CSI for Compute Disk and Filesystem will be available. (see below for nested schema)
  • gpu_cluster (Attributes) Nebius Compute GPUCluster ID that will be attached to node. (see below for nested schema)
  • gpu_settings (Attributes) : GPU-related settings.

    Inner value description

    GPU-related settings. (see below for nested schema)
  • local_disks (Attributes) : local_disks enables the provisioning of fast local drives. This type of storage is strictly ephemeral: on node restart, all data is erased, similar to RAM. (see below for nested schema)
  • metadata (Attributes) (see below for nested schema)
  • network_interfaces (Attributes List) (see below for nested schema)
  • os (String) : OS version that will be used to create the boot disk of Compute Instances in the NodeGroup. Supported platform / Kubernetes version / OS / driver presets combinations
    • gpu-l40s-a, gpu-l40s-d, gpu-h100-sxm, gpu-h200-sxm, cpu-e1, cpu-e2, cpu-d3:
      • drivers_preset: ""
        • version: 1.30 → "ubuntu22.04"
        • version: 1.31 → "ubuntu22.04" (default), "ubuntu24.04"
    • gpu-l40s-a, gpu-l40s-d, gpu-h100-sxm, gpu-h200-sxm:
      • drivers_preset: "cuda12" (CUDA 12.4)
        • version: 1.30, 1.31 → "ubuntu22.04"
      • drivers_preset: "cuda12.4"
        • version: 1.31 → "ubuntu22.04"
      • drivers_preset: "cuda12.8"
        • version: 1.31 → "ubuntu24.04"
    • gpu-b200-sxm:
      • drivers_preset: ""
        • version: 1.30, 1.31 → "ubuntu24.04"
      • drivers_preset: "cuda12" (CUDA 12.8)
        • version: 1.30, 1.31 → "ubuntu24.04"
      • drivers_preset: "cuda12.8"
        • version: 1.31 → "ubuntu24.04"
    • gpu-b200-sxm-a:
      • drivers_preset: ""
        • version: 1.31 → "ubuntu24.04"
      • drivers_preset: "cuda12.8"
        • version: 1.31 → "ubuntu24.04"
  • preemptible (Attributes) : Configures whether the nodes in the group are preemptible. Set to empty value to enable preemptible nodes. (see below for nested schema)
  • reservation_policy (Attributes) : reservation_policy is an interface of the “capacity block” (or “capacity block group”) mechanism of Nebius Compute.

    Inner value description

    ReservationPolicy is copied as-is from NebiusAPI compute/v1/instance.proto. (see below for nested schema)
  • service_account_id (String) : the Nebius service account whose credentials will be available on the nodes of the group. With these credentials, it is possible to make nebius CLI or public API requests from the nodes without the need for extra authentication. This service account is also used to make requests to container registry. resource.serviceaccount.issueAccessToken permission is required to use this field.
  • taints (Attributes List) : Kubernetes Node taints. For now change will not be propagated to existing nodes, so will be applied only to Kubernetes Nodes created after the field change. That behaviour may change later. So, for now you will need to manually set them to existing nodes, if that is needed. Field change will NOT trigger NodeGroup roll out.

    Inner value description

    See https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/ (see below for nested schema)

Nested Schema for template.resources

Required:
  • platform (String)
Optional:
  • preset (String)

Nested Schema for template.boot_disk

Optional:
  • block_size_bytes (Number)
  • size_bytes (Number) Cannot be set alongside size_kibibytes, size_mebibytes or size_gibibytes.
  • size_gibibytes (Number) Cannot be set alongside size_bytes, size_kibibytes or size_mebibytes.
  • size_kibibytes (Number) Cannot be set alongside size_bytes, size_mebibytes or size_gibibytes.
  • size_mebibytes (Number) Cannot be set alongside size_bytes, size_kibibytes or size_gibibytes.
  • type (String) :

    Supported values

    Possible values:
    • UNSPECIFIED
    • NETWORK_SSD
    • NETWORK_HDD
    • NETWORK_SSD_IO_M3
    • NETWORK_SSD_NON_REPLICATED

Nested Schema for template.filesystems

Required:
  • attach_mode (String) :

    Supported values

    Possible values:
    • UNSPECIFIED
    • READ_ONLY
    • READ_WRITE
  • mount_tag (String) Specifies the user-defined identifier, allowing to use it as a device in mount command.
Optional:

Nested Schema for template.filesystems.existing_filesystem

Required:
  • id (String)

Nested Schema for template.gpu_cluster

Optional:
  • id (String)

Nested Schema for template.gpu_settings

Required:
  • drivers_preset (String) : Identifier of the predefined set of drivers included in the ComputeImage deployed on ComputeInstances that are part of the NodeGroup. Supported presets for different platform / Kubernetes version combinations:
    • gpu-l40s-a, gpu-l40s-d, gpu-h100-sxm, gpu-h200-sxm:
      • version: 1.30 → "cuda12" (CUDA 12.4)
      • version: 1.31 → "cuda12" (CUDA 12.4), "cuda12.4", "cuda12.8"
    • gpu-b200-sxm:
      • version: 1.31 → "cuda12" (CUDA 12.8), "cuda12.8"
    • gpu-b200-sxm-a:
      • version: 1.31 → "cuda12.8"

Nested Schema for template.local_disks

Required:
  • config (Attributes) : config defines actions that managed Kubernetes service performs on mounted local disks to provide them inside Kubernetes cluster with a convenient interface.

    Inner value description

    LocalDisksSpecConfig defines actions that managed Kubernetes service performs on mounted local disks to provide them inside Kubernetes cluster with a convenient interface. (see below for nested schema)
Optional:
  • passthrough_group (Attributes) : Requests passthrough local disks from the host. Topology of the provided disks is preserved during stop and start for every instance of a specific platform and preset in the region. (see below for nested schema)

Nested Schema for template.local_disks.config

Optional:
  • none (Boolean) none: “do nothing” - local disks will be provisioned as on a regular compute instance.

Nested Schema for template.local_disks.passthrough_group

Optional:
  • requested (Boolean) : Passthrough local disks from the underlying host. Devices are expected to appear in the guest as NVMe devices (nvme0, nvme1, …), but the exact number depends on the preset. Enabled only when this field is explicitly set.

Nested Schema for template.metadata

Optional:
  • labels (Map of String) : Kubernetes Node labels. Keys and values must follow Kubernetes label syntax: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/ For now change will not be propagated to existing nodes, so will be applied only to Kubernetes Nodes created after the field change. That behavior may change later. So, for now you will need to manually set them to existing nodes, if that is needed. System labels containing “kubernetes.io” and “k8s.io” will be ignored. Field change will NOT trigger NodeGroup roll out.

Nested Schema for template.network_interfaces

Optional:
  • public_ip_address (Attributes) : Parameters for Public IPv4 address associated with the interface. Set to empty value, to enable it.

    Inner value description

    Describes a public IP address. (see below for nested schema)
  • subnet_id (String) : Nebius VPC Subnet ID that will be attached to a node cloud instance network interface. By default Cluster control plane subnet_id used. Subnet should be located in the same network with control plane.

Nested Schema for template.network_interfaces.public_ip_address

Nested Schema for template.preemptible

Nested Schema for template.reservation_policy

Optional:
  • policy (String) :

    Supported values

    Possible values:
    • AUTO:
      1. Will try to launch instance in any reservation_ids if provided.
      2. Will try to launch instance in any of the available capacity block.
      3. Will try to launch instance in PAYG if 1 & 2 are not satisfied.
    • FORBID: The instance is launched only using on-demand (PAYG) capacity. No attempt is made to find or use a Capacity Block. It’s an error to provide reservation_ids with policy = FORBID
    • STRICT:
      1. Will try to launch the instance in Capacity Blocks from reservation_ids if provided.
      2. If reservation_ids are not provided will try to launch instance in suitable & available Capacity Block.
      3. Fail otherwise.
  • reservation_ids (List of String) Capacity block groups, order matters

Nested Schema for template.taints

Required:
  • effect (String) :

    Supported values

    Possible values:
    • EFFECT_UNSPECIFIED
    • NO_EXECUTE
    • NO_SCHEDULE
    • PREFER_NO_SCHEDULE
  • key (String)
  • value (String)

Nested Schema for auto_repair

Optional:
  • conditions (Attributes List) Conditions that determine whether a node should be auto repaired. (see below for nested schema)

Nested Schema for auto_repair.conditions

Required:
  • type (String) Node condition type.
Optional:
  • disabled (Boolean) : When true, disables the default auto-repair condition rules. Cannot be set alongside timeout.
  • status (String) : Node condition status.

    Supported values

    Possible values:
    • CONDITION_STATUS_UNSPECIFIED
    • TRUE
    • FALSE
    • UNKNOWN
  • timeout (String) : The duration after which the node is automatically repaired if the condition remains in the specified status. Duration as a string: possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as 300ms, -1.5h or 2h45m. Valid time units are ns, us (or µs), ms, s, m, h, d. Cannot be set alongside disabled.

Nested Schema for autoscaling

Optional:
  • max_node_count (Number)
  • min_node_count (Number)

Nested Schema for metadata

Nested Schema for strategy

Optional:
  • drain_timeout (String) : Maximum amount of time that the service will spend on attempting gracefully draining a node (evicting it’s pods), before falling back to pod deletion. By default, node can be drained unlimited time. Important consequence of that is if PodDisruptionBudget doesn’t allow to evict a pod, then NodeGroup update with node re-creation will hung on that pod eviction. Note, that it is different from kubectl drain --timeout Duration as a string: possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as 300ms, -1.5h or 2h45m. Valid time units are ns, us (or µs), ms, s, m, h, d.
  • max_surge (Attributes) : The maximum number of additional nodes that can be provisioned above the desired number of nodes during the update process. This value can be specified either as an absolute number (for example 3) or as a percentage of the desired number of nodes (for example 5%). When specified as a percentage, the actual number is calculated by rounding up to the nearest whole number. This value cannot be 0 if max_unavailable is also set to 0. Defaults to 1. Example: If set to 25%, the node group can scale up by an additional 25% during the update, allowing new nodes to be added before old nodes are removed, which helps minimize workload disruption. NOTE: it is user responsibility to ensure that there are enough quota for provision nodes above the desired number. Available quota effectively limits max_surge. In case of not enough quota even for one extra node, update operation will hung because of quota exhausted error. Such error will be visible in Operation.progress_data. (see below for nested schema)
  • max_unavailable (Attributes) : The maximum number of nodes that can be simultaneously unavailable during the update process. This value can be specified either as an absolute number (for example 3) or as a percentage of the desired number of nodes (for example 5%). When specified as a percentage, the actual number is calculated by rounding down to the nearest whole number. This value cannot be 0 if max_surge is also set to 0. Defaults to 0. Example: If set to 20%, up to 20% of the nodes can be taken offline at once during the update, ensuring that at least 80% of the desired nodes remain operational. (see below for nested schema)

Nested Schema for strategy.max_surge

Optional:
  • count (Number) Cannot be set alongside percent.
  • percent (Number) Cannot be set alongside count.

Nested Schema for strategy.max_unavailable

Optional:
  • count (Number) Cannot be set alongside percent.
  • percent (Number) Cannot be set alongside count.

Nested Schema for status

Read-Only:
  • events (Attributes List) :

    Inner value description

    A resource event that has occurred (more or less in the same way) multiple times across a service-defined aggregation interval (see below for nested schema)
  • node_count (Number) : Total number of nodes that are currently in the node group. Both ready and not ready nodes are counted.
  • outdated_node_count (Number) : Total number of nodes that has outdated node configuration. These nodes will be replaced by new nodes with up-to-date configuration.
  • ready_node_count (Number) : Total number of nodes that successfully joined the cluster and are ready to serve workloads. Both outdated and up-to-date nodes are counted.
  • reconciling (Boolean) Show that there are changes are in flight.
  • state (String) :

    Supported values

    Possible values:
    • STATE_UNSPECIFIED
    • PROVISIONING
    • RUNNING
    • DELETING
  • target_node_count (Number) : Desired total number of nodes that should be in the node group. It is either NodeGroupSpec.fixed_node_count or arbitrary number between NodeGroupAutoscalingSpec.min_node_count and NodeGroupAutoscalingSpec.max_node_count decided by autoscaler.
  • version (String) : Actual version of NodeGroup. Have format <major>.<minor>.<patch>-nebius-node.<infra_version> like “1.30.0-nebius-node.10”. Where <major>.<minor>.<patch> is Kubernetes version and <infra_version> is version of Node infrastructure and configuration, which update may include bug fixes, security updates and new features depending on worker node configuration.

Nested Schema for status.events

Read-Only:
  • first_occurred_at (String) : Time of the first occurrence of a recurrent event A string representing a timestamp in ISO 8601 format: YYYY-MM-DDTHH:MM:SSZ or YYYY-MM-DDTHH:MM:SS.SSS±HH:MM
  • last_occurrence (Attributes) : Last occurrence of a recurrent event

    Inner value description

    Represents an API Resource-related event which is potentially important to the end-user. What exactly constitutes an event to be reported is service-dependent (see below for nested schema)
  • occurrence_count (Number) The number of times this event has occurred between first_occurred_at and last_occurrence.occurred_at. Must be > 0

Nested Schema for status.events.last_occurrence

Read-Only:
  • code (String) Event code (unique within the API service), in UpperCamelCase, e.g. "DiskAttached"
  • level (String) : Severity level for the event

    Supported values

    Possible values:
    • UNSPECIFIED - Unspecified event severity level
    • DEBUG - A debug event providing detailed insight. Such events are used to debug problems with specific resource(s) and process(es)
    • INFO - A normal event or state change. Informs what is happening with the API resource. Does not require user attention or interaction
    • WARN: Warning event. Indicates a potential or minor problem with the API resource and/or the corresponding processes. Needs user attention, but requires no immediate action (yet)
    • ERROR - Error event. Indicates a serious problem with the API resource and/or the corresponding processes. Requires immediate user action
  • message (String) : A human-readable message describing what has happened (and suggested actions for the user, if this is a WARN or ERROR level event)
  • occurred_at (String) : Time at which the event has occurred A string representing a timestamp in ISO 8601 format: YYYY-MM-DDTHH:MM:SSZ or YYYY-MM-DDTHH:MM:SS.SSS±HH:MM