- Disks are block storage volumes. One disk can be used on one VM at a time. Each VM has a boot disk, and you can add more disks to a VM for other purposes.
- Shared filesystems are file storage volumes. A filesystem can be shared by multiple VMs.
Disks
Disks provide block storage for Compute VMs. Data that you put on disks is divided into blocks that are stored efficiently and reliably on underlying physical drives. One Compute disk can only be added to one VM at a time. A disk can be created as a boot disk for a VM, from one of the boot disk images provided by Nebius AI Cloud, or as an empty additional (non-boot) disk.Disk types
-
Network SSD (
network_ssd) Reliable disks backed by solid state drives (SSDs). Best used for infrastructure purposes: as VM boot disks, disks on Slurm controller VMs, etc. Reliability of Network SSD disks is ensured by means of erasure coding. A disk will tolerate up to two concurrent hardware failures in the Nebius AI Cloud region. -
Network SSD Non-replicated (Network SSD NRD,
network_ssd_non_replicated) High-performance disks backed by SSDs. Best used as temporary storage without strict reliability requirements, e.g., a boot disk for nodes in Managed Service for Kubernetes® clusters where redundancy is less important. Data blocks of Network SSD Non-replicated disks are stored in NVMe namespaces on underlying SSDs. The size of each namespace is 93 GiB, so the disk size must be a multiple of 93 GiB. -
Network SSD IO M3 (
network_ssd_io_m3) High-performance and reliable disks backed by SSDs. Best used in performance-critical storage solutions, e.g., as storage disks in GlusterFS clusters. In addition to performance levels similar to Network SSD Non-replicated disks, Network SSD IO M3 disks are reliable through replication, with each disk’s data mirrored to three physical drives. In the disk type name, “IO” stands for “input/output (optimized)” and “M3” stands for “mirrors 3”.
Disk types comparison
Disk types are differentiated by several characteristics:- Performance (bandwidth and IOPS) for the same disk size: SSD IO M3 = SSD Non-replicated > SSD
- Reliability: SSD IO M3 > SSD > SSD Non-replicated
- Price: SSD Non-replicated < SSD < SSD IO M3
| Disk type | SSD | SSD NRD | SSD IO M3 |
|---|---|---|---|
| Capacity | 1–8192 GiB (= 8 TiB) | 93–262,074 GiB (~ 256 TiB), multiples of 93 GiB | 93–262,074 GiB (~ 256 TiB), multiples of 93 GiB |
| Maximum read/write bandwidth | 450 MiB/s | 1 GiB/s | 1 GiB/s |
| Maximum read IOPS | 20,000 | 75,000 | 75,000 |
| Maximum write IOPS | 40,000 | 75,000 | 75,000 |
| Reliability features | Erasure coding – tolerates two concurrent hardware failures | None | Replication – data mirrored to three drives |
| Price per 1 GiB per month¹ | $0.071 | $0.053 | $0.118 |
Encryption of disks
To store personal and other sensitive data securely, and to reduce the risk of unauthorized access, you can enable data encryption. To do so, create a disk with data encryption enabled. For SSD disks, encryption is enabled by default and cannot be disabled. For SSD NRD and SSD IO M3 disks, encryption is optional and you can enable it. Encryption is available for both boot disks and secondary disks. For more information, see Encryption in Nebius AI Cloud.Disk performance
Disk performance depends on both the disk type and its size — it increases as you add more allocation units. Every allocation unit contributes equally to the overall bandwidth and IOPS until you reach their maximum values for the disk. While you have fewer allocation units than needed to reach the maximum values, the disk performance equals the sum of each unit’s performance. Once the sum reaches the maximum for bandwidth or IOPS, adding more capacity does not increase performance any further. Disk performance metrics per allocation unit are shown in the following table:| Disk type | SSD | SSD NRD | SSD IO M3 |
|---|---|---|---|
| Allocation unit size | 32 GiB | 93 GiB | 93 GiB |
| Maximum read bandwidth per unit | 15 MiB/s | 110 MiB/s | 110 MiB/s |
| Maximum write bandwidth per unit | 15 MiB/s | 82 MiB/s | 82 MiB/s |
| Maximum read IOPS per unit | 1,000 | 28,000 | 28,000 |
| Maximum write IOPS per unit | 1,000 | 5,600 | 5,600 |
- SSD disk
- SSD NRD or SSD IO M3 disk
How to reach the maximum for each disk performance metric:
- Read/write bandwidth:
- Maximum overall bandwidth is 450 MiB/s.
- Number of allocation units needed to achieve it = Maximum overall read/write bandwidth ÷ Maximum read/write bandwidth per unit = 450 MiB/s ÷ 15 MiB/s = 30 units.
- Minimum disk size = 30 × 32 GiB = 960 GiB.
- Read IOPS:
- Maximum overall read IOPS is 20,000.
- Number of allocation units needed to achieve it = Maximum overall read IOPS ÷ Maximum overall read IOPS per unit = 20,000 ÷ 1,000 = 20 units.
- Minimum disk size = 20 × 32 GiB = 640 GiB.
- Write IOPS:
- Maximum overall write IOPS is 40,000.
- Number of allocation units needed to achieve it = Maximum overall write IOPS ÷ Maximum write IOPS per unit = 40,000 ÷ 1,000 = 40 units.
- Minimum disk size = 40 × 32 GiB = 1280 GiB.
Shared filesystems
Shared filesystems provide file storage for Compute VMs. When VMs work with file storage, they work with a hierarchy of folders and files, as opposed to blocks when disks are involved. One shared filesystem can be attached to multiple VMs at once. All VMs that the filesystem is attached to must belong to the same project. Sharing a filesystem across projects is not supported, even if the projects are in the same region. To use a shared filesystem on a VM that it is attached to, you must mount it as a virtiofs device. Filesystems support data encryption by default; you cannot disable it. Encryption allows you to store personal and other sensitive data on filesystems securely, and reduce the risk of unauthorized access.Filesystem specifications
Nebius provides shared filesystems that are generally based on solid state drives (network_ssd in developer tools such as Nebius AI Cloud CLI or provider for Terraform) to back them up.
Shared filesystems have the following specifications:
- Capacity: 1–5,242,880 GiB (= 5120 TiB = 5 PiB)
- Maximum read bandwidth per client: 12 GiB/s
- Maximum write bandwidth per client: 8 GiB/s
- Maximum aggregate read bandwidth: 940 GiB/s
- Maximum aggregate write bandwidth: 480 GiB/s
- Maximum file size: 512 GiB × filesystem’s block size in KiB (for example, for the default block size it will be 512 GiB × 4 = 2TiB)
-
Maximum number of inodes:
- For filesystems up to 256 GiB: 4,194,304 (4 × 220)
- For filesystems larger than 256 GiB: filesystem’s size ÷ 64 KiB
- Reliability features: Erasure coding – tolerates two concurrent hardware failures
- Price per 1 GiB per month¹: $0.07
Filesystem performance
Filesystem performance depends on its size — performance increases with each 4 TiB of the filesystem size. Every 4 TiB contributes equally to the overall bandwidth until you reach their maximum values for the filesystem. Each 4 TiB of SSD filesystem improves the performance metrics by the following values:- Maximum read bandwidth: by 3.70 GiB/s.
- Maximum write bandwidth: by 1.89 GiB/s.
Use cases and suggestions
Here is an overview of various stages of ML/AI workloads and storage options that Compute and other Nebius AI Cloud services provide for these stages, as well as general suggestions for Compute volumes usage.Infrastructure
- VM boot disks: Network SSD disks For OS and system data on your VMs, the main storage requirement is reliability, so the suggested disk type is Network SSD.
- VM storage disks: Network SSD IO M3 disks If you are building storage solutions on VMs, e.g., GlusterFS clusters, use Network SSD IO M3 disks for speed and reliability.
- Managed Kubernetes node storage: Network SSD Non-replicated disks For Kubernetes worker nodes where data reliability isn’t a top priority (as the data is often transient), Network SSD Non-replicated disks provide high performance and low latency.
- Database hosts: Network SSD IO M3 disks Database workloads often demand consistent and high input/output operations per second (IOPS), and Network SSD IO M3 disks offer the best combination of reliability and speed.
Data preparation
- Storing and preprocessing datasets: Object Storage buckets Object Storage is ideal for handling large, unstructured datasets and provides a scalable solution for preprocessing tasks like data normalization or augmentation.
Training
- Streaming datasets to workers: SSD shared filesystems or Object Storage buckets In most cases, SSD shared filesystems ensure fast access to datasets during training, without bottlenecks. For exceptionally large datasets (1 PiB+) or distributed training across external workers, Object Storage buckets provide a scalable solution.
- Sharing code between workers: SSD shared filesystems By using SSD shared filesystems, multiple workers can efficiently access and synchronize code during distributed training, ensuring consistency and minimizing latency.
- Checkpoints: SSD shared filesystems, then Object Storage buckets During training, SSD shared filesystems allow for quickly saving and loading checkpoints, thus ensuring minimal disruption. Once training is completed, asynchronous transfer to Object Storage reduces costs while maintaining accessibility for future use.
Inference
- Autoscaling, sharing weights between GPUs: SSD shared filesystems When inference workloads require scaling across multiple VMs with GPUs, SSD shared filesystems allow for fast sharing of model weights, thus ensuring consistent performance across nodes during autoscaling.
- Sharing results: Object Storage buckets Once inference tasks are completed, Object Storage is ideal for sharing outputs such as logs, predictions or reports with other users or systems, due to its scalable and cost-efficient nature.
General suggestions
- For maximum IOPS, reads and writes to a volume should be close to its block size.
- For maximum bandwidth, reads and writes to a volume should be in 4 MiB chunks.