- Disks:
- Network SSD disks: managed block storage volumes.
- Local SSD disks: ephemeral host-local block storage.
- Shared filesystems are file storage volumes. A filesystem can be shared by multiple VMs.
Disks
Disks provide block storage for Compute VMs. Data that you put on disks is divided into blocks that are stored efficiently and reliably on underlying physical drives. Compute offers two kinds of disks:- Network SSD disks can be created as a boot disk for a VM, from one of the boot disk images provided by Nebius AI Cloud, or as an empty additional (non-boot) disk. One Compute network SSD disk can only be added to one running VM at a time.
- Local SSD disks are Non-Volatile Memory Express (NVMe) drives physically attached to the compute host that runs a virtual machine (VM). They are not created or managed as separate storage resources, and you can only add them when creating a VM.
Network SSD disks
-
Network SSD (
network_ssd) Reliable disks backed by solid state drives (SSDs). Best used for infrastructure purposes: as VM boot disks, disks on Slurm controller VMs, etc. Reliability of Network SSD disks is ensured by means of erasure coding. A disk will tolerate up to two concurrent hardware failures in the Nebius AI Cloud region. -
Network SSD Non-replicated (Network SSD NRD,
network_ssd_non_replicated) High-performance disks backed by SSDs. Best used as temporary storage without strict reliability requirements, e.g., a boot disk for nodes in Managed Service for Kubernetes® clusters where redundancy is less important. Data blocks of Network SSD Non-replicated disks are stored in NVMe namespaces on underlying SSDs. The size of each namespace is 93 GiB, so the disk size must be a multiple of 93 GiB. -
Network SSD IO M3 (
network_ssd_io_m3) High-performance and reliable disks backed by SSDs. Best used in performance-critical storage solutions, e.g., as storage disks in GlusterFS clusters. In addition to performance levels similar to Network SSD Non-replicated disks, Network SSD IO M3 disks are reliable through replication, with each disk’s data mirrored to three physical drives. In the disk type name, “IO” stands for “input/output (optimized)” and “M3” stands for “mirrors 3”.
Local SSD disks
Consider local SSD disks if your workload needs high-performance, low-latency storage for data that can be recreated. Local SSD disks are ephemeral, meaning temporary: their data is erased when the VM is stopped or deleted. Local SSD disks are available only on some platforms, presets and regions. For details, see Availability. Local SSD disks are useful for:- Scratch space for files and intermediate outputs that can be recreated.
- Training or inference workloads that require low latency.
- Read or write caches for data that can be recreated.
Disk types comparison
Network SSD disk types are differentiated by several characteristics:- Performance (bandwidth and IOPS): Local SSD > SSD IO M3 = SSD Non-replicated > SSD
- Reliability: SSD IO M3 > SSD > SSD Non-replicated = Local SSD
- Price per 1 GiB per month: SSD Non-replicated < Local SSD < SSD < SSD IO M3
| Disk type | Network SSD | Network SSD NRD | Network SSD IO M3 | Local SSD disk¹ |
|---|---|---|---|---|
| Capacity | 1–8192 GiB (= 8 TiB) | 93–262,074 GiB (~ 256 TiB), multiples of 93 GiB | 93–262,074 GiB (~ 256 TiB), multiples of 93 GiB | 3.5 TiB |
| Read/write bandwidth | 450 MiB/s | 2 GiB/s | 2 GiB/s | Read: 6.8 GB/s Write: 2.6 GB/s |
| Read IOPS | 20,000 | 75,000 | 75,000 | 510,000 |
| Write IOPS | 20,000 | 75,000 | 75,000 | 350,000 |
| Reliability features | Erasure coding – tolerates two concurrent hardware failures | None | Replication – data mirrored to three drives | None |
| Price per 1 GiB per month² | $0.071 | $0.053 | $0.118 | $0.065 |
Encryption of disks
To store personal and other sensitive data securely, and to reduce the risk of unauthorized access, you can enable data encryption. To do so, create a disk with data encryption enabled. For Network SSD disks, encryption is enabled by default and cannot be disabled. For Network SSD NRD and Network SSD IO M3 disks, encryption is optional and you can enable it. Encryption is available for both boot disks and secondary disks. Local SSD disks are not encrypted and no other data protection method is applied to them. Ensuring the protection of data stored on these devices is your responsibility. We don’t recommend using local SSD disks for data that can’t be recreated. For more information, see Encryption in Nebius AI Cloud.Disk performance
Disk performance depends on both the disk type and its size — it increases as you add more allocation units. Every allocation unit contributes equally to the overall bandwidth and IOPS until you reach the upper bounds for the disk. While you have fewer allocation units than needed to reach the upper bounds, the disk performance equals the sum of each unit’s performance. Once the sum reaches the upper bound for bandwidth or IOPS, adding more capacity does not increase performance any further. Disk performance metrics per allocation unit are shown in the following table. The bandwidth and IOPS values below are upper bounds; actual performance depends on workload.| Disk type | Network SSD | Network SSD NRD | Network SSD IO M3 | Local SSD disk¹ |
|---|---|---|---|---|
| Allocation unit size | 32 GiB | 93 GiB | 93 GiB | 3.5 TiB |
| Read bandwidth per unit | 15 MiB/s | 110 MiB/s | 110 MiB/s | 6.8 GB/s |
| Write bandwidth per unit | 15 MiB/s | 82 MiB/s | 82 MiB/s | 2.6 GB/s |
| Read IOPS per unit | 1,000 | 28,000 | 28,000 | 510,000 |
| Write IOPS per unit | 1,000 | 5,600 | 5,600 | 350,000 |
- SSD disk
- SSD NRD or SSD IO M3 disk
How to reach the maximum for each disk performance metric:
- Read/write bandwidth:
- Maximum overall bandwidth is 450 MiB/s.
- Number of allocation units needed to achieve it = Maximum overall read/write bandwidth ÷ Maximum read/write bandwidth per unit = 450 MiB/s ÷ 15 MiB/s = 30 units.
- Minimum disk size = 30 × 32 GiB = 960 GiB.
- Read IOPS:
- Maximum overall read IOPS is 20,000.
- Number of allocation units needed to achieve it = Maximum overall read IOPS ÷ Maximum overall read IOPS per unit = 20,000 ÷ 1,000 = 20 units.
- Minimum disk size = 20 × 32 GiB = 640 GiB.
- Write IOPS:
- Maximum overall write IOPS is 20,000.
- Number of allocation units needed to achieve it = Maximum overall write IOPS ÷ Maximum write IOPS per unit = 20,000 ÷ 1,000 = 20 units.
- Minimum disk size = 20 × 32 GiB = 640 GiB.
Shared filesystems
Shared filesystems provide file storage for Compute VMs. When VMs work with file storage, they work with a hierarchy of folders and files, as opposed to blocks when disks are involved. One shared filesystem can be attached to multiple VMs at once. All VMs that the filesystem is attached to must belong to the same project. Sharing a filesystem across projects is not supported, even if the projects are in the same region. To use a shared filesystem on a VM that it is attached to, you must mount it as a virtiofs device. Filesystems support data encryption by default; you cannot disable it. Encryption allows you to store personal and other sensitive data on filesystems securely, and reduce the risk of unauthorized access.Filesystem specifications
Nebius provides shared filesystems that are generally based on solid state drives (network_ssd in developer tools such as Nebius AI Cloud CLI or provider for Terraform) to back them up.
Shared filesystems have the following specifications:
- Capacity: 1–5,242,880 GiB (= 5120 TiB = 5 PiB)
- Read bandwidth per client: up to 12 GiB/s
- Write bandwidth per client: up to 8 GiB/s
- Aggregate read bandwidth: up to 940 GiB/s
- Aggregate write bandwidth: up to 475 GiB/s
- Maximum file size: 512 GiB × filesystem’s block size in KiB (for example, for the default block size it will be 512 GiB × 4 = 2TiB)
-
Maximum number of inodes:
- For filesystems up to 256 GiB: 4,194,304 (4 × 220)
- For filesystems larger than 256 GiB: filesystem’s size ÷ 64 KiB
- Reliability features: Erasure coding – tolerates two concurrent hardware failures
- Price per 1 GiB per month¹: $0.07
Filesystem performance
Filesystem performance depends on its size — performance increases with each 4 TiB of the filesystem size. Every 4 TiB contributes equally to the overall bandwidth until you reach the upper bounds for the filesystem. Each 4 TiB of SSD filesystem improves the performance metrics by the following values:- Read bandwidth: by up to 3.70 GiB/s.
- Write bandwidth: by up to 1.89 GiB/s.
Storage types comparison
| Characteristic | Local SSD disks | Network disks | Shared filesystems |
|---|---|---|---|
| Durability | Ephemeral | Durable | Durable |
| Attachment model | Tied to the VM run and physical host | Independent volume attached to one running VM at a time | Independent volume that can be shared by multiple VMs |
| Capacity model | Preconfigured, all or none | User-defined | User-defined |
Use cases and suggestions
Here is an overview of various stages of ML/AI workloads and storage options that Compute and other Nebius AI Cloud services provide for these stages, as well as general suggestions for using Compute storage.Infrastructure
- VM boot disks: Network SSD disks For OS and system data on your VMs, the main storage requirement is reliability, so the suggested disk type is Network SSD.
- VM storage disks: Network SSD IO M3 disks If you are building storage solutions on VMs, e.g., GlusterFS clusters, use Network SSD IO M3 disks for speed and reliability.
- Managed Kubernetes node storage: Network SSD Non-replicated disks For Kubernetes worker nodes where data reliability isn’t a top priority (as the data is often transient), Network SSD Non-replicated disks provide high performance and low latency.
- Database hosts: Network SSD IO M3 disks Database workloads often demand consistent and high input/output operations per second (IOPS), and Network SSD IO M3 disks offer the best combination of reliability and speed.
Data preparation
- Storing and preprocessing datasets: Object Storage buckets Object Storage is ideal for handling large, unstructured datasets and provides a scalable solution for preprocessing tasks like data normalization or augmentation.
Training
- Streaming datasets to workers: SSD shared filesystems or Object Storage buckets In most cases, SSD shared filesystems ensure fast access to datasets during training, without bottlenecks. For exceptionally large datasets (1 PiB+) or distributed training across external workers, Object Storage buckets provide a scalable solution.
- Sharing code between workers: SSD shared filesystems By using SSD shared filesystems, multiple workers can efficiently access and synchronize code during distributed training, ensuring consistency and minimizing latency.
- Checkpoints: SSD shared filesystems, then Object Storage buckets During training, SSD shared filesystems allow for quickly saving and loading checkpoints, thus ensuring minimal disruption. Once training is completed, asynchronous transfer to Object Storage reduces costs while maintaining accessibility for future use.
- Scratch space: local SSD disks On supported platforms, presets and regions, local SSD disks provide host-local NVMe for data that can be recreated and benefits from high performance and low latency. Treat them as ephemeral and keep important data on durable storage.
Inference
- Autoscaling, sharing weights between GPUs: SSD shared filesystems When inference workloads require scaling across multiple VMs with GPUs, SSD shared filesystems allow for fast sharing of model weights, thus ensuring consistent performance across nodes during autoscaling.
- Sharing results: Object Storage buckets Once inference tasks are completed, Object Storage is ideal for sharing outputs such as logs, predictions or reports with other users or systems, due to its scalable and cost-efficient nature.
General suggestions
- For maximum IOPS, reads and writes to a volume should be close to its block size.
- For maximum bandwidth, reads and writes to a volume should be in 4 MiB chunks.