Use cases and limitations
Mountpoint for Amazon S3 is optimized for specific use cases like machine learning training which often involve reading large datasets with high throughput. For example, it is a good fit for data lake applications that read large objects without using other file system features like locking or POSIX permissions, or write objects sequentially from a single node. Mountpoint for Amazon S3 achieves this by parallelizing requests, both to each single file and to multiple files. Tests in Nebius AI Cloud show that Mountpoint for Amazon S3 can show performance that is close to clients that natively implement S3 APIs. See Mountpoint file system behavior for a detailed description of Mountpoint for Amazon S3’s behavior and POSIX support and how they could affect your application. To troubleshoot file operations that may not be supported by Mountpoint for Amazon S3, see the troubleshooting documentation.Installing and mounting buckets
Prerequisites
- Create a service account and add it to a group that grants the required level of access, for example, the default
viewersoreditorsgroup. - Create an access key pair for the service account and save the key ID and the secret key.
Example
Example
- CLI
The following commands create a service account, add it to the default
viewers group and create an access key for it. Specify your tenant and project IDs in the commands:TENANT_ID: Tenant ID.PROJECT_ID: Project ID.
Local and virtual machines
Mountpoint for Amazon S3 is only available for Linux operating systems.
-
Install Mountpoint for Amazon S3:
For more installation instructions and details, see Getting started and Installing Mountpoint for Amazon S3 in Mountpoint for Amazon S3 documentation.
- RPM-based (Fedora, CentOS, RHEL, etc.)
- DEB-based (Ubuntu, Debian)
-
Create a credentials file,
~/.aws/credentials, with your access key pair:This ensures that the credentials are persistent between shell sessions. If you have theAWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYenvironment variables defined, they override~/.aws/credentials. For more details, see AWS credentials in Mountpoint for Amazon S3 documentation. -
Mount the bucket:
Replace the following values:
<bucket_name>: Name of your bucket.<directory>: Path to the directory on your machine where the bucket should be mounted.--region: Nebius AI Cloud region where the parent project of your bucket is located. To get the region of a project, go to the web console and then expand the top-left list of tenants; the region, e.g.eu-north1, is displayed next to the project’s name.--endpoint-url: Object Storage endpoint in the region. All the endpoints have thehttps://storage.<region>.nebius.cloud:443format. For example, for buckets in theeu-north1region, the endpoint ishttps://storage.eu-north1.nebius.cloud:443.
--maximum-throughput-gbps 10000and--max-threads 64are recommended performance settings. For more details, see Performance. You can add more parameters to the command, for example,--foregroundto run Mountpoint for Amazon S3 in the foreground instead of the background. For more details, see Configuring Mountpoint for Amazon S3.
Kubernetes clusters
- Install kubectl and configure it to work with your cluster. For Managed Kubernetes clusters in Nebius AI Cloud, see How to connect to Managed Service for Kubernetes® clusters using kubectl.
-
Create a Kubernetes Secret with your access key pair:
- Install Helm.
-
Install the Mountpoint for Amazon S3 CSI Driver:
For more installation details and instructions, see Installing Mountpoint for Amazon S3 CSI Driver.
-
Create a PersistentVolume (PV):
Replace the following values:
-
REGION: Nebius AI Cloud region where the parent project of your bucket is located. To get the region of a project, go to the web console and then expand the top-left list of tenants; the region, e.g.eu-north1, is displayed next to the project’s name. -
.spec.accessModes: List of access modes supported by the PV. Supported modes areReadOnlyMany(multiple nodes can mount the PV as read-only) andReadWriteMany(multiple nodes can mount the PV as read-write).If your application requires theReadWriteManymode, contact support before creating the PV. -
.spec.csi.volumeHandle: String that identifies the PV. The volume handle must be unique for each volume in your cluster. -
.spec.csi.volumeAttributes.bucketName: Name of your bucket.
.spec.mountOptions.maximum-throughput-gbpsand.spec.mountOptions.max-threadsare recommended performance settings. For more details, see Performance. -
-
Create a PersistentVolumeClaim (PVC):
.spec.accessModesand.spec.volumeNamemust match the access modes and the name of your PV, respectively. -
Use the PVC to mount the PV to your Pod:
Performance
To optimize performance when mounting buckets, you can configure the following parameters:maximum-throughput-gbps
The--maximum-throughput-gbps parameter raises the maximum throughput limit. The default limit is 1.25 GB/s (10 Gbps). For example, to set it to 10000 Gbps:
mountOptions of your PersistentVolume:
max-threads
The--max-threads parameter raises the maximum number of threads. The default is 16. For example, to set it to 64:
mountOptions of your PersistentVolume:
metadata-ttl
The--metadata-ttl parameter controls how long metadata is cached. Consider setting it to 120 seconds for better performance. For example:
UNSTABLE_MOUNTPOINT_MAX_PREFETCH_WINDOW_SIZE
If you need to maximize the throughput within a single file, you can set theUNSTABLE_MOUNTPOINT_MAX_PREFETCH_WINDOW_SIZE environment variable. For example: