Performance and cost-efficiency best practices in Object Storage

When performing operations on your Object Storage buckets and objects, you can use optimizations and approaches described in this article to ensure better performance and save costs.

List objects without delimiter

Using ListObjects, an S3 API action, with a delimiter is much slower than listing objects without a delimiter. Flat listing provides better performance, especially in buckets with a lot of objects. Consider organizing your object keys without creating multiple levels of nested directories, so that you can use flat listing without delimiters in most or all cases. If your bucket has directories, you can get a flat list of objects in a leaf directory (that is, a directory that does not have subdirectories) more efficiently by specifying a prefix instead of a delimiter in ListObjects.

Use object versioning only if required

For lower latency and higher throughput in high-load scenarios, we recommend using buckets with versioning disabled.

Use concurrency

The S3 API and many interfaces and SDKs for it, such as the AWS CLI, support concurrent requests, which brings significant performance boosts if used correctly. This even applies to GET operations over single objects; the performance effect can be made stronger for large objects if you use multipart uploads and properly align the sizes of upload parts and download ranges (see below). When experiencing performance issues, try increasing concurrency to fully utilize your bandwidth. To find the balance between performance and client machine utilization, increase the number of concurrent requests gradually, and stop as soon as there is no further performance gain. Increasing concurrency to several hundreds per client machine is not recommended. To set the number of concurrent requests in the AWS CLI, run the following command (the default concurrency is 10):

aws configure set max_concurrent_requests 20

Avoid overwriting the same key concurrently

Sending multiple concurrent PUT requests with the same key is supported but performs poorly and leads to occasional timeout errors, especially in versioned buckets.

Optimize the sizes of upload parts and download ranges

While the S3 API allows creating parts up to 5 GiB, it is usually more efficient to use smaller parts. Start with smaller parts, like 10–50 MiB, and only use larger parts if you need to upload very large objects or reduce the number of upload requests. The best way to choose the part size is to treat each part as a unit of download concurrency. Ideally, the part size should ensure that each download range covers one or more upload parts without overlaps.

For example, if you have a 1000 MiB object, and you want to download it using 10 threads, your part size should be at most 100 MiB. But it is often more practical to go even smaller, for example, 10 MiB. Make sure that every thread downloads a range that consists of 10 parts: 0–100 MiB (parts 1–10), 100–200 MiB (parts 11–20) and so on. In that case, you can easily increase concurrency and keep it efficient without needing to re-upload the object with more parts.

To set the upload part size (also known as the chunk size) in the AWS CLI, run the following command:

aws configure set s3.multipart_chunksize 50MB

Using bigger part sizes may limit your upload concurrency. For example, uploading a 100 MiB file with 50 MiB part sizes cannot be performed with a concurrency of 3 or more.

Clean up incomplete multipart uploads

The multipart upload of a very large object may be interrupted at any time (due to a network issue, machine restart or simply after being canceled manually). In that case, Object Storage does not automatically delete your upload and its parts that have already been uploaded, waiting for you to continue the upload. Until these parts are deleted, you are charged for storing them. You can configure a lifecycle rule that automatically cleans up incomplete multipart uploads in your bucket. For example, the following command creates a rule that deletes incomplete uploads in 7 days after initiation:

cat <<EOF > uploads_lifecycle.json
{
  "Rules": [
    {
      "ID": "AbortMultipartUploads",
      "Status": "Enabled",
      "Filter": {
        "Prefix": ""
      },
      "AbortIncompleteMultipartUpload": {
        "DaysAfterInitiation": 7
      }
    }
  ]
}
EOF

aws s3api put-bucket-lifecycle-configuration \
  --bucket <bucket_name> \
  --lifecycle-configuration file://uploads_lifecycle.json

​List objects without delimiter

​Use object versioning only if required

​Use concurrency

​Avoid overwriting the same key concurrently

​Optimize the sizes of upload parts and download ranges

​Clean up incomplete multipart uploads

List objects without delimiter

Use object versioning only if required

Use concurrency

Avoid overwriting the same key concurrently

Optimize the sizes of upload parts and download ranges

Clean up incomplete multipart uploads