Uploading and copying objects using multipart upload - Amazon Simple Storage Service

Uploading and copying objects using multipart upload

Multipart upload allows you to upload a single object as a set of parts. Each part is a contiguous portion of the object's data. You can upload these object parts independently and in any order. If transmission of any part fails, you can retransmit that part without affecting other parts. After all parts of your object are uploaded, Amazon S3 assembles these parts and creates the object. In general, when your object size reaches 100 MB, you should consider using multipart uploads instead of uploading the object in a single operation.

Using multipart upload provides the following advantages:

  • Improved throughput - You can upload parts in parallel to improve throughput.

  • Quick recovery from any network issues - Smaller part size minimizes the impact of restarting a failed upload due to a network error.

  • Pause and resume object uploads - You can upload object parts over time. After you initiate a multipart upload, there is no expiry; you must explicitly complete or stop the multipart upload.

  • Begin an upload before you know the final object size - You can upload an object as you are creating it.

We recommend that you use multipart upload in the following ways:

  • If you're uploading large objects over a stable high-bandwidth network, use multipart upload to maximize the use of your available bandwidth by uploading object parts in parallel for multi-threaded performance.

  • If you're uploading over a spotty network, use multipart upload to increase resiliency to network errors by avoiding upload restarts. When using multipart upload, you need to retry uploading only parts that are interrupted during the upload. You don't need to restart uploading your object from the beginning.

Multipart upload process

Multipart upload is a three-step process: You initiate the upload, you upload the object parts, and after you have uploaded all the parts, you complete the multipart upload. Upon receiving the complete multipart upload request, Amazon S3 constructs the object from the uploaded parts, and you can then access the object just as you would any other object in your bucket.

You can list all of your in-progress multipart uploads or get a list of the parts that you have uploaded for a specific multipart upload. Each of these operations is explained in this section.

Multipart upload initiation

When you send a request to initiate a multipart upload, Amazon S3 returns a response with an upload ID, which is a unique identifier for your multipart upload. You must include this upload ID whenever you upload parts, list the parts, complete an upload, or stop an upload. If you want to provide any metadata describing the object being uploaded, you must provide it in the request to initiate multipart upload.

Parts upload

When uploading a part, in addition to the upload ID, you must specify a part number. You can choose any part number between 1 and 10,000. A part number uniquely identifies a part and its position in the object you are uploading. The part number that you choose doesn’t need to be in a consecutive sequence (for example, it can be 1, 5, and 14). If you upload a new part using the same part number as a previously uploaded part, the previously uploaded part is overwritten.

Whenever you upload a part, Amazon S3 returns an ETag header in its response. For each part upload, you must record the part number and the ETag value. You must include these values in the subsequent request to complete the multipart upload.

Note

After you initiate a multipart upload and upload one or more parts, you must either complete or stop the multipart upload in order to stop getting charged for storage of the uploaded parts. Only after you either complete or stop a multipart upload will Amazon S3 free up the parts storage and stop charging you for the parts storage.

Multipart upload completion

When you complete a multipart upload, Amazon S3 creates an object by concatenating the parts in ascending order based on the part number. If any object metadata was provided in the initiate multipart upload request, Amazon S3 associates that metadata with the object. After a successful complete request, the parts no longer exist.

Your complete multipart upload request must include the upload ID and a list of both part numbers and corresponding ETag values. The Amazon S3 response includes an ETag that uniquely identifies the combined object data. This ETag is not necessarily an MD5 hash of the object data.

You can optionally stop the multipart upload. After stopping a multipart upload, you cannot upload any part using that upload ID again. All storage from any part of the canceled multipart upload is then freed. If any part uploads were in-progress, they can still succeed or fail even after you stop. To free all storage consumed by all parts, you must stop a multipart upload only after all part uploads have completed.

Multipart upload listings

You can list the parts of a specific multipart upload or all in-progress multipart uploads. The list parts operation returns the parts information that you have uploaded for a specific multipart upload. For each list parts request, Amazon S3 returns the parts information for the specified multipart upload, up to a maximum of 1,000 parts. If there are more than 1,000 parts in the multipart upload, you must send a series of list part requests to retrieve all the parts. Note that the returned list of parts doesn't include parts that haven't completed uploading. Using the list multipart uploads operation, you can obtain a list of multipart uploads in progress.

An in-progress multipart upload is an upload that you have initiated, but have not yet completed or stopped. Each request returns at most 1,000 multipart uploads. If there are more than 1,000 multipart uploads in progress, you need to send additional requests to retrieve the remaining multipart uploads. Only use the returned listing for verification. You should not use the result of this listing when sending a complete multipart upload request. Instead, maintain your own list of the part numbers you specified when uploading parts and the corresponding ETag values that Amazon S3 returns

Concurrent multipart upload operations

In a distributed development environment, it is possible for your application to initiate several updates on the same object at the same time. Your application might initiate several multipart uploads using the same object key. For each of these uploads, your application can then upload parts and send a complete upload request to Amazon S3 to create the object. When the buckets have versioning enabled, completing a multipart upload always creates a new version. For buckets that don't have versioning enabled, it is possible that some other request received between the time when a multipart upload is initiated and when it is completed might take precedence.

Note

It is possible for some other request received between the time you initiated a multipart upload and completed it to take precedence. For example, if another operation deletes a key after you initiate a multipart upload with that key, but before you complete it, the complete multipart upload response might indicate a successful object creation without you ever seeing the object.

Multipart upload and pricing

After you initiate a multipart upload, Amazon S3 retains all the parts until you either complete or stop the upload. Throughout its lifetime, you are billed for all storage, bandwidth, and requests for this multipart upload and its associated parts. If you stop the multipart upload, Amazon S3 deletes upload artifacts and any parts that you have uploaded, and you are no longer billed for them. For more information about pricing, see Amazon S3 pricing.

API support for multipart upload

These libraries provide a high-level abstraction that makes uploading multipart objects easy. However, if your application requires, you can use the REST API directly. The following sections in the Amazon Simple Storage Service API Reference describe the REST API for multipart upload.

AWS Command Line Interface support for multipart upload

The following topics in the AWS Command Line Interface describe the operations for multipart upload.

AWS SDK support for multipart upload

You can use an AWS SDKs to upload an object in parts. For a list of AWS SDKs supported by API action see:

Multipart upload API and permissions

You must have the necessary permissions to use the multipart upload operations. You can use access control lists (ACLs), the bucket policy, or the user policy to grant individuals permissions to perform these operations. The following table lists the required permissions for various multipart upload operations when using ACLs, a bucket policy, or a user policy.

Action Required permissions

Create Multipart Upload

You must be allowed to perform the s3:PutObject action on an object to create multipart upload.

The bucket owner can allow other principals to perform the s3:PutObject action.

Initiate Multipart Upload

You must be allowed to perform the s3:PutObject action on an object to initiate multipart upload.

The bucket owner can allow other principals to perform the s3:PutObject action.

Initiator

Container element that identifies who initiated the multipart upload. If the initiator is an AWS account, this element provides the same information as the Owner element. If the initiator is an IAM User, this element provides the user ARN and display name.

Upload Part

You must be allowed to perform the s3:PutObject action on an object to upload a part.

The bucket owner must allow the initiator to perform the s3:PutObject action on an object in order for the initiator to upload a part for that object.

Upload Part (Copy)

You must be allowed to perform the s3:PutObject action on an object to upload a part. Because you are uploading a part from an existing object, you must be allowed s3:GetObject on the source object.

For the initiator to upload a part for an object, the owner of the bucket must allow the initiator to perform the s3:PutObject action on the object.

Complete Multipart Upload

You must be allowed to perform the s3:PutObject action on an object to complete a multipart upload.

The bucket owner must allow the initiator to perform the s3:PutObject action on an object in order for the initiator to complete a multipart upload for that object.

Stop Multipart Upload

You must be allowed to perform the s3:AbortMultipartUpload action to stop a multipart upload.

By default, the bucket owner and the initiator of the multipart upload are allowed to perform this action. If the initiator is an IAM user, that user's AWS account is also allowed to stop that multipart upload.

In addition to these defaults, the bucket owner can allow other principals to perform the s3:AbortMultipartUpload action on an object. The bucket owner can deny any principal the ability to perform the s3:AbortMultipartUpload action.

List Parts

You must be allowed to perform the s3:ListMultipartUploadParts action to list parts in a multipart upload.

By default, the bucket owner has permission to list parts for any multipart upload to the bucket. The initiator of the multipart upload has the permission to list parts of the specific multipart upload. If the multipart upload initiator is an IAM user, the AWS account controlling that IAM user also has permission to list parts of that upload.

In addition to these defaults, the bucket owner can allow other principals to perform the s3:ListMultipartUploadParts action on an object. The bucket owner can also deny any principal the ability to perform the s3:ListMultipartUploadParts action.

List Multipart Uploads

You must be allowed to perform the s3:ListBucketMultipartUploads action on a bucket to list multipart uploads in progress to that bucket.

In addition to the default, the bucket owner can allow other principals to perform the s3:ListBucketMultipartUploads action on the bucket.

AWS KMS Encrypt and Decrypt related permissions

To perform a multipart upload with encryption using an AWS Key Management Service (AWS KMS) KMS key, the requester must have permission to the kms:Decrypt and kms:GenerateDataKey* actions on the key. These permissions are required because Amazon S3 must decrypt and read data from the encrypted file parts before it completes the multipart upload.

For more information, see Uploading a large file to Amazon S3 with encryption using an AWS KMS key in the AWS Knowledge Center.

If your IAM user or role is in the same AWS account as the KMS key, then you must have these permissions on the key policy. If your IAM user or role belongs to a different account than the KMS key, then you must have the permissions on both the key policy and your IAM user or role.

For information on the relationship between ACL permissions and permissions in access policies, see Mapping of ACL permissions and access policy permissions. For information on IAM users, go to Working with Users and Groups.