对目录桶使用分段上传
您可以使用分段上传过程将单个对象作为一组分段上传。每个分段都是对象数据的连续部分。您可以独立上传以及按任意顺序上传这些对象分段。如果任意分段传输失败,可以重新传输该分段且不会影响其他分段。上传完所有的对象分段后,Amazon S3 将汇集这些分段并创建对象。一般而言,如果您的对象大小达到了 100 MB,您应该考虑使用分段上传,而不是在单个操作中上传对象。
使用分段上传可提供以下优势:
-
提高吞吐量 – 您可以并行上传分段以提高吞吐量。
-
从任何网络问题中快速恢复 – 较小的分段大小可以将由于网络错误而重启失败的上传所产生的影响降至最低。
-
暂停和恢复对象上传 – 您可以在一段时间内逐步上传对象分段。启动分段上传后,没有过期日期。您必须显式完成或中止分段上传。
-
在您知道对象的最终大小前开始上传 – 您可以在创建对象时将其上传。
我们建议您按以下方式使用分段上传:
当您使用分段上传将对象上传到目录桶中的 Amazon S3 Express One Zone 存储类时,分段上传过程与使用分段上传将对象上传到通用桶的过程类似。但是,二者之间仍存在一些显著区别。
有关使用分段上传将对象上传到 S3 Express One Zone 的更多信息,请参阅以下主题。
分段上传过程
分段上传的过程包括三个步骤:
-
启动上传。
-
上传对象分段。
-
上传完所有分段后,即完成分段上传。
在收到完成分段上传请求后,Amazon S3 会利用上传的分段构造对象,然后您可以像在您的桶中访问任何其它对象一样访问该对象。
分段上传开始
当您发送请求以开始分段上传时,Amazon S3 将返回具有上传 ID 的响应,此 ID 是分段上传的唯一标识符。无论您何时上传分段、列出分段、完成上传或中止上传,您都必须包括此上传 ID。
分段上传
上传分段时,除了指定上传 ID,还必须指定分段编号。在 S3 Express One Zone 中使用分段上传时,分段编号必须是连续的编号。如果您尝试使用非连续分段编号完成分段上传请求,则会生成 HTTP 400 Bad Request
(分段顺序无效)错误。
分段编号在您正在上传的对象中唯一地识别分段及其位置。如果您使用之前上传的分段的同一分段编号上传新分段,则之前上传的分段将被覆盖。
无论您何时上传分段,Amazon S3 都将在其响应中返回实体标签 (ETag) 标头。对于每个分段上传,您必须记录分段编号和 ETag 值。所有对象分段上传的 ETag 值将保持不变,但将为每个分段分配不同的分段编号。您必须在随后的请求中包括这些值以完成分段上传。
Amazon S3 会自动加密上传到 S3 存储桶的所有新对象。在进行分段上传时,如果您未在请求中指定加密信息,则已上传分段的加密设置将设为目标存储桶的默认加密配置。Amazon S3 存储桶的默认加密配置始终处于启用状态,并至少设置为使用 Amazon S3 托管密钥的服务器端加密(SSE-S3)。对于目录存储桶,支持 SSE-S3 和具有 AWS KMS 密钥的服务器端加密(SSE-KMS)。有关更多信息,请参阅 S3 Express One Zone 中的数据保护和加密。
分段上传完成
完成分段上传时,Amazon S3 会按分段编号的升序将各个分段连接起来,从而创建对象。成功完成请求后,分段将不再存在。
完成分段上传请求必须包括上传 ID 以及分段编号及其相应的 ETag 值的列表。Amazon S3 响应包括可唯一地识别组合对象数据的 ETag。此 ETag 并非对象数据的 MD5 哈希。
分段上传列表
您可以列出特定分段上传或所有正在进行的分段上传的分段。列出分段操作将返回您已为特定分段上传而上传的分段信息。对于每个列出分段请求,Amazon S3 将返回有关特定分段上传的分段信息,最多为 1000 个分段。如果分段上传中的分段超过 1000 个,则必须使用分页来检索所有分段。
返回的分段列表不包括未完成上传的分段。使用列出分段上传操作,您可以获得正在进行的分段上传的列表。
正在进行的分段上传是已开始但还未完成或中止的上传。每个请求将返回最多 1000 个分段上传。如果正在进行的分段上传超过 1000 个,您必须发送其他请求才能检索剩余的分段上传。仅使用返回的列表进行验证。发送完成分段上传 请求时,请勿使用此列表的结果。相反,当上传分段和 Amazon S3 返回的相应 ETag 值时,请保留您自己的指定分段编号的列表。
有关分段上传列表的更多信息,请参阅《Amazon Simple Storage Service API 参考》中的 ListParts。
使用分段上传操作的校验和
上传对象时,可指定校验和算法来检查对象的完整性。目录存储桶不支持 MD5。您可以指定以下安全哈希算法(SHA)或循环冗余校验(CRC)数据完整性检查算法之一:
-
CRC32
-
CRC32C
-
SHA-1
-
SHA-256
您可以使用 Amazon S3 REST API 或 AWS SDK,通过 GetObject
或 HeadObject
来检索个别分段的校验和值。如果您想检索仍在进行的分段上传的各个分段的校验和值,可以使用 ListParts
。
在使用前面的校验和算法时,分段编号必须是连续的编号。如果您尝试使用非连续分段编号完成分段上传请求,Amazon S3 会生成 HTTP 400 Bad Request
(分段顺序无效)错误。
有关校验和如何处理分段上传对象的更多信息,请参阅检查对象完整性。
并发分段上传操作
在分布式开发环境中,您的应用程序可以同时在同一对象上启动多个更新。例如,您的应用程序可能会使用同一对象键启动多个分段上传。然后,对于其中每个上传,您的应用程序可以上传分段并将完成上传请求发送到 Amazon S3,以创建对象。对于 S3 Express One Zone,对象创建时间是分段上传的完成日期。
分段上传和定价
开始分段上传后,Amazon S3 将保留所有分段,直到您完成或中止上传。在整个其生命周期内,您将支付有关此分段上传及其关联分段的所有存储、带宽和请求的费用。如果您中止分段上传,Amazon S3 将删除上传构件和已上传的任何分段,您将不再为它们支付费用。无论指定的存储类如何,删除未完成的分段上传均不收取提前删除费用。有关定价的更多信息,请参阅 Amazon S3 定价。
如果未成功发送完成分段上传请求,则不会汇集对象分段且不会创建对象。您需要为与上传的分段关联的所有存储付费。完成分段上传来创建对象,或者中止分段上传来移除任何已上传的分段非常重要。
在可删除目录桶之前,必须完成或中止所有正在进行的分段上传。目录桶不支持 S3 生命周期配置。如果需要,您可以列出正在进行的分段上传,然后中止上传并删除桶。
分段上传 API 操作和权限
要允许访问目录桶上的对象管理 API 操作,您需要在桶策略或基于 AWS Identity and Access Management(IAM)身份的策略中授予 s3express:CreateSession
权限。
您必须具有使用分段上传操作的所需权限。您可以使用桶策略或基于 IAM 身份的策略向 IAM 主体授予执行这些操作的权限。下表列出了各种分段上传操作所需的权限。
您可以通过 Initiator
元素识别分段上传的发起者。如果发起者是 AWS 账户,此元素将提供与 Owner
元素相同的信息。如果发起者是 IAM 用户,此元素将提供用户 ARN 和显示名称。
操作 |
所需的权限 |
创建分段上传
|
要创建分段上传,您必须得到可以对目录桶执行 s3express:CreateSession 操作的许可。
|
启动分段上传
|
要启动分段上传,您必须得到可以对目录桶执行 s3express:CreateSession 操作的许可。
|
上传分段 |
要上传分段,您必须得到可以对目录桶执行 s3express:CreateSession 操作的许可。
要使发起者能够上传分段,桶拥有者必须允许发起者对目录桶执行 s3express:CreateSession 操作。
|
上传分段(复制) |
要上传分段,您必须得到可以对目录桶执行 s3express:CreateSession 操作的许可。
存储桶拥有者必须允许发起者对对象执行 s3express:CreateSession 操作,发起者才能上传该对象的分段。
|
完成分段上传 |
要完成分段上传,您必须得到可以对目录桶执行 s3express:CreateSession 操作的许可。
要使发起者能够完成分段上传,桶拥有者必须允许发起者对于对象执行 s3express:CreateSession 操作。
|
中止分段上传 |
要中止分段上传,您必须得到可以执行 s3express:CreateSession 操作的许可。
要使发起者能够中止分段上传,必须向发起者授予执行 s3express:CreateSession 操作的显式允许访问权限。
|
列出分段 |
要列出分段上传中的分段,您必须得到可以对目录桶执行 s3express:CreateSession 操作的许可。
|
列出正在进行的分段上传 |
要列出正在上传到桶的分段上传,您必须得到可以对桶执行 s3:ListBucketMultipartUploads 操作的许可。
|
分段上传的 API 操作支持
《Amazon Simple Storage Service API 参考》的下面几节描述了适用于分段上传的 Amazon S3 REST API 操作。
示例
要使用分段上传将对象上传到目录桶中的 S3 Express One Zone,请参阅以下示例。
创建分段上传
对于目录存储桶,当您执行 CreateMultipartUpload
操作和 UploadPartCopy
操作时,存储桶的默认加密必须使用所需的加密配置,并且您在 CreateMultipartUpload
请求中提供的请求标头必须与目标存储桶的默认加密配置相匹配。
以下示例显示如何创建分段上传。
- SDK for Java 2.x
-
/**
* This method creates a multipart upload request that generates a unique upload ID that is used to track
* all the upload parts
*
* @param s3
* @param bucketName - for example, 'doc-example-bucket
--use1-az4
--x-s3'
* @param key
* @return
*/
private static String createMultipartUpload(S3Client s3, String bucketName, String key) {
CreateMultipartUploadRequest createMultipartUploadRequest = CreateMultipartUploadRequest.builder()
.bucket(bucketName)
.key(key)
.build();
String uploadId = null;
try {
CreateMultipartUploadResponse response = s3.createMultipartUpload(createMultipartUploadRequest);
uploadId = response.uploadId();
}
catch (S3Exception e) {
System.err.println(e.awsErrorDetails().errorMessage());
System.exit(1);
}
return uploadId;
- SDK for Python
-
def create_multipart_upload(s3_client, bucket_name, key_name):
'''
Create a multipart upload to a directory bucket
:param s3_client: boto3 S3 client
:param bucket_name: The destination bucket for the multipart upload
:param key_name: The key name for the object to be uploaded
:return: The UploadId for the multipart upload if created successfully, else None
'''
try:
mpu = s3_client.create_multipart_upload(Bucket = bucket_name, Key = key_name)
return mpu['UploadId']
except ClientError as e:
logging.error(e)
return None
此示例说明如何使用 AWS CLI 创建到目录存储桶的分段上传。此命令启动到对象 KEY_NAME
的目录存储桶 bucket-base-name
--azid
--x-s3 的分段上传。要使用该命令,请将用户输入占位符
替换为您自己的信息。
aws s3api create-multipart-upload --bucket bucket-base-name
--azid
--x-s3 --key KEY_NAME
有关更多信息,请参阅 AWS Command Line Interface 中的 create-multipart-upload。
上传分段上传的分段
以下示例显示了如何上传分段上传的各分段。
- SDK for Java 2.x
-
以下示例显示了如何使用 SDK for Java 2.x 将单个对象分段,然后将这些分段上传到目录存储桶。
/**
* This method creates part requests and uploads individual parts to S3 and then returns all the completed parts
*
* @param s3
* @param bucketName
* @param key
* @param uploadId
* @throws IOException
*/
private static ListCompletedPart
multipartUpload(S3Client s3, String bucketName, String key, String uploadId, String filePath) throws IOException {
int partNumber = 1;
ListCompletedPart
completedParts = new ArrayList<>();
ByteBuffer bb = ByteBuffer.allocate(1024 * 1024 * 5); // 5 MB byte buffer
// read the local file, breakdown into chunks and process
try (RandomAccessFile file = new RandomAccessFile(filePath, "r")) {
long fileSize = file.length();
int position = 0;
while (position < fileSize) {
file.seek(position);
int read = file.getChannel().read(bb);
bb.flip(); // Swap position and limit before reading from the buffer.
UploadPartRequest uploadPartRequest = UploadPartRequest.builder()
.bucket(bucketName)
.key(key)
.uploadId(uploadId)
.partNumber(partNumber)
.build();
UploadPartResponse partResponse = s3.uploadPart(
uploadPartRequest,
RequestBody.fromByteBuffer(bb));
CompletedPart part = CompletedPart.builder()
.partNumber(partNumber)
.eTag(partResponse.eTag())
.build();
completedParts.add(part);
bb.clear();
position += read;
partNumber++;
}
}
catch (IOException e) {
throw e;
}
return completedParts;
}
- SDK for Python
-
以下示例显示了如何使用 SDK for Python 将单个对象分段,然后将这些分段上传到目录存储桶。
def multipart_upload(s3_client, bucket_name, key_name, mpu_id, part_size):
'''
Break up a file into multiple parts and upload those parts to a directory bucket
:param s3_client: boto3 S3 client
:param bucket_name: Destination bucket for the multipart upload
:param key_name: Key name for object to be uploaded and for the local file that's being uploaded
:param mpu_id: The UploadId returned from the create_multipart_upload call
:param part_size: The size parts that the object will be broken into, in bytes.
Minimum 5 MiB, Maximum 5 GiB. There is no minimum size for the last part of your multipart upload.
:return: part_list for the multipart upload if all parts are uploaded successfully, else None
'''
part_list = []
try:
with open(key_name, 'rb') as file:
part_counter = 1
while True:
file_part = file.read(part_size)
if not len(file_part):
break
upload_part = s3_client.upload_part(
Bucket = bucket_name,
Key = key_name,
UploadId = mpu_id,
Body = file_part,
PartNumber = part_counter
)
part_list.append({'PartNumber': part_counter, 'ETag': upload_part['ETag']})
part_counter += 1
except ClientError as e:
logging.error(e)
return None
return part_list
此示例显示了如何使用 AWS CLI 将单个对象分段,然后将这些分段上传到目录存储桶。要使用该命令,请将用户输入占位符
替换为您自己的信息。
aws s3api upload-part --bucket bucket-base-name
--azid
--x-s3 --key KEY_NAME
--part-number 1
--body LOCAL_FILE_NAME
--upload-id "AS_mgt9RaQE9GEaifATue15dAAAAAAAAAAEMAAAAAAAAADQwNzI4MDU0MjUyMBYAAAAAAAAAAA0AAAAAAAAAAAH2AfYAAAAAAAAEBSD0WBKMAQAAAABneY9yBVsK89iFkvWdQhRCcXohE8RbYtc9QvBOG8tNpA
"
有关更多信息,请参阅 AWS Command Line Interface 中的 upload-part。
完成分段上传
以下示例显示了如何完成分段上传。
- SDK for Java 2.x
-
以下示例显示了如何使用 SDK for Java 2.x 完成分段上传。
/**
* This method completes the multipart upload request by collating all the upload parts
* @param s3
* @param bucketName - for example, 'doc-example-bucket
--usw2-az1
--x-s3'
* @param key
* @param uploadId
* @param uploadParts
*/
private static void completeMultipartUpload(S3Client s3, String bucketName, String key, String uploadId, ListCompletedPart
uploadParts) {
CompletedMultipartUpload completedMultipartUpload = CompletedMultipartUpload.builder()
.parts(uploadParts)
.build();
CompleteMultipartUploadRequest completeMultipartUploadRequest =
CompleteMultipartUploadRequest.builder()
.bucket(bucketName)
.key(key)
.uploadId(uploadId)
.multipartUpload(completedMultipartUpload)
.build();
s3.completeMultipartUpload(completeMultipartUploadRequest);
}
public static void multipartUploadTest(S3Client s3, String bucketName, String key, String localFilePath) {
System.out.println("Starting multipart upload for: " + key);
try {
String uploadId = createMultipartUpload(s3, bucketName, key);
System.out.println(uploadId);
ListCompletedPart
parts = multipartUpload(s3, bucketName, key, uploadId, localFilePath);
completeMultipartUpload(s3, bucketName, key, uploadId, parts);
System.out.println("Multipart upload completed for: " + key);
}
catch (Exception e) {
System.err.println(e.getMessage());
System.exit(1);
}
}
- SDK for Python
-
以下示例显示了如何使用 SDK for Python 完成分段上传。
def complete_multipart_upload(s3_client, bucket_name, key_name, mpu_id, part_list):
'''
Completes a multipart upload to a directory bucket
:param s3_client: boto3 S3 client
:param bucket_name: The destination bucket for the multipart upload
:param key_name: The key name for the object to be uploaded
:param mpu_id: The UploadId returned from the create_multipart_upload call
:param part_list: The list of uploaded part numbers with their associated ETags
:return: True if the multipart upload was completed successfully, else False
'''
try:
s3_client.complete_multipart_upload(
Bucket = bucket_name,
Key = key_name,
UploadId = mpu_id,
MultipartUpload = {
'Parts': part_list
}
)
except ClientError as e:
logging.error(e)
return False
return True
if __name__ == '__main__':
MB = 1024 ** 2
region = 'us-west-2
'
bucket_name = 'BUCKET_NAME
'
key_name = 'OBJECT_NAME
'
part_size = 10 * MB
s3_client = boto3.client('s3', region_name = region)
mpu_id = create_multipart_upload(s3_client, bucket_name, key_name)
if mpu_id is not None:
part_list = multipart_upload(s3_client, bucket_name, key_name, mpu_id, part_size)
if part_list is not None:
if complete_multipart_upload(s3_client, bucket_name, key_name, mpu_id, part_list):
print (f'{key_name} successfully uploaded through a ultipart upload to {bucket_name}')
else:
print (f'Could not upload {key_name} hrough a multipart upload to {bucket_name}')
此示例说明如何使用 AWS CLI 完成目录存储桶的分段上传。要使用该命令,请将用户输入占位符
替换为您自己的信息。
aws s3api complete-multipart-upload --bucket bucket-base-name
--azid
--x-s3 --key KEY_NAME
--upload-id "AS_mgt9RaQE9GEaifATue15dAAAAAAAAAAEMAAAAAAAAADQwNzI4MDU0MjUyMBYAAAAAAAAAAA0AAAAAAAAAAAH2AfYAAAAAAAAEBSD0WBKMAQAAAABneY9yBVsK89iFkvWdQhRCcXohE8RbYtc9QvBOG8tNpA
" --multipart-upload file://parts.json
此示例采用 JSON 结构,用于描述分段上传中应重新组合成完整文件的各个分段。在此示例中,file://
前缀用于从名为 parts
的本地文件夹中的文件加载 JSON 结构。
parts.json:
parts.json
{
"Parts": [
{
"ETag": "6b78c4a64dd641a58dac8d9258b88147",
"PartNumber": 1
}
]
}
有关更多信息,请参阅 AWS Command Line Interface 中的 complete-multipart-upload。
中止分段上传
以下示例显示了如何中止分段上传。
- SDK for Java 2.x
-
以下示例显示了如何使用 SDK for Java 2.x 中止分段上传。
public static void abortMultiPartUploads( S3Client s3, String bucketName ) {
try {
ListMultipartUploadsRequest listMultipartUploadsRequest = ListMultipartUploadsRequest.builder()
.bucket(bucketName)
.build();
ListMultipartUploadsResponse response = s3.listMultipartUploads(listMultipartUploadsRequest);
ListMultipartUpload
uploads = response.uploads();
AbortMultipartUploadRequest abortMultipartUploadRequest;
for (MultipartUpload upload: uploads) {
abortMultipartUploadRequest = AbortMultipartUploadRequest.builder()
.bucket(bucketName)
.key(upload.key())
.uploadId(upload.uploadId())
.build();
s3.abortMultipartUpload(abortMultipartUploadRequest);
}
}
catch (S3Exception e) {
System.err.println(e.getMessage());
System.exit(1);
}
}
- SDK for Python
-
以下示例显示了如何使用 SDK for Python 中止分段上传。
import logging
import boto3
from botocore.exceptions import ClientError
def abort_multipart_upload(s3_client, bucket_name, key_name, upload_id):
'''
Aborts a partial multipart upload in a directory bucket.
:param s3_client: boto3 S3 client
:param bucket_name: Bucket where the multipart upload was initiated - for example, 'doc-example-bucket
--usw2-az1
--x-s3'
:param key_name: Name of the object for which the multipart upload needs to be aborted
:param upload_id: Multipart upload ID for the multipart upload to be aborted
:return: True if the multipart upload was successfully aborted, False if not
'''
try:
s3_client.abort_multipart_upload(
Bucket = bucket_name,
Key = key_name,
UploadId = upload_id
)
except ClientError as e:
logging.error(e)
return False
return True
if __name__ == '__main__':
region = 'us-west-2
'
bucket_name = 'BUCKET_NAME
'
key_name = 'KEY_NAME
'
upload_id = 'UPLOAD_ID
'
s3_client = boto3.client('s3', region_name = region)
if abort_multipart_upload(s3_client, bucket_name, key_name, upload_id):
print (f'Multipart upload for object {key_name} in {bucket_name} bucket has been aborted')
else:
print (f'Unable to abort multipart upload for object {key_name} in {bucket_name} bucket')
以下示例显示了如何使用 AWS CLI 中止分段上传。要使用该命令,请将用户输入占位符
替换为您自己的信息。
aws s3api abort-multipart-upload --bucket bucket-base-name
--azid
--x-s3 --key KEY_NAME
--upload-id "AS_mgt9RaQE9GEaifATue15dAAAAAAAAAAEMAAAAAAAAADQwNzI4MDU0MjUyMBYAAAAAAAAAAA0AAAAAAAAAAAH2AfYAAAAAAAAEAX5hFw-MAQAAAAB0OxUFeA7LTbWWFS8WYwhrxDxTIDN-pdEEq_agIHqsbg
"
有关更多信息,请参阅 AWS Command Line Interface 中的 abort-multipart-upload。
创建分段上传复制操作
要使用 SSE-KMS 加密目录存储桶中新的对象分段副本,必须将具有 KMS 密钥(特别是 customer managed key)的 SSE-KMS 指定为目录存储桶的默认加密配置。不支持 AWS 托管式密钥 (aws/s3
)。在存储桶的生命周期内,SSE-KMS 配置只能支持每个目录存储桶 1 个 customer managed key。在为 SSE-KMS 指定客户自主管理型密钥后,无法覆盖存储桶的 SSE-KMS 配置的客户自主管理型密钥。无法在 UploadPartCopy 请求标头中使用 SSE-KMS 为新的对象分段副本指定服务器端加密设置。此外,在 CreateMultipartUpload
请求中提供的请求标头必须与目标存储桶的默认加密配置相匹配。
当您通过 UploadPartCopy 将 SSE-KMS 加密的对象从通用存储桶复制到目录存储桶、从目录存储桶复制到通用存储桶,或在目录存储桶之间复制时,不支持 S3 存储桶密钥。在这种情况下,每次对 KMS 加密的对象发出复制请求时,Amazon S3 都会调用 AWS KMS。
以下示例显示了如何使用分段上传将对象从一个存储桶复制到另一个存储桶。
- SDK for Java 2.x
-
以下示例显示了如何使用 SDK for Java 2.x,通过分段上传以编程方式将对象从一个存储桶复制到另一个存储桶。
/**
* This method creates a multipart upload request that generates a unique upload ID that is used to track
* all the upload parts.
*
* @param s3
* @param bucketName
* @param key
* @return
*/
private static String createMultipartUpload(S3Client s3, String bucketName, String key) {
CreateMultipartUploadRequest createMultipartUploadRequest = CreateMultipartUploadRequest.builder()
.bucket(bucketName)
.key(key)
.build();
String uploadId = null;
try {
CreateMultipartUploadResponse response = s3.createMultipartUpload(createMultipartUploadRequest);
uploadId = response.uploadId();
} catch (S3Exception e) {
System.err.println(e.awsErrorDetails().errorMessage());
System.exit(1);
}
return uploadId;
}
/**
* Creates copy parts based on source object size and copies over individual parts
*
* @param s3
* @param sourceBucket
* @param sourceKey
* @param destnBucket
* @param destnKey
* @param uploadId
* @return
* @throws IOException
*/
public static ListCompletedPart
multipartUploadCopy(S3Client s3, String sourceBucket, String sourceKey, String destnBucket, String destnKey, String uploadId) throws IOException {
// Get the object size to track the end of the copy operation.
HeadObjectRequest headObjectRequest = HeadObjectRequest
.builder()
.bucket(sourceBucket)
.key(sourceKey)
.build();
HeadObjectResponse response = s3.headObject(headObjectRequest);
Long objectSize = response.contentLength();
System.out.println("Source Object size: " + objectSize);
// Copy the object using 20 MB parts.
long partSize = 20 * 1024 * 1024;
long bytePosition = 0;
int partNum = 1;
ListCompletedPart
completedParts = new ArrayList<>();
while (bytePosition < objectSize) {
// The last part might be smaller than partSize, so check to make sure
// that lastByte isn't beyond the end of the object.
long lastByte = Math.min(bytePosition + partSize - 1, objectSize - 1);
System.out.println("part no: " + partNum + ", bytePosition: " + bytePosition + ", lastByte: " + lastByte);
// Copy this part.
UploadPartCopyRequest req = UploadPartCopyRequest.builder()
.uploadId(uploadId)
.sourceBucket(sourceBucket)
.sourceKey(sourceKey)
.destinationBucket(destnBucket)
.destinationKey(destnKey)
.copySourceRange("bytes="+bytePosition+"-"+lastByte)
.partNumber(partNum)
.build();
UploadPartCopyResponse res = s3.uploadPartCopy(req);
CompletedPart part = CompletedPart.builder()
.partNumber(partNum)
.eTag(res.copyPartResult().eTag())
.build();
completedParts.add(part);
partNum++;
bytePosition += partSize;
}
return completedParts;
}
public static void multipartCopyUploadTest(S3Client s3, String srcBucket, String srcKey, String destnBucket, String destnKey) {
System.out.println("Starting multipart copy for: " + srcKey);
try {
String uploadId = createMultipartUpload(s3, destnBucket, destnKey);
System.out.println(uploadId);
ListCompletedPart
parts = multipartUploadCopy(s3, srcBucket, srcKey,destnBucket, destnKey, uploadId);
completeMultipartUpload(s3, destnBucket, destnKey, uploadId, parts);
System.out.println("Multipart copy completed for: " + srcKey);
} catch (Exception e) {
System.err.println(e.getMessage());
System.exit(1);
}
}
- SDK for Python
-
以下示例显示了如何使用 SDK for Python,通过分段上传以编程方式将对象从一个存储桶复制到另一个存储桶。
import logging
import boto3
from botocore.exceptions import ClientError
def head_object(s3_client, bucket_name, key_name):
'''
Returns metadata for an object in a directory bucket
:param s3_client: boto3 S3 client
:param bucket_name: Bucket that contains the object to query for metadata
:param key_name: Key name to query for metadata
:return: Metadata for the specified object if successful, else None
'''
try:
response = s3_client.head_object(
Bucket = bucket_name,
Key = key_name
)
return response
except ClientError as e:
logging.error(e)
return None
def create_multipart_upload(s3_client, bucket_name, key_name):
'''
Create a multipart upload to a directory bucket
:param s3_client: boto3 S3 client
:param bucket_name: Destination bucket for the multipart upload
:param key_name: Key name of the object to be uploaded
:return: UploadId for the multipart upload if created successfully, else None
'''
try:
mpu = s3_client.create_multipart_upload(Bucket = bucket_name, Key = key_name)
return mpu['UploadId']
except ClientError as e:
logging.error(e)
return None
def multipart_copy_upload(s3_client, source_bucket_name, key_name, target_bucket_name, mpu_id, part_size):
'''
Copy an object in a directory bucket to another bucket in multiple parts of a specified size
:param s3_client: boto3 S3 client
:param source_bucket_name: Bucket where the source object exists
:param key_name: Key name of the object to be copied
:param target_bucket_name: Destination bucket for copied object
:param mpu_id: The UploadId returned from the create_multipart_upload call
:param part_size: The size parts that the object will be broken into, in bytes.
Minimum 5 MiB, Maximum 5 GiB. There is no minimum size for the last part of your multipart upload.
:return: part_list for the multipart copy if all parts are copied successfully, else None
'''
part_list = []
copy_source = {
'Bucket': source_bucket_name,
'Key': key_name
}
try:
part_counter = 1
object_size = head_object(s3_client, source_bucket_name, key_name)
if object_size is not None:
object_size = object_size['ContentLength']
while (part_counter - 1) * part_size <object_size:
bytes_start = (part_counter - 1) * part_size
bytes_end = (part_counter * part_size) - 1
upload_copy_part = s3_client.upload_part_copy (
Bucket = target_bucket_name,
CopySource = copy_source,
CopySourceRange = f'bytes={bytes_start}-{bytes_end}',
Key = key_name,
PartNumber = part_counter,
UploadId = mpu_id
)
part_list.append({'PartNumber': part_counter, 'ETag': upload_copy_part['CopyPartResult']['ETag']})
part_counter += 1
except ClientError as e:
logging.error(e)
return None
return part_list
def complete_multipart_upload(s3_client, bucket_name, key_name, mpu_id, part_list):
'''
Completes a multipart upload to a directory bucket
:param s3_client: boto3 S3 client
:param bucket_name: Destination bucket for the multipart upload
:param key_name: Key name of the object to be uploaded
:param mpu_id: The UploadId returned from the create_multipart_upload call
:param part_list: List of uploaded part numbers with associated ETags
:return: True if the multipart upload was completed successfully, else False
'''
try:
s3_client.complete_multipart_upload(
Bucket = bucket_name,
Key = key_name,
UploadId = mpu_id,
MultipartUpload = {
'Parts': part_list
}
)
except ClientError as e:
logging.error(e)
return False
return True
if __name__ == '__main__':
MB = 1024 ** 2
region = 'us-west-2
'
source_bucket_name = 'SOURCE_BUCKET_NAME
'
target_bucket_name = 'TARGET_BUCKET_NAME
'
key_name = 'KEY_NAME
'
part_size = 10 * MB
s3_client = boto3.client('s3', region_name = region)
mpu_id = create_multipart_upload(s3_client, target_bucket_name, key_name)
if mpu_id is not None:
part_list = multipart_copy_upload(s3_client, source_bucket_name, key_name, target_bucket_name, mpu_id, part_size)
if part_list is not None:
if complete_multipart_upload(s3_client, target_bucket_name, key_name, mpu_id, part_list):
print (f'{key_name} successfully copied through multipart copy from {source_bucket_name} to {target_bucket_name}')
else:
print (f'Could not copy {key_name} through multipart copy from {source_bucket_name} to {target_bucket_name}')
以下示例显示了如何使用 AWS CLI,通过分段上传以编程方式将对象从一个存储桶复制到目录存储桶。要使用该命令,请将用户输入占位符
替换为您自己的信息。
aws s3api upload-part-copy --bucket bucket-base-name
--azid
--x-s3 --key TARGET_KEY_NAME
--copy-source SOURCE_BUCKET_NAME/SOURCE_KEY_NAME
--part-number 1
--upload-id "AS_mgt9RaQE9GEaifATue15dAAAAAAAAAAEMAAAAAAAAADQwNzI4MDU0MjUyMBYAAAAAAAAAAA0AAAAAAAAAAAH2AfYAAAAAAAAEBnJ4cxKMAQAAAABiNXpOFVZJ1tZcKWib9YKE1C565_hCkDJ_4AfCap2svg
"
有关更多信息,请参阅 AWS Command Line Interface 中的 upload-part-copy。
列出正在进行的分段上传
要列出正在进行的到目录存储桶的分段上传,可以使用 AWS SDK 或 AWS CLI。
- SDK for Java 2.x
-
以下示例显示了如何使用 SDK for Java 2.x 列出正在进行(未完成)的分段上传。
public static void listMultiPartUploads( S3Client s3, String bucketName) {
try {
ListMultipartUploadsRequest listMultipartUploadsRequest = ListMultipartUploadsRequest.builder()
.bucket(bucketName)
.build();
ListMultipartUploadsResponse response = s3.listMultipartUploads(listMultipartUploadsRequest);
List MultipartUpload uploads = response.uploads();
for (MultipartUpload upload: uploads) {
System.out.println("Upload in progress: Key = \"" + upload.key() + "\", id = " + upload.uploadId());
}
}
catch (S3Exception e) {
System.err.println(e.getMessage());
System.exit(1);
}
}
- SDK for Python
-
以下示例显示了如何使用 SDK for Python 列出正在进行(未完成)的分段上传。
import logging
import boto3
from botocore.exceptions import ClientError
def list_multipart_uploads(s3_client, bucket_name):
'''
List any incomplete multipart uploads in a directory bucket in e specified gion
:param s3_client: boto3 S3 client
:param bucket_name: Bucket to check for incomplete multipart uploads
:return: List of incomplete multipart uploads if there are any, None if not
'''
try:
response = s3_client.list_multipart_uploads(Bucket = bucket_name)
if 'Uploads' in response.keys():
return response['Uploads']
else:
return None
except ClientError as e:
logging.error(e)
if __name__ == '__main__':
bucket_name = 'BUCKET_NAME
'
region = 'us-west-2
'
s3_client = boto3.client('s3', region_name = region)
multipart_uploads = list_multipart_uploads(s3_client, bucket_name)
if multipart_uploads is not None:
print (f'There are {len(multipart_uploads)} ncomplete multipart uploads for {bucket_name}')
else:
print (f'There are no incomplete multipart uploads for {bucket_name}')
以下示例显示了如何使用 AWS CLI 列出正在进行(未完成)的分段上传。要使用该命令,请将用户输入占位符
替换为您自己的信息。
aws s3api list-multipart-uploads --bucket bucket-base-name
--azid
--x-s3
有关更多信息,请参阅 AWS Command Line Interface 中的 list-multipart-uploads。
列出分段上传的分段
以下示例显示了如何列出到目录存储桶的分段上传的各个分段。
- SDK for Java 2.x
-
以下示例显示了如何使用 SDK for Java 2.x 列出到目录存储桶的分段上传的各个分段。
public static void listMultiPartUploadsParts( S3Client s3, String bucketName, String objKey, String uploadID) {
try {
ListPartsRequest listPartsRequest = ListPartsRequest.builder()
.bucket(bucketName)
.uploadId(uploadID)
.key(objKey)
.build();
ListPartsResponse response = s3.listParts(listPartsRequest);
ListPart
parts = response.parts();
for (Part part: parts) {
System.out.println("Upload in progress: Part number = \"" + part.partNumber() + "\", etag = " + part.eTag());
}
}
catch (S3Exception e) {
System.err.println(e.getMessage());
System.exit(1);
}
}
- SDK for Python
-
以下示例显示了如何使用 SDK for Python 列出到目录存储桶的分段上传的各个分段。
import logging
import boto3
from botocore.exceptions import ClientError
def list_parts(s3_client, bucket_name, key_name, upload_id):
'''
Lists the parts that have been uploaded for a specific multipart upload to a directory bucket.
:param s3_client: boto3 S3 client
:param bucket_name: Bucket that multipart uploads parts have been uploaded to
:param key_name: Name of the object that has parts uploaded
:param upload_id: Multipart upload ID that the parts are associated with
:return: List of parts associated with the specified multipart upload, None if there are no parts
'''
parts_list = []
next_part_marker = ''
continuation_flag = True
try:
while continuation_flag:
if next_part_marker == '':
response = s3_client.list_parts(
Bucket = bucket_name,
Key = key_name,
UploadId = upload_id
)
else:
response = s3_client.list_parts(
Bucket = bucket_name,
Key = key_name,
UploadId = upload_id,
NextPartMarker = next_part_marker
)
if 'Parts' in response:
for part in response['Parts']:
parts_list.append(part)
if response['IsTruncated']:
next_part_marker = response['NextPartNumberMarker']
else:
continuation_flag = False
else:
continuation_flag = False
return parts_list
except ClientError as e:
logging.error(e)
return None
if __name__ == '__main__':
region = 'us-west-2
'
bucket_name = 'BUCKET_NAME
'
key_name = 'KEY_NAME
'
upload_id = 'UPLOAD_ID
'
s3_client = boto3.client('s3', region_name = region)
parts_list = list_parts(s3_client, bucket_name, key_name, upload_id)
if parts_list is not None:
print (f'{key_name} has {len(parts_list)} parts uploaded to {bucket_name}')
else:
print (f'There are no multipart uploads with that upload ID for {bucket_name} bucket')
以下示例显示了如何使用 AWS CLI 列出到目录存储桶的分段上传的各个分段。要使用该命令,请将用户输入占位符
替换为您自己的信息。
aws s3api list-parts --bucket bucket-base-name
--azid
--x-s3
--key KEY_NAME
--upload-id "AS_mgt9RaQE9GEaifATue15dAAAAAAAAAAEMAAAAAAAAADQwNzI4MDU0MjUyMBYAAAAAAAAAAA0AAAAAAAAAAAH2AfYAAAAAAAAEBSD0WBKMAQAAAABneY9yBVsK89iFkvWdQhRCcXohE8RbYtc9QvBOG8tNpA
"
有关更多信息,请参阅 AWS Command Line Interface 中的 list-parts。