Creating an S3 Batch Operations job
With Amazon S3 Batch Operations, you can perform large-scale batch operations on a list of
specific Amazon S3 objects. This section describes the information that you need to create an
S3 Batch Operations job and the results of a CreateJob
request. It also provides
instructions for creating a Batch Operations job by using the Amazon S3 console, AWS Command Line Interface (AWS CLI), and
AWS SDK for Java.
When you create an S3 Batch Operations job, you can request a completion report for all tasks or only for failed tasks. As long as at least one task has been invoked successfully, S3 Batch Operations generates a report for jobs that have been completed, have failed, or have been canceled. For more information, see Examples: S3 Batch Operations completion reports.
The following video provides a brief demonstration of how to create a Batch Operations job by using the Amazon S3 console.
Batch Operations job request elements
To create an S3 Batch Operations job, you must provide the following information:
- Operation
-
Specify the operation that you want S3 Batch Operations to run against the objects in the manifest. Each operation type accepts parameters that are specific to that operation. With Batch Operations, you can perform an operation in bulk, with the same results as if you performed that operation one-by-one on each object.
- Manifest
-
The manifest is a list of all of the objects that you want S3 Batch Operations to run the specified operation on. You can use the following methods to specify a manifest for a Batch Operations job:
-
Manually create your own customized, CSV-formatted object list.
-
Choose an existing CSV-formatted Cataloging and analyzing your data with S3 Inventory report.
-
Direct Batch Operations to generate a manifest automatically based on object filter criteria that you specify when you create your job. This option is available for batch replication jobs that you create in the Amazon S3 console, or for any job type that you create by using the AWS CLI, AWS SDKs, or Amazon S3 REST API.
Note
-
Regardless of how you specify your manifest, the list itself must be stored in a general purpose bucket. Batch Operations can't import existing manifests from, or save generated manifests to directory buckets. Objects described within the manifest, however, can be stored in directory buckets. For more information, see Directory buckets.
-
If the objects in your manifest are in a versioned bucket, specifying the version IDs for the objects directs Batch Operations to perform the operation on a specific version. If no version IDs are specified, Batch Operations performs the operation on the latest version of the objects. If your manifest includes a version ID field, you must provide a version ID for all objects in the manifest.
For more information, see Specifying a manifest.
-
- Priority
-
Use job priorities to indicate the relative priority of this job to others running in your account. A higher number indicates higher priority.
Job priorities only have meaning relative to the priorities that are set for other jobs in the same account and Region. You can choose whatever numbering system works for you. For example, you might want to assign all Restore (
RestoreObject
) jobs a priority of 1, all Copy (CopyObject
) jobs a priority of 2, and all Replace access control lists (ACLs) (PutObjectAcl
) jobs a priority of 3.S3 Batch Operations prioritizes jobs according to priority numbers, but strict ordering isn't guaranteed. Therefore, don't use job priorities to ensure that any one job starts or finishes before any other job. If you must ensure strict ordering, wait until one job has finished before starting the next.
- RoleArn
-
Specify an AWS Identity and Access Management (IAM) role to run the job. The IAM role that you use must have sufficient permissions to perform the operation that is specified in the job. For example, to run a
CopyObject
job, the IAM role must have thes3:GetObject
permission for the source bucket and thes3:PutObject
permission for the destination bucket. The role also needs permissions to read the manifest and write the job-completion report.For more information about IAM roles, see IAM roles in the IAM User Guide.
For more information about Amazon S3 permissions, see Policy actions for Amazon S3.
Note
Batch Operations jobs that perform actions on directory buckets require specific permissions. For more information, see AWS Identity and Access Management (IAM) for S3 Express One Zone.
- Report
-
Specify whether you want S3 Batch Operations to generate a completion report. If you request a job-completion report, you must also provide the parameters for the report in this element. The necessary information includes:
-
The bucket where you want to store the report
Note
The report must be stored in a general purpose bucket. Batch Operations can't save reports to directory buckets. For more information, see Directory buckets.
-
The format of the report
-
Whether you want the report to include the details of all tasks or only failed tasks
-
An optional prefix string
Note
Completion reports are always encrypted with server-side encryption with Amazon S3 managed keys (SSE-S3).
-
- Tags (optional)
-
You can label and control access to your S3 Batch Operations jobs by adding tags. You can use tags to identify who is responsible for a Batch Operations job, or to control how users interact with Batch Operations jobs. The presence of job tags can grant or limit a user's ability to cancel a job, activate a job in the confirmation state, or change a job's priority level. For example, you could grant a user permission to invoke the
CreateJob
operation, provided that the job is created with the tag"Department=Finance"
.You can create jobs with tags attached to them, and you can add tags to jobs after you create them.
For more information, see Controlling access and labeling jobs using tags.
- Description (optional)
-
To track and monitor your job, you can also provide a description of up to 256 characters. Amazon S3 includes this description whenever it returns information about a job or displays job details on the Amazon S3 console. You can then easily sort and filter jobs according to the descriptions that you assigned. Descriptions don't need to be unique, so you can use descriptions as categories (for example, "Weekly Log Copy Jobs") to help you track groups of similar jobs.
Specifying a manifest
A manifest is an Amazon S3 object that contains the object keys that you want Amazon S3 to act upon. You can supply a manifest in one of the following ways:
-
Create a new manifest file manually.
-
Use an existing manifest.
-
Direct Batch Operations to generate a manifest automatically based on object filter criteria that you specify when you create your job. This option is available for batch replication jobs that you create in the Amazon S3 console, or for any job type that you create by using the AWS CLI, AWS SDKs, or Amazon S3 REST API.
Note
Amazon S3 Batch Operations does not support cross-region manifest generation.
Regardless of how you specify your manifest, the list itself must be stored in a general purpose bucket. Batch Operations can't import existing manifests from, or save generated manifests to directory buckets. Objects described within the manifest, however, can be stored in directory buckets. For more information, see Directory buckets.
Creating a manifest file
To create a manifest file manually, you specify the manifest object key, ETag (entity tag), and optional version ID in a CSV-formatted list. The contents of the manifest must be URL-encoded.
By default, Amazon S3 automatically uses server-side encryption with Amazon S3 managed keys (SSE-S3) to encrypt a manifest that's uploaded to an Amazon S3 bucket. Manifests that use server-side encryption with customer-provided keys (SSE-C) are not supported. Manifests that use server-side encryption with AWS Key Management Service (AWS KMS) keys (SSE-KMS) are supported only when you're using CSV-formatted inventory reports. Using a manually created manifest with AWS KMS is not supported.
Your manifest must contain the bucket name, object key, and optionally, the object version for each object. Any other fields in the manifest are not used by S3 Batch Operations.
Note
If the objects in your manifest are in a versioned bucket, specifying the version IDs for the objects directs Batch Operations to perform the operation on a specific version. If no version IDs are specified, Batch Operations performs the operation on the latest version of the objects. If your manifest includes a version ID field, you must provide a version ID for all objects in the manifest.
The following is an example manifest in CSV format without version IDs.
amzn-s3-demo-bucket1,objectkey1 amzn-s3-demo-bucket1,objectkey2 amzn-s3-demo-bucket1,objectkey3 amzn-s3-demo-bucket1,photos/jpgs/objectkey4 amzn-s3-demo-bucket1,photos/jpgs/newjersey/objectkey5 amzn-s3-demo-bucket1,object%20key%20with%20spaces
The following is an example manifest in CSV format that includes version IDs.
amzn-s3-demo-bucket1,objectkey1,PZ9ibn9D5lP6p298B7S9_ceqx1n5EJ0p amzn-s3-demo-bucket1,objectkey2,YY_ouuAJByNW1LRBfFMfxMge7XQWxMBF amzn-s3-demo-bucket1,objectkey3,jbo9_jhdPEyB4RrmOxWS0kU0EoNrU_oI amzn-s3-demo-bucket1,photos/jpgs/objectkey4,6EqlikJJxLTsHsnbZbSRffn24_eh5Ny4 amzn-s3-demo-bucket1,photos/jpgs/newjersey/objectkey5,imHf3FAiRsvBW_EHB8GOu.NHunHO1gVs amzn-s3-demo-bucket1,object%20key%20with%20spaces,9HkPvDaZY5MVbMhn6TMn1YTb5ArQAo3w
Specifying an existing manifest file
You can specify a manifest file for a create job request by using one of the following two formats:
-
Amazon S3 Inventory report – Must be a CSV-formatted Amazon S3 Inventory report. You must specify the
manifest.json
file that is associated with the inventory report. For more information about inventory reports, see Cataloging and analyzing your data with S3 Inventory. If the inventory report includes version IDs, S3 Batch Operations operates on the specific object versions.Note
-
S3 Batch Operations supports CSV inventory reports that are encrypted with SSE-KMS.
-
If you submit an inventory report manifest that's encrypted with SSE-KMS, your IAM policy must include the permissions
"kms:Decrypt"
and"kms:GenerateDataKey"
for themanifest.json
object and all associated CSV data files.
-
-
CSV file – Each row in the file must include the bucket name, object key, and optionally, the object version. Object keys must be URL-encoded, as shown in the following examples. The manifest must either include version IDs for all objects or omit version IDs for all objects. For more information about the CSV manifest format, see JobManifestSpec in the Amazon Simple Storage Service API Reference.
Note
S3 Batch Operations doesn't support CSV manifest files that are encrypted with SSE-KMS.
Important
When you're using a manually created manifest and a versioned bucket, we recommend that you specify the version IDs for the objects. When you create a job, S3 Batch Operations parses the entire manifest before running the job. However, it doesn't take a "snapshot" of the state of the bucket.
Because manifests can contain billions of objects, jobs might take a long time to run, which can affect which version of an object that the job acts upon. Suppose that you overwrite an object with a new version while a job is running and you didn't specify a version ID for that object. In this case, Amazon S3 performs the operation on the latest version of the object, not on the version that existed when you created the job. The only way to avoid this behavior is to specify version IDs for the objects that are listed in the manifest.
Generating a manifest automatically
You can direct Amazon S3 to generate a manifest automatically based on object filter criteria that you specify when you create your job. This option is available for batch replication jobs that you create in the Amazon S3 console, or for any job type that you create by using the AWS CLI, AWS SDKs, or Amazon S3 REST API. For more information about Batch Replication, see Replicating existing objects with Batch Replication.
To generate a manifest automatically, you specify the following elements as part of your job creation request:
-
Information about the bucket that contains your source objects, including the bucket owner and Amazon Resource Name (ARN)
-
Information about the manifest output, including a flag to create a manifest file, the output bucket owner, the ARN, the prefix, the file format, and the encryption type
-
Optional criteria to filter objects by their creation date, key name, size, and storage class. In the case of replication jobs, you can also use tags to filter objects.
Object filter criteria
To filter the list of objects to be included in an automatically generated manifest, you can specify the following criteria. For more information, see JobManifestGeneratorFilter in the Amazon S3 API Reference.
- CreatedAfter
-
If provided, the generated manifest includes only source bucket objects that were created after this time.
- CreatedBefore
-
If provided, the generated manifest includes only source bucket objects that were created before this time.
- EligibleForReplication
-
If provided, the generated manifest includes objects only if they are eligible for replication according to the replication configuration on the source bucket.
- KeyNameConstraint
-
If provided, the generated manifest includes only source bucket objects whose object keys match the string constraints specified for MatchAnySubstring, MatchAnyPrefix, and MatchAnySuffix.
MatchAnySubstring – If provided, the generated manifest includes objects if the specified string appears anywhere within the object key string.
MatchAnyPrefix – If provided, the generated manifest includes objects if the specified string appears at the start of the object key string.
MatchAnySuffix – If provided, the generated manifest includes objects if the specified string appears at the end of the object key string.
- MatchAnyStorageClass
-
If provided, the generated manifest includes only source bucket objects that are stored with the specified storage class.
- ObjectReplicationStatuses
-
If provided, the generated manifest includes only source bucket objects that have one of the specified replication statuses.
- ObjectSizeGreaterThanBytes
-
If provided, the generated manifest includes only source bucket objects whose file size is greater than the specified number of bytes.
- ObjectSizeLessThanBytes
-
If provided, the generated manifest includes only source bucket objects whose file size is less than the specified number of bytes.
Note
You can't clone most jobs that have automatically generated manifests. Batch
replication jobs can be cloned, except when they use the KeyNameConstraint
,
MatchAnyStorageClass
, ObjectSizeGreaterThanBytes
, or
ObjectSizeLessThanBytes
manifest filter criteria.
The syntax for specifying manifest criteria varies depending on the method that you use to create your job. For examples, see Creating a job.
Creating a job
You can create S3 Batch Operations jobs by using the Amazon S3 console, AWS CLI, AWS SDKs, or Amazon S3 REST API.
For more information about creating a job request, see Batch Operations job request elements.
Prerequisites
Before you create a Batch Operations job, confirm that you have configured the relevant permissions. For more information, see Granting permissions for Batch Operations.
To create a batch job
Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/
. -
In the navigation bar on the top of the page, choose the name of the currently displayed AWS Region. Next, choose the Region in which you want to create your job.
Note
For copy operations, you must create the job in the same Region as the destination bucket. For all other operations, you must create the job in the same Region as the objects in the manifest.
-
Choose Batch Operations on the left navigation pane of the Amazon S3 console.
-
Choose Create job.
-
View the AWS Region where you want to create your job.
-
Under Manifest format, choose the type of manifest object to use.
-
If you choose S3 inventory report, enter the path to the manifest.json object that Amazon S3 generated as part of the CSV-formatted Inventory report, and optionally the version ID for the manifest object if you want to use a version other than the most recent.
-
If you choose CSV, enter the path to a CSV-formatted manifest object. The manifest object must follow the format described in the console. You can optionally include the version ID for the manifest object if you want to use a version other than the most recent.
Note
The Amazon S3 console supports automatic manifest generation for batch replication jobs only. For all other job types, if you want Amazon S3 to generate a manifest automatically based on filter criteria that you specify, you must configure your job using the AWS CLI, AWS SDKs, or Amazon S3 REST API.
-
-
Choose Next.
-
Under Operation, choose the operation that you want to perform on all objects listed in the manifest. Fill out the information for the operation you chose and then choose Next.
-
Fill out the information for Configure additional options and then choose Next.
-
For Review, verify the settings. If you need to make changes, choose Previous. Otherwise, choose Create job.
To create your Batch Operations job with the AWS CLI, choose one of the following examples, depending on whether you're specifying an existing manifest or generating a manifest automatically.
To create your Batch Operations job with the AWS SDK for Java, choose one of the following examples, depending on whether you're specifying an existing manifest or generating a manifest automatically.
You can use the REST API to create a Batch Operations job. For more information, see CreateJob in the Amazon Simple Storage Service API Reference.
Job responses
If the CreateJob
request succeeds, Amazon S3 returns a job ID. The job ID is a
unique identifier that Amazon S3 generates automatically so that you can identify your Batch Operations
job and monitor its status.
When you create a job through the AWS CLI, AWS SDKs, or REST API, you can set S3 Batch Operations to begin processing the job automatically. The job runs as soon as it's ready instead of waiting behind higher-priority jobs.
When you create a job through the Amazon S3 console, you must review the job details and confirm that you want to run the job before Batch Operations can begin to process it. If a job remains in the suspended state for over 30 days, it will fail.