AWS Database Migration Service
User Guide (Version API Version 2016-01-01)

Using Amazon S3 as a Target for AWS Database Migration Service

You can migrate data to Amazon S3 using AWS DMS from any of the supported database sources. When using Amazon S3 as a target in an AWS DMS task, both full load and change data capture (CDC) data is written to comma-separated value (.csv) format by default. For more compact storage and faster query options, you also have the option to have the data written to Apache Parquet (.parquet) format.

AWS DMS names files created during a full load using an incremental hexadecimal counter—for example LOAD00001.csv, LOAD00002..., LOAD00009, LOAD0000A, and so on for .csv files. AWS DMS names CDC files using timestamps, for example 20141029-1134010000.csv. For each source table, AWS DMS creates a folder under the specified target folder. AWS DMS writes all full load and CDC files to the specified Amazon S3 bucket.

The parameter bucketFolder contains the location where the .csv or .parquet files are stored before being uploaded to the S3 bucket. With .csv files, table data is stored in the following format in the S3 bucket.

<schema_name>/<table_name>/LOAD00000001.csv <schema_name>/<table_name>/LOAD00000002.csv ... <schema_name>/<table_name>/LOAD00000009.csv <schema_name>/<table_name>/LOAD0000000A.csv <schema_name>/<table_name>/LOAD0000000B.csv ... <schema_name>/<table_name>/LOAD0000000F.csv <schema_name>/<table_name>/LOAD00000010.csv ...

You can specify the column delimiter, row delimiter, and other parameters using the extra connection attributes. For more information on the extra connection attributes, see Extra Connection Attributes When Using Amazon S3 as a Target for AWS DMS at the end of this section.

When you use AWS DMS to replicate data changes, the first column of the .csv or .parquet output file indicates how the data was changed as shown for the following .csv file.

I,101,Smith,Bob,4-Jun-14,New York U,101,Smith,Bob,8-Oct-15,Los Angeles U,101,Smith,Bob,13-Mar-17,Dallas D,101,Smith,Bob,13-Mar-17,Dallas

For this example, suppose that there is an EMPLOYEE table in the source database. AWS DMS writes data to the .csv or .parquet file, in response to the following events:

  • A new employee (Bob Smith, employee ID 101) is hired on 4-Jun-14 at the New York office. In the .csv or .parquet file, the I in the first column indicates that a new row was INSERTed into the EMPLOYEE table at the source database.

  • On 8-Oct-15, Bob transfers to the Los Angeles office. In the .csv or .parquet file, the U indicates that the corresponding row in the EMPLOYEE table was UPDATEd to reflect Bob's new office location. The rest of the line reflects the row in the EMPLOYEE table as it appears after the UPDATE.

  • On 13-Mar,17, Bob transfers again to the Dallas office. In the .csv or .parquet file, the U indicates that this row was UPDATEd again. The rest of the line reflects the row in the EMPLOYEE table as it appears after the UPDATE.

  • After some time working in Dallas, Bob leaves the company. In the .csv or .parquet file, the D indicates that the row was DELETEd in the source table. The rest of the line reflects how the row in the EMPLOYEE table appeared before it was deleted.

Prerequisites for Using Amazon S3 as a Target

Before using Amazon S3 as a target, check that the following are true:

  • The S3 bucket that you're using as a target is in the same region as the DMS replication instance you are using to migrate your data.

  • The AWS account that you use for the migration has an IAM role with write and delete access to the S3 bucket you are using as a target.

  • This role has tagging access so you can tag any S3 objects written to the target bucket.

To set up this account access, ensure that the role assigned to the user account used to create the migration task has the following set of permissions.

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:DeleteObject", "s3:PutObjectTagging" ], "Resource": [ "arn:aws:s3:::buckettest2*" ] }, { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::buckettest2*" ] } ] }

Limitations to Using Amazon S3 as a Target

The following limitations apply to a file in Amazon S3 that you are using as a target:

  • Only the following data definition language (DDL) commands are supported: TRUNCATE TABLE, DROP TABLE, and CREATE TABLE.

  • Full LOB mode is not supported.

  • Changes to the source table structure during full load are not supported. Changes to the data are supported during full load.

  • Multiple tasks that replicate data from the same source table to the same target S3 endpoint bucket result in those tasks writing to the same file. We recommend that you specify different target endpoints (buckets) if your data source is from the same table.

Security

To use Amazon S3 as a target, the account used for the migration must have write and delete access to the Amazon S3 bucket that is used as the target. Specify the Amazon Resource Name (ARN) of an IAM role that has the permissions required to access Amazon S3.

AWS DMS supports a set of predefined grants for Amazon S3, known as canned access control lists (ACLs). Each canned ACL has a set of grantees and permissions that you can use to set permissions for the Amazon S3 bucket. You can specify a canned ACL using the cannedAclForObjects on the connection string attribute for your S3 target endpoint. For more information about using the extra connection attribute cannedAclForObjects, see Extra Connection Attributes When Using Amazon S3 as a Target for AWS DMS. For more information about Amazon S3 canned ACLs, see Canned ACL.

The IAM role that you use for the migration must be able to perform the s3:PutObjectAcl API operation.

Using Apache Parquet to Store Amazon S3 Objects

The comma-separated value (.csv) format is the default storage format for Amazon S3 target objects. For more compact storage and faster queries, you can instead use Apache Parquet (.parquet) as the storage format.

Apache Parquet is an open-source file storage format originally designed for Hadoop. For more information on Apache Parquet, see https://parquet.apache.org/.

To set .parquet as the storage format for your migrated S3 target objects, you can use the following mechanisms:

Amazon S3 Object Tagging

You can tag Amazon S3 objects that a replication instance creates by specifying appropriate JSON objects as part of task-table mapping rules. For more information about requirements and options for S3 object tagging, including valid tag names, see Object Tagging in the Amazon Simple Storage Service Developer Guide. For more information about table mapping using JSON, see Specifying Table Selection and Transformations by Table Mapping Using JSON.

You tag S3 objects created for specified tables and schemas by using one or more JSON objects of the selection rule type. You then follow this selection object (or objects) by one or more JSON objects of the post-processing rule type with add-tag action. These post-processing rules identify the S3 objects that you want to tag and specify the names and values of the tags that you want to add to these S3 objects.

You can find the parameters to specify in JSON objects of the post-processing rule type in the following table.

Parameter Possible Values Description
rule-type post-processing

A value that applies post-processing actions to the generated target objects. You can specify one or more post-processing rules to tag selected S3 objects.

rule-id A numeric value. A unique numeric value to identify the rule.
rule-name An alphanumeric value. A unique name to identify the rule.
rule-action add-tag The post-processing action that you want to apply to the S3 object. You can add one or more tags using a single JSON post-processing object for the add-tag action.
object-locator

schema-name – The name of the table schema.

table-name – The name of the table.

The name of each schema and table to which the rule applies. You can use the "%" percent sign as a wildcard for all or part of the value of each object-locator parameter. Thus, you can match these items:

  • A single table in a single schema

  • A single table in some or all schemas

  • Some or all tables in a single schema

  • Some or all tables in some or all schemas

tag-set

key – Any valid name for a single tag.

value – Any valid JSON value for this tag.

The names and values for one or more tags that you want to set on each created S3 object that matches the specified object-locator. You can specify up to 10 key-value pairs in a single tag-set parameter object. For more information on S3 object tagging, see Object Tagging in the Amazon Simple Storage Service Developer Guide.

You can also specify a dynamic value for all or part of the value for both the key and value parameters of a tag using ${dyn-value}. Here, ${dyn-value} can be either ${schema-name} or ${table-name}. Thus, you can insert the name of the currently selected schema or table as the whole or any part of the parameter value.

Note

Important

If you insert a dynamic value for the key parameter, you can generate tags with duplicate names for an S3 object, depending on how you use it. In this case, only one of the duplicate tag settings is added to the object.

When you specify multiple post-processing rule types to tag a selection of S3 objects, each S3 object is tagged using only one tag-set object from one post-processing rule. The particular tag set used to tag a given S3 object is the one from the post-processing rule whose associated object locator best matches that S3 object.

For example, suppose that two post-processing rules identify the same S3 object. Suppose also that the object locator from one rule uses wildcards and the object locator from the other rule uses an exact match to identify the S3 object (without wildcards). In this case, the tag set associated with the post-processing rule with the exact match is used to tag the S3 object. If multiple post-processing rules match a given S3 object equally well, the tag set associated with the first such post-processing rule is used to tag the object.

Example Adding Static Tags to an S3 Object Created for a Single Table and Schema

The following selection and post-processing rules add three tags (tag_1, tag_2, and tag_3 with corresponding static values value_1, value_2, and value_3) to a created S3 object. This S3 object corresponds to a single table in the source named STOCK with a schema named aat2.

{ "rules": [ { "rule-type": "selection", "rule-id": "5", "rule-name": "5", "object-locator": { "schema-name": "aat2", "table-name": "STOCK" }, "rule-action": "include" }, { "rule-type": "post-processing", "rule-id": "41", "rule-name": "41", "rule-action": "add-tag", "object-locator": { "schema-name": "aat2", "table-name": "STOCK", }, "tag-set": [ { "key": "tag_1", "value": "value_1" }, { "key": "tag_2", "value": "value_2" }, { "key": "tag_3", "value": "value_3" } ] } ] }

Example Adding Static and Dynamic Tags to S3 Objects Created for Multiple Tables and Schemas

The following example has one selection and two post-processing rules, where input from the source includes all tables and all of their schemas.

{ "rules": [ { "rule-type": "selection", "rule-id": "1", "rule-name": "1", "object-locator": { "schema-name": "%", "table-name": "%" }, "rule-action": "include" }, { "rule-type": "post-processing", "rule-id": "21", "rule-name": "21", "rule-action": "add-tag", "object-locator": { "schema-name": "%", "table-name": "%", }, "tag-set": [ { "key": "dw-schema-name", "value":"${schema-name}" }, { "key": "dw-schema-table", "value": "my_prefix_${table-name}" } ] }, { "rule-type": "post-processing", "rule-id": "41", "rule-name": "41", "rule-action": "add-tag", "object-locator": { "schema-name": "aat", "table-name": "ITEM", }, "tag-set": [ { "key": "tag_1", "value": "value_1" }, { "key": "tag_2", "value": "value_2" } ] } ] }

The first post-processing rule adds two tags (dw-schema-name and dw-schema-table) with corresponding dynamic values (${schema-name} and my_prefix_${table-name}) to almost all S3 objects created in the target. The exception is the S3 object identified and tagged with the second post-processing rule. Thus, each target S3 object identified by the wildcard object locator is created with tags that identify the schema and table to which it corresponds in the source.

The second post-processing rule adds tag_1 and tag_2 with corresponding static values value_1 and value_2 to a created S3 object that is identified by an exact-match object locator. This created S3 object thus corresponds to the single table in the source named ITEM with a schema named aat. Because of the exact match, these tags replace any tags on this object added from the first post-processing rule, which matches S3 objects by wildcard only.

Example Adding both Dynamic Tag Names and Values to S3 Objects

The following example has two selection rules and one post-processing rule. Here, input from the source includes just the ITEM table in either the retail or wholesale schema.

{ "rules": [ { "rule-type": "selection", "rule-id": "1", "rule-name": "1", "object-locator": { "schema-name": "retail", "table-name": "ITEM" }, "rule-action": "include" }, { "rule-type": "selection", "rule-id": "1", "rule-name": "1", "object-locator": { "schema-name": "wholesale", "table-name": "ITEM" }, "rule-action": "include" }, { "rule-type": "post-processing", "rule-id": "21", "rule-name": "21", "rule-action": "add-tag", "object-locator": { "schema-name": "%", "table-name": "ITEM", }, "tag-set": [ { "key": "dw-schema-name", "value":"${schema-name}" }, { "key": "dw-schema-table", "value": "my_prefix_ITEM" }, { "key": "${schema-name}_ITEM_tag_1", "value": "value_1" }, { "key": "${schema-name}_ITEM_tag_2", "value": "value_2" } ] ] }

The tag set for the post-processing rule adds two tags (dw-schema-name and dw-schema-table) to all S3 objects created for the ITEM table in the target. The first tag has the dynamic value "${schema-name}" and the second tag has a static value, "my_prefix_ITEM". Thus, each target S3 object is created with tags that identify the schema and table to which it corresponds in the source.

In addition, the tag set adds two additional tags with dynamic names (${schema-name}_ITEM_tag_1 and "${schema-name}_ITEM_tag_2"). These have the corresponding static values value_1 and value_2. Thus, these tags are each named for the current schema, retail or wholesale. You can't create a duplicate dynamic tag name in this object, because each object is created for a single unique schema name. The schema name is used to create an otherwise unique tag name.

Creating AWS KMS Keys to Encrypt Amazon S3 Target Objects

You can create and use custom AWS KMS keys to encrypt your Amazon S3 target objects. After you create an appropriate KMS key, you can use this key to encrypt the objects using one of the following mechanisms when you create the S3 target endpoint:

  • Use the following options for S3 target objects (with the default .csv file storage format) when you run the create-endpoint command using the AWS CLI.

    --s3-settings '{"ServiceAccessRoleArn": "your-service-access-ARN", "CsvRowDelimiter": "\n", "CsvDelimiter": ",", "BucketFolder": "your-bucket-folder", "BucketName": "your-bucket-name", "EncryptionMode": "SSE_KMS", "ServerSideEncryptionKmsKeyId": "your-KMS-key-ARN"}'

    Here, your-KMS-key-ARN is the Amazon Resource Name (ARN) for your KMS key. For more information, see Endpoint Settings When Using Amazon S3 as a Target for AWS DMS.

  • Set the extra connection attribute encryptionMode to the value SSE_KMS and the extra connection attribute serverSideEncryptionKmsKeyId to the ARN for your KMS key. For more information, see Extra Connection Attributes When Using Amazon S3 as a Target for AWS DMS.

To encrypt S3 target objects using a KMS key, you need an IAM role that has permissions to access the Amazon S3 bucket. This IAM role is then accessed in a policy (a key policy) attached to the encryption key that you create. You can do this in your IAM console by creating the following:

  • A policy with permissions to access the Amazon S3 bucket.

  • An IAM role with this policy.

  • A KMS encryption key with a key policy that references this role.

The following procedures describe how to do this.

To create an IAM policy with permissions to access the Amazon S3 bucket

  1. Open the IAM console at https://console.aws.amazon.com/iam/.

  2. In the navigation pane, choose Policies in the navigation pane. The Policies page opens.

  3. Choose Create policy. The Create policy page opens.

  4. Choose Service and choose S3. A list of action permissions appears.

  5. Choose Expand all to expand the list and choose the following permissions at a minimum:

    • ListBucket

    • PutObject

    • DeleteObject

    Choose any other permissions you need, and then choose Collapse all to collapse the list.

  6. Choose Resources to specify the resources that you want to access. At a minimum, choose All resources to provide general S3 resource access.

  7. Add any other conditions or permissions you need, then choose Review policy. Check your results on the Review policy page.

  8. If the settings are what you need, enter a name for the policy (for example, DMS-S3-endpoint-access), and any description, then choose Create policy. The Policies page opens with a message indicating that your policy has been created.

  9. Search for and choose the policy name in the Policies list. The Summary page appears displaying JSON for the policy similar to the following.

    { "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:PutObject", "s3:ListBucket", "s3:DeleteObject" ], "Resource": "*" } ] }

You have now created the new policy to access S3 resources for encryption with a specified name, for example DMS-S3-endpoint-access.

To create an IAM role with this policy

  1. On your IAM console, choose Roles in the navigation pane. The Roles detail page opens.

  2. Choose Create role. The Create role page opens.

  3. With AWS service selected as the trusted entity, choose DMS as the service that will use the role.

  4. Choose Next: Permissions. The Attach permissions policies view appears in the Create role page.

  5. Find and select the role policy you created in the previous procedure (DMS-S3-endpoint-access).

  6. Choose Next: Tags. The Add tags view appears in the Create role page. Here, you can add any tags you want.

  7. Choose Next: Review. The Review view appears in the Create role page. Here, you can verify the results.

  8. If the settings are what you need, enter a name for the role (required, for example, DMS-S3-endpoint-access-role), and any additional description, then choose Create role. The Roles detail page opens with a message indicating that your role has been created.

You have now created the new role to access S3 resources for encryption with a specified name, for example, DMS-S3-endpoint-access-role.

To create a KMS encryption key with a key policy that references your IAM role

  1. On your IAM console, choose Encryption keys in the navigation pane.

    If this is your first time accessing this option, an introduction to the AWS Key Management Service appears and you can choose Get Started Now to begin. Otherwise, the Encryption keys page opens.

  2. Choose Create key. The Create Alias and Description page opens.

  3. Enter an alias for the key (for example, DMS-S3-endpoint-encryption-key) and any additional description.

  4. Choose Advanced Options. In the Advanced Options section, make sure that KMS is chosen for Key Material Origin, then choose Next Step. The Add tags page opens.

  5. Add any tags that you want to help identify the key and track its usage, then choose Next Step. The Define Key Administrative Permissions page opens showing a list of users and roles that you can choose from.

  6. Add the users and roles that you want to manage the key. Make sure that these users and roles have the required permissions to manage the key.

  7. Choose whether key administrators can delete the key, then choose Next Step. The Define Key Usage Permissions page opens showing an additional list of users and roles that you can choose from.

  8. Choose the role that you previously created in Roles to enable access to encrypt S3 target objects, for example DMS-S3-endpoint-access-role). Then choose Next Step.

    The Preview Key Policy page opens, showing the JSON for the key policy that you can review. Here, you can see where the key policy references the role that you chose in the previous step. You can also see the different key actions permitted for the different principals (users and roles), as shown in the following example.

    { "Id": "key-consolepolicy-3", "Version": "2012-10-17", "Statement": [ { "Sid": "Enable IAM User Permissions", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::944454115380:root" ] }, "Action": "kms:*", "Resource": "*" }, { "Sid": "Allow access for Key Administrators", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::944454115380:role/Admin" ] }, "Action": [ "kms:Create*", "kms:Describe*", "kms:Enable*", "kms:List*", "kms:Put*", "kms:Update*", "kms:Revoke*", "kms:Disable*", "kms:Get*", "kms:Delete*", "kms:TagResource", "kms:UntagResource", "kms:ScheduleKeyDeletion", "kms:CancelKeyDeletion" ], "Resource": "*" }, { "Sid": "Allow use of the key", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::944454115380:role/DMS-S3-endpoint-access-role" ] }, "Action": [ "kms:Encrypt", "kms:Decrypt", "kms:ReEncrypt*", "kms:GenerateDataKey*", "kms:DescribeKey" ], "Resource": "*" }, { "Sid": "Allow attachment of persistent resources", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::944454115380:role/DMS-S3-endpoint-access-role" ] }, "Action": [ "kms:CreateGrant", "kms:ListGrants", "kms:RevokeGrant" ], "Resource": "*", "Condition": { "Bool": { "kms:GrantIsForAWSResource": true } } } ]
  9. Choose Finish. The Encryption keys page opens with a message indicating that your master encryption key has been created.

You have now created a new KMS key with a specified alias (for example, DMS-S3-endpoint-encryption-key). This key enables AWS DMS to encrypt S3 target objects.

Note

You can also create and manage KMS keys using the AWS Key Management Service with IAM roles and policies. For more information, see Key Management Service (KMS): Developer Guide.

Endpoint Settings When Using Amazon S3 as a Target for AWS DMS

You can use endpoint settings to configure your Amazon S3 target similar to using extra connection attributes. You can specify these settings when you create the target endpoint using the create-endpoint command in the AWS CLI, with the --s3-settings 'json-settings' option. Here, json-settings is a JSON object containing parameters to specify the settings. You can also specify a .json file containing the same json-settings object, for example, as in the following: --s3-settings file:///your-file-path/my_s3_settings.json. Here, my_s3_settings.json is the name of a .json file that contains the same json-settings object.

Note that the parameter names for endpoint settings are the same as the names for equivalent extra connections attributes, except that the parameter names for endpoint settings have initial caps. Note also that not all S3 target endpoint settings using extra connection attributes are available using the --s3-settings option of the create-endpoint command. For more information about the available settings in an AWS CLI call to create-endpoint, see create-endpoint in the AWS CLI Command Reference for AWS DMS. For more information on these settings, see the equivalent extra connection attributes in Extra Connection Attributes When Using Amazon S3 as a Target for AWS DMS.

You can use S3 target endpoint settings to configure the following:

  • A custom KMS key to encrypt your S3 target objects

  • Parquet files as the storage format for S3 target objects

KMS Key Settings for Data Encryption

The following examples show configuring a custom KMS key to encrypt your S3 target objects. To start, you might make the following create-endpoint call using the AWS CLI.

aws dms create-endpoint --endpoint-identifier s3-target-endpoint --engine-name s3 --endpoint-type target --s3-settings '{"ServiceAccessRoleArn": "your-service-access-ARN", "CsvRowDelimiter": "\n", "CsvDelimiter": ",", "BucketFolder": "your-bucket-folder", "BucketName": "your-bucket-name", "EncryptionMode": "SSE_KMS", "ServerSideEncryptionKmsKeyId": "arn:aws:kms:us-east-1:944454115380:key/72abb6fb-1e49-4ac1-9aed-c803dfcc0480"}'

Here, the JSON object specified by --s3-settings option defines two parameters. One is an EncryptionMode parameter with the value SSE_KMS. The other is an ServerSideEncryptionKmsKeyId parameter with the value of arn:aws:kms:us-east-1:944454115380:key/72abb6fb-1e49-4ac1-9aed-c803dfcc0480. This value is an Amazon Resource Name (ARN) for your custom KMS key. For an S3 target, you also specify additional settings. These identify the server access role, provide delimiters for the default CSV object storage format, and give the bucket location and name to store S3 target objects.

By default, S3 data encryption occurs using S3 server-side encryption. For the previous example's S3 target, this is also equivalent to specifying its endpoint settings as in the following example.

aws dms create-endpoint --endpoint-identifier s3-target-endpoint --engine-name s3 --endpoint-type target --s3-settings '{"ServiceAccessRoleArn": "your-service-access-ARN", "CsvRowDelimiter": "\n", "CsvDelimiter": ",", "BucketFolder": "your-bucket-folder", "BucketName": "your-bucket-name", "EncryptionMode": "SSE_S3"}'

For more information about working with S3 server-side encryption, see Protecting Data Using Server-Side Encryption .

Settings for Using .parquet Files to Store S3 Target Objects

The default format for creating S3 target objects is .csv files. The following examples show some endpoint settings for specifying .parquet files as the format for creating S3 target objects. You can specify the .parquet files format with all the defaults, as in the following example.

aws dms create-endpoint --endpoint-identifier s3-target-endpoint --engine-name s3 --endpoint-type target --s3-settings '{"ServiceAccessRoleArn": "your-service-access-ARN", "DataFormat": "parquet"}'

Here, the DataFormat parameter is set to parquet to enable the format with all the S3 defaults. These defaults include a dictionary encoding ("EncodingType: "rle-dictionary") that uses a combination of bit-packing and run-length encoding to more efficiently store repeating values.

You can add additional settings for options other than the defaults as in the following example.

aws dms create-endpoint --endpoint-identifier s3-target-endpoint --engine-name s3 --endpoint-type target --s3-settings '{"ServiceAccessRoleArn": "your-service-access-ARN", "BucketFolder": "your-bucket-folder", "BucketName": "your-bucket-name", "compressionType": "GZIP", "DataFormat": "parquet", "EncodingType: "plain-dictionary", "dictPageSizeLimit": "3,072,000", "EnableStatistics": "false" }'

Here, in addition to parameters for several standard S3 bucket options and the DataFormat parameter, the following additional .parquet file parameters are set:

  • EncodingType – Set to a dictionary encoding (plain-dictionary) that stores values encountered in each column in a per-column chunk of the dictionary page.

  • dictPageSizeLimit – Set to a maximum dictionary page size of 3 MB.

  • EnableStatistics – Disables the default that enables the collection of statistics about Parquet file pages and row groups.

Extra Connection Attributes When Using Amazon S3 as a Target for AWS DMS

You can specify the following options as extra connection attributes. If you have multiple connection attribute settings, separate them from each other by semicolons with no additional white space.

Option Description
addColumnName

An optional parameter that when set to true or y you can use to add column name information to the .csv output file.

Default value: false

Valid values: true, false, y, n

Example:

addColumnName=true;

bucketFolder

An optional parameter to set a folder name in the S3 bucket. If provided, target objects are created as .csv or .parquet files in the path bucketFolder/schema_name/table_name/. If this parameter isn't specified, then the path used is schema_name/table_name/.

Example:

bucketFolder=testFolder;

bucketName

The name of the S3 bucket where S3 target objects are created as .csv or .parquet files.

Example:

bucketName=buckettest;

cannedAclForObjects

A value that enables AWS DMS to specify a predefined (canned) access control list for objects created in the S3 bucket as .csv or .parquet files. For more information about Amazon S3 canned ACLs, see Canned ACL in the Amazon S3 Developer Guide.

Default value: NONE

Valid values for this attribute are: NONE; PRIVATE; PUBLIC_READ; PUBLIC_READ_WRITE; AUTHENTICATED_READ; AWS_EXEC_READ; BUCKET_OWNER_READ; BUCKET_OWNER_FULL_CONTROL.

Example:

cannedAclForObjects=PUBLIC_READ;

cdcInsertsOnly

An optional parameter during a change data capture (CDC) load to write only INSERT operations to the comma-separated value (.csv) or columnar storage (.parquet) output files. By default (the false setting), the first field in a .csv or .parquet record contains the letter I (INSERT), U (UPDATE), or D (DELETE) to indicate whether the row was inserted, updated, or deleted at the source database for a CDC load to the target. If cdcInsertsOnly is set to true or y, only INSERTs from the source database are migrated to the .csv or .parquet file.

For .csv format only, how these INSERTS are recorded depends on the value of includeOpForFullLoad. If includeOpForFullLoad is set to true, the first field of every CDC record is set to I to indicate the INSERT operation at the source. If includeOpForFullLoad is set to false, every CDC record is written without a first field to indicate the INSERT operation at the source. For more information about how these parameters work together, see Indicating Source DB Operations in Migrated S3 Data.

Note

AWS DMS supports this interaction between cdcInsertsOnly and includeOpForFullLoad in versions 3.1.4 and later.

Note

AWS DMS version 3.1.4 is not yet available.

Default value: false

Valid values: true, false, y, n

Example:

cdcInsertsOnly=true;

includeOpForFullLoad

An optional parameter during a full load to write the INSERT operations to the comma-separated value (.csv) output files only.

Note

AWS DMS supports includeOpForFullLoad in versions 3.1.4 and later.

Note

AWS DMS version 3.1.4 is not yet available.

For full load, records can only be inserted. By default (the false setting), there is no information recorded in these output files for a full load to indicate that the rows were inserted at the source database. If includeOpForFullLoad is set to true or y, the INSERT is recorded as an I annotation in the first field of the .csv file.

Note

This parameter works together with cdcInsertsOnly for output to .csv files only. For more information about how these parameters work together, see Indicating Source DB Operations in Migrated S3 Data.

Default value: false

Valid values: true, false, y, n

Example:

includeOpForFullLoad=true;

compressionType

An optional parameter when set to GZIP uses GZIP to compress the target .csv or .parquet files. When this parameter is set to the default, it leaves the files uncompressed.

Default value: NONE

Valid values: GZIP or NONE

Example:

compressionType=GZIP;

csvDelimiter

The delimiter used to separate columns in .csv source files. The default is a comma (,).

Example:

csvDelimiter=,;

csvRowDelimiter

The delimiter used to separate rows in the .csv source files. The default is a newline (\n).

Example:

csvRowDelimiter=\n;

maxFileSize

A value that specifies the maximum size (in KB) of any .csv file to be created while migrating to S3 target during full load.

Default value: 1048576 KB (1 GB)

Valid values: 1–1048576

Example:

maxFileSize=512

rfc4180

An optional parameter used to set behavior to comply with RFC for data migrated to Amazon S3 using .csv file format only. When this value is set to true or y using Amazon S3 as a target, if the data has quotation marks or newline characters in it, AWS DMS encloses the entire column with an additional pair of double quotation marks ("). Every quotation mark within the data is repeated twice. This formatting complies with RFC 4180.

Default value: true

Valid values: true, false, y, n

Example:

rfc4180=false;

encryptionMode

The server-side encryption mode that you want to encrypt your .csv or .parquet object files copied to S3. The valid values are SSE_S3 (S3 server-side encryption) or SSE_KMS (KMS key encryption). If you choose SSE_KMS, set the serverSideEncryptionKmsKeyId parameter to the Amazon Resource Name (ARN) for the KMS key to be used for encryption.

Default value: SSE_S3

Valid values: SSE_S3 or SSE_KMS

Example:

encryptionMode=SSE_S3;

serverSideEncryptionKmsKeyId

If you set encryptionMode to SSE_KMS, set this parameter to the Amazon Resource Name (ARN) for the KMS key. You can find this ARN by selecting the key alias in the list of KMS keys created for your account. When you create the key, you must associate specific policies and roles associated with this KMS key. For more information, see Creating AWS KMS Keys to Encrypt Amazon S3 Target Objects.

Example:

serverSideEncryptionKmsKeyId=arn:aws:kms:us-east-1:944454115380:key/72abb6fb-1e49-4ac1-9aed-c803dfcc0480;

dataFormat

The output format for the files that AWS DMS uses to create S3 objects. For Amazon S3 targets, AWS DMS supports either .csv or .parquet files. The .parquet files have a binary columnar storage format with efficient compression options and faster query performance. For more information about .parquet files, see https://parquet.apache.org/.

Default value: csv

Valid values: csv or parquet

Example:

dataFormat=parquet;

encodingType

The Parquet encoding type. The encoding type options include the following:

  • rle-dictionary – This dictionary encoding uses a combination of bit-packing and run-length encoding to more efficiently store repeating values.

  • plain – No encoding.

  • plain-dictionary – This dictionary encoding builds a dictionary of values encountered in a given column. The dictionary is stored in a dictionary page for each column chunk.

Default value: rle-dictionary

Valid values: rle-dictionary, plain, or plain-dictionary

Example:

encodingType=plain-dictionary;

dictPageSizeLimit

The maximum allowed size, in bytes, for a dictionary page in a .parquet file. If a dictionary page exceeds this value, the page uses plain encoding.

Default value: 1,024,000 (1 MB)

Valid values: Any valid integer value

Example:

dictPageSizeLimit=2,048,000;

rowGroupLength

The number of rows in one row group of a .parquet file.

Default value: 10,024 (10 KB)

Valid values: Any valid integer value

Example:

rowGroupLength=20,048;

dataPageSize

The maximum allowed size, in bytes, for a data page in a .parquet file.

Default value: 1,024,000 (1 MB)

Valid values: Any valid integer value

Example:

dataPageSize=2,048,000;

parquetVersion

The version of the .parquet file format.

Default value: PARQUET_1_0

Valid values: PARQUET_1_0 or PARQUET_2_0

Example:

parquetVersion=PARQUET_2_0;

enableStatistics

Set to true or y to enable statistics about .parquet file pages and row groups.

Default value: true

Valid values: true, false, y, n

Example:

enableStatistics=false;

timestampColumnName

An optional parameter to include a timestamp column in the S3 target endpoint data.

Note

AWS DMS supports timestampColumnName in versions 3.1.4 and later.

AWS DMS includes an additional STRING column in the .csv or .parquet object files of your migrated data when you set timestampColumnName to a nonblank value.

For a full load, each row of this timestamp column contains a timestamp for when the data was transferred from the source to the target by DMS.

For a CDC load, each row of the timestamp column contains the timestamp for the commit of that row in the source database.

The string format for this timestamp column value is yyyy-MM-dd HH:mm:ss.SSSSSS. By default, the precision of this value is in microseconds. For a CDC load, the rounding of the precision depends on the commit timestamp supported by DMS for the source database.

When the addColumnName parameter is set to true, DMS also includes the name for the timestamp column that you set as the nonblank value of timestampColumnName.

Example:

timestampColumnName=TIMESTAMP;

parquetTimestampInMillisecond

An optional parameter that specifies the precision of any TIMESTAMP column values written to an S3 object file in .parquet format.

Note

AWS DMS supports parquetTimestampInMillisecond in versions 3.1.4 and later.

When this attribute is set to true or y,AWS DMS writes all TIMESTAMP columns in a .parquet formatted file with millisecond precision. Otherwise, DMS writes them with microsecond precision.

Currently, Amazon Athena and AWS Glue can handle only millisecond precision for TIMESTAMP values. Set this attribute to true for .parquet formatted S3 endpoint object files only if you plan to query or process the data with Athena or AWS Glue.

Note

  • AWS DMS writes any TIMESTAMP column values written to an S3 file in .csv format with microsecond precision.

  • The setting of this attribute has no effect on the string format of the timestamp column value inserted by setting the timestampColumnName attribute.

Default value: false

Valid values: true, false, y, n

Example:

parquetTimestampInMillisecond=true;

Indicating Source DB Operations in Migrated S3 Data

When AWS DMS migrates records to an S3 target, it can create an additional field in each migrated record. This additional field indicates the operation applied to the record at the source database.

For a full load when includeOpForFullLoad is true and the output format is .csv, DMS always creates an additional first field in each .csv record. This field contains the letter I (INSERT) to indicate that the row was inserted at the source database. For a CDC load when cdcInsertsOnly is false (the default), DMS also always creates an additional first field in each .csv or .parquet record. This field contains the letter I (INSERT), U (UPDATE), or D (DELETE) to indicate whether the row was inserted, updated, or deleted at the source database.

When the output format is .csv only, if and how DMS creates and sets this first field also depends on the settings of both includeOpForFullLoad and cdcInsertsOnly together. In the following table, you can see how the settings of these attributes work together to affect the setting of migrated records in this format.

Note

AWS DMS supports includeOpForFullLoad and the interaction described here between includeOpForFullLoad and cdcInsertsOnly in versions 3.1.4 and later.

Note

AWS DMS version 3.1.4 is not yet available.

With These Parameter Settings DMS Sets Target Records as Follows for .csv Output
includeOpForFullLoad cdcInsertsOnly For Full Load For CDC Load
true true Added first field value set to I Added first field value set to I
false false No added field Added first field value set to I, U, or D
false true No added field No added field
true false Added first field value set to I Added first field value set to I, U, or D

When includeOpForFullLoad and cdcInsertsOnly are set to the same value, the target records are set according to the attribute that controls record settings for the current migration type. That attribute is includeOpForFullLoad for full load and cdcInsertsOnly for CDC load.

When includeOpForFullLoad and cdcInsertsOnly are set to different values, AWS DMS makes the target record settings consistent for both CDC and full load. It does this by making the record settings for a CDC load conform to the record settings for any earlier full load specified by includeOpForFullLoad.

In other words, suppose that a full load is set to add a first field to indicate an inserted record. In this case, a following CDC load is set to add a first field that indicates an inserted, updated, or deleted record as appropriate at the source. In contrast, suppose that a full load is set to not add a first field to indicate an inserted record. In this case, a CDC load is also set to not add a first field to each record regardless of its corresponding record operation at the source.