AWS Database Migration Service
User Guide (Version API Version 2016-01-01)

The AWS Documentation website is getting a new look!
Try it now and let us know what you think. Switch to the new look >>

You can return to the original look by selecting English in the language selector above.

Using an Amazon Redshift Database as a Target for AWS Database Migration Service

You can migrate data to Amazon Redshift databases using AWS Database Migration Service. Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. With an Amazon Redshift database as a target, you can migrate data from all of the other supported source databases.

The Amazon Redshift cluster must be in the same AWS account and same AWS Region as the replication instance.

During a database migration to Amazon Redshift, AWS DMS first moves data to an Amazon S3 bucket. When the files reside in an Amazon S3 bucket, AWS DMS then transfers them to the proper tables in the Amazon Redshift data warehouse. AWS DMS creates the S3 bucket in the same AWS Region as the Amazon Redshift database. The AWS DMS replication instance must be located in that same region.

If you use the AWS Command Line Interface (AWS CLI) or the DMS API to migrate data to Amazon Redshift, you must set up an AWS Identity and Access Management (IAM) role to allow S3 access. For more information about creating this IAM role, see Creating the IAM Roles to Use With the AWS CLI and AWS DMS API.

The Amazon Redshift endpoint provides full automation for the following:

  • Schema generation and data type mapping

  • Full load of source database tables

  • Incremental load of changes made to source tables

  • Application of schema changes in data definition language (DDL) made to the source tables

  • Synchronization between full load and change data capture (CDC) processes.

AWS Database Migration Service supports both full load and change processing operations. AWS DMS reads the data from the source database and creates a series of comma-separated value (.csv) files. For full-load operations, AWS DMS creates files for each table. AWS DMS then copies the table files for each table to a separate folder in Amazon S3. When the files are uploaded to Amazon S3, AWS DMS sends a copy command and the data in the files are copied into Amazon Redshift. For change-processing operations, AWS DMS copies the net changes to the .csv files. AWS DMS then uploads the net change files to Amazon S3 and copies the data to Amazon Redshift.

For additional details on working with Amazon Redshift as a target for AWS DMS, see the following sections:

Prerequisites for Using an Amazon Redshift Database as a Target for AWS Database Migration Service

The following list describes the prerequisites necessary for working with Amazon Redshift as a target for data migration:

  • Use the AWS Management Console to launch an Amazon Redshift cluster. You should note the basic information about your AWS account and your Amazon Redshift cluster, such as your password, user name, and database name. You need these values when creating the Amazon Redshift target endpoint.

  • The Amazon Redshift cluster must be in the same AWS account and the same AWS Region as the replication instance.

  • The AWS DMS replication instance needs network connectivity to the Amazon Redshift endpoint (hostname and port) that your cluster uses.

  • AWS DMS uses an Amazon S3 bucket to transfer data to the Amazon Redshift database. For AWS DMS to create the bucket, the DMS console uses an IAM role, dms-access-for-endpoint. If you use the AWS CLI or DMS API to create a database migration with Amazon Redshift as the target database, you must create this IAM role. For more information about creating this role, see Creating the IAM Roles to Use With the AWS CLI and AWS DMS API.

  • AWS DMS converts BLOBs, CLOBs, and NCLOBs to a VARCHAR on the target Amazon Redshift instance. Amazon Redshift doesn't support VARCHAR data types larger than 64 KB, so you can't store traditional LOBs on Amazon Redshift.

Limitations on Using Amazon Redshift as a Target for AWS Database Migration Service

When using an Amazon Redshift database as a target, AWS DMS doesn't support the following:

  • The following DDL is not supported:

    ALTER TABLE <table name> MODIFY COLUMN <column name> <data type>;
  • AWS DMS cannot migrate or replicate changes to a schema with a name that begins with underscore (_). If you have schemas that have a name that begins with an underscore, use mapping transformations to rename the schema on the target.

  • Amazon Redshift doesn't support VARCHARs larger than 64 KB. LOBs from traditional databases can't be stored in Amazon Redshift.

Configuring an Amazon Redshift Database as a Target for AWS Database Migration Service

AWS Database Migration Service must be configured to work with the Amazon Redshift instance. The following table describes the configuration properties available for the Amazon Redshift endpoint.

Property

Description

server

The name of the Amazon Redshift cluster you are using.

port

The port number for Amazon Redshift. The default value is 5439.

username

An Amazon Redshift user name for a registered user.

password

The password for the user named in the username property.

database

The name of the Amazon Redshift data warehouse (service) you are working with.

If you want to add extra connection string attributes to your Amazon Redshift endpoint, you can specify the maxFileSize and fileTransferUploadStreams attributes. For more information on these attributes, see Extra Connection Attributes When Using Amazon Redshift as a Target for AWS DMS.

Using Enhanced VPC Routing with an Amazon Redshift as a Target for AWS Database Migration Service

If you use Enhanced VPC Routing with your Amazon Redshift target, all COPY traffic between your Amazon Redshift cluster and your data repositories goes through your VPC. Because Enhanced VPC Routing affects the way that Amazon Redshift accesses other resources, COPY commands might fail if you haven't configured your VPC correctly.

AWS DMS can be affected by this behavior because it uses the COPY command to move data in S3 to an Amazon Redshift cluster.

Following are the steps AWS DMS takes to load data into an Amazon Redshift target:

  1. AWS DMS copies data from the source to .csv files on the replication server.

  2. AWS DMS uses the AWS SDK to copy the .csv files into an S3 bucket on your account.

  3. AWS DMS then uses the COPY command in Amazon Redshift to copy data from the .csv files in S3 to an appropriate table in Amazon Redshift.

If Enhanced VPC Routing is not enabled, Amazon Redshift routes traffic through the internet, including traffic to other services within the AWS network. If the feature is not enabled, you do not have to configure the network path. If the feature is enabled, you must specifically create a network path between your cluster's VPC and your data resources. For more information on the configuration required, see Enhanced VPC Routing in the Amazon Redshift documentation.

Creating and Using AWS KMS Keys to Encrypt Amazon Redshift Target Data

You can encrypt your target data pushed to Amazon S3 before it is copied to Amazon Redshift. To do so, you can create and use custom AWS Key Management Service (AWS KMS) keys. You can use the key you created to encrypt your target data using one of the following mechanisms when you create the Amazon Redshift target endpoint:

To encrypt Amazon Redshift target data using a KMS key, you need an AWS Identity and Access Management (IAM) role that has permissions to access Amazon Redshift data. This IAM role is then accessed in a policy (a key policy) attached to the encryption key that you create. You can do this in your IAM console by creating the following:

  • An IAM role with an Amazon-managed policy.

  • A KMS encryption key with a key policy that references this role.

The following procedures describe how to do this.

To create an IAM role with the required Amazon-managed policy

  1. Open the IAM console at https://console.aws.amazon.com/iam/.

  2. In the navigation pane, choose Roles. The Roles page opens.

  3. Choose Create role. The Create role page opens.

  4. With AWS service chosen as the trusted entity, choose DMS as the service to use the role.

  5. Choose Next: Permissions. The Attach permissions policies page appears.

  6. Find and select the AmazonDMSRedshiftS3Role policy.

  7. Choose Next: Tags. The Add tags page appears. Here, you can add any tags you want.

  8. Choose Next: Review and review your results.

  9. If the settings are what you need, enter a name for the role (for example, DMS-Redshift-endpoint-access-role), and any additional description, then choose Create role. The Roles page opens with a message indicating that your role has been created.

You have now created the new role to access Amazon Redshift resources for encryption with a specified name, for example DMS-Redshift-endpoint-access-role.

To create a KMS encryption key with a key policy that references your IAM role

  1. On your IAM console, choose Encryption keys in the navigation pane.

    If this is your first time accessing this option, an introduction to the AWS Key Management Service appears and you can choose Get Started Now to begin. Otherwise, the Encryption keys page opens.

  2. Choose Create key. The Create Alias and Description page opens.

  3. Enter an alias for the key (for example, DMS-Redshift-endpoint-encryption-key) and any additional description.

  4. Choose Advanced Options. In the Advanced Options section, make sure that KMS is chosen for Key Material Origin, then choose Next Step. The Add tags page opens.

  5. Add any tags that you want to help identify the key and track its usage, then choose Next Step. The Define Key Administrative Permissions page opens showing a list of users and roles that you can choose from.

  6. Add the users and roles that you want to manage the key. Make sure that these users and roles have the required permissions to manage the key.

  7. Choose whether key administrators can delete the key, then choose Next Step. The Define Key Usage Permissions page opens showing an additional list of users and roles that you can choose from.

  8. Choose the role that you previously created in Roles to enable access to encrypt Amazon Redshift target data, for example DMS-Redshift-endpoint-access-role. Then choose Next Step.

    The Preview Key Policy page opens, showing the JSON for the key policy that you can review. Here, you can see where the key policy references the role that you chose in the previous step. You can also see the different key actions permitted for the different principals (users and roles), as shown in the following example.

    { "Id": "key-consolepolicy-3", "Version": "2012-10-17", "Statement": [ { "Sid": "Enable IAM User Permissions", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::944454115380:root" ] }, "Action": "kms:*", "Resource": "*" }, { "Sid": "Allow access for Key Administrators", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::944454115380:role/Admin" ] }, "Action": [ "kms:Create*", "kms:Describe*", "kms:Enable*", "kms:List*", "kms:Put*", "kms:Update*", "kms:Revoke*", "kms:Disable*", "kms:Get*", "kms:Delete*", "kms:TagResource", "kms:UntagResource", "kms:ScheduleKeyDeletion", "kms:CancelKeyDeletion" ], "Resource": "*" }, { "Sid": "Allow use of the key", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::944454115380:role/DMS-Redshift-endpoint-access-role" ] }, "Action": [ "kms:Encrypt", "kms:Decrypt", "kms:ReEncrypt*", "kms:GenerateDataKey*", "kms:DescribeKey" ], "Resource": "*" }, { "Sid": "Allow attachment of persistent resources", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::944454115380:role/DMS-Redshift-endpoint-access-role" ] }, "Action": [ "kms:CreateGrant", "kms:ListGrants", "kms:RevokeGrant" ], "Resource": "*", "Condition": { "Bool": { "kms:GrantIsForAWSResource": true } } } ] }
  9. Choose Finish. The Encryption keys page opens with a message indicating that your master encryption key has been created.

You have now created a new KMS key with a specified alias (for example, DMS-Redshift-endpoint-encryption-key). This key enables AWS DMS to encrypt Amazon Redshift target data.

Note

You can also create and manage KMS keys using the AWS Key Management Service with IAM roles and policies. For more information, see the Key Management Service (KMS): Developer Guide.

Endpoint Settings When Using Amazon Redshift as a Target for AWS DMS

You can use endpoint settings to configure your Amazon Redshift target similar to using extra connection attributes. You can specify these settings when you create the target endpoint using the create-endpoint command in the AWS CLI, with the --redshift-settings "json-settings" option. Here, json-settings is a JSON object containing parameters to specify the settings. You can also specify a .json file containing the same json-settings object, for example, as in the following: --redshift-settings file:///your-file-path/my_redshift_settings.json. Here, my_redshift_settings.json is the name of a .json file that contains the same json-settings object.

The parameter names for endpoint settings are the same as the names for equivalent extra connections attributes, except that the parameter names for endpoint settings have initial caps. Also, not all Amazon Redshift target endpoint settings using extra connection attributes are available using the --redshift-settings option of the create-endpoint command. For more information about the available settings in an AWS CLI call to create-endpoint, see create-endpoint in the AWS CLI Command Reference for AWS DMS. For more information on these settings, see the equivalent extra connection attributes in Extra Connection Attributes When Using Amazon Redshift as a Target for AWS DMS.

You can use Amazon Redshift target endpoint settings to configure the following:

  • A custom AWS KMS data encryption key. You can then use this key to encrypt your data pushed to Amazon S3 before it is copied to Amazon Redshift.

  • A custom S3 bucket as intermediate storage for data migrated to Amazon Redshift.

KMS Key Settings for Data Encryption

The following examples show configuring a custom KMS key to encrypt your data pushed to S3. To start, you might make the following create-endpoint call using the AWS CLI.

aws dms create-endpoint --endpoint-identifier redshift-target-endpoint --endpoint-type target --engine-name redshift --username your-username --password your-password --server-name your-server-name --port 5439 --database-name your-db-name --redshift-settings '{"EncryptionMode": "SSE_KMS", "ServerSideEncryptionKmsKeyId": "arn:aws:kms:us-east-1:944454115380:key/24c3c5a1-f34a-4519-a85b-2debbef226d1"}'

Here, the JSON object specified by --redshift-settings option defines two parameters. One is an EncryptionMode parameter with the value SSE_KMS. The other is an ServerSideEncryptionKmsKeyId parameter with the value arn:aws:kms:us-east-1:944454115380:key/24c3c5a1-f34a-4519-a85b-2debbef226d1. This value is an Amazon Resource Name (ARN) for your custom KMS key.

By default, S3 data encryption occurs using S3 server-side encryption. For the previous example's Amazon Redshift target, this is also equivalent to specifying its endpoint settings, as in the following example.

aws dms create-endpoint --endpoint-identifier redshift-target-endpoint --endpoint-type target --engine-name redshift --username your-username --password your-password --server-name your-server-name --port 5439 --database-name your-db-name --redshift-settings '{"EncryptionMode": "SSE_S3"}'

For more information about working with S3 server-side encryption, see Protecting Data Using Server-Side Encryption .

Amazon S3 Bucket Settings

When you migrate data to an Amazon Redshift target endpoint, AWS DMS uses a default Amazon S3 bucket as intermediate task storage before copying the migrated data to Amazon Redshift. For example, the examples shown for creating an Amazon Redshift target endpoint with a KMS data encryption key use this default S3 bucket (see KMS Key Settings for Data Encryption).

You can instead specify a custom S3 bucket for this intermediate storage by including the following parameters in the value of your --redshift-settings option on the AWS CLI create-endpoint command:

  • BucketName – A string you specify as the name of the S3 bucket storage.

  • BucketFolder – (Optional) A string you can specify as the name of the storage folder in the specified S3 bucket.

  • ServiceAccessRoleArn – The ARN of an IAM role that permits administrative access to the S3 bucket. Typically, you create this role based on the AmazonDMSRedshiftS3Role policy. For an example, see the procedure to create an IAM role with the required Amazon-managed policy in Creating and Using AWS KMS Keys to Encrypt Amazon Redshift Target Data.

    Note

    If you specify the ARN of a different IAM role using the --service-access-role-arn option of the create-endpoint command, this IAM role option takes precedence.

The following example shows how you might use these parameters to specify a custom Amazon S3 bucket in the following create-endpoint call using the AWS CLI.

aws dms create-endpoint --endpoint-identifier redshift-target-endpoint --endpoint-type target --engine-name redshift --username your-username --password your-password --server-name your-server-name --port 5439 --database-name your-db-name --redshift-settings '{"ServiceAccessRoleArn": "your-service-access-ARN", "BucketName": "your-bucket-name", "BucketFolder": "your-bucket-folder-name"}'

Extra Connection Attributes When Using Amazon Redshift as a Target for AWS DMS

You can use extra connection attributes to configure your Amazon Redshift target. You specify these settings when you create the target endpoint. If you have multiple connection attribute settings, separate them from each other by semicolons with no additional white space.

The following table shows the extra connection attributes available when Amazon Redshift is the target.

Name Description

maxFileSize

Specifies the maximum size (in KB) of any .csv file used to transfer data to Amazon Redshift.

Default value: 32768 KB (32 MB)

Valid values: 1–1048576

Example: maxFileSize=512

fileTransferUploadStreams

Specifies the number of threads used to upload a single file.

Default value: 10

Valid values: 1–64

Example: fileTransferUploadStreams=20

acceptanydate

Specifies if any date format is accepted, including invalid dates formats such as 0000-00-00. Boolean value.

Default value: false

Valid values: true | false

Example: acceptanydate=true

dateformat

Specifies the date format. This is a string input and is empty by default. The default format is YYYY-MM-DD but you can change it to, for example, DD-MM-YYYY. If your date or time values use different formats, use the auto argument with the dateformat parameter. The auto argument recognizes several formats that are not supported when using a dateformat string. The auto keyword is case-sensitive.

Default value: empty

Valid values: 'dateformat_string' or auto

Example: dateformat=auto

timeformat

Specifies the time format. This is a string input and is empty by default. The auto argument recognizes several formats that aren't supported when using a timeformat string. If your date and time values use formats different from each other, use the auto argument with the timeformat parameter.

Default value: 10

Valid values: 'timeformat_string' | 'auto' | 'epochsecs' | 'epochmillisecs'

Example: timeformat=auto

emptyasnull

Specifies whether AWS DMS should migrate empty CHAR and VARCHAR fields as null. A value of true sets empty CHAR and VARCHAR fields as null.

Default value: false

Valid values: true | false

Example: emptyasnull=true

truncateColumns

Truncates data in columns to the appropriate number of characters so that it fits the column specification. Applies only to columns with a VARCHAR or CHAR data type, and rows 4 MB or less in size.

Default value: false

Valid values: true | false

Example:

truncateColumns=true;

removeQuotes

Removes surrounding quotation marks from strings in the incoming data. All characters within the quotation marks, including delimiters, are retained. For more information about removing quotes for an Amazon Redshift target, see the Amazon Redshift Database Developer Guide.

Default value: false

Valid values: true | false

Example:

removeQuotes=true;

trimBlanks

Removes the trailing white space characters from a VARCHAR string. This parameter applies only to columns with a VARCHAR data type.

Default value: false

Valid values: true | false

Example:

trimBlanks=false;

encryptionMode Specifies the server-side encryption mode that you want to use to push your data to S3 before it is copied to Amazon Redshift. The valid values are SSE_S3 (S3 server-side encryption) or SSE_KMS (KMS key encryption). If you choose SSE_KMS, set the serverSideEncryptionKmsKeyId parameter to the Amazon Resource Name (ARN) for the KMS key to be used for encryption.

Default value: SSE_S3

Valid values: SSE_S3 or SSE_KMS

Example:encryptionMode=SSE_S3;

serverSideEncryptionKmsKeyId If you set encryptionMode to SSE_KMS, set this parameter to the ARN for the KMS key. You can find this ARN by selecting the key alias in the list of KMS keys created for your account. When you create the key, you must associate specific policies and roles with it. For more information, see Creating and Using AWS KMS Keys to Encrypt Amazon Redshift Target Data.

Example:

serverSideEncryptionKmsKeyId=arn:aws:kms:us-east-1:944454115380:key/24c3c5a1-f34a-4519-a85b-2debbef226d1;

Target Data Types for Amazon Redshift

The Amazon Redshift endpoint for AWS DMS supports most Amazon Redshift data types. The following table shows the Amazon Redshift target data types that are supported when using AWS DMS and the default mapping from AWS DMS data types.

For additional information about AWS DMS data types, see Data Types for AWS Database Migration Service.

AWS DMS Data Types

Amazon Redshift Data Types

BOOLEAN

BOOL

BYTES

VARCHAR (Length)

DATE

DATE

TIME

VARCHAR(20)

DATETIME

If the scale is => 0 and =< 6, then:

TIMESTAMP (s)

If the scale is => 7 and =< 9, then:

VARCHAR (37)

INT1

INT2

INT2

INT2

INT4

INT4

INT8

INT8

NUMERIC

If the scale is => 0 and =< 37, then:

NUMERIC (p,s)

If the scale is => 38 and =< 127, then:

VARCHAR (Length)

REAL4

FLOAT4

REAL8

FLOAT8

STRING

If the length is 1–65,535, then use VARCHAR (length in bytes)

If the length is 65,536–2,147,483,647, then use VARCHAR (65535)

UINT1

INT2

UINT2

INT2

UINT4

INT4

UINT8

NUMERIC (20,0)

WSTRING

If the length is 1–65,535, then use NVARCHAR (length in bytes)

If the length is 65,536–2,147,483,647, then use NVARCHAR (65535)

BLOB

VARCHAR (maximum LOB size *2)

The maximum LOB size cannot exceed 31 KB. Amazon Redshift doesn't support VARCHARs larger than 64 KB.

NCLOB

NVARCHAR (maximum LOB size)

The maximum LOB size cannot exceed 63 KB. Amazon Redshift doesn't support VARCHARs larger than 64 KB.

CLOB

VARCHAR (maximum LOB size)

The maximum LOB size cannot exceed 63 KB. Amazon Redshift doesn't support VARCHARs larger than 64 KB.