Using an Amazon Redshift database as a target for AWS Database Migration Service - AWS Database Migration Service

Using an Amazon Redshift database as a target for AWS Database Migration Service

You can migrate data to Amazon Redshift databases using AWS Database Migration Service. Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. With an Amazon Redshift database as a target, you can migrate data from all of the other supported source databases.

You can use Amazon Redshift Serverless as a target for AWS DMS. For more information, see Using AWS DMS with Amazon Redshift Serverless as a Target following.

The Amazon Redshift cluster must be in the same AWS account and same AWS Region as the replication instance.

During a database migration to Amazon Redshift, AWS DMS first moves data to an Amazon S3 bucket. When the files reside in an Amazon S3 bucket, AWS DMS then transfers them to the proper tables in the Amazon Redshift data warehouse. AWS DMS creates the S3 bucket in the same AWS Region as the Amazon Redshift database. The AWS DMS replication instance must be located in that same AWS Region .

If you use the AWS CLI or DMS API to migrate data to Amazon Redshift, set up an AWS Identity and Access Management (IAM) role to allow S3 access. For more information about creating this IAM role, see Creating the IAM roles to use with the AWS CLI and AWS DMS API.

The Amazon Redshift endpoint provides full automation for the following:

  • Schema generation and data type mapping

  • Full load of source database tables

  • Incremental load of changes made to source tables

  • Application of schema changes in data definition language (DDL) made to the source tables

  • Synchronization between full load and change data capture (CDC) processes.

AWS Database Migration Service supports both full load and change processing operations. AWS DMS reads the data from the source database and creates a series of comma-separated value (.csv) files. For full-load operations, AWS DMS creates files for each table. AWS DMS then copies the table files for each table to a separate folder in Amazon S3. When the files are uploaded to Amazon S3, AWS DMS sends a copy command and the data in the files are copied into Amazon Redshift. For change-processing operations, AWS DMS copies the net changes to the .csv files. AWS DMS then uploads the net change files to Amazon S3 and copies the data to Amazon Redshift.

For additional details on working with Amazon Redshift as a target for AWS DMS, see the following sections:

Prerequisites for using an Amazon Redshift database as a target for AWS Database Migration Service

The following list describes the prerequisites necessary for working with Amazon Redshift as a target for data migration:

  • Use the AWS Management Console to launch an Amazon Redshift cluster. Note the basic information about your AWS account and your Amazon Redshift cluster, such as your password, user name, and database name. You need these values when creating the Amazon Redshift target endpoint.

  • The Amazon Redshift cluster must be in the same AWS account and the same AWS Region as the replication instance.

  • The AWS DMS replication instance needs network connectivity to the Amazon Redshift endpoint (hostname and port) that your cluster uses.

  • AWS DMS uses an Amazon S3 bucket to transfer data to the Amazon Redshift database. For AWS DMS to create the bucket, the console uses an IAM role, dms-access-for-endpoint. If you use the AWS CLI or DMS API to create a database migration with Amazon Redshift as the target database, you must create this IAM role. For more information about creating this role, see Creating the IAM roles to use with the AWS CLI and AWS DMS API.

  • AWS DMS converts BLOBs, CLOBs, and NCLOBs to a VARCHAR on the target Amazon Redshift instance. Amazon Redshift doesn't support VARCHAR data types larger than 64 KB, so you can't store traditional LOBs on Amazon Redshift.

  • Set the target metadata task setting BatchApplyEnabled to true for AWS DMS to handle changes to Amazon Redshift target tables during CDC. A Primary Key on both the source and target table is required. Without a Primary Key, changes are applied statement by statement. And that can adversely affect task performance during CDC by causing target latency and impacting the cluster commit queue.

Privileges required for using Redshift as a target

Use the GRANT command to define access privileges for a user or user group. Privileges include access options such as being able to read data in tables and views, write data, and create tables. For more information about using GRANT with Amazon Redshift, see GRANT in the Amazon Redshift Database Developer Guide.

The following is the syntax to give specific privileges for a table, database, schema, function, procedure, or language-level privileges on Amazon Redshift tables and views.

GRANT { { SELECT | INSERT | UPDATE | DELETE | REFERENCES } [,...] | ALL [ PRIVILEGES ] } ON { [ TABLE ] table_name [, ...] | ALL TABLES IN SCHEMA schema_name [, ...] } TO { username [ WITH GRANT OPTION ] | GROUP group_name | PUBLIC } [, ...] GRANT { { CREATE | TEMPORARY | TEMP } [,...] | ALL [ PRIVILEGES ] } ON DATABASE db_name [, ...] TO { username [ WITH GRANT OPTION ] | GROUP group_name | PUBLIC } [, ...] GRANT { { CREATE | USAGE } [,...] | ALL [ PRIVILEGES ] } ON SCHEMA schema_name [, ...] TO { username [ WITH GRANT OPTION ] | GROUP group_name | PUBLIC } [, ...] GRANT { EXECUTE | ALL [ PRIVILEGES ] } ON { FUNCTION function_name ( [ [ argname ] argtype [, ...] ] ) [, ...] | ALL FUNCTIONS IN SCHEMA schema_name [, ...] } TO { username [ WITH GRANT OPTION ] | GROUP group_name | PUBLIC } [, ...] GRANT { EXECUTE | ALL [ PRIVILEGES ] } ON { PROCEDURE procedure_name ( [ [ argname ] argtype [, ...] ] ) [, ...] | ALL PROCEDURES IN SCHEMA schema_name [, ...] } TO { username [ WITH GRANT OPTION ] | GROUP group_name | PUBLIC } [, ...] GRANT USAGE ON LANGUAGE language_name [, ...] TO { username [ WITH GRANT OPTION ] | GROUP group_name | PUBLIC } [, ...]

The following is the syntax for column-level privileges on Amazon Redshift tables and views.

GRANT { { SELECT | UPDATE } ( column_name [, ...] ) [, ...] | ALL [ PRIVILEGES ] ( column_name [,...] ) } ON { [ TABLE ] table_name [, ...] } TO { username | GROUP group_name | PUBLIC } [, ...]

The following is the syntax for the ASSUMEROLE privilege granted to users and groups with a specified role.

GRANT ASSUMEROLE ON { 'iam_role' [, ...] | ALL } TO { username | GROUP group_name | PUBLIC } [, ...] FOR { ALL | COPY | UNLOAD } [, ...]

Limitations on using Amazon Redshift as a target for AWS Database Migration Service

The following limitations apply when using an Amazon Redshift database as a target:

  • Don’t enable versioning for the S3 bucket you use as intermediate storage for your Amazon Redshift target. If you need S3 versioning, use lifecycle policies to actively delete old versions. Otherwise, you might encounter endpoint test connection failures because of an S3 list-object call timeout. To create a lifecycle policy for an S3 bucket, see Managing your storage lifecycle. To delete a version of an S3 object, see Deleting object versions from a versioning-enabled bucket.

  • The following DDL is not supported:

    ALTER TABLE table name MODIFY COLUMN column name data type;
  • AWS DMS cannot migrate or replicate changes to a schema with a name that begins with underscore (_). If you have schemas that have a name that begins with an underscore, use mapping transformations to rename the schema on the target.

  • Amazon Redshift doesn't support VARCHARs larger than 64 KB. LOBs from traditional databases can't be stored in Amazon Redshift.

  • Applying a DELETE statement to a table with a multi-column primary key is not supported when any of the primary key column names use a reserved word. Go here to see a list of Amazon Redshift reserved words.

  • You may experience performance issues if your source system performs UPDATE operations on the primary key of a source table. These performance issues occur when applying changes to the target. This is because UPDATE (and DELETE) operations depend on the primary key value to identify the target row. If you update the primary key of a source table, your task log will contain messages like the following:

    Update on table 1 changes PK to a PK that was previously updated in the same bulk update.
  • DMS doesn't support custom DNS names when configuring an endpoint for a Redshift cluster, and you need to use the Amazon provided DNS name. Since the Amazon Redshift cluster must be in the same AWS account and Region as the replication instance, validation fails if you use a custom DNS endpoint.

  • Amazon Redshift has a default 4-hour idle session timeout. When there isn't any activity within the DMS replication task, Redshift disconnects the session after 4 hours. Errors can result from DMS being unable to connect and potentially needing to restart. As a workaround, set a SESSION TIMEOUT limit greater than 4 hours for the DMS replication user. Or, see the description of ALTER USER in the Amazon Redshift Database Developer Guide.

  • When AWS DMS replicates source table data without a primary or unique key, CDC latency might be high resulting in an unacceptable level of performance.

Configuring an Amazon Redshift database as a target for AWS Database Migration Service

AWS Database Migration Service must be configured to work with the Amazon Redshift instance. The following table describes the configuration properties available for the Amazon Redshift endpoint.

Property

Description

server

The name of the Amazon Redshift cluster you are using.

port

The port number for Amazon Redshift. The default value is 5439.

username

An Amazon Redshift user name for a registered user.

password

The password for the user named in the username property.

database

The name of the Amazon Redshift data warehouse (service) you are working with.

If you want to add extra connection string attributes to your Amazon Redshift endpoint, you can specify the maxFileSize and fileTransferUploadStreams attributes. For more information on these attributes, see Endpoint settings when using Amazon Redshift as a target for AWS DMS.

Using enhanced VPC routing with Amazon Redshift as a target for AWS Database Migration Service

If you use Enhanced VPC Routing with your Amazon Redshift target, all COPY traffic between your Amazon Redshift cluster and your data repositories goes through your VPC. Because Enhanced VPC Routing affects the way that Amazon Redshift accesses other resources, COPY commands might fail if you haven't configured your VPC correctly.

AWS DMS can be affected by this behavior because it uses the COPY command to move data in S3 to an Amazon Redshift cluster.

Following are the steps AWS DMS takes to load data into an Amazon Redshift target:

  1. AWS DMS copies data from the source to .csv files on the replication server.

  2. AWS DMS uses the AWS SDK to copy the .csv files into an S3 bucket on your account.

  3. AWS DMS then uses the COPY command in Amazon Redshift to copy data from the .csv files in S3 to an appropriate table in Amazon Redshift.

If Enhanced VPC Routing is not enabled, Amazon Redshift routes traffic through the internet, including traffic to other services within the AWS network. If the feature is not enabled, you do not have to configure the network path. If the feature is enabled, you must specifically create a network path between your cluster's VPC and your data resources. For more information on the configuration required, see Enhanced VPC routing in the Amazon Redshift documentation.

Creating and using AWS KMS keys to encrypt Amazon Redshift target data

You can encrypt your target data pushed to Amazon S3 before it is copied to Amazon Redshift. To do so, you can create and use custom AWS KMS keys. You can use the key you created to encrypt your target data using one of the following mechanisms when you create the Amazon Redshift target endpoint:

To encrypt Amazon Redshift target data using a KMS key, you need an AWS Identity and Access Management (IAM) role that has permissions to access Amazon Redshift data. This IAM role is then accessed in a policy (a key policy) attached to the encryption key that you create. You can do this in your IAM console by creating the following:

  • An IAM role with an AWS-managed policy.

  • A KMS key with a key policy that references this role.

The following procedures describe how to do this.

To create an IAM role with the required AWS-managed policy
  1. Open the IAM console at https://console.aws.amazon.com/iam/.

  2. In the navigation pane, choose Roles. The Roles page opens.

  3. Choose Create role. The Create role page opens.

  4. With AWS service chosen as the trusted entity, choose DMS as the service to use the role.

  5. Choose Next: Permissions. The Attach permissions policies page appears.

  6. Find and select the AmazonDMSRedshiftS3Role policy.

  7. Choose Next: Tags. The Add tags page appears. Here, you can add any tags you want.

  8. Choose Next: Review and review your results.

  9. If the settings are what you need, enter a name for the role (for example, DMS-Redshift-endpoint-access-role), and any additional description, then choose Create role. The Roles page opens with a message indicating that your role has been created.

You have now created the new role to access Amazon Redshift resources for encryption with a specified name, for example DMS-Redshift-endpoint-access-role.

To create an AWS KMS encryption key with a key policy that references your IAM role
Note

For more information about how AWS DMS works with AWS KMS encryption keys, see Setting an encryption key and specifying AWS KMS permissions.

  1. Sign in to the AWS Management Console and open the AWS Key Management Service (AWS KMS) console at https://console.aws.amazon.com/kms.

  2. To change the AWS Region, use the Region selector in the upper-right corner of the page.

  3. In the navigation pane, choose Customer managed keys.

  4. Choose Create key. The Configure key page opens.

  5. For Key type, choose Symmetric.

    Note

    When you create this key, you can only create a symmetric key, because all AWS services, such as Amazon Redshift, only work with symmetric encryption keys.

  6. Choose Advanced Options. For Key material origin, make sure that KMS is chosen, then choose Next. The Add labels page opens.

  7. For Create alias and description, enter an alias for the key (for example, DMS-Redshift-endpoint-encryption-key) and any additional description.

  8. For Tags, add any tags that you want to help identify the key and track its usage, then choose Next. The Define key administrative permissions page opens showing a list of users and roles that you can choose from.

  9. Add the users and roles that you want to manage the key. Make sure that these users and roles have the required permissions to manage the key.

  10. For Key deletion, choose whether key administrators can delete the key, then choose Next. The Define key usage permissions page opens showing an additional list of users and roles that you can choose from.

  11. For This account, choose the available users you want to perform cryptographic operations on Amazon Redshift targets. Also choose the role that you previously created in Roles to enable access to encrypt Amazon Redshift target objects, for example DMS-Redshift-endpoint-access-role).

  12. If you want to add other accounts not listed to have this same access, for Other AWS accounts, choose Add another AWS account, then choose Next. The Review and edit key policy page opens, showing the JSON for the key policy that you can review and edit by typing into the existing JSON. Here, you can see where the key policy references the role and users (for example, Admin and User1) that you chose in the previous step. You can also see the different key actions permitted for the different principals (users and roles), as shown in the following example.

    { "Id": "key-consolepolicy-3", "Version": "2012-10-17", "Statement": [ { "Sid": "Enable IAM User Permissions", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::111122223333:root" ] }, "Action": "kms:*", "Resource": "*" }, { "Sid": "Allow access for Key Administrators", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::111122223333:role/Admin" ] }, "Action": [ "kms:Create*", "kms:Describe*", "kms:Enable*", "kms:List*", "kms:Put*", "kms:Update*", "kms:Revoke*", "kms:Disable*", "kms:Get*", "kms:Delete*", "kms:TagResource", "kms:UntagResource", "kms:ScheduleKeyDeletion", "kms:CancelKeyDeletion" ], "Resource": "*" }, { "Sid": "Allow use of the key", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::111122223333:role/DMS-Redshift-endpoint-access-role", "arn:aws:iam::111122223333:role/Admin", "arn:aws:iam::111122223333:role/User1" ] }, "Action": [ "kms:Encrypt", "kms:Decrypt", "kms:ReEncrypt*", "kms:GenerateDataKey*", "kms:DescribeKey" ], "Resource": "*" }, { "Sid": "Allow attachment of persistent resources", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::111122223333:role/DMS-Redshift-endpoint-access-role", "arn:aws:iam::111122223333:role/Admin", "arn:aws:iam::111122223333:role/User1" ] }, "Action": [ "kms:CreateGrant", "kms:ListGrants", "kms:RevokeGrant" ], "Resource": "*", "Condition": { "Bool": { "kms:GrantIsForAWSResource": true } } } ]
  13. Choose Finish. The Encryption keys page opens with a message indicating that your AWS KMS key has been created.

You have now created a new KMS key with a specified alias (for example, DMS-Redshift-endpoint-encryption-key). This key enables AWS DMS to encrypt Amazon Redshift target data.

Endpoint settings when using Amazon Redshift as a target for AWS DMS

You can use endpoint settings to configure your Amazon Redshift target database similar to using extra connection attributes. You specify the settings when you create the target endpoint using the AWS DMS console, or by using the create-endpoint command in the AWS CLI, with the --redshift-settings '{"EndpointSetting": "value", ...}' JSON syntax.

The following table shows the endpoint settings that you can use with Amazon Redshift as a target.

Name Description

MaxFileSize

Specifies the maximum size (in KB) of any .csv file used to transfer data to Amazon Redshift.

Default value: 32768 KB (32 MB)

Valid values: 1–1,048,576

Example: --redshift-settings '{"MaxFileSize": 512}'

FileTransferUploadStreams

Specifies the number of threads used to upload a single file.

Default value: 10

Valid values: 1–64

Example: --redshift-settings '{"FileTransferUploadStreams": 20}'

Acceptanydate

Specifies if any date format is accepted, including invalid dates formats such as 0000-00-00. Boolean value.

Default value: false

Valid values: true | false

Example: --redshift-settings '{"Acceptanydate": true}'

Dateformat

Specifies the date format. This is a string input and is empty by default. The default format is YYYY-MM-DD but you can change it to, for example, DD-MM-YYYY. If your date or time values use different formats, use the auto argument with the Dateformat parameter. The auto argument recognizes several formats that are not supported when using a Dateformat string. The auto keyword is case-sensitive.

Default value: empty

Valid values: "dateformat_string" or auto

Example:--redshift-settings '{"Dateformat": "auto"}'

Timeformat

Specifies the time format. This is a string input and is empty by default. The auto argument recognizes several formats that aren't supported when using a Timeformat string. If your date and time values use formats different from each other, use the auto argument with the Timeformat parameter.

Default value: 10

Valid values: "Timeformat_string" | "auto" | "epochsecs" | "epochmillisecs"

Example:--redshift-settings '{"Timeformat": "auto"}'

Emptyasnull

Specifies whether AWS DMS should migrate empty CHAR and VARCHAR fields as null. A value of true sets empty CHAR and VARCHAR fields as null.

Default value: false

Valid values: true | false

Example: --redshift-settings '{"Emptyasnull": true}'

TruncateColumns

Truncates data in columns to the appropriate number of characters so that it fits the column specification. Applies only to columns with a VARCHAR or CHAR data type, and rows 4 MB or less in size.

Default value: false

Valid values: true | false

Example: --redshift-settings '{"TruncateColumns": true}'

RemoveQuotes

Removes surrounding quotation marks from strings in the incoming data. All characters within the quotation marks, including delimiters, are retained. For more information about removing quotes for an Amazon Redshift target, see the Amazon Redshift Database Developer Guide.

Default value: false

Valid values: true | false

Example: --redshift-settings '{"RemoveQuotes": true}'

TrimBlanks

Removes the trailing white-space characters from a VARCHAR string. This parameter applies only to columns with a VARCHAR data type.

Default value: false

Valid values: true | false

Example: --redshift-settings '{"TrimBlanks": true}'

EncryptionMode Specifies the server-side encryption mode that you want to use to push your data to S3 before it is copied to Amazon Redshift. The valid values are SSE_S3 (S3 server-side encryption) or SSE_KMS (KMS key encryption). If you choose SSE_KMS, set the ServerSideEncryptionKmsKeyId parameter to the Amazon Resource Name (ARN) for the KMS key to be used for encryption.
Note

You can also use the CLI modify-endpoint command to change the value of the EncryptionMode setting for an existing endpoint from SSE_KMS to SSE_S3. But you can't change the EncryptionMode value from SSE_S3 to SSE_KMS.

Default value: SSE_S3

Valid values: SSE_S3 or SSE_KMS

Example:--redshift-settings '{"EncryptionMode": "SSE_S3"}'

ServerSideEncryptionKmsKeyId If you set EncryptionMode to SSE_KMS, set this parameter to the ARN for the KMS key. You can find this ARN by selecting the key alias in the list of AWS KMS keys created for your account. When you create the key, you must associate specific policies and roles with it. For more information, see Creating and using AWS KMS keys to encrypt Amazon Redshift target data.

Example: --redshift-settings '{"ServerSideEncryptionKmsKeyId":"arn:aws:kms:us-east-1:111122223333:key/11a1a1a1-aaaa-9999-abab-2bbbbbb222a2"}'

EnableParallelBatchInMemoryCSVFiles The EnableParallelBatchInMemoryCSVFiles setting improves performance of larger multithreaded full load tasks by having DMS write to disk instead of memory. The default value is false.
CompressCsvFiles Use this attribute to compress data sent to a Amazon Redshift target during migration. The default value is true, and compression is enabled by default.

Using a data encryption key, and an Amazon S3 bucket as intermediate storage

You can use Amazon Redshift target endpoint settings to configure the following:

  • A custom AWS KMS data encryption key. You can then use this key to encrypt your data pushed to Amazon S3 before it is copied to Amazon Redshift.

  • A custom S3 bucket as intermediate storage for data migrated to Amazon Redshift.

  • Map a boolean as a boolean from a PostgreSQL source. By default, a BOOLEAN type is migrated as varchar(1). You can specify MapBooleanAsBoolean to let your Redshift target migrate the boolean type as boolean, as shown in the example following.

    --redshift-settings '{"MapBooleanAsBoolean": true}'

    Note that you must set this setting on both the source and target endpoints for it to take effect.

KMS key settings for data encryption

The following examples show configuring a custom KMS key to encrypt your data pushed to S3. To start, you might make the following create-endpoint call using the AWS CLI.

aws dms create-endpoint --endpoint-identifier redshift-target-endpoint --endpoint-type target --engine-name redshift --username your-username --password your-password --server-name your-server-name --port 5439 --database-name your-db-name --redshift-settings '{"EncryptionMode": "SSE_KMS", "ServerSideEncryptionKmsKeyId": "arn:aws:kms:us-east-1:111122223333:key/24c3c5a1-f34a-4519-a85b-2debbef226d1"}'

Here, the JSON object specified by --redshift-settings option defines two parameters. One is an EncryptionMode parameter with the value SSE_KMS. The other is an ServerSideEncryptionKmsKeyId parameter with the value arn:aws:kms:us-east-1:111122223333:key/24c3c5a1-f34a-4519-a85b-2debbef226d1. This value is an Amazon Resource Name (ARN) for your custom KMS key.

By default, S3 data encryption occurs using S3 server-side encryption. For the previous example's Amazon Redshift target, this is also equivalent of specifying its endpoint settings, as in the following example.

aws dms create-endpoint --endpoint-identifier redshift-target-endpoint --endpoint-type target --engine-name redshift --username your-username --password your-password --server-name your-server-name --port 5439 --database-name your-db-name --redshift-settings '{"EncryptionMode": "SSE_S3"}'

For more information about working with S3 server-side encryption, see Protecting data using server-side encryption in the Amazon Simple Storage Service User Guide.

Note

You can also use the CLI modify-endpoint command to change the value of the EncryptionMode parameter for an existing endpoint from SSE_KMS to SSE_S3. But you can’t change the EncryptionMode value from SSE_S3 to SSE_KMS.

Amazon S3 bucket settings

When you migrate data to an Amazon Redshift target endpoint, AWS DMS uses a default Amazon S3 bucket as intermediate task storage before copying the migrated data to Amazon Redshift. For example, the examples shown for creating an Amazon Redshift target endpoint with a AWS KMS data encryption key use this default S3 bucket (see KMS key settings for data encryption).

You can instead specify a custom S3 bucket for this intermediate storage by including the following parameters in the value of your --redshift-settings option on the AWS CLI create-endpoint command:

  • BucketName – A string you specify as the name of the S3 bucket storage. If your service access role is based on the AmazonDMSRedshiftS3Role policy, this value must have a prefix of dms-, for example, dms-my-bucket-name.

  • BucketFolder – (Optional) A string you can specify as the name of the storage folder in the specified S3 bucket.

  • ServiceAccessRoleArn – The ARN of an IAM role that permits administrative access to the S3 bucket. Typically, you create this role based on the AmazonDMSRedshiftS3Role policy. For an example, see the procedure to create an IAM role with the required AWS-managed policy in Creating and using AWS KMS keys to encrypt Amazon Redshift target data.

    Note

    If you specify the ARN of a different IAM role using the --service-access-role-arn option of the create-endpoint command, this IAM role option takes precedence.

The following example shows how you might use these parameters to specify a custom Amazon S3 bucket in the following create-endpoint call using the AWS CLI.

aws dms create-endpoint --endpoint-identifier redshift-target-endpoint --endpoint-type target --engine-name redshift --username your-username --password your-password --server-name your-server-name --port 5439 --database-name your-db-name --redshift-settings '{"ServiceAccessRoleArn": "your-service-access-ARN", "BucketName": "your-bucket-name", "BucketFolder": "your-bucket-folder-name"}'

Multithreaded task settings for Amazon Redshift

You can improve performance of full load and change data capture (CDC) tasks for an Amazon Redshift target endpoint by using multithreaded task settings. They enable you to specify the number of concurrent threads and the number of records to store in a buffer.

Multithreaded full load task settings for Amazon Redshift

To promote full load performance, you can use the following ParallelLoad* task settings:

  • ParallelLoadThreads – Specifies the number of concurrent threads that DMS uses during a full load to push data records to an Amazon Redshift target endpoint. The default value is zero (0) and the maximum value is 32. For more information, see Full-load task settings.

    You can use the enableParallelBatchInMemoryCSVFiles attribute set to false when using the ParallelLoadThreads task setting. The attribute improves performance of larger multithreaded full load tasks by having DMS write to disk instead of memory. The default value is true.

  • ParallelLoadBufferSize – Specifies the maximum data record requests while using parallel load threads with Redshift target. The default value is 100 and the maximum value is 1,000. We recommend you use this option when ParallelLoadThreads > 1 (greater than one).

Note

Support for the use of ParallelLoad* task settings during FULL LOAD to Amazon Redshift target endpoints is available in AWS DMS versions 3.4.5 and higher.

The ReplaceInvalidChars Redshift endpoint setting is not supported for use during change data capture (CDC) or during a parallel load enabled FULL LOAD migration task. It is supported for FULL LOAD migration when parallel load isn’t enabled. For more information see RedshiftSettings in the AWS Database Migration Service API Reference

Multithreaded CDC task settings for Amazon Redshift

To promote CDC performance, you can use the following ParallelApply* task settings:

  • ParallelApplyThreads – Specifies the number of concurrent threads that AWS DMS uses during a CDC load to push data records to a Amazon Redshift target endpoint. The default value is zero (0) and the maximum value is 32. The minimum recommended value equals the number of slices in your cluster.

  • ParallelApplyBufferSize – Specifies the maximum data record requests while using parallel apply threads with Redshift target. The default value is 100 and the maximum value is 1,000. We recommend to use this option when ParallelApplyThreads > 1 (greater than one).

    To obtain the most benefit for Redshift as a target, we recommend that the value of ParallelApplyBufferSize be at least two times (double or more) the number of ParallelApplyThreads.

Note

Support for the use of ParallelApply* task settings during CDC to Amazon Redshift target endpoints is available in AWS DMS versions 3.4.3 and higher.

The level of parallelism applied depends on the correlation between the total batch size and the maximum file size used to transfer data. When using multithreaded CDC task settings with a Redshift target, benefits are gained when batch size is large in relation to the maximum file size. For example, you can use the following combination of endpoint and task settings to tune for optimal performance.

// Redshift endpoint setting MaxFileSize=250000; // Task settings BatchApplyEnabled=true; BatchSplitSize =8000; BatchApplyTimeoutMax =1800; BatchApplyTimeoutMin =1800; ParallelApplyThreads=32; ParallelApplyBufferSize=100;

Using the settings in the previous example, a customer with a heavy transactional workload benefits by their batch buffer, containing 8000 records, getting filled in 1800 seconds, utilizing 32 parallel threads with a 250 MB maximum file size.

For more information, see Change processing tuning settings.

Note

DMS queries that run during ongoing replication to a Redshift cluster can share the same WLM (workload management) queue with other application queries that are running. So, consider properly configuring WLM properties to influence performance during ongoing replication to a Redshift target. For example, if other parallel ETL queries are running, DMS runs slower and performance gains are lost.

Target data types for Amazon Redshift

The Amazon Redshift endpoint for AWS DMS supports most Amazon Redshift data types. The following table shows the Amazon Redshift target data types that are supported when using AWS DMS and the default mapping from AWS DMS data types.

For additional information about AWS DMS data types, see Data types for AWS Database Migration Service.

AWS DMS data types

Amazon Redshift data types

BOOLEAN

BOOL

BYTES

VARCHAR (Length)

DATE

DATE

TIME

VARCHAR(20)

DATETIME

If the scale is => 0 and =< 6, depending on Redshift target column type, then one of the following:

TIMESTAMP (s)

TIMESTAMPTZ (s) — If source timestamp contains a zone offset (such as in SQL Server or Oracle) it converts to UTC on insert/update. If it doesn't contain an offset, then time is considered in UTC already.

If the scale is => 7 and =< 9, then:

VARCHAR (37)

INT1

INT2

INT2

INT2

INT4

INT4

INT8

INT8

NUMERIC

If the scale is => 0 and =< 37, then:

NUMERIC (p,s)

If the scale is => 38 and =< 127, then:

VARCHAR (Length)

REAL4

FLOAT4

REAL8

FLOAT8

STRING

If the length is 1–65,535, then use VARCHAR (length in bytes)

If the length is 65,536–2,147,483,647, then use VARCHAR (65535)

UINT1

INT2

UINT2

INT2

UINT4

INT4

UINT8

NUMERIC (20,0)

WSTRING

If the length is 1–65,535, then use NVARCHAR (length in bytes)

If the length is 65,536–2,147,483,647, then use NVARCHAR (65535)

BLOB

VARCHAR (maximum LOB size *2)

The maximum LOB size cannot exceed 31 KB. Amazon Redshift doesn't support VARCHARs larger than 64 KB.

NCLOB

NVARCHAR (maximum LOB size)

The maximum LOB size cannot exceed 63 KB. Amazon Redshift doesn't support VARCHARs larger than 64 KB.

CLOB

VARCHAR (maximum LOB size)

The maximum LOB size cannot exceed 63 KB. Amazon Redshift doesn't support VARCHARs larger than 64 KB.

Using AWS DMS with Amazon Redshift Serverless as a Target

AWS DMS supports using Amazon Redshift Serverless as a target endpoint. For information about using Amazon Redshift Serverless, see Amazon Redshift Serverless in the Amazon Redshift Management Guide.

This topic describes how to use a Amazon Redshift Serverless endpoint with AWS DMS.

Note

When creating an Amazon Redshift Serverless endpoint, for the DatabaseName field of your RedshiftSettings endpoint configuration, use either the name of the Amazon Redshift data warehouse or the name of the workgroup endpoint. For the ServerName field, use the value for Endpoint displayed in the Workgroup page for the serverless cluster (for example, default-workgroup.093291321484.us-east-1.redshift-serverless.amazonaws.com). For information about creating an endpoint, see Creating source and target endpoints. For information about the workgroup endpoint, see Connecting to Amazon Redshift Serverless .

Trust Policy with Amazon Redshift Serverless as a target

When using Amazon Redshift Serverless as a target endpoint, you must add the following highlighted section to your trust policy. This trust policy is attached to the dms-access-for-endpoint role.

{ "PolicyVersion": { "CreateDate": "2016-05-23T16:29:57Z", "VersionId": "v3", "Document": { "Version": "2012-10-17", "Statement": [ { "Action": [ "ec2:CreateNetworkInterface", "ec2:DescribeAvailabilityZones", "ec2:DescribeInternetGateways", "ec2:DescribeSecurityGroups", "ec2:DescribeSubnets", "ec2:DescribeVpcs", "ec2:DeleteNetworkInterface", "ec2:ModifyNetworkInterfaceAttribute" ], "Resource": "arn:aws:service:region:account:resourcetype/id", "Effect": "Allow" }, { "Sid": "", "Effect": "Allow", "Principal": { "Service": "redshift-serverless.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }, "IsDefaultVersion": true } }

For more information about using a trust policy with AWS DMS, see Creating the IAM roles to use with the AWS CLI and AWS DMS API.

Limitations when using Amazon Redshift Serverless as a target

Using Redshift Serverless as a target has the following limitations: