Summary Prerequisites and limitations Architecture Tools Epics Related resources Additional information

Migrate an Amazon Redshift cluster to an AWS Region in China

Created by Jing Yan (AWS)

R Type: Relocate	Environment: Production	Technologies: Databases; Migration
Workload: All other workloads	AWS services: Amazon Redshift	Source: AWS Redshift
Target: AWS Redshift

Summary

This pattern provides a step-by-step approach to migrate an Amazon Redshift cluster to an AWS Region in China from another AWS Region.

This pattern uses SQL commands to recreate all the database objects, and uses the UNLOAD command to move this data from Amazon Redshift to an Amazon Simple Storage Service (Amazon S3) bucket in the source Region. The data is then migrated to an S3 bucket in the AWS Region in China. The COPY command is used to load data from the S3 bucket and transfer it to the target Amazon Redshift cluster.

Amazon Redshift doesn't currently support cross-Region features such as snapshot copying to AWS Regions in China. This pattern provides a way to work around that limitation. You can also reverse the steps in this pattern to migrate data from an AWS Region in China to another AWS Region.

Prerequisites and limitations

Prerequisites

Active AWS accounts in both a China Region and an AWS Region outside China
Existing Amazon Redshift clusters in both a China Region and an AWS Region outside China

Limitations

This is an offline migration, which means the source Amazon Redshift cluster cannot perform write operations during the migration.

Architecture

Source technology stack

Amazon Redshift cluster in an AWS Region outside China

Target technology stack

Amazon Redshift cluster in an AWS Region in China

Target architecture

Tools

Tools

Amazon S3 – Amazon Simple Storage Service (Amazon S3) is an object storage service that offers scalability, data availability, security, and performance. You can use Amazon S3 to store data from Amazon Redshift, and you can copy data from an S3 bucket to Amazon Redshift.
Amazon Redshift – Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud.
psql – psql is a terminal-based front-end to PostgreSQL.

Epics

Task	Description	Skills required
Launch and configure an EC2 instance in the source Region.	Sign in to the AWS Management Console and open the Amazon Elastic Compute Cloud (Amazon EC2) console. Your current Region is displayed in the navigation bar at the top of the screen. This Region cannot be an AWS Region in China. From the Amazon EC2 console dashboard, choose “Launch instance,” and create and configure an EC2 instance. Important: Ensure your EC2 security groups for inbound rules allow unrestricted access to TCP port 22 from your source machine. For instructions on how to launch and configure an EC2 instance, see the “Related resources” section.	DBA, Developer
Install the psql tool.	Download and install PostgreSQL. Amazon Redshift does not provide the psql tool, it is installed with PostgreSQL. For more information about using psql and installing PostgreSQL tools, see the “Related resources” section.	DBA
Record the Amazon Redshift cluster details.	Open the Amazon Redshift console, and choose “Clusters” in the navigation pane. Then choose the Amazon Redshift cluster name from the list. On the “Properties” tab, in the “Database configurations” section, record the “Database name” and “Port.” Open the “Connection details” section and record the “Endpoint,” which is in the “endpoint:<port>/<databasename>” format. Important: Ensure your Amazon Redshift security groups for inbound rules allow unrestricted access to TCP port 5439 from your EC2 instance.	DBA
Connect psql to the Amazon Redshift cluster.	At a command prompt, specify the connection information by running the “psql -h <endpoint> -U <userid> -d <databasename> -p <port>” command. At the psql password prompt, enter the password for the “<userid>” user. You are then connected to the Amazon Redshift cluster and can interactively enter commands.	DBA
Create an S3 bucket.	Open the Amazon S3 console, and create an S3 bucket to hold the files exported from Amazon Redshift. For instructions on how to create an S3 bucket, see the “Related resources” section.	DBA, AWS General
Create an IAM policy that supports unloading data.	Open the AWS Identity and Access Management (IAM) console and choose “Policies.” Choose “Create policy,” and choose the “JSON” tab. Copy and paste the IAM policy for unloading data from the “Additional information” section. Important: Replace “s3_bucket_name” with your S3 bucket’s name. Choose “Review policy,” and enter a name and description for the policy. Choose “Create policy.”	DBA
Create an IAM role to allow UNLOAD operation for Amazon Redshift.	Open the IAM console and choose “Roles.” Choose “Create role,” and choose “AWS service” in “Select type of trusted entity.” Choose “Redshift” for the service, choose “Redshift – Customizable,” and then choose “Next.” Choose the “Unload” policy you created earlier, and choose “Next.” Enter a “Role name,” and choose “Create role.”	DBA
Associate IAM role with the Amazon Redshift cluster.	Open the Amazon Redshift console, and choose “Manage IAM roles.” Choose “Available roles” from the dropdown menu and choose the role you created earlier. Choose “Apply changes.” When the “Status” for the IAM role on the “Manage IAM roles” shows as “In-sync”, you can run the UNLOAD command.	DBA
Stop write operations to the Amazon Redshift cluster.	You must remember to stop all write operations to the source Amazon Redshift cluster until the migration is complete.	DBA

Task	Description	Skills required
Launch and configure an EC2 instance in the target Region.	Sign in to the AWS Management Console for a Region in China, either Beijing or Ningxia. From the Amazon EC2 console, choose “Launch instance,” and create and configure an EC2 instance. Important: Make sure your Amazon EC2 security groups for inbound rules allow unrestricted access to TCP port 22 from your source machine. For further instructions on how to launch and configure an EC2 instance, see the “Related resources” section.	DBA
Record the Amazon Redshift cluster details.	Open the Amazon Redshift console, and choose “Clusters” in the navigation pane. Then choose the Amazon Redshift cluster name from the list. On the “Properties” tab, in the “Database configurations” section, record the “Database name” and “Port.” Open the “Connection details” section and record the “Endpoint,” which is in the “endpoint:<port>/<databasename>” format. Important: Make sure your Amazon Redshift security groups for inbound rules allow unrestricted access to TCP port 5439 from your EC2 instance.	DBA
Connect psql to the Amazon Redshift cluster.	At a command prompt, specify the connection information by running the “psql -h <endpoint> -U <userid> -d <databasename> -p <port>” command. At the psql password prompt, enter the password for the “<userid>” user. You are then connected to the Amazon Redshift cluster and can interactively enter commands.	DBA
Create an S3 bucket.	Open the Amazon S3 console, and create an S3 bucket to hold the exported files from Amazon Redshift. For help with this and other stories, see the “Related resources” section.	DBA
Create an IAM policy that supports copying data.	Open the IAM console and choose “Policies.” Choose “Create policy,” and choose the “JSON” tab. Copy and paste the IAM policy for copying data from the “Additional information” section. Important: Replace “s3_bucket_name” with your S3 bucket’s name. Choose “Review policy,” enter a name and description for the policy. Choose “Create policy.”	DBA
Create an IAM role to allow COPY operation for Amazon Redshift.	Open the IAM console and choose “Roles.” Choose “Create role,” and choose “AWS service” in “Select type of trusted entity.” Choose “Redshift” for the service, choose “Redshift – Customizable,” and then choose “Next.” Choose the “Copy” policy you created earlier, and choose “Next.” Enter a “Role name,” and choose “Create role.”	DBA
Associate IAM role with the Amazon Redshift cluster.	Open the Amazon Redshift console, and choose “Manage IAM roles.” Choose “Available roles” from the dropdown menu and choose the role you created earlier. Choose “Apply changes.” When the “Status” for the IAM role on the “Manage IAM roles” shows as “In-sync”, you can run the “COPY” command.	DBA

Task	Description	Skills required
Verify the rows in the source Amazon Redshift tables.	Use the scripts in the “Additional information” section to verify and record the number of rows in the source Amazon Redshift tables. Remember to split the data evenly for the UNLOAD and COPY scripts. This will improve the data unloading and loading efficiency, because the data quantity covered by each script will be balanced.	DBA
Verify the number of database objects in the source Amazon Redshift cluster.	Use the scripts in the “Additional information” section to verify and record the number of databases, users, schemas, tables, views, and user-defined functions (UDFs) in your source Amazon Redshift cluster.	DBA
Verify SQL statement results before migration.	Some SQL statements for data validation should be sorted according to actual business and data situations. This is to verify the imported data to ensure it is consistent and displayed correctly.	DBA

Task	Description	Skills required
Generate Amazon Redshift DDL scripts.	Generate Data Definition Language (DDL) scripts by using the links from the “SQL statements to query Amazon Redshift” section in the “Additional information” section. These DDL scripts should include the “create user,” “create schema,” “privileges on schema to user,” “create table/view,” “privileges on objects to user,” and “create function” queries.	DBA
Create objects in the Amazon Redshift cluster for the target Region.	Run the DDL scripts by using the AWS Command Line Interface (AWS CLI) in the AWS Region in China. These scripts will create objects in the Amazon Redshift cluster for the target Region.	DBA
Unload source Amazon Redshift cluster data to the S3 bucket.	Run the UNLOAD command to unload data from the Amazon Redshift cluster in the source Region to the S3 bucket.	DBA, Developer
Transfer source Region S3 bucket data to target Region S3 bucket.	Transfer the data from your source Region S3 bucket to the target S3 bucket. Because the “$ aws s3 sync” command cannot be used, make sure you use the process outlined in the “Transferring Amazon S3 data from AWS Regions to AWS Regions in China” article in the “Related resources” section.	Developer
Load data into the target Amazon Redshift cluster.	In the psql tool for your target Region, run the COPY command to load data from the S3 bucket to the target Amazon Redshift cluster.	DBA

Task	Description	Skills required
Verify and compare the number of rows in the source and target tables.	Verify and compare the number of table rows in the source and target Regions to ensure all are migrated.	DBA
Verify and compare the number of source and target database objects.	Verify and compare all database objects in the source and target Regions to ensure all are migrated.	DBA
Verify and compare SQL script results in the source and target Regions.	Run the SQL scripts prepared before the migration. Verify and compare the data to ensure that the SQL results are correct.	DBA
Reset the passwords of all users in the target Amazon Redshift cluster.	After the migration is complete and all data is verified, you should reset all user passwords for the Amazon Redshift cluster in the AWS Region in China.	DBA

Related resources

Additional information

IAM policy for unloading data


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": ["arn:aws:s3:::s3_bucket_name"]
    },
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:DeleteObject"],
      "Resource": ["arn:aws:s3:::s3_bucket_name/*"]
    }
  ]
}

IAM policy for copying data


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": ["arn:aws:s3:::s3_bucket_name"]
    },
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject"],
      "Resource": ["arn:aws:s3:::s3_bucket_name/*"]
    }
  ]
}

SQL statements to query Amazon Redshift


##Database

select * from pg_database where datdba>1;

##User

select * from pg_user where usesysid>1;

##Schema

SELECT n.nspname AS "Name",

  pg_catalog.pg_get_userbyid(n.nspowner) AS "Owner"

FROM pg_catalog.pg_namespace n

WHERE n.nspname !~ '^pg_' AND n.nspname <> 'information_schema'

ORDER BY 1;

##Table

select count(*) from pg_tables where schemaname not in ('pg_catalog','information_schema');

select schemaname,count(*) from pg_tables where schemaname not in ('pg_catalog','information_schema') group by schemaname order by 1;

##View

SELECT 

    n.nspname AS schemaname,c.relname AS viewname,pg_catalog.pg_get_userbyid(c.relowner) as "Owner"

FROM 

    pg_catalog.pg_class AS c

INNER JOIN

    pg_catalog.pg_namespace AS n

    ON c.relnamespace = n.oid

WHERE relkind = 'v' and n.nspname not in ('information_schema','pg_catalog');

##UDF

SELECT 

   n.nspname AS schemaname,

   p.proname AS proname,

   pg_catalog.pg_get_userbyid(p.proowner) as "Owner"

FROM pg_proc p

LEFT JOIN pg_namespace n on n.oid = p.pronamespace

WHERE p.proowner != 1;

SQL scripts to generate DDL statements

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Migrate an Amazon RDS for Oracle DB to another VPC

Migrate workloads to the VMware Cloud on AWS by using VMware HCX