Getting started with zero-ETL integrations - Amazon Redshift

Getting started with zero-ETL integrations

Before configuring your zero-ETL integration on Amazon Redshift, configure your integration source and set it up with the required parameters and permissions. Then, continue to the rest of the initial setup from the Amazon Redshift console and AWS CLI.

To create an Aurora zero-ETL integration with Amazon Redshift

To create an Aurora zero-ETL integration with Amazon Redshift, do the following:

  1. From the Amazon RDS console, create a custom DB cluster parameter group as described in the Amazon Aurora User Guide.

  2. From the Amazon RDS console, create a source Amazon Aurora DB cluster as described in the Amazon Aurora User Guide.

  3. From the Amazon Redshift console: Create and configure a target Amazon Redshift data warehouse.

  4. From the Amazon RDS console, create a zero-ETL integration as described in the Amazon Aurora User Guide.

  5. From the Amazon Redshift console or the query editor v2, create an Amazon Redshift database from your integration.

    Then, query and create materialized views with replicated data.

To create an RDS zero-ETL integration with Amazon Redshift

To create an RDS zero-ETL integration with Amazon Redshift, do the following:

  1. From the Amazon RDS console, create a custom DB parameter group as described in the Amazon RDS User Guide.

  2. From the Amazon RDS console, create a source Amazon RDS instance as described in the Amazon RDS User Guide.

  3. From the Amazon Redshift console: Create and configure a target Amazon Redshift data warehouse.

  4. From the Amazon RDS console, create a zero-ETL integration as described in the Amazon RDS User Guide.

  5. From the Amazon Redshift console or the query editor v2, create an Amazon Redshift database from your integration.

    Then, query and create materialized views with replicated data.

The Amazon RDS console offers a step-by-step integration creation flow, in which you specify the source database and the target Amazon Redshift data warehouse. If issues occur, then you can choose to have Amazon RDS fix the issues for you instead of manually fixing them on either the Amazon RDS or Amazon Redshift console.

Create and configure a target Amazon Redshift data warehouse

Before this step, create your integration source and configure parameters required by the source type for zero-ETL integrations.

In this step, you create and configure a target Amazon Redshift data warehouse, such as a Redshift Serverless workgroup or a provisioned cluster.

Your target data warehouse must have the following characteristics:

  • Running Amazon Redshift Serverless or a provisioned cluster of instance type ra3.16xlarge, ra3.4xlarge, or ra3.xlplus.

  • Has case sensitivity (enable_case_sensitive_identifier) turned on. For more information, see Turn on case sensitivity for your data warehouse.

  • Encrypted, if your target data warehouse is an Amazon Redshift provisioned cluster. For more information, see Amazon Redshift database encryption.

  • Created in the same AWS Region as the integration source.

Note

For Aurora PostgreSQL and RDS for MySQL zero-ETL integrations with Amazon Redshift, also consider the following for your target data warehouse:

  • You must create your data warehouse in Preview on the preview_2023 track. You can't use preview features in production or move your preview data warehouse to a production deployment.

  • If you choose to create an Amazon Redshift provisioned cluster, that cluster must have at least two nodes.

  • For Aurora PostgreSQL sources, you must create your target data warehouse in the US East (Ohio) AWS Region. Note that you must create your source database for Aurora PostgreSQL zero-ETL integrations using the Amazon RDS Database Preview Environment.

    For RDS for MySQL sources, you must create your target data warehouse in a supported AWS Region. For a list of AWS Regions where RDS for MySQL zero-ETL integrations are available, see Supported Regions for zero-ETL integrations with Amazon Redshift in the Amazon RDS User Guide.

To create your target data warehouse in Preview for your Aurora PostgreSQL and RDS for MySQL zero-ETL integrations, see one of the following topics depending on your deployment type:

  • To create a preview Amazon Redshift provisioned cluster, see Creating a preview cluster. Make sure that you choose the preview_2023 track in order to use zero-ETL integrations.

  • To create a preview Amazon Redshift Serverless workgroup, see Creating a preview workgroup.

To create your target data warehouse for your Aurora MySQL zero-ETL integrations, see one of the following topics depending on your deployment type:

When you create a provisioned cluster, Amazon Redshift also creates a default parameter group. You can't edit the default parameter group. However, you can create a custom parameter group before creating a new cluster and then associate it with the cluster. Or, you can edit the parameter group that will be associated with the created cluster. You must also turn on case sensitivity for the parameter group either when creating the custom parameter group or when editing a current one to use zero-ETL integrations.

You can create a custom parameter group using either the Amazon Redshift console or the AWS CLI as follows:

Turn on case sensitivity for your data warehouse

You can attach a parameter group and enable case sensitivity for a provisioned cluster during creation. However, you can update a serverless workgroup through the AWS Command Line Interface (AWS CLI) only after it's been created. This is required to support the case sensitivity of MySQL and PostgreSQL. enable_case_sensitive_identifier is a configuration value that determines whether name identifiers of databases, tables, and columns are case sensitive. This parameter must be turned on to create zero-ETL integrations in the data warehouse. For more information, see enable_case_sensitive_identifier.

For Amazon Redshift Serverless – Turn on case sensitivity for Amazon Redshift Serverless using the AWS CLI. Note that you can turn on case sensitivity for Amazon Redshift Serverless only from the AWS CLI.

For Amazon Redshift provisioned clusters, enable case sensitivity for your target cluster using one of the following topics:

Turn on case sensitivity for Amazon Redshift Serverless using the AWS CLI

Run the following AWS CLI command to turn on case sensitivity for your workgroup.

aws redshift-serverless update-workgroup \ --workgroup-name target-workgroup \ --config-parameters parameterKey=enable_case_sensitive_identifier,parameterValue=true

Wait for the workgroup status to be Active before proceeding to the next step.

Turn on case sensitivity for Amazon Redshift provisioned clusters using the Amazon Redshift console

  1. Sign in to the AWS Management Console and open the Amazon Redshift console at https://console.aws.amazon.com/redshiftv2/.

  2. In the left navigation pane, choose Provisioned clusters dashboard.

  3. Choose the provisioned cluster that you want to replicate data into.

  4. In the left navigation pane, choose Configurations > Workload management.

  5. On the workload management page, choose the parameter group.

  6. Choose the Parameters tab.

  7. Choose Edit parameters, then change enable_case_sensitive_identifier to true.

  8. Then, choose Save.

Turn on case sensitivity for Amazon Redshift provisioned clusters using the AWS CLI

  1. Because you can't edit the default parameter group, from your terminal program, run the following AWS CLI command to create a custom parameter group. Later, you will associate it with the provisioned cluster.

    aws redshift create-cluster-parameter-group \ --parameter-group-name zero-etl-params \ --parameter-group-family redshift-1.0 \ --description "Param group for zero-ETL integrations"
  2. Run the following AWS CLI command to turn on case sensitivity for the parameter group.

    aws redshift modify-cluster-parameter-group \ --parameter-group-name zero-etl-params \ --parameters ParameterName=enable_case_sensitive_identifier,ParameterValue=true
  3. Run the following command to associate the parameter group with the cluster.

    aws redshift modify-cluster \ --cluster-identifier target-cluster \ --cluster-parameter-group-name zero-etl-params
  4. Wait for the provisioned cluster to be available. You can check the status of the cluster by using the describe-cluster command. Then, run the following command to reboot the cluster.

    aws redshift reboot-cluster \ --cluster-identifier target-cluster

Configure authorization for your Amazon Redshift data warehouse

To replicate data from your integration source into your Amazon Redshift data warehouse, you must initially add the following two entities:

  • Authorized principal – identifies the user or role that can create zero-ETL integrations into the data warehouse.

  • Authorized integration source – identifies the source database that can update the data warehouse.

You can configure authorized principals and authorized integration sources from the Resource Policy tab on the Amazon Redshift console or using the Amazon Redshift PutResourcePolicy API operation.

Add authorized principals

To create a zero-ETL integration into your Redshift Serverless workgroup or provisioned cluster, authorize access to the associated namespace or provisioned cluster.

You can skip this step if both of the following conditions are true:

  • The AWS account that owns the Redshift Serverless workgroup or provisioned cluster also owns the source database.

  • That principal is associated with an identity-based IAM policy with permissions to create zero-ETL integrations into this Redshift Serverless namespace or provisioned cluster.

Add authorized principals to an Amazon Redshift Serverless namespace

  1. In the Amazon Redshift console, in the left navigation pane, choose Redshift Serverless.

  2. Choose Namespace configuration, then choose your namespace, and go to the Resource Policy tab.

  3. Choose Add authorized principals.

  4. For each authorized principal that you want to add, enter into the namespace either the ARN of the AWS user or role, or the ID of the AWS account that you want to grant access to create zero-ETL integrations. An account ID is stored as an ARN.

  5. Choose Save changes.

Add authorized principals to an Amazon Redshift provisioned cluster

  1. In the Amazon Redshift console, in the left navigation pane, choose Provisioned clusters dashboard.

  2. Choose Clusters, then choose the cluster, and go to the Resource Policy tab.

  3. Choose Add authorized principals.

  4. For each authorized principal that you want to add, enter into the cluster either the ARN of the AWS user or role, or the ID of the AWS account that you want to grant access to create zero-ETL integrations. An account ID is stored as an ARN.

  5. Choose Save changes.

Add authorized integration sources

To allow your source to update your Amazon Redshift data warehouse, you must add it as an authorized integration source to the namespace.

Add an authorized integration source to an Amazon Redshift Serverless namespace

  1. In the Amazon Redshift console, go to Serverless dashboard.

  2. Choose the name of the namespace.

  3. Go to the Resource Policy tab.

  4. Choose Add authorized integration source.

  5. Specify the ARN of the source for the zero-ETL integration.

Note

Removing an authorized integration source stops data from replicating into the namespace. This action deactivates all zero-ETL integrations from that source into this namespace.

Add an authorized integration source to an Amazon Redshift provisioned cluster

  1. In the Amazon Redshift console, go to Provisioned clusters dashboard.

  2. Choose the name of the provisioned cluster.

  3. Go to the Resource Policy tab.

  4. Choose Add authorized integration source.

  5. Specify the ARN of the source that's the data source for the zero-ETL integration.

Note

Removing an authorized integration source stops data from replicating into the provisioned cluster. This action deactivates all zero-ETL integrations from that source into this Amazon Redshift provisioned cluster.

Configure authorization using the Amazon Redshift API

You can use the Amazon Redshift API operations to configure resource policies that work with zero-ETL integrations.

To control the source that can create an inbound integration into the namespace, create a resource policy and attach it to the namespace. With the resource policy, you can specify the source that has access to the integration. The resource policy is attached to the namespace of your target data warehouse to allow the source to create an inbound integration to replicate live data from the source into Amazon Redshift.

The following is a sample resource policy.

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "redshift.amazonaws.com" }, "Action": "redshift:AuthorizeInboundIntegration", "Condition": { "StringEquals": { "aws:SourceArn": "source_arn" } } }, { "Effect": "Allow", "Principal": { "AWS": "source_principal" }, "Action": "redshift:CreateInboundIntegration" } ] }

The following summarizes the Amazon Redshift API operations applicable to configuring resource policies for integrations:

  • Use the PutResourcePolicy API operation to persist the resource policy. When you provide another resource policy, the previous resource policy on the resource is replaced. Use the previous example resource policy, which grants permissions for the following actions:

    • CreateInboundIntegration – Allows the source principal to create an inbound integration for data to be replicated from the source into the target data warehouse.

    • AuthorizeInboundIntegration – Allows Amazon Redshift to continuously validate that the target data warehouse can receive data replicated from the source ARN.

  • Use the GetResourcePolicy API operation is to view existing resource policies.

  • Use the DeleteResourcePolicy API operation to remove a resource policy from the resource.

To update a resource policy, you can also use the put-resource-policy AWS CLI command.

Next steps

Now that you have configured authorization for your target Amazon Redshift data warehouse, you can create a zero-ETL integration and start replicating data.

Depending on your source, do one of the following: