Sharing both read and write data within an AWS account or across accounts (preview) - Amazon Redshift

Sharing both read and write data within an AWS account or across accounts (preview)

This is prerelease documentation for the multi-data warehouse writes through data sharing feature for Amazon Redshift, which is available in public preview in the PREVIEW_2023 track. The documentation and the feature are both subject to change. We recommend that you use this feature only with test clusters, and not in production environments. For preview terms and conditions, see Beta Service Participation in AWS Service Terms.
Note

Multi-warehouse writes through data sharing is not currently available on ra3.xlplus clusters. To use this feature, create ra3.4xl clusters, ra3.16xl clusters, or Amazon Redshift Serverless workgroups with the preview track.

For more information about setting up preview track, see either of the following:

You can share database objects for both reads and writes across different Amazon Redshift clusters or Amazon Redshift Serverless workgroups within the same AWS account, or from one AWS account to another. You can write data across regions as well. The procedures in this topic show how to set up data sharing. You can grant permissions such as SELECT, INSERT, and UPDATE for different tables and USAGE and CREATE for different schemas. The data is live and available to all warehouses as soon as a write transaction is committed. When sharing data across accounts, producer account administrators can choose whether or not these accounts get read-only access to the data. Additionally, producer account administrators can determine whether or not specific namespaces or regions get read-only, read-and-write, or any access to the data.

The sections that follow show how to configure data sharing. The procedures assume you're working in a database in a provisioned cluster or Amazon Redshift Serverless workgroup.

Read-only data sharing vs. data sharing for reads and writes

Previously, objects in datashares were read only in all circumstances. Writing to an object in a datashare is a new feature. Objects in datashares are only write-enabled when a producer specifically grants write privileges like INSERT or CREATE on objects to the datashare. Details follow in subsequent sections in this topic.

If you're looking for the existing documentation for read-only datashares, that's available at Sharing data across clusters in Amazon Redshift.

Prerequisites and usage notes

Isolation settingsYour database must be set to snapshot isolation. Databases created in the PREVIEW_2023 track are snapshot isolation by default. Isolation settings can be changed by using the ISOLATION keyword in the CREATE DATABASE or ALTER DATABASE statements.

Note

Multi-warehouse writes through data sharing is not currently available on ra3.xlplus clusters. To use this feature, create ra3.4xl clusters, ra3.16xl clusters, or Amazon Redshift Serverless workgroups with the preview track.

Sharing database objects for read and write access in a datashare as the producer database administrator

To start data sharing, the administrator on the producer creates a datashare and adds objects to it:

  1. The producer database owner or superuser creates a datashare. A datashare is a logical container of database objects, permissions, and consumers. (Consumers are clusters or Amazon Redshift Serverless namespaces in your account and other accounts.) Each datashare is associated with the database it's created in and only objects from that database can be added. The following command creates a datashare:

    CREATE DATASHARE my_datashare [PUBLICACCESSIBLE = TRUE];

    Setting PUBLICACCESSIBLE = TRUE allows consumers to query your datashare from publicly accessible clusters and provisioned workgroups. Leave this out or explicitly set it to false if you do not want to allow it.

    The datashare owner must grant USAGE on the schemas they want to add to the datashare. The GRANT command is new. It's used to grant various actions on the schema, including CREATE and USAGE. The schemas hold shared objects:

    CREATE SCHEMA myshared_schema1; CREATE SCHEMA myshared_schema2; GRANT USAGE ON SCHEMA myshared_schema1 TO DATASHARE my_datashare; GRANT CREATE, USAGE ON SCHEMA myshared_schema2 TO DATASHARE my_datashare;

    Alternatively, the administrator can continue to run ALTER commands to add a schema to the datashare. Only USAGE permissions are granted when a schema is added this way.

    ALTER DATASHARE my_datashare ADD SCHEMA myshared_schema1;
  2. After the administrator adds schemas, they can grant datashare permissions on objects in the schema. These can be both read and write permissions. The GRANT ALL sample shows how to grant all permissions.

    GRANT SELECT, INSERT ON TABLE myshared_schema1.table1, myshared_schema1.table2, myshared_schema2.table1 TO DATASHARE my_datashare; GRANT ALL ON TABLE myshared_schema1.table4 TO DATASHARE my_datashare;

    You can continue to run commands like ALTER DATASHARE to add tables. When you do, only SELECT permissions are granted on the objects added.

    ALTER DATASHARE my_datashare ADD TABLE myshared_schema1.table1, myshared_schema1.table2, myshared_schema2.table1;
  3. The administrator grants usage on the datashare to a specific namespace in the account. You can find the namespace ID as part of the ARN in the cluster details page, in the Amazon Redshift Serverless namespace details page, or by running the command SELECT current_namespace;. For more information, see CURRENT_NAMESPACE.

    GRANT USAGE ON DATASHARE my_datashare TO NAMESPACE '86b5169f-012a-234b-9fbb-e2e24359e9a8';

Authorizing a datashare to be shared with another account

After creating a datashare on the producer and adding objects to it, you might want to share it with another AWS account. Authorizing a datashare for another account to use is a secondary security process that must be performed on the producer by a data-security administrator. It authorizes a datashare for consumption by the consumer account. You can authorize a datashare to another account for read-only purposes or for reads and writes. If authorized for writes, the account is allowed the write privileges granted to the datashare. If not, only the read privileges granted to the datashare are available. For more information on how to do this, see Authorizing datashares for writes as the producer security administrator (preview).

Associating a datashare from another account as the consumer

If the datashare is sourced from another account, you must associated it with specific clusters or namespaces in your account before it can be used with those clusters or namespaces. You can associate it with all namespaces in an account, all namespaces in specific regions in the account, or specific namespaces in the account. You can choose whether or not a namespace is allowed read-only access or allowed read and write access. If associated for writes, the associated clusters and Serverless namespaces are allowed the write privileges granted to the datashare. If not, only the read privileges granted to the datashare are available. For more information on how to do this, see Associating shared data as the consumer data security administrator (preview).

Permissions you can grant to datashares

Different object types and various permissions you can grant to them in a data sharing context.

Schemas:

  • USAGE

  • CREATE

Tables:

  • SELECT

  • INSERT

  • UPDATE

  • DELETE

  • TRUNCATE

  • DROP

  • REFERENCES

Functions:

  • EXECUTE

Enabling clusters or Serverless workgroups to query the datashare

This step assumes the datashare is originating from another cluster or Amazon Redshift Serverless namespace in your account, or it is coming from another account and has been associated with the namespace you are using.

  1. The consumer database administrator can create a database from the datashare.

    CREATE DATABASE my_ds_db [WITH PERMISSIONS] FROM DATASHARE my_datashare OF NAMESPACE 'abc123def';

    If you create a database WITH PERMISSIONS you can grant granular permissions on datashare objects to different users and roles. Without this, all users and roles granted USAGE permission on the datashare database are granted all permissions on all objects within the datashare database.

  2. The following shows how to grant permissions to a Redshift database user or role. You must be connected to a local database to run these statements. You cannot run these statements if you execute a USE command on the datashare database before running the grant statements.

    GRANT USAGE ON DATABASE my_ds_db TO ROLE data_eng; GRANT CREATE, USAGE ON SCHEMA my_ds_db.my_shared_schema TO ROLE data_eng; GRANT ALL ON ALL TABLES IN SCHEMA my_ds_db.my_shared_schema TO ROLE data_eng; GRANT USAGE ON DATABASE my_ds_db TO bi_user; GRANT USAGE ON SCHEMA my_ds_db.my_shared_schema TO bi_user; GRANT SELECT ON my_ds_db.my_shared_schema.table1 TO bi_user;

Writing to objects within a datashare

There are two ways to write to objects within a datashare.

With the USE command, connecting to the local database

With this command, you can query a datashare's objects with two-part notation and run multi-statement transactions. You can only run multi-statement transactions within a single database.

USE database_name;

You can go back to using the database you’re connected to by running the RESET USE command.

RESET USE;

Once you USE a database you can BEGIN a transaction in that database. You can't write to multiple databases within a single transaction.

Important

If you run a USE command without first committing or ending the current transaction, the data warehouse will restart. Make sure to end or commit open transactions prior to running a USE statement.

USE database_name; --set the context to the datashare database INSERT INTO datashare_schema.table1 VALUES (1); BEGIN; --start a transaction INSERT INTO datashare_schema.table2 VALUES (1); INSERT INTO datashare_schema.table3 VALUES (1); COMMIT; --end transaction; you can also use the END command.
Note

You can't copy data from a parquet and ORC files to a datashare table.

Connect to the datashares database directly via a Redshift JDBC, ODBC, or Python driver

If you connect to a datashare database directly, you can query the tables using two-part notation and run multi-statement transactions in the datashare database right away.

--connect to the datashare database via JDBC, ODBC, or the Python Driver INSERT INTO datashare_schema.table1 VALUES (1); BEGIN; --start a transaction INSERT INTO datashare_schema.table2 VALUES (1); INSERT INTO datashare_schema.table3 VALUES (1); COMMIT; -- end transaction; can also use the END command.
Important

Running an INSERT INTO SELECT statement is supported, but to do so you must have INSERT permissions on all tables referenced in the query.

Note

Support for connecting to datashare databases via the Data API is coming soon.

Requirements and limitations for datasharing in preview

  • You must be connected directly to a datashare database or run the USE command to write to datashares. However, we will soon enable the ability to do this with three-part notation.

  • You must use Serverless workgroups, ra3.4xl clusters, or ra3.16xl clusters to use this feature. Support for ra3.xlplus clusters is planned.

  • Metadata Discovery – When you're a consumer connected directly to a datashare database through the Redshift JDBC, ODBC, or Python drivers, you can view catalog data in the following ways:

  • You cannot connect to datashare databases via the Data API. Support for this will be coming soon.

  • Consumers cannot see the permissions granted to the datashares. We will be adding this soon.

  • For cross-account data sharing, both the producer and consumer cluster must be encrypted.

  • Your database’s isolation level must be snapshot isolation in order to allow other Serverless workgroups and clusters to write to it.

  • Consumers writing to datashare objects will not trigger an auto analyze operation. As a result, the producer must manually run analyze after data is inserted into the table to have table statistics updated. Without this, query plans may not be optimal.

SQL statements supported

These statements are supported for the public preview release of data sharing with writes:

  • BEGIN | START TRANSACTION

  • END | COMMIT | ROLLBACK

  • COPY without COMPUPDATE

  • { CREATE | DROP } SCHEMA

  • { CREATE | DROP | SHOW } TABLE

  • CREATE TABLE table_name AS

  • DELETE

  • { GRANT | REVOKE } privilege_name ON OBJECT_TYPE object_name TO consumer_user

  • INSERT

  • SELECT

  • INSERT INTO SELECT

  • TRUNCATE

  • UPDATE

  • Super data type columns

Unsupported statement types – The following aren't supported:

  • Multi-statement queries to consumer warehouses when writing to producers.

  • Concurrency scaling queries writing from consumers to producers.

  • Auto-copy jobs writing from consumers to producers.

  • Streaming jobs writing from consumers to producers.

  • Consumers creating zero-ETL integration tables on producer clusters. For more information about zero-ETL integrations, see Working with zero-ETL integrations.

  • Writing to a table with an interleaved sort key.