Sharing write access to data (Preview) - Amazon Redshift

Sharing write access to data (Preview)

You can share database objects for both reads and writes across different Amazon Redshift clusters or Amazon Redshift Serverless workgroups within the same AWS account, across accounts, and across regions. The procedures in this topic show how to set up data sharing that includes write permissions. You can grant permissions such as SELECT, INSERT, and UPDATE for different tables and USAGE and CREATE for schemas. The data is live and available to all warehouses as soon as a write transaction is committed. Producer account administrators can determine whether or not specific namespaces or regions get read-only, read-and-write, or any access to the data.

The sections that follow show how to configure data sharing. The procedures assume you're working in a database in a provisioned cluster or Amazon Redshift Serverless workgroup.

Read-only data sharing vs. data sharing for reads and writes

Previously, objects in datashares were read only in all circumstances. Writing to an object in a datashare is a new feature. Objects in datashares are only write-enabled when a producer specifically grants write privileges like INSERT or CREATE on objects to the datashare. Additionally, for cross-account sharing, a producer has to authorize the datashare for writes and the consumer has to associate specific clusters and workgroups for writes. Details follow in subsequent sections in this topic.

Permissions you can grant to datashares (preview)

Different object types and various permissions you can grant to them in a data sharing context.

Schemas:

  • USAGE

  • CREATE

Tables:

  • SELECT

  • INSERT

  • UPDATE

  • DELETE

  • TRUNCATE

  • DROP

  • REFERENCES

Functions:

  • EXECUTE

Databases:

  • CREATE

Requirements and limitations for datasharing in preview

  • Connections – You must be connected directly to a datashare database or run the USE command to write to datashares. However, we will soon enable the ability to do this with three-part notation.

  • Availability – You must use Serverless workgroups, ra3.4xl clusters, or ra3.16xl clusters to use this feature. Support for ra3.xlplus clusters is planned.

  • Metadata Discovery – When you're a consumer connected directly to a datashare database through the Redshift JDBC, ODBC, or Python drivers, you can view catalog data in the following ways:

  • Data API – You cannot connect to datashare databases via the Data API. Support for this will be coming soon.

  • Permissions visibility – Consumers cannot see the permissions granted to the datashares. We will be adding this soon.

  • Encryption – For cross-account data sharing, both the producer and consumer cluster must be encrypted.

  • Isolation level – Your database’s isolation level must be snapshot isolation in order to allow other Serverless workgroups and clusters to write to it.

  • Auto operations – Consumers writing to datashare objects will not trigger an auto analyze operation. As a result, the producer must manually run analyze after data is inserted into the table to have table statistics updated. Without this, query plans may not be optimal.

  • Multi-statement queries and transactions – Multi-statement queries outside of a transaction block aren't currently supported. As a result, if you are using a query editor like dbeaver and you have multiple write queries, you need to wrap your queries in an explicit BEGIN...END transaction statement.

SQL statements supported

These statements are supported for the public preview release of data sharing with writes:

  • BEGIN | START TRANSACTION

  • END | COMMIT | ROLLBACK

  • COPY without COMPUPDATE

  • { CREATE | DROP } SCHEMA

  • { CREATE | DROP | SHOW } TABLE

  • CREATE TABLE table_name AS

  • DELETE

  • { GRANT | REVOKE } privilege_name ON OBJECT_TYPE object_name TO consumer_user

  • INSERT

  • SELECT

  • INSERT INTO SELECT

  • TRUNCATE

  • UPDATE

  • Super data type columns

Unsupported statement types – The following aren't supported:

  • Multi-statement queries to consumer warehouses when writing to producers.

  • Concurrency scaling queries writing from consumers to producers.

  • Auto-copy jobs writing from consumers to producers.

  • Streaming jobs writing from consumers to producers.

  • Consumers creating zero-ETL integration tables on producer clusters. For more information about zero-ETL integrations, see Working with zero-ETL integrations.

  • Writing to a table with an interleaved sort key.

Sharing data within an account with write permissions as the producer account administrator (preview)

Previously, objects in datashares were read only in all circumstances. Writing to an object in a datashare is a new feature. Objects in datashares are only write-enabled when a producer specifically grants write privileges like INSERT or CREATE on objects to the datashare. Details follow in subsequent sections in this topic.

If you're looking for the existing documentation for read-only datashares, that's available at Sharing data across clusters in Amazon Redshift.

To start data sharing, the administrator on the producer creates a datashare and adds objects to it:

  1. The producer database owner or superuser creates a datashare. A datashare is a logical container of database objects, permissions, and consumers. (Consumers are clusters or Amazon Redshift Serverless namespaces in your account and other accounts.) Each datashare is associated with the database it's created in and only objects from that database can be added. The following command creates a datashare:

    CREATE DATASHARE my_datashare [PUBLICACCESSIBLE = TRUE];

    Setting PUBLICACCESSIBLE = TRUE allows consumers to query your datashare from publicly accessible clusters and provisioned workgroups. Leave this out or explicitly set it to false if you do not want to allow it.

    The datashare owner must grant USAGE on the schemas they want to add to the datashare. The GRANT command is new. It's used to grant various actions on the schema, including CREATE and USAGE. The schemas hold shared objects:

    CREATE SCHEMA myshared_schema1; CREATE SCHEMA myshared_schema2; GRANT USAGE ON SCHEMA myshared_schema1 TO DATASHARE my_datashare; GRANT CREATE, USAGE ON SCHEMA myshared_schema2 TO DATASHARE my_datashare;

    Alternatively, the administrator can continue to run ALTER commands to add a schema to the datashare. Only USAGE permissions are granted when a schema is added this way.

    ALTER DATASHARE my_datashare ADD SCHEMA myshared_schema1;
  2. After the administrator adds schemas, they can grant datashare permissions on objects in the schema. These can be both read and write permissions. The GRANT ALL sample shows how to grant all permissions.

    GRANT SELECT, INSERT ON TABLE myshared_schema1.table1, myshared_schema1.table2, myshared_schema2.table1 TO DATASHARE my_datashare; GRANT ALL ON TABLE myshared_schema1.table4 TO DATASHARE my_datashare;

    You can continue to run commands like ALTER DATASHARE to add tables. When you do, only SELECT permissions are granted on the objects added.

    ALTER DATASHARE my_datashare ADD TABLE myshared_schema1.table1, myshared_schema1.table2, myshared_schema2.table1;
  3. The administrator grants usage on the datashare to a specific namespace in the account. You can find the namespace ID as part of the ARN in the cluster details page, in the Amazon Redshift Serverless namespace details page, or by running the command SELECT current_namespace;. For more information, see CURRENT_NAMESPACE.

    GRANT USAGE ON DATASHARE my_datashare TO NAMESPACE '86b5169f-012a-234b-9fbb-e2e24359e9a8';