Sharing datashares - Amazon Redshift

Sharing datashares

You only need datashares when you are sharing data between different Amazon Redshift provisioned clusters or serverless workgroups. Within the same cluster, you can query another database using simple three-part notation database.schema.table as long as you have the required permissions on the objects in the other database.

Managing permissions for datashares in Amazon Redshift

As a producer cluster administrator, you retain control for the datasets you are sharing. You can add new objects to or remove them from the datashare. You can also grant or revoke access to datashares as a whole for the consumer clusters, AWS accounts, or AWS Regions. When permissions are revoked, consumer clusters immediately lose access to the shared objects and stop seeing them in the list of INBOUND datashares in SVV_DATASHARES.

The following example creates the datashare salesshare, adds the schema public, and adds the table public.tickit_sales_redshift to salesshare. It also grants usage permissions on salesshare to the specified cluster namespace.

CREATE DATASHARE salesshare; ALTER DATASHARE salesshare ADD SCHEMA public; ALTER DATASHARE salesshare ADD TABLE public.tickit_sales_redshift; GRANT USAGE ON DATASHARE salesshare TO NAMESPACE '13b8833d-17c6-4f16-8fe4-1a018f5ed00d';

For CREATE DATASHARE, superusers and database owners can create datashares. For more information, see CREATE DATASHARE. For ALTER DATASHARE, the owner of the datashare with the required permissions on the datashare objects to be added or removed can alter the datashare. For information, see ALTER DATASHARE.

As a producer administrator, when you drop a datashare, it stops being listed on consumer clusters. The databases and schema references created on the consumer cluster from the dropped datashare continue to exist with no objects in them. The consumer cluster administrator must delete these databases manually.

On the consumer side, a consumer cluster administrator can determine which users and roles should get access to the shared data by creating a database from the datashare. Depending on the options you choose when creating the database, you can control access to it as follows. For more information about creating a database from a datashare, see CREATE DATABASE.

Creating the database without the WITH PERMISSIONS clause

An administrator can control access at the database or schema level. To control access at the schema level, the administrator must create an external schema from the Amazon Redshift database created from the datashare.

The following example grants permissions to access a shared table at the database level and schema level.

GRANT USAGE ON DATABASE sales_db TO Bob; CREATE EXTERNAL SCHEMA sales_schema FROM REDSHIFT DATABASE sales_db; GRANT USAGE ON SCHEMA sales_schema TO ROLE Analyst_role;

To further restrict access, you can create views on top of shared objects, exposing only the necessary data. You can then use these views to give access to the users and roles.

Once the users are granted access to the database or schema, they will have access to all shared objects in that database or schema.

Creating the database with the WITH PERMISSIONS clause

After granting usage rights on the database or schema, an administrator can further control access using the same permission granting process as they would on a local database or schema. Without individual object permissions, users can’t access any objects in the datashared database or schema even after being granted the USAGE permission.

The following example grants permissions to access a shared table at the database level.

GRANT USAGE ON DATABASE sales_db TO Bob; GRANT USAGE FOR SCHEMAS IN DATABASE sales_db TO Bob; GRANT SELECT ON sales_db.public.tickit_sales_redshift TO Bob;

After being granted access to the database or schema, users still need to be given the relevant permissions for any objects in the database or schema that you want them to access.

Granular sharing using WITH PERMISSIONS (preview)

Enabling clusters or Serverless workgroups to query the datashare

This step assumes the datashare is originating from another cluster or Amazon Redshift Serverless namespace in your account, or it is coming from another account and has been associated with the namespace you are using.

  1. The consumer database administrator can create a database from the datashare.

    CREATE DATABASE my_ds_db [WITH PERMISSIONS] FROM DATASHARE my_datashare OF NAMESPACE 'abc123def';

    If you create a database WITH PERMISSIONS you can grant granular permissions on datashare objects to different users and roles. Without this, all users and roles granted USAGE permission on the datashare database are granted all permissions on all objects within the datashare database.

  2. The following shows how to grant permissions to a Redshift database user or role. You must be connected to a local database to run these statements. You cannot run these statements if you execute a USE command on the datashare database before running the grant statements.

    GRANT USAGE ON DATABASE my_ds_db TO ROLE data_eng; GRANT CREATE, USAGE ON SCHEMA my_ds_db.my_shared_schema TO ROLE data_eng; GRANT ALL ON ALL TABLES IN SCHEMA my_ds_db.my_shared_schema TO ROLE data_eng; GRANT USAGE ON DATABASE my_ds_db TO bi_user; GRANT USAGE ON SCHEMA my_ds_db.my_shared_schema TO bi_user; GRANT SELECT ON my_ds_db.my_shared_schema.table1 TO bi_user;

Working with views in Amazon Redshift data sharing

A producer cluster can share regular, late-binding, and materialized views. When sharing regular or late-binding views, you don't have to share the base tables. The following table shows how views are supported with data sharing.

View name Can this view be added to a datashare? Can a consumer create this view on datashare objects across clusters?
Regular view Yes No
Late-binding view Yes Yes
Materialized view Yes Yes, but only with a complete refresh

The following query shows the output of a regular view that is supported with data sharing. For information about regular view definition, see CREATE VIEW.

SELECT * FROM tickit_db.public.myevent_regular_vw ORDER BY eventid LIMIT 5; eventid | eventname ----------+------------- 3835 | LeAnn Rimes 3967 | LeAnn Rimes 4856 | LeAnn Rimes 4948 | LeAnn Rimes 5131 | LeAnn Rimes

The following query shows the output of a late-binding view that is supported with data sharing. For information about late-binding view definition, see CREATE VIEW.

SELECT * FROM tickit_db.public.event_lbv ORDER BY eventid LIMIT 5; eventid | venueid | catid | dateid | eventname | starttime --------+---------+-------+--------+------------------------------+--------------------- 1 | 305 | 8 | 1851 | Gotterdammerung | 2008-01-25 14:30:00 2 | 306 | 8 | 2114 | Boris Godunov | 2008-10-15 20:00:00 3 | 302 | 8 | 1935 | Salome | 2008-04-19 14:30:00 4 | 309 | 8 | 2090 | La Cenerentola (Cinderella) | 2008-09-21 14:30:00 5 | 302 | 8 | 1982 | Il Trovatore | 2008-06-05 19:00:00

The following query shows the output of a materialized view that is supported with data sharing. For information about materialized view definition, see CREATE MATERIALIZED VIEW.

SELECT * FROM tickit_db.public.tickets_mv; catgroup | qtysold ----------+--------- Concerts | 195444 Shows | 149905

You can maintain common tables across all tenants in a producer cluster. You can also share subsets of data filtered by dimension columns, such as tenant_id (account_id or namespace_id), to consumer clusters. To do this, you can define a view on the base table with a filter on these ID columns, for example current_aws_account = tenant_id. On the consumer side, when you query the view, you see only the rows that qualify for your account. To do this, you can use the Amazon Redshift context functions current_aws_account and current_namespace.

The following query returns the account ID in which the current Amazon Redshift cluster resides. You can run this query if you are connected to Amazon Redshift.

select current_user, current_aws_account; current_user | current_aws_account -------------+-------------------- dwuser | 111111111111 (1row)

The following query returns the namespace of the current Amazon Redshift cluster. You can run this query if you are connected to the database.

select current_user, current_namespace; current_user | current_namespace -------------+-------------------------------------- dwuser | 86b5169f-01dc-4a6f-9fbb-e2e24359e9a8 (1 row)

Incremental refresh for materialized views in a datashare

Amazon Redshift supports incremental refresh for materialized views in a consumer datashare when the base tables are shared. Incremental refresh is an operation where Amazon Redshift identifies changes in the base table or tables that happened after the previous refresh and updates only the corresponding records in the materialized view. For more information about this behavior, see CREATE MATERIALIZED VIEW.

Managing access to data sharing API operations with IAM policies

To control the access to the data sharing API operations, use IAM action-based policies. For information about how to manage IAM policies, see Managing IAM policies in the IAM User Guide.

For information on the permissions required to use the data sharing API operations, see Permissions required to use the data sharing API operations in the Amazon Redshift Management Guide.

To make cross-account data sharing more secure, you can use a conditional key ConsumerIdentifier for the AuthorizeDataShare and DeauthorizeDataShare API operations. By doing this, you can explicitly control which AWS accounts can make calls to the two API operations.

You can deny authorizing or deauthorizing data sharing for any consumer that isn't your own account. To do so, specify the AWS account number in the IAM policy.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Deny", "Action": [ "redshift:AuthorizeDataShare", "redshift:DeauthorizeDataShare" ], "Resource": "*", "Condition": { "StringNotEquals": { "redshift:ConsumerIdentifier": "555555555555" } } } ] }

You can allow a producer with a DataShareArn testshare2 to explicitly share with a consumer with an AWS account of 111122223333 in the IAM policy.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "redshift:AuthorizeDataShare", "redshift:DeauthorizeDataShare" ], "Resource": "arn:aws:redshift:us-east-1:666666666666:datashare:af06285e-8a45-4ee9-b598-648c218c8ff1/testshare2", "Condition": { "StringEquals": { "redshift:ConsumerIdentifier": "111122223333" } } } ] }