Overview of data sharing in Amazon Redshift - Amazon Redshift

Overview of data sharing in Amazon Redshift

With data sharing, you can securely and easily share live data across Amazon Redshift clusters.

For information about how to get started working with data sharing and manage datashares using the AWS Management Console, see Managing data sharing tasks.

Data sharing use cases for Amazon Redshift

Amazon Redshift data sharing is especially useful for these use cases:

  • Supporting different kinds of business-critical workloads – Use a central extract, transform, and load (ETL) cluster that shares data with multiple business intelligence (BI) or analytic clusters. This approach provides read workload isolation and chargeback for individual workloads. You can size and scale your individual workload compute according to the workload-specific requirements of price and performance.

  • Enabling cross-group collaboration – Enable seamless collaboration across teams and business groups for broader analytics, data science, and cross-product impact analysis.

  • Delivering data as a service – Share data as a service across your organization.

  • Sharing data between environments – Share data among development, test, and production environments. You can improve team agility by sharing data at different levels of granularity.

  • Licensing access to data in Amazon Redshift – List Amazon Redshift data sets in the AWS Data Exchange catalog that customers can find, subscribe to, and query in minutes.

Data sharing write-access use cases (preview)

Datasharing for writes has several important use cases:

  • Update business source data on the producer – You can share data as a service across your organization, but then consumers can also perform actions on the source data. For instance, they can communicate back up-to-date values or acknowledge receipt of data. These are just a couple possible business use cases.

  • Insert additional records on the producer – Consumers can add records to the original source data. These can be marked as from the consumer, if needed.

For information specifically regarding how to perform write operations on a datashare, see Sharing write access to data (Preview).

Sharing data at different levels in Amazon Redshift

With Amazon Redshift, you can share data at different levels. These levels include databases, schemas, tables, views (including regular, late-binding, and materialized views), and SQL user-defined functions (UDFs). You can create multiple datashares for a given database. A datashare can contain objects from multiple schemas in the database on which sharing is created.

By having this flexibility in sharing data, you get fine-grained access control. You can tailor this control for different users and businesses that need access to Amazon Redshift data.

Managing data consistency in Amazon Redshift

Amazon Redshift provides transactional consistency on all producer and consumer clusters and shares up-to-date and consistent views of the data with all consumers.

You can continuously update data on the producer cluster. All queries on a consumer cluster within a transaction read the same state of the shared data. Amazon Redshift doesn't consider the data that was changed by another transaction on the producer cluster that was committed after the beginning of the transaction on the consumer cluster. After the data change is committed on the producer cluster, new transactions on the consumer cluster can immediately query the updated data.

The strong consistency removes the risks of lower-fidelity business reports that might contain invalid results during sharing of data. This factor is especially important for financial analysis or where the results might be used to prepare datasets that are used to train machine learning models.