Data consumers
Data consumers consume the data from the data producer after the centralized catalog shares it using AWS Lake Formation. The following diagram shows two data consumers in the data lake.

There are two types of data consumer: application and data-serving. The following table describes these two types.
Application type |
Application data consumers run applications in their own AWS accounts. The applications consume the AWS Identity and Access Management (IAM) roles to access the shared data from a data producer and then process it according to their logic. Typically, this type of data consumer has prescriptive data requirements to fulfill an application's needs. |
Data-serving type |
Data-serving data consumers are typically meant for individuals (for example, data analysts or data scientists) and applications (for example, a business intelligence application) that don't have their own AWS accounts. Multiple data-serving data consumers can exist in one organization’s data lake. For example, different lines of business might choose to set up their own data-serving data consumers to help users consume data from the data lake. These data consumers have their own IAM role principals configured in their AWS account (for example, IAM roles associated with AWS IAM Identity Center) that are used by end users in the data consumer account to access shared data through AWS services (for example, Amazon Athena). Typically, this type of data consumer has wide-ranging and continuously increasing data requirements. |
AWS Lake Formation is the most important AWS service used by a data consumer for cross-account data sharing and accessing the centralized catalog. After databases are shared by the centralized catalog, the shared resources are available in Lake Formation in the data consumer account. Data access can then be granted to local IAM principals in the data consumer account, with permission from the data producer, if required. The shared data can then be used by AWS services integrated with Lake Formation (for example, Amazon Athena and AWS Glue). You can use the following AWS services to access shared data in the data consumer account:
-
Amazon Athena is an interactive query service that helps directly analyze data in Amazon Simple Storage Service (Amazon S3) using standard SQL. For more information about Athena and Lake Formation, see How Athena accesses data registered with Lake Formation in the Amazon Athena documentation.
-
Amazon Redshift Spectrum helps you to efficiently query and retrieve structured and semi-structured data from files in Amazon S3 without having to load the data into Amazon Redshift tables. For more information about Redshift Spectrum and Lake Formation, see Using Redshift Spectrum with Lake Formation in the Amazon Redshift documentation.
-
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between different data stores and data streams. An AWS Glue ETL job’s associated IAM role can access the data lake data managed by Lake Formation if it has the required access permissions.
-
Amazon EMR helps run big data frameworks (for example, Apache Hadoop
and Apache Spark ) to process and analyze large amounts of data. For more information about Amazon EMR and Lake Formation, see Integrate Amazon EMR with Lake Formation in the Amazon EMR documentation. -
Amazon QuickSight is a scalable, serverless, embeddable, and machine learning (ML)-powered business intelligence service that you can use to analyze and visualize data from your data lake. For more information about QuickSight and Lake Formation, see Authorizing connections through Lake Formation in the QuickSight documentation.
-
Amazon SageMaker AI Data Wrangler (Data Wrangler) reduces the time it takes to aggregate and prepare data for ML. For more information about Data Wrangler and Lake Formation, see Prepare ML Data with Amazon SageMaker AI Data Wrangler in the Amazon SageMaker AI documentation.