Onboarding and granting access
This guide’s data lake reference architecture helps you to independently scale data producers and data consumers, in addition to defining and establishing a consistent process for onboarding and granting access to those data consumers.
The following sections describe the onboarding process for data producers and data consumers and how to grant access in a data consumer account. This guide uses the named resource method between the centralized catalog and data consumers. The process for the LF-TBAC method is similar but slightly different. We recommend that you evaluate and configure these approaches to meet your organization’s data governance practices and policies.
For more information about these two methods, see the Centralized catalog section of this guide.
Onboarding data producers
The following diagram shows how to onboard a new data producer to your data lake.

The diagram shows the following onboarding process:
-
The data producer selectively provides the centralized catalog with access to its data (for example, an Amazon Simple Storage Service (Amazon S3) bucket and AWS KMS key). Access is provided to the centralized catalog's AWS Identity and Access Management (IAM) principals to register the data producer’s data lake location in AWS Lake Formation and the IAM principals used to maintain the data producer's catalog.
-
Register the data producer’s data lake location (for example, an S3 bucket) that uses the centralized catalog’s Lake Formation.
-
Create the database, tables, and table schemas for the new data from the data producer in the AWS Glue Data Catalog.
Onboarding data consumers
The following diagram shows how to onboard a new data consumer to your data lake.

The diagram shows the following onboarding process:
-
The data consumer requests approval to view the data producer's data and specifies the data that it needs to access.
-
The data producer’s data steward reviews the request from the data consumer and evaluates whether to:
-
Share some or all tables in the requested databases. We recommend database-level sharing when there are no data security implications of sharing all tables with the data consumer, which helps avoid the management overhead of table-level sharing.
-
Share at the data consumer's organization, OU, or account level.
-
-
When approved by the data producer, the required Data Catalog resources are shared with the data consumer in the centralized catalog.
-
Resource links can be created in the data consumer's account by using Lake Formation and then point to the shared Data Catalog resources in the centralized catalog.
After the onboarding process is complete, the data consumer's Lake Formation administrator can see the database catalog resource from the centralized catalog and the resource link. At this stage, no one else in the data consumer's account can access the data producer's data.
Grant Select access in a data consumer account
The following diagram shows the process for granting Select
access to shared data resources with a local IAM principal in
the data consumer account. The local IAM principal can be the IAM role for individual users
or an IAM role that is consumed by specific AWS services.
Note
When the data being shared is of low sensitivity, you can delegate access granting to the data consumer itself without requiring approval from the data producer. This is because trust and sharing are already established between them.

The diagram shows the following process:
-
The individual IAM principal in the data consumer account requests
Select
access to the resource link from the IAM principal in the data consumer account. -
The data producer’s data steward reviews the request from the data consumer and provides approval if all requirements are met.
-
Select
access is granted and this allows the IAM principal to consume the requested data.