AWS offerings for data mesh - AWS Prescriptive Guidance

AWS offerings for data mesh

Use the capabilities of analytics on AWS to the build the data mesh–based data solution for your organization. The analytics on AWS resource recommends several AWS services to build data mesh at low cost without compromising on performance. Customers have adopted the following options for building a data mesh–based solution:

  • Implement data mesh by using Amazon DataZone

  • Implement data mesh by using open source frameworks on AWS such as data.all

  • Implement data mesh by using AWS Lake Formation

These three options use the following AWS services:

The Amazon DataZone option also uses Amazon EventBridge.

The data.all and AWS Lake Formation options also use the following AWS services and resources:

The AWS services that you use in your implementation might differ, based on your organization's requirements.

Amazon DataZone

If you want to use a fully managed service, consider using Amazon DataZone to implement data mesh for your organization. Amazon DataZone is a data management service for cataloging, discovering, sharing, and governing data stored across AWS, on premises, and third-party sources. The following diagram shows a data mesh reference architecture based on Amazon DataZone.

Multiple producer and consumer accounts with a central governance account and Amazon DataZone.

In the reference architecture, the member accounts belong to the data domains. They're grouped into data producers and data consumers. The architecture diagram contains following components:

  1. The data producers publish data products in the business catalog provided by the Amazon DataZone data portal. The data portal is hosted in the central governance account.

  2. Data consumers (users) log in to the data portal by using their AWS credentials or single sign-on credentials. They can browse the catalog and search for the data products of their interest by using keywords. They can filter the search results.

  3. After the data users belonging to the consumer teams find the data product of their interest, they can request access to the data. Amazon DataZone has a built-in access-management workflow that the data owner uses to review and approve the request.

  4. The data consumer teams can consume the data to empower their artificial intelligence and machine learning (AI/ML), analytics and reporting, and extract, transform, and load (ETL) use cases.

Data.all

If you understand open source and want to build and manage your own solution, consider using open source frameworks such as data.all. Data.all is a modern data marketplace that supports collaboration among diverse users. Data.all simplifies data discovery, sharing, and granular data access management while builders use the AWS portfolio of data and analytics services. The following diagram shows a data mesh reference architecture based on data.all.

Multiple producer and consumer accounts with a central governance account and data.all.

The architecture diagram contains following components:

  1. The data producers publish data products in the catalog provided by the data.all frontend. The frontend and backend of data.all are hosted in the central governance account.

  2. Data consumers (users) log in to the data.all frontend by using their single sign-on or Amazon Cognito credentials. They can browse the catalog and search for the data products of their interest. They can filter the search results.

  3. After the data users belonging to the consumer teams find the data product of their interest, they can request access the data. Data.all has a built-in access-management workflow that the data owner uses to review and approve access requests.

  4. The consumer teams can consume the data to empower their AI/ML, analytics and reporting, and ETL use cases.

AWS Lake Formation

If you want to build a custom data mesh solution from the ground up and manage it, consider using AWS Lake Formation. Lake Formation helps you centrally govern, secure, and globally share data for analytics and machine learning. The following diagram shows a data mesh reference architecture based on Lake Formation.

Multiple producer and consumer accounts with a central governance account and Lake Formation

The architecture diagram contains following components:

  1. The data producers publish data products in the AWS Glue Data Catalog of the central governance account. AWS Lake Formation manages access to the entities of the central Data Catalog.

  2. After access is granted, the consumer teams can consume the data to empower their AI/ML, analytics and reporting, and ETL use-cases.