Derive insights with inside-out data movement - Derive Insights from AWS Modern Data

This whitepaper is for historical reference only. Some content might be outdated and some links might not be available.

Derive insights with inside-out data movement

To get the most from your data lakes and these purpose-built stores, you need to move data between these systems easily. For example, clickstream data from web applications can be collected directly in a data lake and a portion of that data can be moved out to a data warehouse for daily reporting. We think of this concept as inside-out data movement.

Diagram showing inside-out data movement

Inside-out data movement

Derive real time event-based visualization insights from your Lake house with Amazon Redshift and Amazon QuickSight

Customers often want to analyze their data visually as soon as data is ingested into their data lake, to make decisions with speed and agility for downstream business value.

The following diagram illustrates the Modern Data inside-out data movement with Amazon Redshift and Amazon QuickSight to perform data visualization insights.

Reference architecture diagram showing deriving real time event-based visualization insights with Amazon Redshift and Amazon QuickSight.

Derive real time event-based visualization insights from your Modern Data with Amazon Redshift and Amazon QuickSight

The steps that data follows through the architecture are as follows:

  1. Data ingestion — A new data file is uploaded in Amazon S3. An S3 event triggers an AWS Lambda function.

  2. Event trigger —Lambda triggers an AWS Glue workflow to start processing the file. Lambda updates AWS Glue Data Catalog with metadata changes.

  3. Data processing — Load transformed data into target data stores like S3 and Amazon Redshift. AWS Glue jobs push logs and notifications to Amazon CloudWatch. CloudWatch triggers a Lambda function upon AWS Glue job completion.

  4. Data analytics — Analyze the data in Amazon Redshift and the data lake (S3). Lambda calls the QuickSight ingestion API to refresh the SPICE dataset.

  5. Data visualizations — New data is reflected in QuickSight visuals. QuickSight can create a data set by combining data in Amazon Redshift and Athena. Output is stored in SPICE for fast analytics.

Derive persona-centric insights from your Modern Data with AWS Glue DataBrew, Amazon Athena, Amazon Redshift, and Amazon QuickSight

Many organizations want to get insights from exponentially growing data volumes to help them make decisions with speed and agility. They need to embrace data gravity by using both a central data lake, and a ring of purpose-built data services and data warehouses based on persona or job function.

The following diagram illustrates the Modern Data inside-out data movement with AWS Glue DataBrew, Amazon Athena, Amazon Redshift, and Amazon QuickSight to perform persona-centric data analytics.

Diagram showing how to derive persona-centric insights from your Modern Data with AWS Glue DataBrew, Amazon Athena, Amazon Redshift, and Amazon QuickSight

Derive persona-centric insights from your Modern Data with AWS Glue DataBrew, Amazon Athena, Amazon Redshift, and Amazon QuickSight

The steps that data follows through the architecture are as follows:

  1. Data ingestion — Data is ingested into Amazon S3 from different sources.

  2. Ad-hoc data processing — Data curators and data scientists use Data Brew to validate, clean, and enrich the data. Amazon Athena is also used to run ad-hoc queries to analyze the data in the lake. The transformation is shared with data engineers to set up batch processing.

  3. Batch data processing — Data engineers or developers set up batch jobs in AWS Glue and AWS Glue DataBrew. Jobs can be event-triggered, or can be scheduled to run periodically.

  4. Data analytics — Data and business analysts can now analyze prepared datasets in Amazon Redshift, or in S3 using Athena.

  5. Data visualizations — Business analysts can create visuals in QuickSight. Data curators can enrich data from multiple sources. Administrators can enforce security and data governance. Developers can embed the QuickSight dashboard in applications.