Analytics architecture - Data Warehousing on AWS

This whitepaper is for historical reference only. Some content might be outdated and some links might not be available.

Analytics architecture

Analytics pipelines are designed to handle large volumes of incoming streams of data from heterogeneous sources such as databases, applications, and devices.

A typical analytics pipeline has the following stages:

  1. Collect data

  2. Store the data

  3. Process the data

  4. Analyze and visualize the data

Analytics Pipeline

Analytics pipeline

Data collection

At the data collection stage, consider that you probably have different types of data, such as transactional data, log data, streaming data, and Internet of Things (IoT) data. AWS provides solutions for data storage for each of these types of data.

Log data

Reliably capturing system-generated logs helps you troubleshoot issues, conduct audits, and perform analytics using the information stored in the logs. Amazon S3 is a popular storage solution for non-transactional data, such as log data, that is used for analytics. Because it provides 99.999999999 percent durability, S3 is also a popular archival solution.

Streaming data

Web applications, mobile devices, and many software applications and services can generate staggering amounts of streaming data—sometimes terabytes per hour—that need to be collected, stored, and processed continuously. Using Amazon Kinesis services, you can do that simply and at a low cost. Alternatively, you can use Amazon Managed Streaming for Apache Kafka (Amazon MSK) to run applications that use Apache Kafka to process streaming data. With Amazon MSK, you can use native Apache Kafka application programming interfaces (APIs) to populate data lakes, stream changes to and from databases, and power ML and analytics applications.

IoT data

Devices and sensors around the world send messages continuously. Enterprises today need to capture this data and derive intelligence from it. Using AWS IoT, connected devices interact easily and securely with the AWS Cloud. Use AWS IoT to leverage AWS services like AWS Lambda, Amazon Kinesis Services, Amazon S3, Amazon Machine Learning, and Amazon DynamoDB to build applications that gather, process, analyze, and act on IoT data, without having to manage any infrastructure.