What is a modern data streaming architecture? - Build Modern Data Streaming Architectures on AWS

What is a modern data streaming architecture?

A modern data streaming architecture allows you to ingest, process, and analyze high volumes of high-velocity data from a variety of sources in real-time to build more reactive and intelligent customer experiences. The modern streaming data architecture can be designed as a stack of five logical layers; each layer is composed of multiple purpose-built components that address specific requirements. The following diagram illustrates the modern streaming data architecture.


      A diagram depicting modern data streaming architecture on AWS.

Modern data streaming architecture on AWS

The modern data streaming architecture includes the following key components:

  • Source - Your source of streaming data includes data sources like sensors, social media, IoT devices, log files generated by using your web and mobile applications, mobile devices that generates semi-structured and unstructured data as continuous streams at high velocity.

  • Stream storage - The stream storage layer is responsible for providing scalable and cost-effective components to store streaming data. The streaming data can be stored in the order it was received for a set duration of time, and can be replayed indefinitely during that time.

  • Stream ingestion - The stream ingestion layer is responsible for ingesting data into the stream storage layer. It provides the ability to collect data from tens of thousands of data sources and ingest in near real-time. .

  • Stream processing - The stream processing layer is responsible for transforming data into a consumable state through data validation, cleanup, normalization, transformation, and enrichment. The streaming records are read in the order they are produced, allowing for real-time analytics, building event driven applications, or streaming ETL.

  • Destination - The destination layer is like a purpose-built destination depending upon your use case. Your destination can be an event driven application, data lake, data warehouse, database, or an OpenSearch.

Refer to the following diagram for an example of the streaming data lifecycle on AWS:


      A diagram depicting an example of the streaming data lifecycle on AWS.

Streaming data lifecycle on AWS

The streaming data lifecycle is segmented into the layers of modern data streaming architecture:

  • Stream sources - Streaming data sources can be application and click stream logs, mobile apps, existing transactional relational and NoSQL databases, IoT sensors and social media.

  • Stream ingestion - AWS IoT for ingesting IoT devices data into Kinesis Data Streams and Amazon MSK, Kinesis Agent for ingesting streaming data into Kinesis Data Streams and Amazon Data Firehose, AWS SDK for custom producers, AWS DMS for change data capture use cases and Amazon MSK connect for continuous ingest from files, change data capture from databases.

  • Stream storage - You can use Kinesis Data Streams, Amazon MSK and Apache Kafka on Amazon EC2 for your stream storage depending upon your use case. See Table 2 for additional details.

  • Stream processing - You can use Managed Service for Apache Flink for advanced streaming use cases with multiple destinations and stateful stream processing. AWS Lambda is good for event-based and stateless processing, and use cases like filtering, enrichments, and transformations. Use Amazon EMR to use your favorite open-source big data frameworks and use AWS Glue if you are already using AWS Glue or Apache Spark where you need to process data in batch, steaming, and event modes and you want to build your streaming jobs visually.

  • Downstream destinations - Your destination can be databases, data warehouses, purpose-built systems such as OpenSearch services, data lakes, event driven applications, and various third-party integrations.

This diagram represents the modern streaming reference architecture on AWS.


      A reference architecture diagram depicting modern data streaming architecture on AWS.

Streaming reference architecture on AWS

For more details about this architecture, refer to Streaming reference architecture in the AWS Well-Architected Data Analytics Lens.