Menu
Microservices on AWS
AWS Whitepaper

Distributed Data Management

Monolithic applications are typically backed by a large relational database, which defines a single data model common to all application components. In a microservices approach, such a central database would prevent the goal of building decentralized and independent components. Each microservice component should have its own data persistence layer.

Distributed data management, however, raises new challenges. As explained by the CAP Theorem, distributed microservices architectures inherently trade off consistency for performance and need to embrace eventual consistency.

Building a centralized store of critical reference data that is curated by master data management tools and procedures provides a means for microservices to synchronize their critical data and possibly roll back state. Using AWS Lambda with scheduled Amazon CloudWatch Events you can build a simple cleanup and deduplication mechanism.

It’s very common for state changes to affect more than a single microservice. In those cases, event sourcing has proven to be a useful pattern. The core idea behind event sourcing is to represent and persist every application change as an

event record. Instead of persisting application state, data is stored as a stream of events. Database transaction logging and version control systems are two well- known examples for event sourcing. Event sourcing has a couple of benefits: state can be determined and reconstructed for any point in time. It naturally produces a persistent audit trail and also facilitates debugging.

In the context of microservices architectures, event sourcing enables decoupling different parts of an application by using a publish/subscribe pattern, and it feeds the same event data into different data models for separate microservices. Event sourcing is frequently used in conjunction with the CQRS pattern (Command, Query, Responsibility, Segregation) to decouple read from write workloads and optimize both for performance, scalability, and security. In traditional data management systems, commands and queries are run against the same data repository.

The following figure shows how the event sourcing pattern can be implemented on AWS. Amazon Kinesis Streams serves as the main component of the central event store that captures application changes as events and persists them on Amazon S3.

Note

Kinesis Streams enables you to build custom applications that process or analyze streaming data for specialized needs. Kinesis Streams can continuously capture and store terabytes of data per hour from hundreds of thousands of sources, such as website clickstreams, financial transactions, social media feeds, IT logs, and location-tracking events.

The following figure depicts three different microservices composed of Amazon API Gateway, Amazon EC2, and Amazon DynamoDB. The blue arrows indicate the flow of the events: when microservice 1 experiences an event state change, it publishes an event by writing a message into Kinesis Streams. All microservices run their own Kinesis Streams application on a fleet of EC2 instances that read a copy of the message, filter it based on relevancy for the microservice, and possibly forward it for further processing.

Amazon S3 durably stores all events across all microservices and is the single source of truth when it comes to debugging, recovering application state, or auditing application changes.