Distributed data management
Monolithic applications are typically backed by a large relational database, which defines a single data model common to all application components. In a microservices approach, such a central database would prevent the goal of building decentralized and independent components. Each microservice component should have its own data persistence layer.
Distributed data management, however, raises new challenges. As a
consequence of the
CAP
Theorem, distributed microservices architectures inherently
trade off consistency for performance and need to embrace eventual
consistency.
In a distributed system, business transactions can span multiple
microservices. Because they cannot use a single
ACID transaction, you can end up with partial executions. In this case, we would need some control logic to redo the already
processed transactions. For this purpose, the distributed
Saga
pattern is commonly used. In the case of a failed business
transaction, Saga orchestrates a series of compensating
transactions that undo the changes that were made by the preceding
transactions.
AWS Step
Functions make it easy to implement a Saga execution coordinator
as shown in the following figure.
Saga execution coordinator
Building a centralized store of critical reference data that is
curated by core data management tools and procedures provides a means for microservices to synchronize their critical data and
possibly roll back state. Using Lambda with scheduled Amazon CloudWatch Events you can build a simple cleanup and deduplication
mechanism.
It’s very common for state changes to affect more than a single
microservice. In such cases, event sourcing has proven to be a useful pattern. The core idea behind event sourcing is to represent
and persist every application change as an event record. Instead
of persisting application state, data is stored as a stream of
events. Database transaction logging and version control systems
are two well-known examples for event sourcing. Event sourcing has
a couple of benefits: state can be determined and reconstructed
for any point in time. It naturally produces a persistent audit
trail and also facilitates debugging.
In the context of microservices architectures, event sourcing
enables decoupling different parts of an application by using a
publish/subscribe pattern, and it feeds the same event data into
different data models for separate microservices. Event sourcing
is frequently used in conjunction with the Command Query Responsibility Segregation (CQRS) pattern to decouple read from write
workloads and optimize both for performance, scalability, and
security. In traditional data management systems, commands and
queries are run against the same data repository.
The following figure shows how the event sourcing pattern can be implemented
on AWS. Amazon
Kinesis Data Streams serves as the main component of the central event
store, which captures application changes as events and persists
them on Amazon S3. The figure depicts three different microservices, composed of Amazon API Gateway, AWS Lambda, and Amazon DynamoDB. The arrows indicate the flow of the events: when Microservice 1 experiences an event state change, it publishes an event by writing a message into Kinesis Data Streams. All microservices run their own Kinesis Data Streams application in AWS Lambda which reads a copy of the message, filters it based on relevancy for the microservice, and possibly forwards it for further processing. If your function returns an error, Lambda retries the batch until processing succeeds or the data expires. To avoid stalled shards, you can configure the event source mapping to retry with a smaller batch size, limit the number of retries, or discard records that are too old. To retain discarded events, you can configure the event source mapping to send details about failed batches to an Amazon Simple Queue Service (Amazon SQS) queue or Amazon Simple Notification Service (Amazon SNS) topic.
Event sourcing pattern on AWS
Amazon S3 durably stores all events across all microservices and is the single source of truth when it comes to debugging, recovering application state, or auditing application changes. There are two primary reasons why records may be delivered more than one time to your Kinesis Data Streams application: producer retries and consumer retries. Your application must anticipate and appropriately handle processing individual records multiple times.