Understanding the workload - AWS Prescriptive Guidance

Understanding the workload

In order to apply the framework, start by understanding the workload that you want to analyze. A system architecture diagram provides a starting point for documenting the most relevant details of the system. However, trying to analyze an entire workload can be complex, because many systems have numerous components and interactions. Instead, we recommend that you focus on user stories, which are informal, general explanations of software features written from the perspective of the end user. Their purpose is to articulate how a software feature provides value to the customer. You can then model these user stories with architecture diagrams and data flow diagrams to make it easier to assess the technical components that provide the described business functionality. For example, an in-app mobile game purchasing solution might have two user stories, “buying in-app credits” and “obtaining in-app refunds,” as shown in the following diagram. (This example architecture highlights how you can decompose a system into user stories; it's not intended to represent a highly resilient application.)

In-app purchasing system with two user stories

Each user story consists of four common components: code and config, infrastructure, data stores, and external dependencies. Your diagrams should include all these components and reflect the interactions among the components. For example, if there is excessive load on your Amazon API Gateway endpoint, consider how that load cascades to other components in the system, such as your AWS Lambda functions or Amazon DynamoDB tables. Tracking these interactions helps you understand how the failure mode can impact the user story. You can capture this flow visually with a data flow diagram or by using simple flow arrows in an architecture diagram, as in the previous illustration. For each component, consider capturing details such as the type of information that's being transmitted, the information that's received, whether the communication is synchronous or asynchronous, and which fault boundaries are being crossed. In the example, the DynamoDB tables are shared in both user stories, as you can see by the arrows indicating that the Lambda component in the in-app refunds story accesses the DynamoDB tables in the in-app purchasing story. This means that a failure that's caused by the in-app purchasing user story could cascade to the in-app refunds user story as a result of shared fate.

In addition, it's important to understand the baseline configuration for each component. The baseline configuration identifies constraints such as the average and maximum number of transactions per second, the maximum size of a payload, a client timeout, and default or current service quotas for the resource. If you are modeling a new design, we recommend that you document the functional requirements for the design and consider the limits. This helps you understand how failure modes could manifest in the component.

Finally, you should prioritize user stories based on the business value they provide. This prioritization helps you focus on your workload's most critical functionality first. You can then focus your analysis on the workload components that are part of the critical path for that functionality, and realize value from utilizing the framework more quickly. As you iterate through the process, you can examine additional user stories at different priorities.