OPS04-BP01 Implement application telemetry
Application telemetry is the foundation for observability of your workload. Your application should emit telemetry that provides insight into the state of the application and the achievement of business outcomes. From troubleshooting to measuring the impact of a new feature, application telemetry informs the way you build, operate, and evolve your workload.
Application telemetry consists of metrics and logs. Metrics are diagnostic information, such as your pulse or temperature. Metrics are used collectively to describe the state of your application. Collecting metrics over time can be used to develop baselines and detect anomalies. Logs are messages that the application sends about its internal state or events that occur. Error codes, transaction identifiers, and user actions are examples of events that are logged.
Desired Outcome:
-
Your application emits metrics and logs that provide insight into its health and the achievement of business outcomes.
-
Metrics and logs are stored centrally for all applications in the workload.
Common anti-patterns:
-
Your application doesn't emit telemetry. You are forced to rely upon your customers to tell you when something is wrong.
-
A customer has reported that your application is unresponsive. You have no telemetry and are unable to confirm that the issue exists or characterize the issue without using the application yourself to understand the current user experience.
Benefits of establishing this best practice:
-
You can understand the health of your application, the user experience, and the achievement of business outcomes.
-
You can react quickly to changes in your application health.
-
You can develop application health trends.
-
You can make informed decisions about improving your application.
-
You can detect and resolve application issues faster.
Level of risk exposed if this best practice is not established: High
Implementation guidance
Implementing application telemetry consists of three steps: identifying a location to store telemetry, identifying telemetry that describes the state of the application, and instrumenting the application to emit telemetry.
Customer example
AnyCompany Retail has a microservices based architecture. As part of their architectural design process, they identified application telemetry that would help them understand the state of each microservice. For example, the user cart service emits telemetry about events like add to cart, abandon cart, and length of time it took to add an item to the cart. All microservices log errors, warnings, and transaction information. Telemetry is sent to Amazon CloudWatch for storage and analysis.
Implementation steps
-
Identify a central location for telemetry storage for the applications in your workload. The location should support both collection of telemetry and analysis capabilities. Anomaly detection and automated insights are recommended features.
-
Amazon CloudWatch
provides telemetry collection, dashboards, analysis, and event generation capabilities.
-
-
To identify what telemetry you need, start by answering this question: what is the state of my application? Your application should emit logs and metrics that collectively answer this question. If you can’t answer the questions with the existing application telemetry, work with business and engineering stakeholders to create a list of telemetry requirements.
-
You can request expert technical advice from your AWS account team as you identify and develop new application telemetry.
-
-
Once the additional application telemetry has been identified, work with your engineering stakeholders to instrument your application.
-
The AWS Distro for Open Telemetry
provides APIs, libraries, and agents that collect application telemetry. This example demonstrates how to instrument a JavaScript application with custom metrics . -
If you want to understand the observability services that AWS offers, work through the One Observability Workshop
or request support from your AWS account team. -
For a deeper dive into application telemetry, read the Instrumenting distributed systems for operational visibility
article in the Amazon Builder’s Library, which explains how Amazon instruments applications and can serve as a guide for developing your own instrumentation guidelines.
-
Level of effort for the implementation plan: High. Instrumenting your application and centralizing telemetry storage can take significant investment.
Resources
Related best practices:
OPS04-BP02 Implement and configure workload telemetry – Application telemetry is a component of workload telemetry. In order to understand the health of the overall workload you need to understand the health of individual applications that make up the workload.
OPS04-BP03 Implement user activity telemetry – User activity telemetry is often a subset of application telemetry. User activity like add to cart events, click streams, or completed transactions provide insight into the user experience.
OPS04-BP04 Implement dependency telemetry – Dependency checks are related to application telemetry and may be instrumented into your application. If your application relies on external dependencies like DNS or a database your application can emit metrics and logs on reachability, timeouts, and other events.
OPS04-BP05 Implement transaction traceability – Tracing transactions across a workload requires each application to emit information about how they process shared events. The way individual applications handle these events is emitted through their application telemetry.
OPS08-BP02 Define workload metrics – Workload metrics are the key health indicators for your workload. Key application metrics are a part of workload metrics.
Related documents:
-
AWS Builders Library – Instrumenting Distributed Systems for Operational Visibility
-
AWS Well-Architected Operational Excellence Whitepaper – Design Telemetry
-
Monitoring application health and performance with AWS Distro for OpenTelemetry
-
New – How to better monitor your custom application metrics using Amazon CloudWatch Agent
-
Start Building – How to Monitor your Applications Effectively
Related videos:
Related examples:
Related services: