Designing and implementing logging and monitoring with Amazon CloudWatch

Khurram Nizami, Amazon Web Services (AWS)

April 2023 (document history)

This guide helps you design and implement logging and monitoring with Amazon CloudWatch and related Amazon Web Services (AWS) management and governance services for workloads that use Amazon Elastic Compute Cloud (Amazon EC2) instances, Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), AWS Lambda, and on-premises servers. The guide is intended for operations teams, DevOps engineers, and application engineers that manage workloads on the AWS Cloud.

Your logging and monitoring approach should be based on the six pillars of the AWS Well-Architected Framework. These pillars are operational excellence, security, reliability, performance efficiency, and cost optimization. A well-architected monitoring and alarming solution improves reliability and performance by helping you proactively analyze and adjust your infrastructure.

This guide doesn't extensively discuss logging and monitoring for security or cost-optimization because these are topics that require in-depth evaluation. There are many AWS services that support security logging and monitoring, including AWS CloudTrail, AWS Config, Amazon Inspector, Amazon Detective, Amazon Macie, Amazon GuardDuty, and AWS Security Hub. You can also use AWS Cost Explorer, AWS Budgets, and CloudWatch billing metrics for cost optimization.

The following table outlines the six areas that your logging and monitoring solution should address.

Capturing and ingesting log files and metrics	Identify, configure, and send system and application logs and metrics to AWS services from different sources.
Searching and analyzing logs	Search and analyze logs for operations management, problem identification, troubleshooting, and applications analysis.
Monitoring metrics and alarming	Identify and act on observations and trends in your workloads.
Monitoring application and service availability	Reduce downtime and improve your ability to meet service level targets by continuously monitoring service availability.
Tracing applications	Trace application requests in systems and external dependencies to fine-tune performance, perform root cause analysis, and troubleshoot issues.
Creating dashboards and visualizations	Create dashboards that focus on relevant metrics and observations for your systems and workloads, which helps continuous improvement and proactive discovery of issues.

CloudWatch can meet most logging and monitoring requirements, and provides a reliable, scalable, and flexible solution. Many AWS services automatically provide CloudWatch metrics, in addition to CloudWatch logging integration for monitoring and analysis. CloudWatch also provides agents and log drivers to support a variety of compute options such as servers (both in the cloud and on premises), containers, and serverless computing. This guide also covers the following AWS services that are used with logging and monitoring:

AWS Systems Manager Distributor, Systems Manager State Manager, and Systems Manager Automation to automate, configure, and update the CloudWatch agent for your EC2 instances and on-premises servers
Amazon OpenSearch Service for advanced log aggregation, search, and analysis
Amazon Route 53 health checks and CloudWatch Synthetics to monitor application and service availability
Amazon Managed Service for Prometheus for monitoring containerized applications at scale
AWS X-Ray for application tracing and runtime analysis
Amazon Managed Grafana to visualize and analyze data from multiple sources (for example, CloudWatch, Amazon OpenSearch Service, and Amazon Timestream)

The AWS compute services that you choose also affect the implementation and configuration of your logging and monitoring solution. For example, CloudWatch's implementation and configuration is different for Amazon EC2, Amazon ECS, Amazon EKS, and Lambda.

Application and workload owners can often forget about logging and monitoring or inconsistently configure and implement it. This means that workloads enter production with limited observability, which causes delays in identifying issues and increases the time taken to troubleshoot and resolve them. At a minimum, your logging and monitoring solution must address the systems layer for the operating system (OS)-level logs and metrics, in addition to the application layer for application logs and metrics. The guide provides a recommended approach for addressing these two layers across different compute types, including the three compute types outlined in the following table.

Long-running and immutable EC2 instances	System and application logs and metrics across multiple operating systems (OSs) in multiple AWS Regions or accounts.
Containers	System and application logs and metrics for your Amazon ECS and Amazon EKS clusters, including examples for different configurations.
Serverless	System and application logs and metrics for your Lambda functions and considerations for customization.

This guide provides a logging and monitoring solution that addresses CloudWatch and related AWS services in the following areas:

Planning your CloudWatch deployment – Considerations for planning your CloudWatch deployment and guidance on centralizing your CloudWatch configuration.
Configuring the CloudWatch agent for EC2 instances and on-premises servers – CloudWatch configuration details for system-level and application-level logging and metrics.
CloudWatch agent installation approaches for Amazon EC2 and on-premises servers – Approaches for installing the CloudWatch agent, including automated deployment using Systems Manager across multiple Regions and accounts.
Logging and monitoring on Amazon ECS – Guidance for configuring CloudWatch for cluster-level and application-level logging and metrics in Amazon ECS.
Logging and monitoring on Amazon EKS – Guidance for configuring CloudWatch for cluster-level and application-level logging and metrics in Amazon EKS.
Prometheus monitoring on Amazon EKS – Introduces and compares Amazon Managed Service for Prometheus with CloudWatch Container Insights monitoring for Prometheus.
Logging and metrics for AWS Lambda – Guidance for configuring CloudWatch for your Lambda functions.
Searching and analyzing logs in CloudWatch – Methods to analyze your logs using Amazon CloudWatch Application Insights, CloudWatch Logs Insights, and extending log analysis to Amazon OpenSearch Service.
Alarming options with CloudWatch – Introduces CloudWatch Alarms and CloudWatch Anomaly Detection and provides guidance on alarm creation and setup.
Monitoring application and service availability – Introduces and compares CloudWatch Synthetics and Route 53 health checks for automated availability monitoring.
Tracing applications with AWS X-Ray – Introduction and setup for application tracing using X-Ray for Amazon EC2, Amazon ECS, Amazon EKS, and Lambda
Dashboards and visualizations with CloudWatch – Introduction to CloudWatch Dashboards for improved observability across AWS workloads.
CloudWatch integration with AWS services – Explains how CloudWatch integrates with various AWS services.
Amazon Managed Grafana for dashboarding and visualization – Introduces and compares Amazon Managed Grafana with CloudWatch for dashboarding and visualization.

Implementation examples are used throughout this guide across these areas and are also available from the AWS Samples GitHub repository.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Targeted business outcomes