Logging - AWS Prescriptive Guidance

Logging

Logging is the process of keeping data about events that occur in a system. The log can include problems, errors, or information about the current operation. Logs can be classified into different types, such as the following:

  • Events log

  • Server log

  • System log

  • Authorization and access logs

  • Audit logs

A developer can search the logs for specific error codes or patterns, filter them based on specific fields, or archive them securely for future analysis. Logs help the developer to perform root cause analysis for performance issues and also to correlate between system components.

Building an effective logging solution involves close coordination between the application and infrastructure teams. Application logs are not useful unless there is a scalable logging infrastructure that supports use cases such as parsing, filtering, buffering, and correlation of logs. Common use cases, such as generating a correlation ID, logging run time for business-critical methods, and defining log patterns, can be simplified.

Application team

An application developer must ensure that the logs generated follow logging best practices. Best practices include the following:

  • Generating correlation IDs to track unique requests

  • Logging the time taken by business-critical methods

  • Logging at an appropriate log level

  • Sharing a common logging library

When you design applications that interact with different microservices, use these logging design principles to simplify filtering and log extraction on the backend.

Generating correlation IDs to track unique requests

When the application receives the request, it can check whether a correlation ID is already present in the header. If an ID isn't present, the application should generate an ID. For example, an Application Load Balancer adds a header called X-Amzn-Trace-Id. The application can use the header to correlate the request from the load balancer to the application. Similarly, the application should inject traceId if calling dependent microservices so that logs generated by different components in a request flow are correlated.

Logging the time taken by business-critical methods

When the application receives a request, it interacts with a different component. The application should log the time taken for business-critical methods in a defined pattern. This can make it easier to parse the logs in the backend. It can also help you to generate useful insights from the logs. You can use approaches such as aspect-oriented programming (AOP) to generate such logs so that you can separate logging concerns from your business logic.

Logging at an appropriate log level

The application should write logs that have a helpful amount of information. Use log levels to categorize events by their severity. For example, use WARNING and ERROR levels for important events that need investigating. Use INFO and DEBUG for detailed tracing and high-volume events. Set log handlers to capture only the levels that are necessary in production. Generating too much logging at the INFO level isn't helpful, and it adds pressure in the backend infrastructure. DEBUG logging can be useful, but it should be used cautiously. Using DEBUG logs can generate a large volume of data, so it isn't recommended in performance-testing environment.

Sharing a common logging library

The application teams should use a common logging library, such as AWS SDK for Java, with a predefined common logging pattern that developers can be use as dependencies in their project.

Infrastructure team

DevOps engineers can reduce effort by using the following logging design principles when filtering and extracting logs on the backend. The infrastructure team must set up and support the following resources.

Log agent

A log agent (log shipper) is a program that reads logs from one location and sends them to another location. Log agents are used to read log files stored on a computer and upload log events to the backend for centralization.

Logs are unstructured data that must be structured before you can make meaningful insights from them. Log agents use parsers to read log statements and extract relevant fields such as timestamp, log level, and service name, and they structure that data into a JSON format. Having a lightweight log agent at the edge is useful because it leads to less resource utilization. The log agent can directly push to the backend, or it can use an intermediary log forwarder that pushes the data to the backend. Using a log forwarder offloads the work from the log agents at the source.

Log parser

A log parser converts the unstructured logs into structured logs. Log agent parsers also enrich the logs by adding metadata. Data parsing of the data can be done at the source (application end) or it can be done centrally. The schema for storing the logs should be extensible so that you can add new fields. We recommend using standard log formats such as JSON. However, in some cases, the logs must be transformed to JSON formats for better searching. Writing the right parser expression enables efficient transformation.

Logs backend

A logs backend service collects, ingests, and visualizes log data from various sources. The log agent can directly write to the backend or use an intermediary log forwarder. While performance testing, be sure to store the logs so that they can be searched at a later time. Store logs in the backend separately for each application. For example, use a dedicated index for an application, and use index pattern to search for logs that are spread across different related applications. We recommend saving at least 7 days of data for log searching. However, storing the data for a longer duration can result in unnecessary storage costs. Because a large volume of logs are generated during the performance test, it's important for the logging infrastructure to scale and right-size the logging backend.

Log visualization

To gain meaningful and actionable insights from application logs, use dedicated visualization tools to process and transform the raw log data into graphical representations. Visualizations such as charts, graphs, and dashboards can help uncover trends, patterns, and anomalies that might not be readily apparent when looking at the raw logs.

Key benefits of using visualization tools include the ability to correlate data across multiple systems and applications to identify dependencies and bottlenecks. Interactive dashboards support drilling down into the data at different levels of granularity to troubleshoot issues or spot usage trends. Specialized data visualization platforms provide capabilities such as analytics, alerting, and data sharing that can enhance monitoring and analysis.

By using the power of data visualization on application logs, development and operations teams can gain visibility into system and application performance. The insights derived can be used for a variety of purposes, including optimizing efficiency, improving user experience, enhancing security, and capacity planning. The end result is dashboards tailored to various stakeholders, providing at-a-glance views that summarize log data into actionable and insightful information.

Automating the logging infrastructure

Because different application have different requirements, it's important to automate the installation and operations of the logging infrastructure. Use infrastructure as code (IaC) tools to provision the logging infrastructure's backend. Then you can provision the logging infrastructure either as a shared service or as an independent bespoke deployment for a particular application.

We recommend that developers use continuous delivery (CD) pipelines to automate the following:

  • Deploy the logging infrastructure on demand and tear it down when it isn't required.

  • Deploy log agents across different targets.

  • Deploy log parser and forwarder configurations.

  • Deploy application dashboards.

Logging tools

AWS provides native logging, alarming, and dashboard services. The following are popular AWS services and resources for logging: