Standardizing a debugging approach - AWS Lambda

Standardizing a debugging approach

Once a Lambda-based application is deployed, monitor and capture errors as they occur (see Monitoring and observability) and then start the debugging process. This section introduces a general approach to debugging Lambda-based applications. You can use this to identify the causes of errors and then use the learnings to make your workloads more resilient.


            debugging ops figure 17

The first step is to observe an error. By monitoring a workload effectively (see Monitoring and Observability), you can identify abnormal behavior in the system. These anomalies may be seen as:

  • Lambda function errors showing a problem in completing an invocation.

  • CloudWatch alarms and other monitoring tools that are either triggered due to exceptional values or other operational anomalies.

  • Changes in expected activity, like unexpected drops in traffic, or large increases in load.

  • Issue reports from end users that have not been otherwise identified in the infrastructure.

  • Issues in third-party services, such as payment systems, that are impacting your application.

Next, identify the point of failure in the application architecture. By using an effective approach to monitoring a workload, you can quickly isolate the cause of the problem. If it is related to a Lambda function, it’s important to find the event source, the event, and the Lambda function processing the event.

Debug the failure by first classifying the type of error, such as lack of permissions, lack of capacity, a downstream outage, or a business logic error. The course of action taken here depends upon the type of failure identified. In the event of a business logic error in the Lambda function code, debug using the event related to the error. This chapter explores some typical causes of errors in Lambda functions and how you can identify and resolve these.

Finally, remediate the problem. This may include identifying causes of change in the infrastructure and taking action to prevent such changes in the future. Alternatively, it may be a code change requiring a new version of a Lambda function. In any case, the remediation should be documented, and you should ensure that monitoring processes are updated if necessary. If an error is reported by end users but otherwise not identified by your infrastructure, you may be able to improve error-handling logic in the front-end and backend APIs to alert you when these errors occur. You should aim to collect as much detail about the error as possible, including stack traces.