Introduction

AWS Lambda is a flexible service designed for a wide variety of use-cases. Across the millions of AWS customers using Lambda every month, serverless applications generally fall into several common categories:

Web applications: serve the front-end code via Amazon S3 and Amazon CloudFront, or automating the entire deployment and hosting with AWS Amplify Console.
Web and mobile backends: the front-ends interact with the backend via API Gateway. Integrated authorization and authentication are provided by Amazon Cognito or APN Partners like Auth0.
Data processing: event-based processing tasks triggered by data changes in data stores, or streaming data ETL tasks with Amazon Kinesis and Lambda.
Parallelized computing tasks: splitting highly complex, long-lived computations to individual tasks across many Lambda function instances to process data more quickly in parallel.
Internet of Things (IoT) workloads: processing data generated by physical IoT devices.

Additionally, many workloads are hybrid serverless applications, especially where legacy systems are being migrated from either on-premises or instance-based environments. In this case, developers can gradually migrate functionality from a legacy system to a Lambda-based application.

This guide is built for developers and operators of Lambda-based applications. It is aimed at operators of typical production serverless applications, looking to understand more clearly how to build, measure, troubleshoot, and optimize their compute processes. This guide covers concepts and best practices for designing Lambda-based applications, together with an approach for ongoing monitoring and troubleshooting.

Serverless applications can include a wide variety of different AWS services to manage APIs, messaging, storage, and content distribution. Most of these applications rely upon Lambda for connecting these services and transforming the data throughout an application. This guide focuses on the role of Lambda in these architectures, and how you can fine-tune your functions and their configurations to maximize reliability and maintainability, and reduce cost.

Lambda-based applications are event-driven architectures with many of the characteristics of distributed systems. While Lambda handles many of the complex tasks like scaling and infrastructure management, it’s important for operators to understand the scope of knowledge that they need to manage serverless applications successfully.

As AWS customers adopt Lambda to solve many of their most challenging workloads, understanding the troubleshooting and monitoring tasks involved is the key to becoming a proficient operator. Both start-ups and enterprises develop Lambda-based applications for green field applications and legacy applications. As these applications develop features and build traffic, many of the same best practices for operations apply.

This guide covers many of the most important operational best practices and advice while explaining core topics underpinning how Lambda-based applications work. The first half of this guide provides a deep dive into foundational topics around event-driven architectures, application design, and security. The latter half covers guidance for operators around debugging, monitoring and observability, and performance optimization. The goal is to provide a concrete, actionable approach to operating and troubleshooting Lambda-based applications.

These are the topics covered in detail:

Event-driven architectures: understanding how events drive serverless applications informs the design of your workload. This chapter explains: how Lambda fits into this paradigm; the benefits and tradeoffs of event-driven architectures; design principles, stateless design, idempotency, and message ordering; retry behaviors; using AWS services; avoiding common anti-patterns.
Application design: topics include: understanding quotas; scaling and concurrency; choosing and managing runtimes and SDKs; networking and VPC configurations; comparing synchronous versus asynchronous invocations; controlling traffic flow for non-serverless services.
Security: security is the primary concern at AWS, but all developers have a role to play in developing secure applications. This covers: the shared responsibility model; applying the principles of least privilege; handling sensitive data; IAM roles and resource policies; authorization and authentication; code signing; protecting applications with public endpoints.
Debugging: the process of identifying errors in software is critical to any production workload. Topics include: standardizing a debugging approach; capturing and replaying events; troubleshooting executions, networking, and deployments; identifying common causes of errors (memory configurations, timeouts, quotas, third-party libraries, and unintended leakage between invocations).
Monitoring and observability: serverless applications have parallels with distributed applications for monitoring and observability, presenting challenges to new serverless operators. This chapter explains: instrumentation best practices; CloudWatch Logs (using Insights and AWS Resource Groups); tracing with X-Ray; alerts and automation; code storage optimization.
Performance optimization: while Lambda manages running and scaling functions, there are many levers available to developers that influence the performance. Topics include: cold starts and latency, package sizes and dependencies, memory and power settings; performance and cost; maximizing throughput.

This guide will be revised regularly to incorporate new Lambda features and AWS services as they are released. If you have any questions or comments about any of the content in this guide, raise an issue in the GitHub repository or contact the author.

James Beswick AWS Serverless Developer Advocate jbeswick@amazon.com

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Event-driven architectures