Comparing performance of interactive and asynchronous workloads - AWS Lambda

Comparing performance of interactive and asynchronous workloads

A distributed systems application consists of multiple services communicating using messages over a network. Due to network delays, traffic, message retries, system failovers, and individual performance profiles of the services, the time taken to complete a unit of work can vary.

Instead of measuring the performance against an average number, it can be more helpful to measure the outliers. AWS X-Ray reports show a response distribution histogram that helps to identify the performance of outliers (see chapter 5). Using percentile metrics, you can identify the latency experienced at the p95 or p99 range, for example. This shows performance for the slowest 5% or 1% of the requests, respectively.

Performance goals are driven by use-case. In interactive workloads, invocations are triggered directly by an end user event. For applications such as web apps or mobile apps, the round-trip performance of requests is directly experienced by an end user. If a user makes an API request to the application’s backend, this synchronously calls downstream services before returning a response. For these types of applications, round-trip latency is important to optimize to improve the user experience.

In many interactive, synchronous workloads, you may be able to rearchitect the design to use a reactive asynchronous approach. In this case, the initial API call persists the request in an SQS queue and immediately responds with an acknowledgement to the caller. If you are using API Gateway, this can be completed by using a service integration instead of a Lambda function. The work continues asynchronously and the caller either polls for progress or the application uses a webhook or WebSocket to communicate the status of the request. This approach can improve the end user experience while also helping to provide greater scale and resiliency to the workload.

To learn more, read Managing backend requests and frontend notifications in serverless web apps.

For many asynchronous workloads, the cold start latency of an individual Lambda function is less significant than the overall performance in aggregate. When working with event sources such as S3 or SQS, Lambda scales up to process the traffic. Since many tasks can be processed in parallel, if a small percentage of invocations experience cold start latency, it has a negligible impact on the overall time of the processing task.