Monitoring for Lambda SnapStart
You can monitor your Lambda SnapStart functions using Amazon CloudWatch, AWS X-Ray, and the Accessing real-time telemetry data for extensions using the Telemetry API.
Note
The AWS_LAMBDA_LOG_GROUP_NAME
and AWS_LAMBDA_LOG_STREAM_NAME
environment variables are not available in Lambda SnapStart functions.
CloudWatch for SnapStart
There are a few differences with the CloudWatch log stream format for SnapStart functions:
-
Initialization logs – When a new execution environment is created, the
REPORT
doesn't include theInit Duration
field. That's because Lambda initializes SnapStart functions when you create a version instead of during function invocation. For SnapStart functions, theInit Duration
field is in theINIT_REPORT
record. This record shows duration details for the Init phase, including the duration of anybeforeCheckpoint
runtime hooks. -
Invocation logs – When a new execution environment is created, the
REPORT
includes theRestore Duration
andBilled Restore Duration
fields:-
Restore Duration
: The time it takes for Lambda to restore a snapshot, load the runtime (JVM), and run anyafterRestore
runtime hooks. The process of restoring snapshots can include time spent on activities outside the MicroVM. This time is reported inRestore Duration
. -
Billed Restore Duration
: The time it takes for Lambda to load the runtime (JVM) and run anyafterRestore
hooks. You are not charged for the time it takes to restore a snapshot.
-
Note
Duration charges apply to code that runs in the function handler, initialization code that's declared outside of the handler, the time it takes for the runtime (JVM) to load, and any code that runs in a runtime hook. For more information, see SnapStart pricing.
The cold start duration is the sum of Restore Duration
+ Duration
.
The following example is a Lambda Insights query that returns the latency percentiles for SnapStart functions. For more information about Lambda Insights queries, see Example workflow using queries to troubleshoot a function.
filter @type = "REPORT" | parse @log /\d+:\/aws\/lambda\/(?<function>.*)/ | parse @message /Restore Duration: (?<restoreDuration>.*?) ms/ | stats count(*) as invocations, pct(@duration+coalesce(@initDuration,0)+coalesce(restoreDuration,0), 50) as p50, pct(@duration+coalesce(@initDuration,0)+coalesce(restoreDuration,0), 90) as p90, pct(@duration+coalesce(@initDuration,0)+coalesce(restoreDuration,0), 99) as p99, pct(@duration+coalesce(@initDuration,0)+coalesce(restoreDuration,0), 99.9) as p99.9 group by function, (ispresent(@initDuration) or ispresent(restoreDuration)) as coldstart | sort by coldstart desc
X-Ray active tracing for SnapStart
You can use X-Ray to trace requests to Lambda SnapStart functions. There are a few differences with the X-Ray subsegments for SnapStart functions:
-
There is no
Initialization
subsegment for SnapStart functions. -
The
Restore
subsegment shows the time it takes for Lambda to restore a snapshot, load the runtime (JVM), and run anyafterRestore
runtime hooks. The process of restoring snapshots can include time spent on activities outside the MicroVM. This time is reported in theRestore
subsegment. You aren't charged for the time spent outside the microVM to restore a snapshot.
Telemetry API events for SnapStart
Lambda sends the following SnapStart events to the Telemetry API:
-
platform.restoreStart – Shows the time when the Restore phase started.
-
platform.restoreRuntimeDone – Shows whether the
Restore
phase was successful. Lambda sends this message when the runtime sends arestore/next
runtime API request. There are three possible statuses: success, failure, and timeout. -
platform.restoreReport – Shows how long the
Restore
phase lasted and how many milliseconds you were billed for during this phase.
Amazon API Gateway and function URL metrics
If you create a web API using API Gateway, then you can use the IntegrationLatency metric to measure end-to-end latency (the time between when API Gateway relays a request to the backend and when it receives a response from the backend).
If you're using a Lambda function URL, then you can use the UrlRequestLatency metric to measure end-to-end latency (the time between when the function URL receives a request and when the function URL returns a response).