Alarms and logs for tracking metrics from asynchronous endpoints

You can monitor SageMaker using Amazon CloudWatch, which collects raw data and processes it into readable, near real-time metrics. With Amazon CloudWatch, you can access historical information and gain a better perspective on how your web application or service is performing. For more information about Amazon CloudWatch, see What is Amazon CloudWatch?

Monitoring with CloudWatch

The metrics below are an exhaustive list of metrics for asynchronous endpoints and are in the the AWS/SageMaker namespace. Any metric not listed below is not published if the endpoint is enabled for asynchronous inference. Such metrics include (but are not limited to):

OverheadLatency
Invocations
InvocationsPerInstance

Common Endpoint Metrics

These metrics are the same as the metrics published for real-time endpoints today. For more information about other metrics in Amazon CloudWatch, see Monitor SageMaker with Amazon CloudWatch.

Metric Name Description Unit/Stats

Metric Name	Description	Unit/Stats
`Invocation4XXErrors`	The number of requests where the model returned a 4xx HTTP response code. For each 4xx response, 1 is sent; otherwise, 0 is sent.	Units: None Valid statistics: Average, Sum
`Invocation5XXErrors`	The number of InvokeEndpoint requests where the model returned a 5xx HTTP response code. For each 5xx response, 1 is sent; otherwise, 0 is sent.	Units: None Valid statistics: Average, Sum
`ModelLatency`	The interval of time taken by a model to respond as viewed from SageMaker. This interval includes the local communication times taken to send the request and to fetch the response from the container of a model and the time taken to complete the inference in the container.	Units: Microseconds Valid statistics: Average, Sum, Min, Max, Sample Count

Invocation4XXErrors

The number of requests where the model returned a 4xx HTTP response code. For each 4xx response, 1 is sent; otherwise, 0 is sent.

Units: None

Valid statistics: Average, Sum

Invocation5XXErrors

The number of InvokeEndpoint requests where the model returned a 5xx HTTP response code. For each 5xx response, 1 is sent; otherwise, 0 is sent.

Units: None

Valid statistics: Average, Sum

ModelLatency

The interval of time taken by a model to respond as viewed from SageMaker. This interval includes the local communication times taken to send the request and to fetch the response from the container of a model and the time taken to complete the inference in the container.

Units: Microseconds

Valid statistics: Average, Sum, Min, Max, Sample Count

Asynchronous Inference Endpoint Metrics

These metrics are published for endpoints enabled for asynchronous inference. The following metrics are published with the EndpointName dimension:

Metric Name	Description	Unit/Stats
`ApproximateBacklogSize`	The number of items in the queue for an endpoint that are currently being processed or yet to be processed.	Units: Count Valid statistics: Average, Max, Min
`ApproximateBacklogSizePerInstance`	Number of items in the queue divided by the number of instances behind an endpoint. This metric is primarily used for setting up application autoscaling for an async-enabled endpoint.	Units: Count Valid statistics: Average, Max, Min
`ApproximateAgeOfOldestRequest`	Age of the oldest request in the queue.	Units: Seconds Valid statistics: Average, Max, Min
`HasBacklogWithoutCapacity`	The value of this metric is `1` when there are requests in the queue but zero instances behind the endpoint. The value is `0` at all other times. You can use this metric for autoscaling your endpoint up from zero instances upon receiving a new request in the queue.	Units: Count Valid statistics: Average

The following metrics are published with the EndpointName and VariantName dimensions:

Metric Name	Description	Unit/Stats
`RequestDownloadFailures`	When an inference failure occurs due to an issue downloading the request from Amazon S3.	Units: Count Valid statistics: Sum
`ResponseUploadFailures`	When an inference failure occurs due to an issue uploading the response to Amazon S3.	Units: Count Valid statistics: Sum
`NotificationFailures`	When an issue occurs publishing notifications.	Units: Count Valid statistics: Sum
`RequestDownloadLatency`	Total time to download the request payload.	Units: Microseconds Valid statistics: Average, Sum, Min, Max, Sample Count
`ResponseUploadLatency`	Total time to upload the response payload.	Units: Microseconds Valid statistics: Average, Sum, Min, Max, Sample Count
`ExpiredRequests`	Number of requests in the queue that fail due to reaching their specified request TTL.	Units: Count Valid statistics: Sum
`InvocationFailures`	If an invocation fails for any reason.	Units: Count Valid statistics: Sum
`InvocationsProcesssed`	Number of async invocations processed by the endpoint.	Units: Count Valid statistics: Sum
`TimeInBacklog`	Total time the request was queued before being processed. This does not include the actual processing time (i.e. downloading time, uploading time, model latency).	Units: Milliseconds Valid statistics: Average, Sum, Min, Max, Sample Count
`TotalProcessingTime`	Time the inference request was recieved by SageMaker to the time the request finished processing. This includes time in backlog and time to upload and send response notifications, if any.	Units: Milliseconds Valid statistics: Average, Sum, Min, Max, Sample Count

Amazon SageMaker Asynchronous Inference also includes host-level metrics. For information on host-level metrics, see SageMaker Jobs and Endpoint Metrics.

Logs

In addition to the Model container logs that are published to Amazon CloudWatch in your account, you also get a new platform log for tracing and debugging inference requests.

The new logs are published under the Endpoint Log Group:


/aws/sagemaker/Endpoints/[EndpointName]

The log stream name consists of:


[production-variant-name]/[instance-id]/data-log.

Log lines contain the request’s inference ID so that errors can be easily mapped to a particular request.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Delete

Check prediction results