Monitor a serverless endpoint - Amazon SageMaker

Monitor a serverless endpoint

To monitor your serverless endpoint, you can use Amazon CloudWatch alarms. CloudWatch is a service that collects metrics in real time from your AWS applications and resources. An alarm watches metrics as they are collected and gives you the ability to pre-specify a threshold and the actions to take if that threshold is breached. For example, your CloudWatch alarm can send you a notification if your endpoint breaches an error threshold. By setting up CloudWatch alarms, you gain visibility into the performance and functionality of your endpoint. For more information about CloudWatch alarms, see Using Amazon CloudWatch alarms in the Amazon CloudWatch User Guide.

Monitoring with CloudWatch

The metrics below are an exhaustive list of metrics for serverless endpoints. Any metric not listed below is not published for serverless endpoints. For information about the following metrics, see Monitor Amazon SageMaker with Amazon CloudWatch.

Common endpoint metrics

These CloudWatch metrics are the same as the metrics published for real-time endpoints.

The OverheadLatency metric tracks all additional latency that SageMaker added which includes the cold start time for launching new compute resources for your serverless endpoint. Compared to on-demand serverless endpoints, the OverheadLatency for serverless endpoints with provision concurrency is generally significantly less.

Serverless endpoints can also use the Invocations4XXErrors, Invocations5XXErrors, Invocations, ModelLatency, ModelSetupTime and MemoryUtilization metrics. To learn more about these metrics, see SageMaker Endpoint Invocation Metrics.

Common serverless endpoint metrics

These CloudWatch metrics are published for both on-demand serverless endpoints and serverless endpoint with Provisioned Concurrency.

Metric Name Description Unit/Stats
ServerlessConcurrentExecutionsUtilization The number of concurrent executions divided by the maximum concurrency.

Units: None

Valid statistics: Average, Max, Min

Serverless endpoint with Provisioned Concurrency metrics

These CloudWatch metrics are published for serverless endpoints with Provisioned Concurrency.

Metric Name Description Unit/Stats
ServerlessProvisionedConcurrencyExecutions The number of concurrent executions handled by the endpoint.

Units: Count

Valid statistics: Average, Max, Min

ServerlessProvisionedConcurrencyUtilization The number of concurrent executions divided by the allocated Provisioned Concurrency.

Units: None

Valid statistics: Average, Max, Min

ServerlessProvisionedConcurrencyInvocations The number of InvokeEndpoint requests handled by Provisioned Concurrency.

Units: Count

Valid statistics: Average, Max, Min

ServerlessProvisionedConcurrencySpilloverInvocations The number of InvokeEndpoint requests not handled by Provisioned Concurrency, that is handled by on-demand Serverless Inference.

Units: Count

Valid statistics: Average, Max, Min

Logs

If you want to monitor the logs from your endpoint for debugging or progress analysis, you can use Amazon CloudWatch Logs. The SageMaker-provided log group that you can use for serverless endpoints is /aws/sagemaker/Endpoints/[EndpointName]. For more information about using CloudWatch Logs in SageMaker, see Log Amazon SageMaker Events with Amazon CloudWatch. To learn more about CloudWatch Logs, see What is Amazon CloudWatch Logs? in the Amazon CloudWatch Logs User Guide.