Monitor a serverless endpoint

To monitor your serverless endpoint, you can use Amazon CloudWatch alarms. CloudWatch is a service that collects metrics in real time from your AWS applications and resources. An alarm watches metrics as they are collected and gives you the ability to pre-specify a threshold and the actions to take if that threshold is breached. For example, your CloudWatch alarm can send you a notification if your endpoint breaches an error threshold. By setting up CloudWatch alarms, you gain visibility into the performance and functionality of your endpoint. For more information about CloudWatch alarms, see Using Amazon CloudWatch alarms in the Amazon CloudWatch User Guide.

Monitoring with CloudWatch

The metrics below are an exhaustive list of metrics for serverless endpoints. Any metric not listed below is not published for serverless endpoints. For information about the following metrics, see Monitor Amazon SageMaker with Amazon CloudWatch.

Common endpoint metrics

These CloudWatch metrics are the same as the metrics published for real-time endpoints.

The OverheadLatency metric tracks all additional latency that SageMaker added which includes the cold start time for launching new compute resources for your serverless endpoint. Compared to on-demand serverless endpoints, the OverheadLatency for serverless endpoints with provision concurrency is generally significantly less.

Serverless endpoints can also use the Invocations4XXErrors, Invocations5XXErrors, Invocations, ModelLatency, ModelSetupTime and MemoryUtilization metrics. To learn more about these metrics, see SageMaker endpoint invocation metrics.

Common serverless endpoint metrics

These CloudWatch metrics are published for both on-demand serverless endpoints and serverless endpoint with Provisioned Concurrency.

Metric Name Description Unit/Stats

Metric Name	Description	Unit/Stats
`ServerlessConcurrentExecutionsUtilization`	The number of concurrent executions divided by the maximum concurrency.	Units: None Valid statistics: Average, Max, Min

ServerlessConcurrentExecutionsUtilization

The number of concurrent executions divided by the maximum concurrency.

Units: None

Valid statistics: Average, Max, Min

Serverless endpoint with Provisioned Concurrency metrics

These CloudWatch metrics are published for serverless endpoints with Provisioned Concurrency.

Metric Name	Description	Unit/Stats
`ServerlessProvisionedConcurrencyExecutions`	The number of concurrent executions handled by the endpoint.	Units: Count Valid statistics: Average, Max, Min
`ServerlessProvisionedConcurrencyUtilization`	The number of concurrent executions divided by the allocated Provisioned Concurrency.	Units: None Valid statistics: Average, Max, Min
`ServerlessProvisionedConcurrencyInvocations`	The number of `InvokeEndpoint` requests handled by Provisioned Concurrency.	Units: Count Valid statistics: Average, Max, Min
`ServerlessProvisionedConcurrencySpilloverInvocations`	The number of `InvokeEndpoint` requests not handled by Provisioned Concurrency, that is handled by on-demand Serverless Inference.	Units: Count Valid statistics: Average, Max, Min

Logs

If you want to monitor the logs from your endpoint for debugging or progress analysis, you can use Amazon CloudWatch Logs. The SageMaker-provided log group that you can use for serverless endpoints is /aws/sagemaker/Endpoints/[EndpointName]. For more information about using CloudWatch Logs in SageMaker, see Log Amazon SageMaker Events with Amazon CloudWatch. To learn more about CloudWatch Logs, see What is Amazon CloudWatch Logs? in the Amazon CloudWatch Logs User Guide.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Delete a serverless endpoint

Automatically scale Provisioned Concurrency for a serverless endpoint