InferenceMetrics
The metrics for an existing endpoint compared in an Inference Recommender job.
Contents
- MaxInvocations
-
The expected maximum number of requests per minute for the instance.
Type: Integer
Required: Yes
- ModelLatency
-
The expected model latency at maximum invocations per minute for the instance.
Type: Integer
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: