AsyncInferenceClientConfig
Configures the behavior of the client used by SageMaker to interact with the model container during asynchronous inference.
Contents
- MaxConcurrentInvocationsPerInstance
-
The maximum number of concurrent requests sent by the SageMaker client to the model container. If no value is provided, SageMaker chooses an optimal value.
Type: Integer
Valid Range: Minimum value of 1. Maximum value of 1000.
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: