Invoke a Multi-Model Endpoint - Amazon SageMaker

Invoke a Multi-Model Endpoint

To invoke a multi-model endpoint, use the runtime_InvokeEndpoint from the Amazon SageMaker Runtime just as you would invoke a single model endpoint, with one change. Pass a new TargetModel parameter that specifies which of the models at the endpoint to target. The Amazon SageMaker Runtime InvokeEndpoint request supports X-Amzn-Target-Model as a new header that takes the relative path of the model specified for invocation. The Amazon SageMaker system constructs the absolute path of the model by combining the prefix that is provided as part of the CreateModel API call with the relative path of the model.

The following example prediction request uses the AWS SDK for Python (Boto 3) in the sample notebook.

response = runtime_sm_client.invoke_endpoint( EndpointName = ’my-endpoint’, ContentType = 'text/csv', TargetModel = ’Houston_TX.tar.gz’, Body = body)

The multi-model endpoint dynamically loads target models as needed. You can observe this when running the MME Sample Notebook as it iterates through random invocations against multiple target models hosted behind a single endpoint. The first request against a given model takes longer because the model has to be downloaded from Amazon Simple Storage Service (Amazon S3) and loaded into memory. (This is called a cold start.) Subsequent calls finish faster because there's no additional overhead after the model has loaded.


Invoking multi-model endpoints using the Amazon SageMaker Python SDK isn't supported.