Host multiple models which use different containers behind one endpoint
SageMaker multi-container endpoints enable customers to deploy multiple containers, that use different models or frameworks, on a single SageMaker endpoint. The containers can be run in a sequence as an inference pipeline, or each container can be accessed individually by using direct invocation to improve endpoint utilization and optimize costs.
For information about invoking the containers in a multi-container endpoint in sequence, see Host models along with pre-processing logic as serial inference pipeline behind one endpoint.
For information about invoking a specific container in a multi-container endpoint, see Use a multi-container endpoint with direct invocation
Topics
Create a multi-container endpoint (Boto 3)
Create a Multi-container endpoint by calling CreateModel,
CreateEndpointConfig, and
CreateEndpoint
APIs as you would to create any other endpoints. You can
run these containers sequentially as an inference pipeline, or run each individual
container by using direct invocation. Multi-container endpoints have the following
requirements when you call create_model
:
-
Use the
Containers
parameter instead ofPrimaryContainer
, and include more than one container in theContainers
parameter. -
The
ContainerHostname
parameter is required for each container in a multi-container endpoint with direct invocation. -
Set the
Mode
parameter of theInferenceExecutionConfig
field toDirect
for direct invocation of each container, orSerial
to use containers as an inference pipeline. The default mode isSerial
.
Note
Currently there is a limit of up to 15 containers supported on a multi-container endpoint.
The following example creates a multi-container model for direct invocation.
-
Create container elements and
InferenceExecutionConfig
with direct invocation.container1 = { 'Image': '123456789012.dkr.ecr.us-east-1.amazonaws.com/myimage1:mytag', 'ContainerHostname': 'firstContainer' } container2 = { 'Image': '123456789012.dkr.ecr.us-east-1.amazonaws.com/myimage2:mytag', 'ContainerHostname': 'secondContainer' } inferenceExecutionConfig = {'Mode': 'Direct'}
-
Create the model with the container elements and set the
InferenceExecutionConfig
field.import boto3 sm_client = boto3.Session().client('sagemaker') response = sm_client.create_model( ModelName = 'my-direct-mode-model-name', InferenceExecutionConfig = inferenceExecutionConfig, ExecutionRoleArn = role, Containers = [container1, container2] )
To create an endoint, you would then call create_endpoint_config
Update a multi-container endpoint
To update a multi-container endpoint, complete the following steps.
-
Call create_model
to create a new model with a new value for the Mode
parameter in theInferenceExecutionConfig
field. -
Call create_endpoint_config
to create a new endpoint config with a different name by using the new model you created in the previous step. -
Call update_endpoint
to update the endpoint with the new endpoint config you created in the previous step.
Delete a multi-container endpoint
To delete an endpoint, call delete_endpointEndpointName
parameter.