Host multiple models which use different containers behind one endpoint - Amazon SageMaker

Host multiple models which use different containers behind one endpoint

SageMaker multi-container endpoints enable customers to deploy multiple containers, that use different models or frameworks, on a single SageMaker endpoint. The containers can be run in a sequence as an inference pipeline, or each container can be accessed individually by using direct invocation to improve endpoint utilization and optimize costs.

For information about invoking the containers in a multi-container endpoint in sequence, see Host models along with pre-processing logic as serial inference pipeline behind one endpoint.

For information about invoking a specific container in a multi-container endpoint, see Use a multi-container endpoint with direct invocation

Create a multi-container endpoint (Boto 3)

Create a Multi-container endpoint by calling CreateModel, CreateEndpointConfig, and CreateEndpoint APIs as you would to create any other endpoints. You can run these containers sequentially as an inference pipeline, or run each individual container by using direct invocation. Multi-container endpoints have the following requirements when you call create_model:

  • Use the Containers parameter instead of PrimaryContainer, and include more than one container in the Containers parameter.

  • The ContainerHostname parameter is required for each container in a multi-container endpoint with direct invocation.

  • Set the Mode parameter of the InferenceExecutionConfig field to Direct for direct invocation of each container, or Serial to use containers as an inference pipeline. The default mode is Serial.


Currently there is a limit of up to 15 containers supported on a multi-container endpoint.

The following example creates a multi-container model for direct invocation.

  1. Create container elements and InferenceExecutionConfig with direct invocation.

    container1 = { 'Image': '', 'ContainerHostname': 'firstContainer'} container2 = { 'Image': '', 'ContainerHostname': 'secondContainer' } inferenceExecutionConfig = {'Mode': 'Direct' }
  2. Create the model with the container elements and set the InferenceExecutionConfig field.

    import boto3 sm_client = boto3.Session().client('sagemaker') response = sm_client.create_model( ModelName = 'my-direct-mode-model-name', InferenceExecutionConfig = inferenceExecutionConfig, ExecutionRoleArn = role, Containers = [container1, container2])

To create an endoint, you would then call create_endpoint_config and create_endpoint as you would to create any other endpoint.

Update a multi-container endpoint

To update a multi-container endpoint, complete the following steps.

  1. Call create_model to create a new model with a new value for the Mode parameter in the InferenceExecutionConfig field.

  2. Call create_endpoint_config to create a new endpoint config with a different name by using the new model you created in the previous step.

  3. Call update_endpoint to update the endpoint with the new endpoint config you created in the previous step.

Delete a multi-container endpoint

To delete an endpoint, call delete_endpoint, and provide the name of the endpoint you want to delete as the EndpointName parameter.