Create a Multi-Model Endpoint (AWS SDK for Python (Boto)) - Amazon SageMaker

Create a Multi-Model Endpoint (AWS SDK for Python (Boto))

You create a multi-model endpoint using the Amazon SageMaker CreateModel, CreateEndpointConfig, and CreateEndpoint APIs just as you would create a single model endpoint, but with two changes. When defining the container, you need to pass a new Mode parameter value, MultiModel. You also need to pass the ModelDataUrl field that specifies the prefix in Amazon S3 where the model artifacts are located, instead of the path to a single model artifact, as you would when deploying a single model.

For a sample notebook that uses Amazon SageMaker to deploy multiple XGBoost models to an endpoint, see Multi-Model Endpoint XGBoost Sample Notebook.

The following procedure outlines the key steps used in that sample to create a multi-model endpoint.

To deploy the model (AWS SDK for Python (Boto 3))

  1. Get a container whose image supports deploying models. Currently, only the MXNet and PyTorch pre-build containers support multi-model endpoints. To use any other framework or algorithm, use the Amazon SageMaker inference toolkit to build a container that supports multi-model endpoints. For information, see Build Your Own Container with Multi Model Server.

    container = { 'Image': '', 'ModelDataUrl': 's3://my-bucket/path/to/artifacts/', 'Mode': 'MultiModel' }
  2. Create the model that uses this container.

    response = sm_client.create_model( ModelName = 'my-multi-model-name', ExecutionRoleArn = role, Containers = [container])
  3. (Optional) If you are using a serial inference pipeline, get the additional container(s) to include in the pipeline, and include it in the Containers argument of CreateModel:

    preprocessor_container = { 'Image': '' } multi_model_container = { 'Image': '', 'ModelDataUrl': 's3://my-bucket/path/to/artifacts/', 'Mode': 'MultiModel' } response = sm_client.create_model( ModelName = 'my-multi-model-name', ExecutionRoleArn = role, Containers = [preprocessor_container, multi_model_container])
  4. Configure the multi-model endpoint for the model. We recommend configuring your endpoints with at least two instances. This allows Amazon SageMaker to provide a highly available set of predictions across multiple Availability Zones for the models.

    response = sm_client.create_endpoint_config( EndpointConfigName = ‘my-epc’, ProductionVariants=[{ 'InstanceType': 'ml.m4.xlarge', 'InitialInstanceCount': 2, 'InitialVariantWeight': 1, 'ModelName': ‘my-multi-model-name’, 'VariantName': 'AllTraffic'}])

    You can use only one multi-model-enabled endpoint in a serial inference pipeline.

  5. Create the multi-model endpoint using the EndpointName and EndpointConfigName parameters.

    response = sm_client.create_endpoint( EndpointName = 'my-endpoint', EndpointConfigName = 'my-epc')