Build Your Own Container for SageMaker AI Multi-Model Endpoints
Refer to the following sections for bringing your own container and dependencies to multi-model endpoints.
Topics
Bring your own dependencies for multi-model endpoints on CPU backed instances
If none of the pre-built container images serve your needs, you can build your own container for use with CPU backed multi-model endpoints.
Custom Amazon Elastic Container Registry (Amazon ECR) images deployed in Amazon SageMaker AI are expected to adhere to the basic contract described in Custom Inference Code with Hosting Services that govern how SageMaker AI interacts with a Docker container that runs your own inference code. For a container to be capable of loading and serving multiple models concurrently, there are additional APIs and behaviors that must be followed. This additional contract includes new APIs to load, list, get, and unload models, and a different API to invoke models. There are also different behaviors for error scenarios that the APIs need to abide by. To indicate that the container complies with the additional requirements, you can add the following command to your Docker file:
LABEL com.amazonaws.sagemaker.capabilities.multi-models=true
SageMaker AI also injects an environment variable into the container
SAGEMAKER_MULTI_MODEL=true
If you are creating a multi-model endpoint for a serial inference pipline, your Docker file must have the required labels for both multi-models and serial inference pipelines. For more information about serial information pipelines, see Run Real-time Predictions with an Inference Pipeline.
To help you implement these requirements for a custom container, two libraries are available:
-
Multi Model Server
is an open source framework for serving machine learning models that can be installed in containers to provide the front end that fulfills the requirements for the new multi-model endpoint container APIs. It provides the HTTP front end and model management capabilities required by multi-model endpoints to host multiple models within a single container, load models into and unload models out of the container dynamically, and performs inference on a specified loaded model. It also provides a pluggable backend that supports a pluggable custom backend handler where you can implement your own algorithm. -
SageMaker AI Inference Toolkit
is a library that bootstraps Multi Model Server with a configuration and settings that make it compatible with SageMaker AI multi-model endpoints. It also allows you to tweak important performance parameters, such as the number of workers per model, depending on the needs of your scenario.
Bring your own dependencies for multi-model endpoints on GPU backed instances
The bring your own container (BYOC) capability on multi-model endpoints with GPU backed instances is not currently supported by the Multi Model Server and SageMaker AI Inference Toolkit libraries.
For creating multi-model endpoints with GPU backed instances, you can use the SageMaker AI
supported NVIDIA Triton
Inference Server. with the NVIDIA Triton Inference Containers
FROM 301217895009.dkr.ecr.us-west-2.amazonaws.com/sagemaker-tritonserver:22.07-py3
Important
Containers with the Triton Inference Server are the only supported containers you can use for GPU backed multi-model endpoints.
Use the SageMaker AI Inference Toolkit
Note
The SageMaker AI Inference Toolkit is only supported for CPU backed multi-model endpoints. The SageMaker AI Inference Toolkit is not currently not supported for GPU backed multi-model endpoints.
Pre-built containers that support multi-model endpoints are listed in Supported algorithms, frameworks, and instances for multi-model endpoints. If you want to use
any other framework or algorithm, you need to build a container. The easiest way to do this
is to use the SageMaker AI Inference
Toolkit
Note
The SageMaker AI inference toolkit supports only Python model handlers. If you want to implement your handler in any other language, you must build your own container that implements the additional multi-model endpoint APIs. For information, see Custom Containers Contract for Multi-Model Endpoints.
To extend a container by using the SageMaker AI inference toolkit
-
Create a model handler. MMS expects a model handler, which is a Python file that implements functions to pre-process, get preditions from the model, and process the output in a model handler. For an example of a model handler, see model_handler.py
from the sample notebook. -
Import the inference toolkit and use its
model_server.start_model_server
function to start MMS. The following example is from thedockerd-entrypoint.py
file from the sample notebook. Notice that the call tomodel_server.start_model_server
passes the model handler described in the previous step:import subprocess import sys import shlex import os from retrying import retry from subprocess import CalledProcessError from sagemaker_inference import model_server def _retry_if_error(exception): return isinstance(exception, CalledProcessError or OSError) @retry(stop_max_delay=1000 * 50, retry_on_exception=_retry_if_error) def _start_mms(): # by default the number of workers per model is 1, but we can configure it through the # environment variable below if desired. # os.environ['SAGEMAKER_MODEL_SERVER_WORKERS'] = '2' model_server.start_model_server(handler_service='/home/model-server/model_handler.py:handle') def main(): if sys.argv[1] == 'serve': _start_mms() else: subprocess.check_call(shlex.split(' '.join(sys.argv[1:]))) # prevent docker exit subprocess.call(['tail', '-f', '/dev/null']) main()
-
In your
Dockerfile
, copy the model handler from the first step and specify the Python file from the previous step as the entrypoint in yourDockerfile
. The following lines are from the Dockerfileused in the sample notebook: # Copy the default custom service file to handle incoming data and inference requests COPY model_handler.py /home/model-server/model_handler.py # Define an entrypoint script for the docker image ENTRYPOINT ["python", "/usr/local/bin/dockerd-entrypoint.py"]
-
Build and register your container. The following shell script from the sample notebook builds the container and uploads it to an Amazon Elastic Container Registry repository in your AWS account:
%%sh # The name of our algorithm algorithm_name=demo-sagemaker-multimodel cd container account=$(aws sts get-caller-identity --query Account --output text) # Get the region defined in the current configuration (default to us-west-2 if none defined) region=$(aws configure get region) region=${region:-us-west-2} fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest" # If the repository doesn't exist in ECR, create it. aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1 if [ $? -ne 0 ] then aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null fi # Get the login command from ECR and execute it directly $(aws ecr get-login --region ${region} --no-include-email) # Build the docker image locally with the image name and then push it to ECR # with the full name. docker build -q -t ${algorithm_name} . docker tag ${algorithm_name} ${fullname} docker push ${fullname}
You can now use this container to deploy multi-model endpoints in SageMaker AI.