Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Add or Remove Models

Focus mode
Add or Remove Models - Amazon SageMaker AI

You can deploy additional models to a multi-model endpoint and invoke them through that endpoint immediately. When adding a new model, you don't need to update or bring down the endpoint, so you avoid the cost of creating and running a separate endpoint for each new model. The process for adding and removing models is the same for CPU and GPU-backed multi-model endpoints.

SageMaker AI unloads unused models from the container when the instance is reaching memory capacity and more models need to be downloaded into the container. SageMaker AI also deletes unused model artifacts from the instance storage volume when the volume is reaching capacity and new models need to be downloaded. The first invocation to a newly added model takes longer because the endpoint takes time to download the model from S3 to the container's memory in instance hosting the endpoint

With the endpoint already running, copy a new set of model artifacts to the Amazon S3 location there you store your models.

# Add an AdditionalModel to the endpoint and exercise it aws s3 cp AdditionalModel.tar.gz s3://amzn-s3-demo-bucket/path/to/artifacts/
Important

To update a model, proceed as you would when adding a new model. Use a new and unique name. Don't overwrite model artifacts in Amazon S3 because the old version of the model might still be loaded in the containers or on the storage volume of the instances on the endpoint. Invocations to the new model could then invoke the old version of the model.

Client applications can request predictions from the additional target model as soon as it is stored in S3.

response = runtime_sagemaker_client.invoke_endpoint( EndpointName='<ENDPOINT_NAME>', ContentType='text/csv', TargetModel='AdditionalModel.tar.gz', Body=body)

To delete a model from a multi-model endpoint, stop invoking the model from the clients and remove it from the S3 location where model artifacts are stored.

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.