Create a custom Docker container image for SageMaker and use it for model training in AWS Step Functions
Created by Julia Bluszcz (AWS), Neha Sharma (AWS), Aubrey Oosthuizen (AWS), Mohan Gowda Purushothama (AWS), and Mateusz Zaremba (AWS)
Environment: Production | Technologies: Machine learning & AI; DevOps | AWS services: Amazon ECR; Amazon SageMaker; AWS Step Functions |
Summary
This pattern shows how to create a Docker container image for Amazon SageMaker and use it for a training model in AWS Step Functions. By packaging custom algorithms in a container, you can run almost any code in the SageMaker environment, regardless of programming language, framework, or dependencies.
In the example SageMaker notebook provided, the custom Docker container image is stored in Amazon Elastic Container Registry (Amazon ECR). Step Functions then uses the container that’s stored in Amazon ECR to run a Python processing script for SageMaker. Then, the container exports the model to Amazon Simple Storage Service (Amazon S3).
Prerequisites and limitations
Prerequisites
An active AWS account
An AWS Identity and Access Management (IAM) role for SageMaker with Amazon S3 permissions
Familiarity with Python
Familiarity with the Amazon SageMaker Python SDK
Familiarity with the AWS Command Line Interface (AWS CLI)
Familiarity with AWS SDK for Python (Boto3)
Familiarity with Amazon ECR
Familiarity with Docker
Product versions
AWS Step Functions Data Science SDK version 2.3.0
Amazon SageMaker Python SDK version 2.78.0
Architecture
The following diagram shows an example workflow for creating a Docker container image for SageMaker, then using it for a training model in Step Functions:
The diagram shows the following workflow:
A data scientist or DevOps engineer uses a Amazon SageMaker notebook to create a custom Docker container image.
A data scientist or DevOps engineer stores the Docker container image in an Amazon ECR private repository that’s in a private registry.
A data scientist or DevOps engineer uses the Docker container to run a Python SageMaker processing job in a Step Functions workflow.
Automation and scale
The example SageMaker notebook in this pattern uses an ml.m5.xlarge
notebook instance type. You can change the instance type to fit your use case. For more information about SageMaker notebook instance types, see Amazon SageMaker Pricing
Tools
Amazon Elastic Container Registry (Amazon ECR) is a managed container image registry service that’s secure, scalable, and reliable.
Amazon SageMaker is a managed machine learning (ML) service that helps you build and train ML models and then deploy them into a production-ready hosted environment.
Amazon SageMaker Python SDK
is an open source library for training and deploying machine-learning models on SageMaker. AWS Step Functions is a serverless orchestration service that helps you combine AWS Lambda functions and other AWS services to build business-critical applications.
AWS Step Functions Data Science Python SDK
is an open source library that helps you create Step Functions workflows that process and publish machine learning models.
Epics
Task | Description | Skills required |
---|---|---|
Setup Amazon ECR and create a new private registry. | If you haven’t already, set up Amazon ECR by following the instructions in Setting up with Amazon ECR in the Amazon ECR User Guide. Each AWS account is provided with a default private Amazon ECR registry. | DevOps engineer |
Create an Amazon ECR private repository. | Follow the instructions in Creating a private repository in the Amazon ECR User Guide. Note: The repository that you create is where you’ll store your custom Docker container images. | DevOps engineer |
Create a Dockerfile that includes the specifications needed to run your SageMaker processing job. | Create a Dockerfile that includes the specifications needed to run your SageMaker processing job by configuring a Dockerfile. For instructions, see Adapting your own training container in the Amazon SageMaker Developer Guide. For more information about Dockerfiles, see the Dockerfile Reference Example Jupyter notebook code cells to create a Dockerfile Cell 1
Cell 2
| DevOps engineer |
Build your Docker container image and push it to Amazon ECR. |
For more information, see Building and registering the container Example Jupyter notebook code cells to build and register a Docker image Important: Before running the following cells, make sure that you’ve created a Dockerfile and stored it in the directory called Cell 1
Cell 2
Cell 3
Cell 4
Note: You must authenticate your Docker client to your private registry so that you can use the | DevOps engineer |
Task | Description | Skills required |
---|---|---|
Create a Python script that includes your custom processing and model training logic. | Write custom processing logic to run in your data processing script. Then, save it as a Python script named For more information, see Bring your own model with SageMaker Script Mode Example Python script that includes custom processing and model training logic
| Data scientist |
Create a Step Functions workflow that includes your SageMaker Processing job as one of the steps. | Install and import the AWS Step Functions Data Science SDK Important: Make sure that you’ve created an IAM execution role for Step Functions Example environment set up and custom training script to upload to Amazon S3
Example SageMaker processing step definition that uses a custom Amazon ECR image and Python script Note: Make sure that you use the
Example Step Functions workflow that runs a SageMaker processing job Note: This example workflow includes the SageMaker processing job step only, not a complete Step Functions workflow. For a full example workflow, see Example notebooks in SageMaker
| Data scientist |
Related resources
Process data (Amazon SageMaker Developer Guide)
Adapting your own training container (Amazon SageMaker Developer Guide)