Create a custom Docker container image for SageMaker and use it for model training in AWS Step Functions - AWS Prescriptive Guidance

Create a custom Docker container image for SageMaker and use it for model training in AWS Step Functions

Created by Julia Bluszcz (AWS), Neha Sharma (AWS), Aubrey Oosthuizen (AWS), Mohan Gowda Purushothama (AWS), and Mateusz Zaremba (AWS)

Environment: Production

Technologies: Machine learning & AI; DevOps

AWS services: Amazon ECR; Amazon SageMaker; AWS Step Functions

Summary

This pattern shows how to create a Docker container image for Amazon SageMaker and use it for a training model in AWS Step Functions. By packaging custom algorithms in a container, you can run almost any code in the SageMaker environment, regardless of programming language, framework, or dependencies.

In the example SageMaker notebook provided, the custom Docker container image is stored in Amazon Elastic Container Registry (Amazon ECR). Step Functions then uses the container that’s stored in Amazon ECR to run a Python processing script for SageMaker. Then, the container exports the model to Amazon Simple Storage Service (Amazon S3).

Prerequisites and limitations

Prerequisites

Product versions

  • AWS Step Functions Data Science SDK version 2.3.0

  • Amazon SageMaker Python SDK version 2.78.0

Architecture

The following diagram shows an example workflow for creating a Docker container image for SageMaker, then using it for a training model in Step Functions:

Workflow to create Docker container image for SageMaker to use as a Step Functions training model.

The diagram shows the following workflow:

  1. A data scientist or DevOps engineer uses a Amazon SageMaker notebook to create a custom Docker container image.

  2. A data scientist or DevOps engineer stores the Docker container image in an Amazon ECR private repository that’s in a private registry.

  3. A data scientist or DevOps engineer uses the Docker container to run a Python SageMaker processing job in a Step Functions workflow.

Automation and scale

The example SageMaker notebook in this pattern uses an ml.m5.xlarge notebook instance type. You can change the instance type to fit your use case. For more information about SageMaker notebook instance types, see Amazon SageMaker Pricing.

Tools

  • Amazon Elastic Container Registry (Amazon ECR) is a managed container image registry service that’s secure, scalable, and reliable.

  • Amazon SageMaker is a managed machine learning (ML) service that helps you build and train ML models and then deploy them into a production-ready hosted environment.

  • Amazon SageMaker Python SDK is an open source library for training and deploying machine-learning models on SageMaker.

  • AWS Step Functions is a serverless orchestration service that helps you combine AWS Lambda functions and other AWS services to build business-critical applications.

  • AWS Step Functions Data Science Python SDK is an open source library that helps you create Step Functions workflows that process and publish machine learning models.

Epics

TaskDescriptionSkills required

Setup Amazon ECR and create a new private registry.

If you haven’t already, set up Amazon ECR by following the instructions in Setting up with Amazon ECR in the Amazon ECR User Guide. Each AWS account is provided with a default private Amazon ECR registry.

DevOps engineer

Create an Amazon ECR private repository.

Follow the instructions in Creating a private repository in the Amazon ECR User Guide.

Note: The repository that you create is where you’ll store your custom Docker container images.

DevOps engineer

Create a Dockerfile that includes the specifications needed to run your SageMaker processing job.

Create a Dockerfile that includes the specifications needed to run your SageMaker processing job by configuring a Dockerfile. For instructions, see Adapting your own training container in the Amazon SageMaker Developer Guide.

For more information about Dockerfiles, see the Dockerfile Reference in the Docker documentation.

Example Jupyter notebook code cells to create a Dockerfile

Cell 1

# Make docker folder !mkdir -p docker

Cell 2

%%writefile docker/Dockerfile FROM python:3.7-slim-buster RUN pip3 install pandas==0.25.3 scikit-learn==0.21.3 ENV PYTHONUNBUFFERED=TRUE ENTRYPOINT ["python3"]
DevOps engineer

Build your Docker container image and push it to Amazon ECR.

  1. Build the container image using the Dockerfile that you created by running the docker build command in the AWS CLI.

  2. Push the container image to Amazon ECR by running the docker push command.

For more information, see Building and registering the container in Building your own algorithm container on GitHub.

Example Jupyter notebook code cells to build and register a Docker image

Important: Before running the following cells, make sure that you’ve created a Dockerfile and stored it in the directory called docker. Also, make sure that you’ve created an Amazon ECR repository, and that you replace the ecr_repository value in the first cell with your repository’s name.

Cell 1

import boto3 tag = ':latest' account_id = boto3.client('sts').get_caller_identity().get('Account') region = boto3.Session().region_name ecr_repository = 'byoc' image_uri = '{}.dkr.ecr.{}.amazonaws.com/{}'.format(account_id, region, ecr_repository + tag)

Cell 2

# Build docker image !docker build -t $image_uri docker

Cell 3

# Authenticate to ECR !aws ecr get-login-password --region {region} | docker login --username AWS --password-stdin {account_id}.dkr.ecr.{region}.amazonaws.com

Cell 4

# Push docker image !docker push $image_uri

Note: You must authenticate your Docker client to your private registry so that you can use the docker push and docker pull commands. These commands push and pull images to and from the repositories in your registry.

DevOps engineer
TaskDescriptionSkills required

Create a Python script that includes your custom processing and model training logic.

Write custom processing logic to run in your data processing script. Then, save it as a Python script named training.py.

For more information, see Bring your own model with SageMaker Script Mode on GitHub.

Example Python script that includes custom processing and model training logic

%%writefile training.py from numpy import empty import pandas as pd import os from sklearn import datasets, svm from joblib import dump, load if __name__ == '__main__': digits = datasets.load_digits() #create classifier object clf = svm.SVC(gamma=0.001, C=100.) #fit the model clf.fit(digits.data[:-1], digits.target[:-1]) #model output in binary format output_path = os.path.join('/opt/ml/processing/model', "model.joblib") dump(clf, output_path)
Data scientist

Create a Step Functions workflow that includes your SageMaker Processing job as one of the steps.

Install and import the AWS Step Functions Data Science SDK and upload the training.py file to Amazon S3. Then, use the Amazon SageMaker Python SDK to define a processing step in Step Functions.

Important: Make sure that you’ve created an IAM execution role for Step Functions in your AWS account.

Example environment set up and custom training script to upload to Amazon S3

!pip install stepfunctions import boto3 import stepfunctions import sagemaker import datetime from stepfunctions import steps from stepfunctions.inputs import ExecutionInput from stepfunctions.steps import ( Chain ) from stepfunctions.workflow import Workflow from sagemaker.processing import ScriptProcessor, ProcessingInput, ProcessingOutput sagemaker_session = sagemaker.Session() bucket = sagemaker_session.default_bucket() role = sagemaker.get_execution_role() prefix = 'byoc-training-model' # See prerequisites section to create this role workflow_execution_role = f"arn:aws:iam::{account_id}:role/AmazonSageMaker-StepFunctionsWorkflowExecutionRole" execution_input = ExecutionInput( schema={ "PreprocessingJobName": str}) input_code = sagemaker_session.upload_data( "training.py", bucket=bucket, key_prefix="preprocessing.py", )

Example SageMaker processing step definition that uses a custom Amazon ECR image and Python script

Note: Make sure that you use the execution_input parameter to specify the job name. The parameter’s value must be unique each time the job runs. Also, the training.py file’s code is passed as an input parameter to the ProcessingStep, which means that it will be copied inside the container. The destination for the ProcessingInput code is the same as the second argument inside the container_entrypoint.

script_processor = ScriptProcessor(command=['python3'], image_uri=image_uri, role=role, instance_count=1, instance_type='ml.m5.xlarge') processing_step = steps.ProcessingStep( "training-step", processor=script_processor, job_name=execution_input["PreprocessingJobName"], inputs=[ ProcessingInput( source=input_code, destination="/opt/ml/processing/input/code", input_name="code", ), ], outputs=[ ProcessingOutput( source='/opt/ml/processing/model', destination="s3://{}/{}".format(bucket, prefix), output_name='byoc-example') ], container_entrypoint=["python3", "/opt/ml/processing/input/code/training.py"], )

Example Step Functions workflow that runs a SageMaker processing job

Note: This example workflow includes the SageMaker processing job step only, not a complete Step Functions workflow. For a full example workflow, see Example notebooks in SageMaker in the AWS Step Functions Data Science SDK documentation.

workflow_graph = Chain([processing_step]) workflow = Workflow( name="ProcessingWorkflow", definition=workflow_graph, role=workflow_execution_role ) workflow.create() # Execute workflow execution = workflow.execute( inputs={ "PreprocessingJobName": str(datetime.datetime.now().strftime("%Y%m%d%H%M-%SS")), # Each pre processing job (SageMaker processing job) requires a unique name, } ) execution_output = execution.get_output(wait=True)
Data scientist

Related resources