Summary Prerequisites and limitations Architecture Tools Epics Related resources Attachments

Migrate ML Build, Train, and Deploy workloads to Amazon SageMaker using AWS Developer Tools

Created by Scot Marvin (AWS)

R Type: Replatform	Source: Machine Learning	Target: Amazon SageMaker
Created by: AWS	Environment: PoC or pilot	Technologies: Machine learning & AI; DevOps; Migration
AWS services: Amazon SageMaker

Summary

This pattern provides guidance for migrating an on-premises machine learning (ML) application running on Unix or Linux servers to be trained and deployed on AWS using Amazon SageMaker. This deployment uses a continuous integration and continuous deployment (CI/CD) pipeline. The migration pattern is deployed using an AWS CloudFormation stack.

Prerequisites and limitations

Prerequisites

An active AWS account using AWS Landing Zone
AWS Command Line Interface (AWS CLI) installed and configured on your Unix or Linux server
An ML source code repository in either GitHub, AWS CodeCommit, or Amazon Simple Storage Service (Amazon S3)

Limitations

Only 300 individual pipelines can be deployed in one AWS Region.
This pattern is intended for supervised ML workloads with train-and-deploy code in Python.

Product versions

Docker version 19.03.5, build 633a0ea, using Python 3.6x

Architecture

Source technology stack

On-premises Linux compute instance with data on either the local file system or in a relational database

Source architecture

Target technology stack

AWS CodePipeline deployed with Amazon S3 for data storage and Amazon DynamoDB as metadata store for tracking or logging pipeline runs

Target architecture

Application migration architecture

Native Python package and AWS CodeCommit repository (and an SQL client, for on-premises datasets on database instance)

Tools

Python
Git
AWS CLI – The AWS CLI deploys the AWS CloudFormation stack and moves data to the S3 bucket. The S3 bucket, in turn, leads to the target.

Epics

Task	Description	Skills required
Validate source code and datasets.		Data scientist
Identify target build, train, and deployment instance types and sizes.		Data engineer, Data scientist
Create capability list and capacity requirements.
Identify network requirements.		DBA, Systems administrator
Identify the network or host access security requirements for the source and target applications.		Data engineer, ML engineer, Systems administrator
Determine backup strategy.		ML engineer, Systems administrator
Determine availability requirements.		ML engineer, Systems administrator
Identify the application migration or switchover strategy.		Data scientist, ML engineer

Task	Description	Skills required
Create a virtual private cloud (VPC).		ML engineer, Systems administrator
Create security groups.		ML engineer, Systems administrator
Set up an Amazon S3 bucket and AWS CodeCommit repository branches for ML code.		ML engineer

Task	Description	Skills required
Use native MySQL tools or third-party tools to migrate train, validate, and test datasets to provisioned S3 bucket.	This is required for AWS CloudFormation stack deployment.	Data engineer, ML engineer
Package the ML train and hosting code as Python packages and push to the provisioned repository in AWS CodeCommit or GitHub.	You need the repository's branch name to deploy the AWS CloudFormation template for migration.	Data scientist, ML engineer

Task	Description	Skills required
Follow the ML workload migration strategy.		Application owner, ML engineer
Deploy the AWS CloudFormation stack.	Use the AWS CLI to create the stack declared in the YAML template provided with this solution.	Data scientist, ML engineer

Task	Description	Skills required
Switch the application clients over to the new infrastructure.		Application owner, Data scientist, ML engineer

Task	Description	Skills required
Shut down the temporary AWS resources.	Shut down any custom resources from the AWS CloudFormation template (for example, any AWS Lambda functions that aren't being used).	Data scientist, ML engineer
Review and validate the project documents.		Application owner, Data scientist
Validate the results and the ML model evaluation metrics with operators.	Make sure that model performance matches the application users' expectations and is comparable to the on-premises state.	Application owner, Data scientist
Close out the project and provide feedback.		Application owner, ML engineer

Related resources

Attachments

To access additional content that is associated with this document, unzip the following file: attachment.zip

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Export a Microsoft SQL Server database to Amazon S3

Migrate OpenText TeamSite workloads to AWS