Deploy a Model Compiled with Neo (Console) - Amazon SageMaker

Deploy a Model Compiled with Neo (Console)

You can create a Neo endpoint in the Amazon SageMaker console.

  1. Choose Models, and then choose Create models from the Inference group. On the Create model page, complete the Model name, IAM role, and, if needed, VPC fields.

    
                            Create Neo Model for Inference
  2. To add information about the container used to deploy your model, choose Add container, then choose Next. Complete the Container input options, Location of inference code image, and Location of model artifacts, and optionally, Container host name, and Environmental variables fields.

    
                            Create Neo Model for Inference
  3. To deploy Neo-compiled models, choose the following:

    • Container input options: Provide model artifacts and inference image

    • Location of inference code image: Choose one of the following images, depending the region and kind of application:

      • Amazon SageMaker Image Classification

        • 301217895009.dkr.ecr.us-west-2.amazonaws.com/image-classification-neo:latest

        • 785573368785.dkr.ecr.us-east-1.amazonaws.com/image-classification-neo:latest

        • 007439368137.dkr.ecr.us-east-2.amazonaws.com/image-classification-neo:latest

        • 802834080501.dkr.ecr.eu-west-1.amazonaws.com/image-classification-neo:latest

      • Amazon SageMaker XGBoost

        • 301217895009.dkr.ecr.us-west-2.amazonaws.com/xgboost-neo:latest

        • 785573368785.dkr.ecr.us-east-1.amazonaws.com/xgboost-neo:latest

        • 007439368137.dkr.ecr.us-east-2.amazonaws.com/xgboost-neo:latest

        • 802834080501.dkr.ecr.eu-west-1.amazonaws.com/xgboost-neo:latest

      • TensorFlow : The TensorFlow version used must be in TensorFlow SageMaker Estimators list.

        • 301217895009.dkr.ecr.us-west-2.amazonaws.com/sagemaker-neo-tensorflow:[tensorflow-version]-[cpu/gpu]-py3

        • 785573368785.dkr.ecr.us-east-1.amazonaws.com/sagemaker-neo-tensorflow:[tensorflow-version]-[cpu/gpu]-py3

        • 007439368137.dkr.ecr.us-east-2.amazonaws.com/sagemaker-neo-tensorflow:[tensorflow-version]-[cpu/gpu]-py3

        • 802834080501.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-neo-tensorflow:[tensorflow-version]-[cpu/gpu]-py3

      • MXNet The MXNet version used must be in MXNet SageMaker Estimators list.

        • 301217895009.dkr.ecr.us-west-2.amazonaws.com/sagemaker-neo-mxnet:[mxnet-version]-[cpu/gpu]-py3

        • 785573368785.dkr.ecr.us-east-1.amazonaws.com/sagemaker-neo-mxnet:[mxnet-version]-[cpu/gpu]-py3

        • 007439368137.dkr.ecr.us-east-2.amazonaws.com/sagemaker-neo-mxnet:[mxnet-version]-[cpu/gpu]-py3

        • 802834080501.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-neo-mxnet:[mxnet-version]-[cpu/gpu]-py3

      • PyTorch The PyTorch version used must be in PyTorch SageMaker Estimators list.

        • 301217895009.dkr.ecr.us-west-2.amazonaws.com/sagemaker-neo-pytorch:[pytorch-version]-[cpu/gpu]-py3

        • 785573368785.dkr.ecr.us-east-1.amazonaws.com/sagemaker-neo-pytorch:[pytorch-version]-[cpu/gpu]-py3

        • 007439368137.dkr.ecr.us-east-2.amazonaws.com/sagemaker-neo-pytorch:[pytorch-version]-[cpu/gpu]-py3

        • 802834080501.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-neo-pytorch:[pytorch-version]-[cpu/gpu]-py3

    • Location of model artifact: the full S3 bucket path of the compiled model artifact generated by the Neo compilation API.

    • Environmental variables:

      • Omit this field for SageMaker Image Classification and SageMaker XGBoost.

      • For TensorFlow, PyTorch, and MXNet, specify the environment variable SAGEMAKER_SUBMIT_DIRECTORY as the full S3 bucket path that contains the training script.

    The script must be packaged as a *.tar.gz file. The *.tar.gz file must contain the training script at the root level. The script must contain two additional functions for Neo serving containers:

    • neo_preprocess(payload, content_type): Function that takes in the payload and Content-Type of each incoming request and returns a NumPy array.

    • neo_postprocess(result): Function that takes the prediction results produced by Deep Learning Runtime and returns the response body.

    Neither of these two functions use any functionalities of MXNet, PyTorch, or TensorFlow. For examples using these functions, see the Neo Model Compilation Sample Notebooks.

  4. Confirm that the information for the containers is accurate, and then choose Create model.This takes you to the create model landing page. Select the Create endpoint button there.

    
                            Create Model Landing Page
  5. In Create and configure endpoint diagram, specify the Endpoint name. Choose Create a new endpoint configuration in Attach endpoint configuration.

    
                            Neo console Create and configure endpoint UI.
  6. In New endpoint configuration page, specify the Endpoint configuration name.

    
                            Neo console New endpoint configuration UI.
  7. Press Edit next to the name of the model and specify the correct Instance type on the Edit Production Variant page. It is imperative that the Instance type value match the one specified in your compilation job.

    
                            Neo console New endpoint configuration UI.
  8. When you’re done click Save, then click Create endpoint configuration on the New endpoint configuration page, and then click Create endpoint.