先决条件

注意

如果您使用 AWS SDK for Python (Boto3)、或 SageMaker AI 控制台编译模型 AWS CLI，请按照本节中的说明进行操作。

要创建 SageMaker Neo 编译的模型，您需要以下内容：

Docker 映像 Amazon ECR URI。您可以从此列表中选择一个满足您需求的产品。

入口点脚本文件：

适用于 PyTorch 和 MXNet 型号：

如果您使用 SageMaker AI 训练模型，则训练脚本必须实现下述功能。训练脚本在推理过程中用作入口点脚本。在 MNIST 使用 MXNet 模块和 N SageMaker eo 进行训练、编译和部署中详述的示例中，训练脚本 (mnist.py) 实现了所需的函数。

如果您没有使用 SageMaker AI 训练模型，则需要提供可在推理时使用的入口点脚本 (inference.py) 文件。基于框架 PyTorch —— MXNet 或——推理脚本的位置必须符合适用的 SageMaker Python SDK 模型目录结构 MxNet或模型目录结构。 PyTorch

MXNet在 CPU PyTorch和 GPU 实例类型上使用 Neo 推理优化的容器镜像时，推理脚本必须实现以下功能：

model_fn：加载模型。（可选）
input_fn：将传入的请求负载转换为 numpy 数组。
predict_fn：执行预测。
output_fn：将预测输出转换为响应负载。
或者，您可以将 transform_fn 定义为组合 input_fn、predict_fn 和 output_fn。

以下是名为 code (code/inference.py) PyTorch 和 MXNet （Gluon and Mod ule）的目录中的inference.py脚本示例。这些示例首先加载模型，然后在 GPU 上将其提供给映像数据：

MXNet Module


import numpy as np
import json
import mxnet as mx
import neomx  # noqa: F401
from collections import namedtuple

Batch = namedtuple('Batch', ['data'])

# Change the context to mx.cpu() if deploying to a CPU endpoint
ctx = mx.gpu()

def model_fn(model_dir):
    # The compiled model artifacts are saved with the prefix 'compiled'
    sym, arg_params, aux_params = mx.model.load_checkpoint('compiled', 0)
    mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None)
    exe = mod.bind(for_training=False,
                   data_shapes=[('data', (1,3,224,224))],
                   label_shapes=mod._label_shapes)
    mod.set_params(arg_params, aux_params, allow_missing=True)
    
    # Run warm-up inference on empty data during model load (required for GPU)
    data = mx.nd.empty((1,3,224,224), ctx=ctx)
    mod.forward(Batch([data]))
    return mod


def transform_fn(mod, image, input_content_type, output_content_type):
    # pre-processing
    decoded = mx.image.imdecode(image)
    resized = mx.image.resize_short(decoded, 224)
    cropped, crop_info = mx.image.center_crop(resized, (224, 224))
    normalized = mx.image.color_normalize(cropped.astype(np.float32) / 255,
                                  mean=mx.nd.array([0.485, 0.456, 0.406]),
                                  std=mx.nd.array([0.229, 0.224, 0.225]))
    transposed = normalized.transpose((2, 0, 1))
    batchified = transposed.expand_dims(axis=0)
    casted = batchified.astype(dtype='float32')
    processed_input = casted.as_in_context(ctx)

    # prediction/inference
    mod.forward(Batch([processed_input]))

    # post-processing
    prob = mod.get_outputs()[0].asnumpy().tolist()
    prob_json = json.dumps(prob)
    return prob_json, output_content_type

MXNet Gluon


import numpy as np
import json
import mxnet as mx
import neomx  # noqa: F401

# Change the context to mx.cpu() if deploying to a CPU endpoint
ctx = mx.gpu()

def model_fn(model_dir):
    # The compiled model artifacts are saved with the prefix 'compiled'
    block = mx.gluon.nn.SymbolBlock.imports('compiled-symbol.json',['data'],'compiled-0000.params', ctx=ctx)
    
    # Hybridize the model & pass required options for Neo: static_alloc=True & static_shape=True
    block.hybridize(static_alloc=True, static_shape=True)
    
    # Run warm-up inference on empty data during model load (required for GPU)
    data = mx.nd.empty((1,3,224,224), ctx=ctx)
    warm_up = block(data)
    return block


def input_fn(image, input_content_type):
    # pre-processing
    decoded = mx.image.imdecode(image)
    resized = mx.image.resize_short(decoded, 224)
    cropped, crop_info = mx.image.center_crop(resized, (224, 224))
    normalized = mx.image.color_normalize(cropped.astype(np.float32) / 255,
                                  mean=mx.nd.array([0.485, 0.456, 0.406]),
                                  std=mx.nd.array([0.229, 0.224, 0.225]))
    transposed = normalized.transpose((2, 0, 1))
    batchified = transposed.expand_dims(axis=0)
    casted = batchified.astype(dtype='float32')
    processed_input = casted.as_in_context(ctx)
    return processed_input


def predict_fn(processed_input_data, block):
    # prediction/inference
    prediction = block(processed_input_data)
    return prediction

def output_fn(prediction, output_content_type):
    # post-processing
    prob = prediction.asnumpy().tolist()
    prob_json = json.dumps(prob)
    return prob_json, output_content_type

PyTorch 1.4 and Older


import os
import torch
import torch.nn.parallel
import torch.optim
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
from PIL import Image
import io
import json
import pickle


def model_fn(model_dir):
    """Load the model and return it.
    Providing this function is optional.
    There is a default model_fn available which will load the model
    compiled using SageMaker Neo. You can override it here.

    Keyword arguments:
    model_dir -- the directory path where the model artifacts are present
    """

    # The compiled model is saved as "compiled.pt"
    model_path = os.path.join(model_dir, 'compiled.pt')
    with torch.neo.config(model_dir=model_dir, neo_runtime=True):
        model = torch.jit.load(model_path)
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        model = model.to(device)

    # We recommend that you run warm-up inference during model load
    sample_input_path = os.path.join(model_dir, 'sample_input.pkl')
    with open(sample_input_path, 'rb') as input_file:
        model_input = pickle.load(input_file)
    if torch.is_tensor(model_input):
        model_input = model_input.to(device)
        model(model_input)
    elif isinstance(model_input, tuple):
        model_input = (inp.to(device) for inp in model_input if torch.is_tensor(inp))
        model(*model_input)
    else:
        print("Only supports a torch tensor or a tuple of torch tensors")
        return model


def transform_fn(model, request_body, request_content_type,
                 response_content_type):
    """Run prediction and return the output.
    The function
    1. Pre-processes the input request
    2. Runs prediction
    3. Post-processes the prediction output.
    """
    # preprocess
    decoded = Image.open(io.BytesIO(request_body))
    preprocess = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[
                0.485, 0.456, 0.406], std=[
                0.229, 0.224, 0.225]),
    ])
    normalized = preprocess(decoded)
    batchified = normalized.unsqueeze(0)
    # predict
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    batchified = batchified.to(device)
    output = model.forward(batchified)

    return json.dumps(output.cpu().numpy().tolist()), response_content_type

PyTorch 1.5 and Newer


import os
import torch
import torch.nn.parallel
import torch.optim
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
from PIL import Image
import io
import json
import pickle


def model_fn(model_dir):
    """Load the model and return it.
    Providing this function is optional.
    There is a default_model_fn available, which will load the model
    compiled using SageMaker Neo. You can override the default here.
    The model_fn only needs to be defined if your model needs extra
    steps to load, and can otherwise be left undefined.

    Keyword arguments:
    model_dir -- the directory path where the model artifacts are present
    """

    # The compiled model is saved as "model.pt"
    model_path = os.path.join(model_dir, 'model.pt')
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = torch.jit.load(model_path, map_location=device)
    model = model.to(device)

    return model


def transform_fn(model, request_body, request_content_type,
                    response_content_type):
    """Run prediction and return the output.
    The function
    1. Pre-processes the input request
    2. Runs prediction
    3. Post-processes the prediction output.
    """
    # preprocess
    decoded = Image.open(io.BytesIO(request_body))
    preprocess = transforms.Compose([
                                transforms.Resize(256),
                                transforms.CenterCrop(224),
                                transforms.ToTensor(),
                                transforms.Normalize(
                                    mean=[
                                        0.485, 0.456, 0.406], std=[
                                        0.229, 0.224, 0.225]),
                                    ])
    normalized = preprocess(decoded)
    batchified = normalized.unsqueeze(0)
    
    # predict
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    batchified = batchified.to(device)
    output = model.forward(batchified)
    return json.dumps(output.cpu().numpy().tolist()), response_content_type

对于 inf1 实例或 onnx、xgboost、keras 容器映像

对于所有其他 Neo Inference 优化的容器映像或 inferentia 实例类型，入口点脚本必须为 Neo 深度学习运行时系统实施以下功能：
- neo_preprocess：将传入的请求负载转换为 numpy 数组。
- neo_postprocess：将 Neo 深度学习运行时系统的预测输出转换为响应正文。
  
  注意
  前两个函数不使用 MXNet PyTorch、或 TensorFlow的任何功能。
有关如何使用这些功能的示例，请参阅 Neo 模型编译示例笔记本。

对于 TensorFlow 模特

如果您的模型在将数据发送到模型之前需要自定义的预处理和后处理逻辑，则必须指定推理时可以使用的入口点脚本 inference.py 文件。该脚本应实施一对 input_handler 和 output_handler 功能或单个处理程序功能。

注意

请注意，如果处理程序功能已实施，则 input_handler 和 output_handler 被忽略。

以下是 inference.py 脚本的代码示例，您可以将该脚本与编译模型组合在一起，对图像分类模型执行自定义的预处理和后处理。A SageMaker I 客户端将图像文件作为application/x-image内容类型发送给input_handler函数，然后将其转换为 JSON。然后，使用 REST API 将转换后的映像文件发送到 Tensorflow Model Server (TFX)。


import json
import numpy as np
import json
import io
from PIL import Image

def input_handler(data, context):
    """ Pre-process request input before it is sent to TensorFlow Serving REST API
    
    Args:
    data (obj): the request data, in format of dict or string
    context (Context): an object containing request and configuration details
    
    Returns:
    (dict): a JSON-serializable dict that contains request body and headers
    """
    f = data.read()
    f = io.BytesIO(f)
    image = Image.open(f).convert('RGB')
    batch_size = 1
    image = np.asarray(image.resize((512, 512)))
    image = np.concatenate([image[np.newaxis, :, :]] * batch_size)
    body = json.dumps({"signature_name": "serving_default", "instances": image.tolist()})
    return body

def output_handler(data, context):
    """Post-process TensorFlow Serving output before it is returned to the client.
    
    Args:
    data (obj): the TensorFlow serving response
    context (Context): an object containing request and configuration details
    
    Returns:
    (bytes, string): data to return to client, response content type
    """
    if data.status_code != 200:
        raise ValueError(data.content.decode('utf-8'))

    response_content_type = context.accept_header
    prediction = data.content
    return prediction, response_content_type

如果没有自定义的预处理或后处理， SageMaker AI 客户端会以类似的方式将文件图像转换为 JSON，然后再将其发送到 SageMaker AI 终端节点。

有关更多信息，请参阅 SageMaker Python SDK 中的部署到 TensorFlow 服务端点。

包含已编译模型构件的 Amazon S3 存储桶 URI。

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

文档惯例

部署模型

使用 SageMaker AI SDK 部署编译后的模型