建立一個非同步推論端點

使用 SageMaker 託管服務建立端點的相同方式來建立非同步端點：

在中建立模 SageMaker 型CreateModel。
使用 CreateEndpointConfig 建立一個端點組態。
使用 CreateEndpoint 建立一個 HTTPS 端點。

若要建立端點，請先以 CreateModel 建立模型，指向模型成品和 Docker 登錄檔路徑 (映像)。然後，您可以使用CreateEndpointConfig其中指定一或多個使用 CreateModel API 建立的模型以及要佈建的資源 SageMaker 來建立組態。以 CreateEndpoint 使用請求中指定的端點組態來建立端點。您可以使用 UpdateEndpoint API 更新非同步端點。用 InvokeEndpointAsync 從端點上託管的模型傳送和接收推論請求。您可以使用 DeleteEndpoint API 刪除端點。

如需可用 SageMaker 映像檔的完整清單，請參閱可用的 Deep Learning Containers 映像。請參閱使用您自己的推論程式碼，瞭解有關如何建立 Docker 映像的資訊。

建立模型

下列範例示範如何使用 AWS SDK for Python (Boto3)建立模型。前幾行定義：

sagemaker_client：低階 SageMaker 用戶端物件，可讓您輕鬆傳送及接收要求至 AWS 服務。
sagemaker_role：具有 SageMaker IAM 角色的字串變數 Amazon 資源名稱 (ARN)。
aws_region：帶有您 AWS 區域名稱的字符串變量。


import boto3

# Specify your AWS Region
aws_region='<aws_region>'

# Create a low-level SageMaker service client.
sagemaker_client = boto3.client('sagemaker', region_name=aws_region)

# Role to give SageMaker permission to access AWS services.
sagemaker_role= "arn:aws:iam::<account>:role/*"

接下來，指定儲存在 Amazon S3 中預先訓練模型的位置。在此範例中，我們使用名為 demo-xgboost-model.tar.gz 的 XGBoost 預先訓練模型。完整的 Amazon S3 URI 儲存在一個字串變數 model_url：


#Create a variable w/ the model S3 URI
s3_bucket = '<your-bucket-name>' # Provide the name of your S3 bucket
bucket_prefix='saved_models'
model_s3_key = f"{bucket_prefix}/demo-xgboost-model.tar.gz"

#Specify S3 bucket w/ model
model_url = f"s3://{s3_bucket}/{model_s3_key}"

指定主要容器。針對主要容器，您可以指定包含推論程式碼的 Docker 映像、成品 (來自先前的訓練)，以及推論程式碼在您部署模型以進行預測時所使用的自訂環境映射。

在此範例中，我們指定 XGBoost 的內建演算法容器映像：


from sagemaker import image_uris

# Specify an AWS container image. 
container = image_uris.retrieve(region=aws_region, framework='xgboost', version='0.90-1')

在 Amazon SageMaker 中創建一個模型CreateModel。指定下列內容：

ModelName：模型的名稱 (在這個範例中，它的儲存名稱是名為 model_name 的字串變數)。
ExecutionRoleArn：Amazon SageMaker 可假設存取模型成品和 Docker 映像，以便在 ML 運算執行個體上部署或批次轉換任務時使用的 IAM 角色的 Amazon 資源名稱 (ARN)。
PrimaryContainer：主要 Docker 映像檔的位置，包含推論程式碼、關聯成品，以及推論程式碼在部署模型以進行預測時使用的自訂環境地圖。


model_name = '<The_name_of_the_model>'

#Create model
create_model_response = sagemaker_client.create_model(
    ModelName = model_name,
    ExecutionRoleArn = sagemaker_role,
    PrimaryContainer = {
        'Image': container,
        'ModelDataUrl': model_url,
    })

如需完整的 SageMaker API 參數清單，請參閱 API 參考指南中的CreateModel說明。

如果您使用 SageMaker 提供的容器，可以在此步驟中設定環境變數，將模型伺服器逾時和裝載大小從預設值增加到框架支援的最大值。如果您未明確設定這些變數，您可能無法利用非同步推論支援的逾時上限和承載大小。下列範例會示範如何根據來設定 PyTorch 推論容器的環境變數。 TorchServe


model_name = '<The_name_of_the_model>'

#Create model
create_model_response = sagemaker_client.create_model(
    ModelName = model_name,
    ExecutionRoleArn = sagemaker_role,
    PrimaryContainer = {
        'Image': container,
        'ModelDataUrl': model_url,
        'Environment': {
            'TS_MAX_REQUEST_SIZE': '100000000',
            'TS_MAX_RESPONSE_SIZE': '100000000',
            'TS_DEFAULT_RESPONSE_TIMEOUT': '1000'
        },
    })

完成建立端點之後，您應該透過從 inference.py 指令碼列印出環境變數，測試環境變數是否已正確設定。下表列出數個架構的環境變數，您可以用這些設定來變更預設值。

架構	環境變數
PyTorch 1.8 (基於 TorchServe)	'TS_MAX_REQUEST_SIZE': '100000000' 'TS_MAX_RESPONSE_SIZE': '100000000' 'TS_DEFAULT_RESPONSE_TIMEOUT': '1000'
PyTorch 1.4 (以多媒體訊息為基礎)	'MMS_MAX_REQUEST_SIZE': '1000000000' 'MMS_MAX_RESPONSE_SIZE': '1000000000' 'MMS_DEFAULT_RESPONSE_TIMEOUT': '900'
HuggingFace 推論容器 (以 MMS 為基礎)	'MMS_MAX_REQUEST_SIZE': '2000000000' 'MMS_MAX_RESPONSE_SIZE': '2000000000' 'MMS_DEFAULT_RESPONSE_TIMEOUT': '900'

架構

環境變數

PyTorch 1.8 (基於 TorchServe)

'TS_MAX_REQUEST_SIZE': '100000000'

'TS_MAX_RESPONSE_SIZE': '100000000'

'TS_DEFAULT_RESPONSE_TIMEOUT': '1000'

PyTorch 1.4 (以多媒體訊息為基礎)

'MMS_MAX_REQUEST_SIZE': '1000000000'

'MMS_MAX_RESPONSE_SIZE': '1000000000'

'MMS_DEFAULT_RESPONSE_TIMEOUT': '900'

HuggingFace 推論容器 (以 MMS 為基礎)

'MMS_MAX_REQUEST_SIZE': '2000000000'

'MMS_MAX_RESPONSE_SIZE': '2000000000'

'MMS_DEFAULT_RESPONSE_TIMEOUT': '900'

建立一個端點組態

建立模型後，使用 CreateEndpointConfig 建立一個端點組態。Amazon SageMaker 託管服務使用此組態來部署模型。在組態中，您可以識別使用與建立的一或多個模型 CreateModel，以部署您希望 Amazon 佈建 SageMaker 的資源。指定 AsyncInferenceConfig 物件並為 OutputConfig 提供一個輸出 Amazon S3 位置。您可以選用指定 Amazon SNS 主題，以傳送預測結果通知。如需 Amazon SNS 主題的更多相關資訊，請參閱設定 Amazon SNS。

下列範例說明如何使用 AWS SDK for Python (Boto3)建立端點組態：


import datetime
from time import gmtime, strftime

# Create an endpoint config name. Here we create one based on the date  
# so it we can search endpoints based on creation time.
endpoint_config_name = f"XGBoostEndpointConfig-{strftime('%Y-%m-%d-%H-%M-%S', gmtime())}"

# The name of the model that you want to host. This is the name that you specified when creating the model.
model_name='<The_name_of_your_model>'

create_endpoint_config_response = sagemaker_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name, # You will specify this name in a CreateEndpoint request.
    # List of ProductionVariant objects, one for each model that you want to host at this endpoint.
    ProductionVariants=[
        {
            "VariantName": "variant1", # The name of the production variant.
            "ModelName": model_name, 
            "InstanceType": "ml.m5.xlarge", # Specify the compute instance type.
            "InitialInstanceCount": 1 # Number of instances to launch initially.
        }
    ],
    AsyncInferenceConfig={
        "OutputConfig": {
            # Location to upload response outputs when no location is provided in the request.
            "S3OutputPath": f"s3://{s3_bucket}/{bucket_prefix}/output"
            # (Optional) specify Amazon SNS topics
            "NotificationConfig": {
                "SuccessTopic": "arn:aws:sns:aws-region:account-id:topic-name",
                "ErrorTopic": "arn:aws:sns:aws-region:account-id:topic-name",
            }
        },
        "ClientConfig": {
            # (Optional) Specify the max number of inflight invocations per instance
            # If no value is provided, Amazon SageMaker will choose an optimal value for you
            "MaxConcurrentInvocationsPerInstance": 4
        }
    }
)

print(f"Created EndpointConfig: {create_endpoint_config_response['EndpointConfigArn']}")

在上述範例中，您可以為 AsyncInferenceConfig 欄位的 OutputConfig 指定下列金鑰：

S3OutputPath：請求中未提供位置時，上傳回應輸出的位置。
NotificationConfig: (選用) 在推論請求成功 (SuccessTopic) 或失敗 (ErrorTopic) 時，會傳送通知給您的 SNS 主題。

您還可以為AsyncInferenceConfig 欄位中的 ClientConfig 指定下列選用引數：

MaxConcurrentInvocationsPerInstance: (選擇性) 用 SageMaker 戶端傳送至模型容器的並行要求數目上限。

建立端點

取得模型和端點組態後，請使用 CreateEndpoint API 建立端點。端點名稱在您 AWS 帳戶中的 AWS 區域中必須是唯一的。

下面會使用請求中指定的端點組態建立端點。Amazon SageMaker 使用端點佈建資源和部署模型。


# The name of the endpoint.The name must be unique within an AWS Region in your AWS account.
endpoint_name = '<endpoint-name>' 

# The name of the endpoint configuration associated with this endpoint.
endpoint_config_name='<endpoint-config-name>'

create_endpoint_response = sagemaker_client.create_endpoint(
                                            EndpointName=endpoint_name, 
                                            EndpointConfigName=endpoint_config_name)

當您呼叫 CreateEndpoint API 時，Amazon SageMaker 非同步推論會傳送測試通知，以檢查您是否已設定 Amazon SNS 主題。Amazon SageMaker 非同步推論也會在呼叫UpdateEndpoint和UpdateEndpointWeightsAndCapacities之後傳送測試通知。這樣可以 SageMaker 檢查您是否具有所需的權限。通知可以直接忽略。測試通知的格式如下：


{
    "eventVersion":"1.0",
    "eventSource":"aws:sagemaker",
    "eventName":"TestNotification"
}

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

必要條件

調用