管理模型

Edge Manager 代理程式可以一次載入多個模型，並使用 Edge 裝置上載入的模型進行推論。代理程式可載入的型號數取決於裝置上的可用記憶體。代理程式會驗證模型簽章，並將 Edge 封裝任務產生的所有成品載入記憶體中。此步驟要求上一步中描述的所有必要憑證與二進位安裝的其餘部分一起安裝。如果無法驗證模型的簽章，則載入模型會失敗，並顯示適當的傳回碼和原因。

SageMaker Edge Manager 代理程式提供模型管理 API 清單，這些 API 可在 Edge 裝置上採用控制平面和資料平面 API。除了本文件之外，我們建議您進行客戶端範例實作，該實作顯示了以下所述 API 的規範用法。

proto 檔案可作為發行成品的一部分使用 (在發行壓縮包中)。在本文件中，我們列出並說明此 proto 檔案中所列 API 的用法。

注意

Windows 版本上有這些 API 的一對一對應，而 C# 中應用程式實作的範例程式碼會與 Windows 的發行成品共用。以下指示適用於以獨立程序的方式執行代理程式，適用於 Linux 的發行成品。

根據您的操作系統提取存檔。其中 VERSION 被分成三個組成部分：<MAJOR_VERSION>.<YYYY-MM-DD>-<SHA-7>。如需如何取得發行版本 (<MAJOR_VERSION>)、發行成品的時間戳記 (<YYYY-MM-DD>) 以及儲存庫遞交 ID (SHA-7) 的相關資訊，請參閱安裝 Edge Manager 代理程式

發行成品階層 (擷取 tar/zip 存檔後) 如下所示。代理程式 proto 檔案位於 api/ 下。


0.20201205.7ee4b0b
├── bin
│         ├── sagemaker_edge_agent_binary
│         └── sagemaker_edge_agent_client_example
└── docs
├── api
│         └── agent.proto
├── attributions
│         ├── agent.txt
│         └── core.txt
└── examples
└── ipc_example
├── CMakeLists.txt
├── sagemaker_edge_client.cc
├── sagemaker_edge_client_example.cc
├── sagemaker_edge_client.hh
├── sagemaker_edge.proto
├── README.md
├── shm.cc
├── shm.hh
└── street_small.bmp

載入模型

Edge Manager 代理程式支援載入多個模型。此 API 驗證模型簽章，並將 EdgePackagingJob 作業產生的所有成品載入到記憶體中。此步驟要求所有必要憑證與代理程式二進位安裝的其餘部分一起安裝。如果無法驗證模型的簽章，則此步驟會失敗，並在記錄檔中顯示適當的傳回碼和錯誤訊息。


// perform load for a model
// Note:
// 1. currently only local filesystem paths are supported for loading models.
// 2. multiple models can be loaded at the same time, as limited by available device memory
// 3. users are required to unload any loaded model to load another model.
// Status Codes:
// 1. OK - load is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist at the url
// 5. ALREADY_EXISTS - model with the same name is already loaded
// 6. RESOURCE_EXHAUSTED - memory is not available to load the model
// 7. FAILED_PRECONDITION - model is not compiled for the machine.
//
rpc LoadModel(LoadModelRequest) returns (LoadModelResponse);

卸載模型

卸載先前載入的模型。它是通過其期間提供的模型別名標識 loadModel。如果沒有找到別名或模型未加載則返回錯誤。


//
// perform unload for a model
// Status Codes:
// 1. OK - unload is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist
//
rpc UnLoadModel(UnLoadModelRequest) returns (UnLoadModelResponse);

清單模型

列出所有載入的模型及其別名。


//
// lists the loaded models
// Status Codes:
// 1. OK - unload is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
//
rpc ListModels(ListModelsRequest) returns (ListModelsResponse);

描述型號

描述載入代理程式上的模型。


//
// Status Codes:
// 1. OK - load is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist at the url
//
rpc DescribeModel(DescribeModelRequest) returns (DescribeModelResponse);

擷取資料

允許用戶端應用程式擷取 Amazon S3 儲存貯體中的輸入和輸出張量，以及選擇性地擷取輔助功能。客戶端應用程式預計將在每次呼叫此 API 時傳遞一個唯一的擷取 ID。這可以稍後用於查詢擷取的狀態。


//
// allows users to capture input and output tensors along with auxiliary data.
// Status Codes:
// 1. OK - data capture successfully initiated
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 5. ALREADY_EXISTS - capture initiated for the given capture_id
// 6. RESOURCE_EXHAUSTED - buffer is full cannot accept any more requests.
// 7. OUT_OF_RANGE - timestamp is in the future.
// 8. INVALID_ARGUMENT - capture_id is not of expected format.
//
rpc CaptureData(CaptureDataRequest) returns (CaptureDataResponse);

獲取擷取狀態

根據載入的型號，輸入和輸出張量可能很大 (對於許多 Edge 裝置來說)。擷取到雲端可能非常耗時。因此實作 CaptureData() 做為非同步作業。擷取 ID 是用戶端在擷取資料呼叫期間提供的唯一識別碼，此 ID 可用於查詢異步呼叫的狀態。


//
// allows users to query status of capture data operation
// Status Codes:
// 1. OK - data capture successfully initiated
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - given capture id doesn't exist.
//
rpc GetCaptureDataStatus(GetCaptureDataStatusRequest) returns (GetCaptureDataStatusResponse);

预测

predictAPI 會對先前載入的模型執行推論。它接受直接饋送到神經網路的張量形式的請求。輸出是來自模型的輸出張量 (或純量)。這是一個封鎖調用。


//
// perform inference on a model.
//
// Note:
// 1. users can chose to send the tensor data in the protobuf message or
// through a shared memory segment on a per tensor basis, the Predict
// method with handle the decode transparently.
// 2. serializing large tensors into the protobuf message can be quite expensive,
// based on our measurements it is recommended to use shared memory of
// tenors larger than 256KB.
// 3. SMEdge IPC server will not use shared memory for returning output tensors,
// i.e., the output tensor data will always send in byte form encoded
// in the tensors of PredictResponse.
// 4. currently SMEdge IPC server cannot handle concurrent predict calls, all
// these call will be serialized under the hood. this shall be addressed
// in a later release.
// Status Codes:
// 1. OK - prediction is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - when model not found
// 5. INVALID_ARGUMENT - when tenors types mismatch
//
rpc Predict(PredictRequest) returns (PredictResponse);

Input


// request for Predict rpc call
//
message PredictRequest {
string name = 1;
repeated Tensor tensors = 2;
}

//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
//    tensor_metadata - represents metadata of the shared memory segment
//    data_or_handle - represents the data of shared memory, this could be passed in two ways:
//                        a. send across the raw bytes of the multi-dimensional tensor array
//                        b. send a SharedMemoryHandle which contains the posix shared memory segment
//                            id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
  TensorMetadata tensor_metadata = 1; //optional in the predict request
  oneof data {
    bytes byte_data = 4;
    // will only be used for input tensors
    SharedMemoryHandle shared_memory_handle = 5;
  }
}

//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
//    tensor_metadata - represents metadata of the shared memory segment
//    data_or_handle - represents the data of shared memory, this could be passed in two ways:
//                        a. send across the raw bytes of the multi-dimensional tensor array
//                        b. send a SharedMemoryHandle which contains the posix shared memory segment
//                            id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
  TensorMetadata tensor_metadata = 1; //optional in the predict request
  oneof data {
    bytes byte_data = 4;
    // will only be used for input tensors
    SharedMemoryHandle shared_memory_handle = 5;
  }
}

//
// TensorMetadata represents the metadata for a tensor
//    name - name of the tensor
//    data_type  - data type of the tensor
//    shape - array of dimensions of the tensor
//
message TensorMetadata {
  string name = 1;
  DataType data_type = 2;
  repeated int32 shape = 3;
}

//
// SharedMemoryHandle represents a posix shared memory segment
//    offset - offset in bytes from the start of the shared memory segment.
//    segment_id - shared memory segment id corresponding to the posix shared memory segment.
//    size - size in bytes of shared memory segment to use from the offset position.
//
message SharedMemoryHandle {
  uint64 size = 1;
  uint64 offset = 2;
  uint64 segment_id = 3;
}

Output

注意

PredictResponse 僅傳回 Tensors 而不是 SharedMemoryHandle。


// response for Predict rpc call
//
message PredictResponse {
   repeated Tensor tensors = 1;
}

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

使用 SageMaker Edge Manager 部署 API 直接部署模型套件

SageMaker Edge Manager 生命週期結束