了解模型評估任務的結果如何在 Amazon S3 中儲存 - Amazon Bedrock

本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。

了解模型評估任務的結果如何在 Amazon S3 中儲存

模型評估任務的輸出會儲存在您建立模型評估任務時指定的 Amazon S3 儲存貯體中。模型評估工作的結果會儲存為JSON行檔案 (.jsonl)。

模型評估任務的結果會儲存在您指定的 S3 儲存貯體中,如下所示。

  • 對於使用人力的模型評估任務:

    s3://user-specified-S3-output-path/job-name/job-uuid/datasets/dataset-name/file-uuid_output.jsonl

  • 對於自動模型評估任務:

    s3://user-specified-S3-output-path/job-name/job-uuid/models/model-id/taskTypes/task-type/datasets/dataset/file-uuid_output.jsonl

下列主題說明如何將自動化和人力型模型評估任務的結果儲存在 Amazon S3 中。

從自動化模型評估任務輸出資料

當任務狀態變更為已完成時,自動化評估任務的結果會儲存在 datasets 目錄中。

對於建立模型評估工作時選取的每個量度和對應的提示資料集,會在datasets目錄中產生一個JSON行檔案。該檔案會使用以下命名慣例 metric_input-dataset.jsonl

模型評估任務的每個結果都以 automatedEvaluationResult 鍵起始。第一個子鍵 scores 包含您在 Amazon Bedrock 主控台選取的指標。在此範例中,僅選取單一指標 Accuracy。另外也包含 result,此為所選指標的計算值。若要進一步了解計算哪些特定值,請參閱 在 Amazon 基岩 (主控台) 中檢閱自動化模型評估任務的指標

第二個鍵 inputRecord 是您在輸入提示資料集中提供的內容複本。

第三個關鍵字,modelResponses包含JSON物件清單,其中包含建立模型評估工作時所選取ARN的模型。另外還包含根據提供的提示的模型的完整回應。

以下是文字摘要任務類型的範例輸出,只選取一個指標 (準確度)。

{ "automatedEvaluationResult": { "scores": [{ "metricName": "Accuracy", "result": 0.31920555233955383 }] }, "inputRecord": { "prompt": "Human: two bodies have been recovered from a flooded coal mine in yiyang city , central china 's henan province , as of #:## thursday morning , rescuers said . Assistant:", "referenceResponse": "two bodies recovered from flooded coal mine in c. china" }, "modelResponses": [{ "modelIdentifier": "arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-v2", "response": " Here are the key points I gathered from your news summary:\n\n- Two bodies have been recovered from a flooded coal mine in Yiyang City, in China's Henan province.\n\n- This happened as of Thursday morning, according to rescuers.\n\n- The location is central China's Henan province. \n\n- Yiyang City is where the flooded coal mine is located.\n\n- The coal mine was flooded, leading to the two deaths and need for rescue/recovery efforts.\n\n- This news is developing as of Thursday morning, so the death toll and rescue details may still be unfolding.\n\nPlease let me know if I have accurately summarized the key information or if you would like me to expand on any part of it." }] }

從使用人力工作者的模型評估任務輸出資料。

模型評估任務完成後,您會在從人力檢閱作業傳回的輸出資料中看到下列參數。

參數 值類型 範例數值 描述
flowDefinitionArn

字串

arn:aws:sagemaker:us-west-2:111122223333:flow-definition/flow-definition-name

用於建立人工迴圈的人工檢閱工作流程 (流程定義) 的 Amazon 資源編號 ()。ARN

humanAnswers

JSON物件清單

"answerContent": { "evaluationResults": { "thumbsUpDown": [{ "metricName": "Relevance", "modelResponseId": "0", "result": false }] } }

中包含 Worker 回應的JSON物件清單answerContent

humanLoopName

字串 system-generated-hash 系統產生了 40 個字元的十六進位字串。
inputRecord JSON物件
"inputRecord": { "prompt": "What does vitamin C serum do for skin?", "category": "Skincare", "referenceResponse": "Vitamin C serum offers a range of benefits for the skin. Firstly, it acts.... }
包JSON含來自輸入資料集之輸入提示的物件。
modelResponses JSON物件清單
"modelResponses": [{ "modelIdentifier": "arn:aws:bedrock:us-west-2::foundation-model/model-id", "response": "the-models-response-to-the-prompt" }]
來自模型的個別回應。
inputContent

物件

{ "additionalDataS3Uri":"s3://user-specified-S3-URI-path/datasets/dataset-name/records/record-number/human-loop-additional-data.json", "evaluationMetrics":[ { "description":"testing", "metricName":"IndividualLikertScale", "ratingMethod":"IndividualLikertScale" } ], "instructions":"example instructions" }

在 S3 儲存貯體中啟動人工迴圈所需的人工迴圈輸入內容。

modelResponseIdMap

物件

{ "0":"arn:aws:bedrock:us-west-2::foundation-model/model-id" }

humanAnswers.answerContent.evaluationResults包含 modelResponseId s. 會將modelResponseIdMap連接modelResponseId至模型名稱。

以下為模型評估任務輸出資料的範例。

{ "humanEvaluationResult": [{ "flowDefinitionArn": "arn:aws:sagemaker:us-west-2:111122223333:flow-definition/flow-definition-name", "humanAnswers": [{ "acceptanceTime": "2023-11-09T19:17:43.107Z", "answerContent": { "evaluationResults": { "thumbsUpDown": [{ "metricName": "Coherence", "modelResponseId": "0", "result": false }, { "metricName": "Accuracy", "modelResponseId": "0", "result": true }], "individualLikertScale": [{ "metricName": "Toxicity", "modelResponseId": "0", "result": 1 }] } }, "submissionTime": "2023-11-09T19:17:52.101Z", "timeSpentInSeconds": 8.994, "workerId": "444455556666", "workerMetadata": { "identityData": { "identityProviderType": "Cognito", "issuer": "https://cognito-idp.AWS 區域.amazonaws.com/AWS 區域_111222", "sub": "c6aa8eb7-9944-42e9-a6b9-" } } }], ...Additional response have been truncated for clarity... }], "humanLoopName": "b3b1c64a2166e001e094123456789012", "inputContent":{ "additionalDataS3Uri":"s3://user-specified-S3-output-path/datasets/dataset-name/records/record-number/human-loop-additional-data.json", "evaluationMetrics":[ { "description":"testing", "metricName":"IndividualLikertScale", "ratingMethod":"IndividualLikertScale" } ], "instructions":"some dummy instructions" }, "modelResponseIdMap":{ "0":"arn:aws:bedrock:AWS 區域::foundation-model/model-id" } }, "inputRecord": { "category": "Fitness", "prompt": "Human: What is High-Intensity Interval Training ? \\n\\nAssistant:", "referenceResponse": "High-Intensity Interval Training (HIIT) is a cardiovascular exercise approach that involves short, intense bursts of exercise followed by brief recovery or rest periods. In HIIT, you perform exercises at near-maximum effort during the high-intensity intervals, which can include activities like sprinting, cycling, jumping, or body weight exercises such as burpees or push-ups. These intense intervals are interspersed with short recovery periods to lower your heart rate before the next high-intensity bout. HIIT workouts usually consist of multiple cycles of these intervals, and the number of cycles and the length of intervals can vary. HIIT is known for its time efficiency, making it an excellent choice for those with busy schedules. It offers both aerobic and anaerobic benefits, improving cardiovascular fitness, burning calories, and promoting fat loss, all while boosting metabolism for continued calorie burning post-workout." }, "modelResponses": [{ "modelIdentifier": "arn:aws:bedrock:AWS 區域::foundation-model/model-id", "response": " High-intensity interval training (HIIT) is a form of exercise that involves short bursts of intense activity alternated with recovery periods. Here are some key features of HIIT:\n\n- Alternates short periods of intense anaerobic exercise with less intense recovery periods. The high-intensity periods are typically done at 80-95% of maximum heart rate.\n\n- The duration of the intense periods can range from 5 seconds to 8 minutes, but typically are in the range of 30 seconds to 4 minutes.\n\n- The total HIIT session typically lasts between 10-30 minutes, including warm-up, recovery periods, and cool down. \n\n- Uses exercises like sprints, cycling, rowing, jumping rope, stair climbing, swimming, etc. Movements that get your heart rate up quickly.\n\n- The short, intense intervals help improve cardiovascular fitness by increasing heart rate and oxygen consumption.\n\n- HIIT burns more calories and fat than steady-state cardio in a shorter time period. It also helps build muscle strength and endurance.\n\n- Considered an efficient and effective form of exercise for fat loss and improving aerobic power. But it requires motivation to push yourself during the intense intervals.\n\n- Not suitable for beginners due to the high-intensity. Start with steady-state cardio and build up endurance before trying HIIT.\n\nIn summary, HIIT intersperses intense bursts of" }] } }

下表說明如何在 Amazon S3 儲存貯體中傳回您為 Amazon Bedrock 主控台每個指標選取的評分方法evaluationResults 下方的第一個子鍵是傳回評分方法的方式。

在 Amazon Bedrock 主控台選取的評分方法如何在 Amazon S3 中儲存
選取的評分方法 儲存在 Amazon S3 中
李克特量表 - 個人 IndividualLikertScale
李克特量表 - 比較 ComparisonLikertScale
選項按鈕 ComparisonChoice
順序排名 ComparisonRank
拇指向上/向下 ThumbsUpDown