了解模型评估作业的结果是如何保存在 Amazon S3 中的 - Amazon Bedrock

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

了解模型评估作业的结果是如何保存在 Amazon S3 中的

模型评估作业的输出将保存在创建模型评估作业时指定的 Amazon S3 存储桶中。模型评估作业的结果将保存为 JSON 行文件 (.jsonl)。

模型评估作业的结果将保存在您指定的 S3 存储桶中,如下所示。

  • 使用人工的模型评估作业:

    s3://user-specified-S3-output-path/job-name/job-uuid/datasets/dataset-name/file-uuid_output.jsonl

  • 自动模型评估作业:

    s3://user-specified-S3-output-path/job-name/job-uuid/models/model-id/taskTypes/task-type/datasets/dataset/file-uuid_output.jsonl

以下主题描述了如何将自动模型评估作业和人工模型评估作业的结果保存在 Amazon S3 中。

自动模型评估作业的输出数据

当作业状态变为已完成时,自动评估作业的结果将存储在 datasets 目录中。

对于您在创建模型评估作业时选择的每个指标和相应的提示数据集,都将在 datasets 目录中生成一个 JSON 行文件。文件使用以下命名约定 metric_input-dataset.jsonl

模型评估作业的每个结果都以 automatedEvaluationResult 键开头。第一个子键 scores 包含您在 Amazon Bedrock 控制台中选择的指标。此示例中仅选择了一个指标 Accuracy。它还包含所选指标的计算值 result。要详细了解计算的具体值,请参阅 自动模型评估作业报告卡(控制台)

第二个键 inputRecord 是您在输入提示数据集中提供的内容副本。

第三个键 modelResponses 包含一个 JSON 对象列表,其中包含您在创建模型评估作业时所选模型的 ARN。它还包含基于所提供提示的模型的完整响应。

以下是文本摘要任务类型的输出示例,其中仅选择了一个指标,即准确性。

{ "automatedEvaluationResult": { "scores": [{ "metricName": "Accuracy", "result": 0.31920555233955383 }] }, "inputRecord": { "prompt": "Human: two bodies have been recovered from a flooded coal mine in yiyang city , central china 's henan province , as of #:## thursday morning , rescuers said . Assistant:", "referenceResponse": "two bodies recovered from flooded coal mine in c. china" }, "modelResponses": [{ "modelIdentifier": "arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-v2", "response": " Here are the key points I gathered from your news summary:\n\n- Two bodies have been recovered from a flooded coal mine in Yiyang City, in China's Henan province.\n\n- This happened as of Thursday morning, according to rescuers.\n\n- The location is central China's Henan province. \n\n- Yiyang City is where the flooded coal mine is located.\n\n- The coal mine was flooded, leading to the two deaths and need for rescue/recovery efforts.\n\n- This news is developing as of Thursday morning, so the death toll and rescue details may still be unfolding.\n\nPlease let me know if I have accurately summarized the key information or if you would like me to expand on any part of it." }] }

使用人工的模型评估作业的输出数据

模型评估作业完成后,您会在人工审核任务返回的输出数据中看到以下参数。

参数 值类型 示例值 描述
flowDefinitionArn

String

arn:aws:sagemaker:us-west-2:111122223333:flow-definition/flow-definition-name

用于创建人工循环的人工审核工作流(流定义)的 Amazon 资源编号 (ARN)。

humanAnswers

JSON 对象列表

"answerContent": { "evaluationResults": { "thumbsUpDown": [{ "metricName": "Relevance", "modelResponseId": "0", "result": false }] } }

包含 answerContent 中工作人员回应的 JSON 对象的列表。

humanLoopName

String system-generated-hash 系统生成的 40 个字符的十六进制字符串。
inputRecord JSON 对象
"inputRecord": { "prompt": "What does vitamin C serum do for skin?", "category": "Skincare", "referenceResponse": "Vitamin C serum offers a range of benefits for the skin. Firstly, it acts.... }
一个 JSON 对象,包含来自输入数据集的输入提示。
modelResponses JSON 对象列表
"modelResponses": [{ "modelIdentifier": "arn:aws:bedrock:us-west-2::foundation-model/model-id", "response": "the-models-response-to-the-prompt" }]
来自模型的各个响应。
inputContent

对象

{ "additionalDataS3Uri":"s3://user-specified-S3-URI-path/datasets/dataset-name/records/record-number/human-loop-additional-data.json", "evaluationMetrics":[ { "description":"testing", "metricName":"IndividualLikertScale", "ratingMethod":"IndividualLikertScale" } ], "instructions":"example instructions" }

在 S3 存储桶中启动人类循环所需的人工循环输入内容。

modelResponseIdMap

对象

{ "0":"arn:aws:bedrock:us-west-2::foundation-model/model-id" }

humanAnswers.answerContent.evaluationResults包含 modelResponseId s。将modelResponseIdMapmodelResponseId连接到模型名称。

以下是模型评估作业的输出数据示例。

{ "humanEvaluationResult": [{ "flowDefinitionArn": "arn:aws:sagemaker:us-west-2:111122223333:flow-definition/flow-definition-name", "humanAnswers": [{ "acceptanceTime": "2023-11-09T19:17:43.107Z", "answerContent": { "evaluationResults": { "thumbsUpDown": [{ "metricName": "Coherence", "modelResponseId": "0", "result": false }, { "metricName": "Accuracy", "modelResponseId": "0", "result": true }], "individualLikertScale": [{ "metricName": "Toxicity", "modelResponseId": "0", "result": 1 }] } }, "submissionTime": "2023-11-09T19:17:52.101Z", "timeSpentInSeconds": 8.994, "workerId": "444455556666", "workerMetadata": { "identityData": { "identityProviderType": "Cognito", "issuer": "https://cognito-idp.AWS 区域.amazonaws.com/AWS 区域_111222", "sub": "c6aa8eb7-9944-42e9-a6b9-" } } }], ...Additional response have been truncated for clarity... }], "humanLoopName": "b3b1c64a2166e001e094123456789012", "inputContent":{ "additionalDataS3Uri":"s3://user-specified-S3-output-path/datasets/dataset-name/records/record-number/human-loop-additional-data.json", "evaluationMetrics":[ { "description":"testing", "metricName":"IndividualLikertScale", "ratingMethod":"IndividualLikertScale" } ], "instructions":"some dummy instructions" }, "modelResponseIdMap":{ "0":"arn:aws:bedrock:AWS 区域::foundation-model/model-id" } }, "inputRecord": { "category": "Fitness", "prompt": "Human: What is High-Intensity Interval Training ? \\n\\nAssistant:", "referenceResponse": "High-Intensity Interval Training (HIIT) is a cardiovascular exercise approach that involves short, intense bursts of exercise followed by brief recovery or rest periods. In HIIT, you perform exercises at near-maximum effort during the high-intensity intervals, which can include activities like sprinting, cycling, jumping, or body weight exercises such as burpees or push-ups. These intense intervals are interspersed with short recovery periods to lower your heart rate before the next high-intensity bout. HIIT workouts usually consist of multiple cycles of these intervals, and the number of cycles and the length of intervals can vary. HIIT is known for its time efficiency, making it an excellent choice for those with busy schedules. It offers both aerobic and anaerobic benefits, improving cardiovascular fitness, burning calories, and promoting fat loss, all while boosting metabolism for continued calorie burning post-workout." }, "modelResponses": [{ "modelIdentifier": "arn:aws:bedrock:AWS 区域::foundation-model/model-id", "response": " High-intensity interval training (HIIT) is a form of exercise that involves short bursts of intense activity alternated with recovery periods. Here are some key features of HIIT:\n\n- Alternates short periods of intense anaerobic exercise with less intense recovery periods. The high-intensity periods are typically done at 80-95% of maximum heart rate.\n\n- The duration of the intense periods can range from 5 seconds to 8 minutes, but typically are in the range of 30 seconds to 4 minutes.\n\n- The total HIIT session typically lasts between 10-30 minutes, including warm-up, recovery periods, and cool down. \n\n- Uses exercises like sprints, cycling, rowing, jumping rope, stair climbing, swimming, etc. Movements that get your heart rate up quickly.\n\n- The short, intense intervals help improve cardiovascular fitness by increasing heart rate and oxygen consumption.\n\n- HIIT burns more calories and fat than steady-state cardio in a shorter time period. It also helps build muscle strength and endurance.\n\n- Considered an efficient and effective form of exercise for fat loss and improving aerobic power. But it requires motivation to push yourself during the intense intervals.\n\n- Not suitable for beginners due to the high-intensity. Start with steady-state cardio and build up endurance before trying HIIT.\n\nIn summary, HIIT intersperses intense bursts of" }] } }

下表说明了您在 Amazon Bedrock 控制台中为每个指标选择的评级方法是如何返回到您的 Amazon S3 存储桶中的。evaluationResults 下的第一个子键是返回评级方法的方式。

在 Amazon Bedrock 控制台中选择的评级方法如何保存在 Amazon S3 中
所选评级方法 已保存在 Amazon S3 中
李克特量表 - 单个 IndividualLikertScale
李克特量表 - 比较 ComparisonLikertScale
选择按钮 ComparisonChoice
序数排名 ComparisonRank
大拇指向上/向下 ThumbsUpDown