Memahami bagaimana hasil pekerjaan evaluasi model Anda yang disimpan di Amazon S3 - Amazon Bedrock

Terjemahan disediakan oleh mesin penerjemah. Jika konten terjemahan yang diberikan bertentangan dengan versi bahasa Inggris aslinya, utamakan versi bahasa Inggris.

Memahami bagaimana hasil pekerjaan evaluasi model Anda yang disimpan di Amazon S3

Output dari pekerjaan evaluasi model disimpan di bucket Amazon S3 yang Anda tentukan saat membuat pekerjaan evaluasi model. Hasil pekerjaan evaluasi model disimpan sebagai file baris JSON (.jsonl).

Hasil dari pekerjaan evaluasi model disimpan di bucket S3 yang Anda tentukan sebagai berikut.

  • Untuk pekerjaan evaluasi model yang menggunakan pekerja manusia:

    s3://user-specified-S3-output-path/job-name/job-uuid/datasets/dataset-name/file-uuid_output.jsonl

  • Untuk pekerjaan evaluasi model otomatis:

    s3://user-specified-S3-output-path/job-name/job-uuid/models/model-id/taskTypes/task-type/datasets/dataset/file-uuid_output.jsonl

Topik berikut menjelaskan bagaimana hasil dari pekerjaan evaluasi model berbasis pekerja otomatis dan manusia disimpan di Amazon S3.

Output data dari pekerjaan evaluasi model otomatis

Hasil pekerjaan evaluasi otomatis disimpan di datasets direktori saat status pekerjaan berubah menjadi Selesai.

Untuk setiap metrik dan kumpulan data prompt terkait yang Anda pilih saat pekerjaan evaluasi model dibuat, file baris JSON dibuat di datasets direktori. File menggunakan konvensi penamaan berikutmetric_input-dataset.jsonl.

Setiap hasil dari pekerjaan evaluasi model dimulai dengan automatedEvaluationResult kuncinya. Kunci anak pertama scores berisi metrik yang Anda pilih di konsol Amazon Bedrock. Dalam contoh ini, hanya satu metrik yang Accuracy dipilih. Ini juga berisi nilai result yang dihitung untuk metrik yang dipilih. Untuk mempelajari lebih lanjut tentang nilai spesifik apa yang dihitung, lihatKartu laporan pekerjaan evaluasi model otomatis (konsol).

Kunci kedua, inputRecord adalah salinan dari apa yang Anda berikan dalam dataset prompt input Anda.

Kunci ketiga, modelResponses berisi daftar objek JSON yang berisi ARN model yang dipilih saat Anda membuat pekerjaan evaluasi model. Ini juga berisi respons penuh dari model berdasarkan prompt yang diberikan.

Berikut ini adalah contoh output untuk jenis tugas ringkasan teks dengan hanya satu metrik, akurasi, dipilih.

{ "automatedEvaluationResult": { "scores": [{ "metricName": "Accuracy", "result": 0.31920555233955383 }] }, "inputRecord": { "prompt": "Human: two bodies have been recovered from a flooded coal mine in yiyang city , central china 's henan province , as of #:## thursday morning , rescuers said . Assistant:", "referenceResponse": "two bodies recovered from flooded coal mine in c. china" }, "modelResponses": [{ "modelIdentifier": "arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-v2", "response": " Here are the key points I gathered from your news summary:\n\n- Two bodies have been recovered from a flooded coal mine in Yiyang City, in China's Henan province.\n\n- This happened as of Thursday morning, according to rescuers.\n\n- The location is central China's Henan province. \n\n- Yiyang City is where the flooded coal mine is located.\n\n- The coal mine was flooded, leading to the two deaths and need for rescue/recovery efforts.\n\n- This news is developing as of Thursday morning, so the death toll and rescue details may still be unfolding.\n\nPlease let me know if I have accurately summarized the key information or if you would like me to expand on any part of it." }] }

Output data dari pekerjaan evaluasi model yang menggunakan pekerja manusia.

Ketika pekerjaan evaluasi model telah selesai, Anda melihat parameter berikut dalam data keluaran yang dikembalikan dari tugas tinjauan manusia.

Parameter Jenis Nilai Nilai contoh Deskripsi
flowDefinitionArn

String

arn:aws:sagemaker:us-west-2:111122223333:flow-definition/flow-definition-name

Amazon Resource Number (ARN) dari alur kerja tinjauan manusia (definisi alur) yang digunakan untuk membuat loop manusia.

humanAnswers

Daftar objek JSON

"answerContent": { "evaluationResults": { "thumbsUpDown": [{ "metricName": "Relevance", "modelResponseId": "0", "result": false }] } }

Daftar objek JSON yang berisi respons pekerja dianswerContent.

humanLoopName

String system-generated-hash Sebuah sistem menghasilkan string hex 40 karakter.
inputRecord Objek JSON
"inputRecord": { "prompt": "What does vitamin C serum do for skin?", "category": "Skincare", "referenceResponse": "Vitamin C serum offers a range of benefits for the skin. Firstly, it acts.... }
Sebuah objek JSON yang berisi prompt entri dari dataset input.
modelResponses Daftar objek JSON
"modelResponses": [{ "modelIdentifier": "arn:aws:bedrock:us-west-2::foundation-model/model-id", "response": "the-models-response-to-the-prompt" }]
Tanggapan individu dari model.
inputContent

Objek

{ "additionalDataS3Uri":"s3://user-specified-S3-URI-path/datasets/dataset-name/records/record-number/human-loop-additional-data.json", "evaluationMetrics":[ { "description":"testing", "metricName":"IndividualLikertScale", "ratingMethod":"IndividualLikertScale" } ], "instructions":"example instructions" }

Konten input loop manusia diperlukan untuk memulai loop manusia di bucket S3 Anda.

modelResponseIdMap

Objek

{ "0":"arn:aws:bedrock:us-west-2::foundation-model/model-id" }

humanAnswers.answerContent.evaluationResultsberisi modelResponseId s. modelResponseIdMapMenghubungkan modelResponseId ke nama model.

Berikut ini adalah contoh data keluaran dari pekerjaan evaluasi model.

{ "humanEvaluationResult": [{ "flowDefinitionArn": "arn:aws:sagemaker:us-west-2:111122223333:flow-definition/flow-definition-name", "humanAnswers": [{ "acceptanceTime": "2023-11-09T19:17:43.107Z", "answerContent": { "evaluationResults": { "thumbsUpDown": [{ "metricName": "Coherence", "modelResponseId": "0", "result": false }, { "metricName": "Accuracy", "modelResponseId": "0", "result": true }], "individualLikertScale": [{ "metricName": "Toxicity", "modelResponseId": "0", "result": 1 }] } }, "submissionTime": "2023-11-09T19:17:52.101Z", "timeSpentInSeconds": 8.994, "workerId": "444455556666", "workerMetadata": { "identityData": { "identityProviderType": "Cognito", "issuer": "https://cognito-idp.Wilayah AWS.amazonaws.com/Wilayah AWS_111222", "sub": "c6aa8eb7-9944-42e9-a6b9-" } } }], ...Additional response have been truncated for clarity... }], "humanLoopName": "b3b1c64a2166e001e094123456789012", "inputContent":{ "additionalDataS3Uri":"s3://user-specified-S3-output-path/datasets/dataset-name/records/record-number/human-loop-additional-data.json", "evaluationMetrics":[ { "description":"testing", "metricName":"IndividualLikertScale", "ratingMethod":"IndividualLikertScale" } ], "instructions":"some dummy instructions" }, "modelResponseIdMap":{ "0":"arn:aws:bedrock:Wilayah AWS::foundation-model/model-id" } }, "inputRecord": { "category": "Fitness", "prompt": "Human: What is High-Intensity Interval Training ? \\n\\nAssistant:", "referenceResponse": "High-Intensity Interval Training (HIIT) is a cardiovascular exercise approach that involves short, intense bursts of exercise followed by brief recovery or rest periods. In HIIT, you perform exercises at near-maximum effort during the high-intensity intervals, which can include activities like sprinting, cycling, jumping, or body weight exercises such as burpees or push-ups. These intense intervals are interspersed with short recovery periods to lower your heart rate before the next high-intensity bout. HIIT workouts usually consist of multiple cycles of these intervals, and the number of cycles and the length of intervals can vary. HIIT is known for its time efficiency, making it an excellent choice for those with busy schedules. It offers both aerobic and anaerobic benefits, improving cardiovascular fitness, burning calories, and promoting fat loss, all while boosting metabolism for continued calorie burning post-workout." }, "modelResponses": [{ "modelIdentifier": "arn:aws:bedrock:Wilayah AWS::foundation-model/model-id", "response": " High-intensity interval training (HIIT) is a form of exercise that involves short bursts of intense activity alternated with recovery periods. Here are some key features of HIIT:\n\n- Alternates short periods of intense anaerobic exercise with less intense recovery periods. The high-intensity periods are typically done at 80-95% of maximum heart rate.\n\n- The duration of the intense periods can range from 5 seconds to 8 minutes, but typically are in the range of 30 seconds to 4 minutes.\n\n- The total HIIT session typically lasts between 10-30 minutes, including warm-up, recovery periods, and cool down. \n\n- Uses exercises like sprints, cycling, rowing, jumping rope, stair climbing, swimming, etc. Movements that get your heart rate up quickly.\n\n- The short, intense intervals help improve cardiovascular fitness by increasing heart rate and oxygen consumption.\n\n- HIIT burns more calories and fat than steady-state cardio in a shorter time period. It also helps build muscle strength and endurance.\n\n- Considered an efficient and effective form of exercise for fat loss and improving aerobic power. But it requires motivation to push yourself during the intense intervals.\n\n- Not suitable for beginners due to the high-intensity. Start with steady-state cardio and build up endurance before trying HIIT.\n\nIn summary, HIIT intersperses intense bursts of" }] } }

Tabel berikut menjelaskan bagaimana metode Penilaian yang Anda pilih untuk setiap metrik di konsol Amazon Bedrock ditampilkan di bucket Amazon S3 Anda. Child-key pertama di bawah evaluationResults ini adalah bagaimana metode Rating dikembalikan.

Bagaimana metode penilaian yang dipilih di konsol Amazon Bedrock disimpan di Amazon S3
Metode penilaian dipilih Disimpan di Amazon S3
Skala Likert - Individu IndividualLikertScale
Skala Likert - Perbandingan ComparisonLikertScale
Tombol pilihan ComparisonChoice
Peringkat ordinal ComparisonRank
Jempol ke atas/bawah ThumbsUpDown