Output location Console results page Output format Reading results programmatically Interpreting scores

View and interpret results

Output location

After a job completes, results are written to your output S3 path:

output-s3-uri/job-id/advanced_prompt_optimization_results.jsonl

The job ID is the last segment of the job ARN.

Console results page

Warning

If you move the results files away from the output S3 location after the job is complete, the console results page will not render.

Output format

The output file is JSONL with per-template results. Each line contains:

promptTemplateId: correlates to your input templateId
promptOptimizationResults: array with one entry per target model, each containing:
- modelId: the target model
- status: optimization status for this model
- optimizedPromptTemplate: the rewritten prompt template
- Evaluation scores per sample
- Latency (time to first token, or TTFT)
- Cost estimates

Reading results programmatically


import boto3
import json

s3 = boto3.client('s3', region_name='us-west-2')

# Job ID is the last segment of the job ARN
job_id = job_arn.split('/')[-1]
output_key = f'output/{job_id}/advanced_prompt_optimization_results.jsonl'

response = s3.get_object(Bucket='my-bucket', Key=output_key)
content = response['Body'].read().decode('utf-8')

for line in content.strip().split('\n'):
    result = json.loads(line)
    print(f"Template: {result['promptTemplateId']}")
    for opt in result.get('promptOptimizationResults', []):
        print(f"  Model: {opt['modelId']}")
        print(f"  Status: {opt['status']}")
        print(f"  Optimized: {opt['optimizedPromptTemplate'][:100]}...")

Interpreting scores

Scores are normalized; higher is better. The service normalizes all evaluation scores regardless of the original grading scale you defined in your evaluation method.

You can always see the raw LLM-as-a-judge output in your S3 bucket and on the console results page in the detailed results tab. Due to the nature of merging custom LLMJ prompts with the service-provided default prompt, the numerical score may not exactly match the discrete steps in your custom LLMJ rubric. If you want binary evaluation results or exact matching, it may be better to use a Lambda evaluator.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Define evaluation methods

Regions, models, and quotas