Overview How the optimization loop works What you receive Cost Expected duration

How Advanced Prompt Optimization works

Overview

Advanced Prompt Optimization takes your prompt templates, evaluation samples, and an evaluation method, then runs iterative inference, evaluate, and rewrite loops. It outputs optimized prompts with evaluation metrics for each target model. It supports multimodal inputs including png, jpg, and PDF files.

How the optimization loop works

Your evaluation samples are injected into the placeholder variables in your prompt template, then sent for inference with your target model(s). Multimodal inputs (images and PDFs) are sent in the payload to the model along with the prompt but should not be referenced in a double curly bracket {{placeholder}} variable. The responses are graded according to your evaluation method. The service analyzes the evaluation results and automatically rewrites your prompts, then sends them back to the models. This feedback loop repeats and completes according to proprietary internal optimization parameters.

It is important that you define your evaluation method and criteria as precisely as possible, because the evaluation steers the prompt optimization.

What you receive

At the end of the optimization job, you receive:

Your prompt templates before and after optimization
Evaluation scores for each evaluation sample
Latency (time to first token, or TTFT) for each model
Cost estimates for each model

Cost

All inference and Lambda function invocations run in your AWS account. Lambda operations are charged at Lambda's public pricing. Inference pricing (including LLM-as-a-judge evaluations) is charged according to Bedrock's public pricing for on-demand inference. There is no separate Advanced Prompt Optimization service charge beyond inference costs. The current default LLM-as-a-judge model is Anthropic Claude Sonnet 4.6, unless you select a different one for your custom LLMJ prompt.

See the Bedrock public pricing page under Prompt Optimization, then Advanced Prompt Optimization for a calculation method to estimate the cost of running an optimization.

Expected duration

For a single prompt with only a few evaluation samples, the job could run for 15 to 20 minutes. For many prompts, each with a large number of evaluation samples, the job could run for over an hour, potentially for multiple hours. This is because each prompt template goes through multiple rounds of inference, evaluation, and rewriting loops based on every evaluation sample record you provide.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Optimize and migrate prompts

Prerequisites and permissions