# How Advanced Prompt Optimization works ## Overview Advanced Prompt Optimization takes your prompt templates, evaluation samples, and an evaluation method, then runs iterative inference, evaluate, and rewrite loops. It outputs optimized prompts with evaluation metrics for each target model. It supports multimodal inputs including png, jpg, and PDF files. ## How the optimization loop works Your evaluation samples are injected into the placeholder variables in your prompt template, then sent for inference with your target model(s). Multimodal inputs (images and PDFs) are sent in the payload to the model along with the prompt but should not be referenced in a double curly bracket `{{placeholder}}` variable. The responses are graded according to your evaluation method. The service analyzes the evaluation results and automatically rewrites your prompts, then sends them back to the models. This feedback loop repeats and completes according to proprietary internal optimization parameters. It is important that you define your evaluation method and criteria as precisely as possible, because the evaluation steers the prompt optimization. ## What you receive At the end of the optimization job, you receive: + Your prompt templates before and after optimization + Evaluation scores for each evaluation sample + Latency (time to first token, or TTFT) for each model + Cost estimates for each model ## Cost All inference and Lambda function invocations run in your AWS account. Lambda operations are charged at Lambda's public pricing. Inference pricing (including LLM-as-a-judge evaluations) is charged according to Bedrock's public pricing for on-demand inference. There is no separate Advanced Prompt Optimization service charge beyond inference costs. The current default LLM-as-a-judge model is Anthropic Claude Sonnet 4.6, unless you select a different one for your custom LLMJ prompt. See the Bedrock public pricing page under Prompt Optimization, then Advanced Prompt Optimization for a calculation method to estimate the cost of running an optimization. ## Expected duration For a single prompt with only a few evaluation samples, the job could run for 15 to 20 minutes. For many prompts, each with a large number of evaluation samples, the job could run for over an hour, potentially for multiple hours. This is because each prompt template goes through multiple rounds of inference, evaluation, and rewriting loops based on every evaluation sample record you provide.