Show / Hide Table of Contents

Interface ILlmAsAJudgeOptions

Options for configuring an LLM-as-a-Judge custom evaluator.

Namespace: Amazon.CDK.AWS.BedrockAgentCore
Assembly: Amazon.CDK.Lib.dll
Syntax (csharp)
public interface ILlmAsAJudgeOptions
Syntax (vb)
Public Interface ILlmAsAJudgeOptions
Remarks

Uses a foundation model to assess agent performance based on custom instructions and a rating scale.

ExampleMetadata: infused

Examples
// Create a custom LLM-as-a-Judge evaluator
             var evaluator = new Evaluator(this, "MyEvaluator", new EvaluatorProps {
                 EvaluatorName = "my_custom_evaluator",
                 Level = EvaluationLevel.SESSION,
                 EvaluatorConfig = EvaluatorConfig.LlmAsAJudge(new LlmAsAJudgeOptions {
                     Instructions = "Evaluate whether the agent response is helpful and accurate.",
                     ModelId = "us.anthropic.claude-sonnet-4-6",
                     RatingScale = EvaluatorRatingScale.Categorical(new [] { new CategoricalRatingOption { Label = "Good", Definition = "The response is helpful and accurate." }, new CategoricalRatingOption { Label = "Bad", Definition = "The response is not helpful or contains errors." } })
                 })
             });

             // Use the custom evaluator in an online evaluation configuration
             // Use the custom evaluator in an online evaluation configuration
             new OnlineEvaluationConfig(this, "MyEvaluation", new OnlineEvaluationConfigProps {
                 OnlineEvaluationConfigName = "my_evaluation",
                 Evaluators = new [] { EvaluatorSelector.Builtin(BuiltinEvaluator.HELPFULNESS), EvaluatorSelector.Custom(evaluator) },
                 DataSource = DataSourceConfig.FromCloudWatchLogs(new CloudWatchLogsDataSourceConfig {
                     LogGroupNames = new [] { "/aws/bedrock-agentcore/my-agent" },
                     ServiceNames = new [] { "my-agent.default" }
                 })
             });

Synopsis

Properties

AdditionalModelRequestFields

Additional model-specific request fields.

InferenceConfig

Optional inference configuration parameters that control model behavior during evaluation.

Instructions

The evaluation instructions that guide the language model in assessing agent performance.

ModelId

The identifier of the Amazon Bedrock model to use for evaluation.

RatingScale

The rating scale that defines how the evaluator should score agent performance.

Properties

AdditionalModelRequestFields

Additional model-specific request fields.

IDictionary<string, object>? AdditionalModelRequestFields { get; }
Property Value

IDictionary<string, object>

Remarks

Default: - No additional fields

InferenceConfig

Optional inference configuration parameters that control model behavior during evaluation.

IEvaluatorInferenceConfig? InferenceConfig { get; }
Property Value

IEvaluatorInferenceConfig

Remarks

When not specified, the foundation model uses its own default values for maxTokens, temperature, and topP.

Default: - The foundation model's default inference parameters are used

See: https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/custom-evaluators.html

Instructions

The evaluation instructions that guide the language model in assessing agent performance.

string Instructions { get; }
Property Value

string

Remarks

These instructions define the evaluation criteria, context, and expected behavior. Instructions must contain placeholders appropriate for the evaluation level (e.g., {context}, {available_tools} for SESSION level).

Note: Evaluators using reference-input placeholders (e.g., {expected_tool_trajectory}, {assertions}, {expected_response}) are only compatible with on-demand evaluation, not online evaluation.

See: https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/custom-evaluators.html

ModelId

The identifier of the Amazon Bedrock model to use for evaluation.

string ModelId { get; }
Property Value

string

Remarks

Accepts standard model IDs (e.g., 'anthropic.claude-sonnet-4-6') and cross-region inference profile IDs with region prefixes (e.g., 'us.anthropic.claude-sonnet-4-6', 'eu.anthropic.claude-sonnet-4-6').

RatingScale

The rating scale that defines how the evaluator should score agent performance.

EvaluatorRatingScale RatingScale { get; }
Property Value

EvaluatorRatingScale

Remarks

Uses a foundation model to assess agent performance based on custom instructions and a rating scale.

ExampleMetadata: infused

Back to top Generated by DocFX