Submit a single prompt with the InvokeModel API operations - Amazon Bedrock

Submit a single prompt with the InvokeModel API operations

Run inference on a model through the API by sending an InvokeModel or InvokeModelWithResponseStream request. You can specify the media type for the request and response bodies in the contentType and accept fields. The default value for both fields is application/json if you don't specify a value.

Streaming is supported for all text output models except AI21 Labs Jurassic-2 models. To check if a model supports streaming, send a GetFoundationModel or ListFoundationModels request and check the value in the responseStreamingSupported field.

Specify the following fields, depending on the model that you use.

  1. modelId – Use the ID or Amazon Resource Name (ARN) of a model or throughput. The method for finding the ID or ARN depends on the type of model or throughput that you use:

    • Base model – Do one of the following:

    • Inference profile – Do one of the following:

      • Send a ListInferenceProfiles request and find the inferenceProfileArn of the model to use in the response.

      • In the Amazon Bedrock console, select Cross-region inference from the left navigation pane and find the ID or ARN of the inference profile in the Inference profiles section.

    • Provisioned Throughput – If you've created Provisioned Throughput for a base or custom model, do one of the following:

      • Send a ListProvisionedModelThroughputs request and find the provisionedModelArn of the model to use in the response.

      • In the Amazon Bedrock console, select Provisioned Throughput from the left navigation pane and select a Provisioned Throughput in the Provisioned throughput section. Then, find the ID or ARN of the Provisioned Throughput in the Model details section.

    • Custom model – Purchase Provisioned Throughput for the custom model (for more information, see Increase model invocation capacity with Provisioned Throughput in Amazon Bedrock) and find the model ID or ARN of the provisioned model.

  2. body – Each base model has its own inference parameters that you set in the body field. The inference parameters for a custom or provisioned model depends on the base model from which it was created. For more information, see Inference request parameters and response fields for foundation models.

Invoke model code examples

The following examples show how to run inference with the InvokeModel API. For examples with different models, see the inference parameter reference for the desired model (Inference request parameters and response fields for foundation models).

CLI

The following example saves the generated response to the prompt story of two dogs to a file called invoke-model-output.txt.

aws bedrock-runtime invoke-model \ --model-id anthropic.claude-v2 \ --body '{"prompt": "\n\nHuman: story of two dogs\n\nAssistant:", "max_tokens_to_sample" : 300}' \ --cli-binary-format raw-in-base64-out \ invoke-model-output.txt
Python

The following example returns a generated response to the prompt explain black holes to 8th graders.

import boto3 import json brt = boto3.client(service_name='bedrock-runtime') body = json.dumps({ "prompt": "\n\nHuman: explain black holes to 8th graders\n\nAssistant:", "max_tokens_to_sample": 300, "temperature": 0.1, "top_p": 0.9, }) modelId = 'anthropic.claude-v2' accept = 'application/json' contentType = 'application/json' response = brt.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType) response_body = json.loads(response.get('body').read()) # text print(response_body.get('completion'))

Invoke model with streaming code example

Note

The AWS CLI does not support streaming.

The following example shows how to use the InvokeModelWithResponseStream API to generate streaming text with Python using the prompt write an essay for living on mars in 1000 words.

import boto3 import json brt = boto3.client(service_name='bedrock-runtime') body = json.dumps({ 'prompt': '\n\nHuman: write an essay for living on mars in 1000 words\n\nAssistant:', 'max_tokens_to_sample': 4000 }) response = brt.invoke_model_with_response_stream( modelId='anthropic.claude-v2', body=body ) stream = response.get('body') if stream: for event in stream: chunk = event.get('chunk') if chunk: print(json.loads(chunk.get('bytes').decode()))