Use the API to invoke a model with a single prompt - Amazon Bedrock

Use the API to invoke a model with a single prompt

Run inference on a model through the API by sending an InvokeModel or InvokeModelWithResponseStream request. You can specify the media type for the request and response bodies in the contentType and accept fields. The default value for both fields is application/json if you don't specify a value.

Streaming is supported for all text output models except AI21 Labs Jurassic-2 models. To check if a model supports streaming, send a GetFoundationModel or ListFoundationModels request and check the value in the responseStreamingSupported field.

Specify the following fields, depending on the model that you use.

  1. modelId – Use either the model ID or its ARN. The method for finding the modelId or modelArn depends on the type of model you use:

    • Base model – Do one of the following.

    • Custom model – Purchase Provisioned Throughput for the custom model (for more information, see Provisioned Throughput for Amazon Bedrock) and find the model ID or ARN of the provisioned model.

    • Provisioned model – If you have created a Provisioned Throughput for a base or custom model, do one of the following.

      • Send a ListProvisionedModelThroughputs request and find the provisionedModelArn of the model to use in the response.

      • In the console, select a model in Provisioned Throughput and find the model ARN in the Model details section.

  2. body – Each base model has its own inference parameters that you set in the body field. The inference parameters for a custom or provisioned model depends on the base model from which it was created. For more information, see Inference parameters for foundation models.

Invoke model code examples

The following examples show how to run inference with the InvokeModel API. For examples with different models, see the inference parameter reference for the desired model (Inference parameters for foundation models).


The following example saves the generated response to the prompt story of two dogs to a file called invoke-model-output.txt.

aws bedrock-runtime invoke-model \ --model-id anthropic.claude-v2 \ --body '{"prompt": "\n\nHuman: story of two dogs\n\nAssistant:", "max_tokens_to_sample" : 300}' \ --cli-binary-format raw-in-base64-out \ invoke-model-output.txt

The following example returns a generated response to the prompt explain black holes to 8th graders.

import boto3 import json brt = boto3.client(service_name='bedrock-runtime') body = json.dumps({ "prompt": "\n\nHuman: explain black holes to 8th graders\n\nAssistant:", "max_tokens_to_sample": 300, "temperature": 0.1, "top_p": 0.9, }) modelId = 'anthropic.claude-v2' accept = 'application/json' contentType = 'application/json' response = brt.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType) response_body = json.loads(response.get('body').read()) # text print(response_body.get('completion'))

Invoke model with streaming code example


The AWS CLI does not support streaming.

The following example shows how to use the InvokeModelWithResponseStream API to generate streaming text with Python using the prompt write an essay for living on mars in 1000 words.

import boto3 import json brt = boto3.client(service_name='bedrock-runtime') body = json.dumps({ 'prompt': '\n\nHuman: write an essay for living on mars in 1000 words\n\nAssistant:', 'max_tokens_to_sample': 4000 }) response = brt.invoke_model_with_response_stream( modelId='anthropic.claude-v2', body=body ) stream = response.get('body') if stream: for event in stream: chunk = event.get('chunk') if chunk: print(json.loads(chunk.get('bytes').decode()))