Running inference on a model - Amazon Bedrock

Running inference on a model

The following examples show how to run inference on a model with InvokeModel and, with Python, run inference with streaming with the InvokeModelWithResponseStream operation.

Note

The AWS CLI does not support streaming.

For information about the parameters each model supports, see Inference parameters for foundation models. For information about writing prompts, see Prompt engineering guidelines.

AWS CLI

The following example shows how to generate text with the AWS CLI using the prompt story of two dogs and the Anthropic Claude V2 model. The example returns up to 300 tokens in the response and saves the response to the file output.txt:

aws bedrock-runtime invoke-model \ --model-id anthropic.claude-v2 \ --body "{\"prompt\": \"\n\nHuman: story of two dogs\n\nAssistant:\", \"max_tokens_to_sample\" : 300}" \ --cli-binary-format raw-in-base64-out \ invoke-model-output.txt

The following example shows how to call the Llama 2 Chat 13B model.

aws bedrock-runtime invoke-model \ --region us-east-1 \ --model-id meta.llama2-13b-chat-v1 \ --body "{\"prompt\": \"What is the average lifespan of a Llama?\", \"max_gen_len\" : 128, \"temperature\": 0.1, \"top_p\": 0.9}" \ invoke-model-output.txt
Python (Boto)

The following example shows how to generate text with Python using the prompt explain black holes to 8th graders and the Anthropic Claude V2 model:

import boto3 import json brt = boto3.client(service_name='bedrock-runtime') body = json.dumps({ "prompt": "\n\nHuman: explain black holes to 8th graders\n\nAssistant:", "max_tokens_to_sample": 300, "temperature": 0.1, "top_p": 0.9, }) modelId = 'anthropic.claude-v2' accept = 'application/json' contentType = 'application/json' response = brt.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType) response_body = json.loads(response.get('body').read()) # text print(response_body.get('completion'))

The following example shows how to generate streaming text with Python using the prompt write an essay for living on mars in 1000 words and the Anthropic Claude V2 model:

import boto3 import json brt = boto3.client(service_name='bedrock-runtime') body = json.dumps({ 'prompt': '\n\nHuman: write an essay for living on mars in 1000 words\n\nAssistant:', 'max_tokens_to_sample': 100 }) response = brt.invoke_model_with_response_stream( modelId='anthropic.claude-v2', body=body ) stream = response.get('body') if stream: for event in stream: chunk = event.get('chunk') if chunk: print(json.loads(chunk.get('bytes').decode()))

Base model inference examples

The following Python (Boto) examples show how you can perform inference with the InvokeModel operation on different Amazon Bedrock base models.

A2I Jurassic-2

This examples shows how to call the A2I Jurassic-2 Mid model.

import boto3 import json brt = boto3.client(service_name='bedrock-runtime') body = json.dumps({ "prompt": "Translate to spanish: 'Amazon Bedrock is the easiest way to build and scale generative AI applications with base models (FMs)'.", "maxTokens": 200, "temperature": 0.5, "topP": 0.5 }) modelId = 'ai21.j2-mid-v1' accept = 'application/json' contentType = 'application/json' response = brt.invoke_model( body=body, modelId=modelId, accept=accept, contentType=contentType ) response_body = json.loads(response.get('body').read()) # text print(response_body.get('completions')[0].get('data').get('text'))

Cohere Command

This examples shows how to call the Cohere Command model.

import boto3 import json brt = boto3.client(service_name='bedrock-runtime') body = json.dumps({ "prompt": "How do you tie a tie?", "max_tokens": 200, "temperature": 0.5, "p": 0.5 }) modelId = 'cohere.command-text-v14' accept = 'application/json' contentType = 'application/json' response = brt.invoke_model( body=body, modelId=modelId, accept=accept, contentType=contentType ) response_body = json.loads(response.get('body').read()) # text print(response_body.get('generations')[0].get('text'))

Meta Llama 2

This example shows how to call the Llama 2 Chat 13B model.

import boto3 import json bedrock = boto3.client(service_name='bedrock-runtime', region_name='us-east-1') body = json.dumps({ "prompt": "What is the average lifespan of a Llama?", "max_gen_len": 128, "temperature": 0.1, "top_p": 0.9, }) modelId = 'meta.llama2-13b-chat-v1' accept = 'application/json' contentType = 'application/json' response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType) response_body = json.loads(response.get('body').read()) print(response_body)

Stability AI Diffusion XL

This example shows how to call the Stability AI Stability Diffusion XL model.

import boto3 import json brt = boto3.client(service_name='bedrock-runtime') prompt_data = "A photograph of an dog on the top of a mountain covered in snow." body = json.dumps({ "text_prompts": [ { "text": prompt_data } ], "cfg_scale":10, "seed":20, "steps":50 }) modelId = "stability.stable-diffusion-xl-v0" accept = "application/json" contentType = "application/json" response = brt.invoke_model( body=body, modelId=modelId, accept=accept, contentType=contentType ) response_body = json.loads(response.get("body").read()) print(response_body['result']) print(f'{response_body.get("artifacts")[0].get("base64")[0:80]}...')