AI21 Labs Jamba models
This section provides inference parameters and a code example for using AI21 Labs Jamba models.
Topics
Required fields
The AI21 Labs Jamba models supports the following required fields:
-
Messages (
messages
) – The previous messages in this chat, from oldest (index 0) to newest. Must have at least one user or assistant message in the list. Include both user inputs and system responses. Maximum total size for the list is about 256K tokens. Each message includes the following members: -
Role (
role
) – The role of the message author. One of the following values:-
User (
user
) – Input provided by the user. Any instructions given here that conflict with instructions given in thesystem
prompt take precedence over thesystem
prompt instructions. -
Assistant (
assistant
) – Response generated by the model. -
System (
system
) – Initial instructions provided to the system to provide general guidance on the tone and voice of the generated message. An initial system message is optional but recommended to provide guidance on the tone of the chat. For example, "You are a helpful chatbot with a background in earth sciences and a charming French accent."
-
-
Content (
content
) – The content of the message.
Inference parameters
The AI21 Labs Jamba models support the following inference parameters.
Randomness and Diversity
The AI21 Labs Jamba models support the following parameters to control randomness and diversity in the response.
-
Temperature (
temperature
)– How much variation to provide in each answer. Setting this value to 0 guarantees the same response to the same question every time. Setting a higher value encourages more variation. Modifies the distribution from which tokens are sampled. Default: 1.0, Range: 0.0 – 2.0 -
Top P (
top_p
) – Limit the pool of next tokens in each step to the top N percentile of possible tokens, where 1.0 means the pool of all possible tokens, and 0.01 means the pool of only the most likely next tokens.
Length
The AI21 Labs Jamba models support the following parameters to control the length of the generated response.
-
Max completion length (
max_tokens
) – The maximum number of tokens to allow for each generated response message. Typically the best way to limit output length is by providing a length limit in the system prompt (for example, "limit your answers to three sentences"). Default: 4096, Range: 0 – 4096. -
Stop sequences (
stop
) – End the message when the model generates one of these strings. The stop sequence is not included in the generated message. Each sequence can be up to 64K long, and can contain newlines as \n characters.Examples:
-
Single stop string with a word and a period: "monkeys."
-
Multiple stop strings and a newline: ["cat", "dog", " .", "####", "\n"]
-
-
Number of responses (
n
) – How many chat responses to generate. Notes n must be 1 for streaming responses. If n is set to larger than 1, settingtemperature=0
will always fail because all answers are guaranteed to be duplicates. Default:1, Range: 1 – 16
Repetitions
The AI21 Labs Jamba models support the following parameters to control repetition in the generated response.
-
Frequency Penalty (
frequency_penalty
) – Reduce frequency of repeated words within a single response message by increasing this number. This penalty gradually increases the more times a word appears during response generation. Setting to 2.0 will produce a string with few, if any repeated words. -
Presence Penalty (
presence_penalty
) – Reduce the frequency of repeated words within a single message by increasing this number. Unlike frequency penalty, presence penalty is the same no matter how many times a word appears.
Model invocation request body field
When you make an InvokeModel or InvokeModelWithResponseStream call using an AI21 Labs model, fill the body
field with a JSON object that conforms to the one below. Enter the prompt in the prompt
field.
{ "messages": [ { "role":"system", // Non-printing contextual information for the model "content":"You are a helpful history teacher. You are kind and you respond with helpful content in a professional manner. Limit your answers to three sentences. Your listener is a high school student." }, { "role":"user", // The question we want answered. "content":"Who was the first emperor of rome?" } ], "n":1 // Limit response to one answer }
Model invocation response body field
For information about the format of the body
field in the response, see
https://docs.ai21.com/reference/jamba-instruct-api#response-details
Code example
This example shows how to call the AI21 Labs Jamba-Instruct model.
invoke_model
import boto3 import json bedrock = session.client('bedrock-runtime', 'us-east-1') response = bedrock.invoke_model( modelId='ai21.jamba-instruct-v1:0', body=json.dumps({ 'messages': [ { 'role': 'user', 'content': 'which llm are you?' } ], }) ) print(json.dumps(json.loads(response['body']), indent=4))
converse
import boto3 import json bedrock = session.client('bedrock-runtime', 'us-east-1') response = bedrock.converse( modelId='ai21.jamba-instruct-v1:0', messages=[ { 'role': 'user', 'content': [ { 'text': 'which llm are you?' } ] } ] ) print(json.dumps(json.loads(response['body']), indent=4))
Code example for Jamba 1.5 Large
This example shows how to call the AI21 Labs Jamba 1.5 Large model.
invoke_model
POST https://bedrock-runtime.us-east-1.amazonaws.com/model/ai21.jamba-1-5-mini-v1:0/invoke-model HTTP/1.1 { "messages": [ { "role": "system", "content": "You are a helpful chatbot with a background in earth sciences and a charming French accent." }, { "role": "user", "content": "What are the main causes of earthquakes?" } ], "max_tokens": 512, "temperature": 0.7, "top_p": 0.9, "stop": ["###"], "n": 1 }