Supported models Request and Response Code example

Mistral AI models

You make inference requests to Mistral AI models with InvokeModel or InvokeModelWithResponseStream (streaming). You need the model ID for the model that you want to use. To get the model ID, see Amazon Bedrock model IDs.

Mistral AI models are available under the Apache 2.0 license. For more information about using Mistral AI models, see the Mistral AI documentation.

Supported models

You can use following Mistral AI models.

Mistral 7B Instruct
Mixtral 8X7B Instruct
Mistral Large

Request and Response

Request

The Mistral AI models have the following inference parameters.


{
    "prompt": string,
    "max_tokens" : int,
    "stop" : [string],    
    "temperature": float,
    "top_p": float,
    "top_k": int
}

The following are required parameters.

prompt – (Required) The prompt that you want to pass to the model, as shown in the following example.
```
<s>[INST] What is your favourite condiment? [/INST]
```
The following example shows how to format is a multi-turn prompt.
```
<s>[INST] What is your favourite condiment? [/INST]
Well, I'm quite partial to a good squeeze of fresh lemon juice. 
It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> 
[INST] Do you have mayonnaise recipes? [/INST]
```
Text for the user role is inside the [INST]...[/INST] tokens, text outside is the assistant role. The beginning and ending of a string are represented by the <s> (beginning of string) and </s> (end of string) tokens. For information about sending a chat prompt in the correct format, see Chat template in the Mistral AI documentation.

The following are optional parameters.

max_tokens – Specify the maximum number of tokens to use in the generated response. The model truncates the response once the generated text exceeds max_tokens.

Default	Minimum	Maximum
Mistral 7B Instruct – 512 Mixtral 8X7B Instruct – 512 Mistral Large – 8,192	1	Mistral 7B Instruct – 8,192 Mixtral 8X7B Instruct – 4,096 Mistral Large – 8,192

Default

Minimum

Maximum

Mistral 7B Instruct – 512

Mixtral 8X7B Instruct – 512

Mistral Large – 8,192

Mistral 7B Instruct – 8,192

Mixtral 8X7B Instruct – 4,096

Mistral Large – 8,192

stop – A list of stop sequences that if generated by the model, stops the model from generating further output.

Default	Minimum	Maximum
0	0	10

temperature – Controls the randomness of predictions made by the model. For more information, see Inference parameters.

Default	Minimum	Maximum
Mistral 7B Instruct – 0.5 Mixtral 8X7B Instruct – 0.5 Mistral Large – 0.7	0	1

Default

Minimum

Maximum

Mistral 7B Instruct – 0.5

Mixtral 8X7B Instruct – 0.5

Mistral Large – 0.7

top_p – Controls the diversity of text that the model generates by setting the percentage of most-likely candidates that the model considers for the next token. For more information, see Inference parameters.

Default	Minimum	Maximum
Mistral 7B Instruct – 0.9 Mixtral 8X7B Instruct – 0.9 Mistral Large – 1	0	1

Default

Minimum

Maximum

Mistral 7B Instruct – 0.9

Mixtral 8X7B Instruct – 0.9

Mistral Large – 1

top_k – Controls the number of most-likely candidates that the model considers for the next token. For more information, see Inference parameters.

Default	Minimum	Maximum
Mistral 7B Instruct – 50 Mixtral 8X7B Instruct – 50 Mistral Large – disabled	1	200

Default

Minimum

Maximum

Mistral 7B Instruct – 50

Mixtral 8X7B Instruct – 50

Mistral Large – disabled

200

Response

The body response from a call to InvokeModel is the following:


{
  "outputs": [
    {
        "text": string,
        "stop_reason": string
    }
  ]
}

The body response has the following fields:

outputs – A list of outputs from the model. Each output has the following fields.
- text – The text that the model generated.
- stop_reason – The reason why the response stopped generating text. Possible values are:
  - stop – The model has finished generating text for the input prompt. The model stops because it has no more content to generate or if the model generates one of the stop sequences that you define in the stop request parameter.
  - length – The length of the tokens for the generated text exceeds the value of max_tokens in the call to InvokeModel (InvokeModelWithResponseStream, if you are streaming output). The response is truncated to max_tokens tokens.

Code example

This examples shows how to call the Mistral 7B Instruct model.


# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate text using a Mistral AI model.
"""
import json
import logging
import boto3


from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_text(model_id, body):
    """
    Generate text using a Mistral AI model.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        JSON: The response from the model.
    """

    logger.info("Generating text with Mistral AI model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    response = bedrock.invoke_model(
        body=body,
        modelId=model_id
    )

    logger.info("Successfully generated text with Mistral AI model %s", model_id)

    return response


def main():
    """
    Entrypoint for Mistral AI example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    try:
        model_id = 'mistral.mistral-7b-instruct-v0:2'

        prompt = """<s>[INST] In Bash, how do I list all text files in the current directory
          (excluding subdirectories) that have been modified in the last month? [/INST]"""

        body = json.dumps({
            "prompt": prompt,
            "max_tokens": 400,
            "temperature": 0.7,
            "top_p": 0.7,
            "top_k": 50
        })

        response = generate_text(model_id=model_id,
                                 body=body)

        response_body = json.loads(response.get('body').read())

        outputs = response_body.get('outputs')

        for index, output in enumerate(outputs):

            print(f"Output {index + 1}\n----------")
            print(f"Text:\n{output['text']}\n")
            print(f"Stop reason: {output['stop_reason']}\n")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
    else:
        print(f"Finished generating text with Mistral AI model {model_id}.")


if __name__ == "__main__":
    main()

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Meta Llama models

Stability.ai Diffusion models