Mistral AI models - Amazon Bedrock

Mistral AI models

You make inference requests to Mistral AI models with InvokeModel or InvokeModelWithResponseStream (streaming). You need the model ID for the model that you want to use. To get the model ID, see Amazon Bedrock model IDs.

Mistral AI models are available under the Apache 2.0 license. For more information about using Mistral AI models, see the Mistral AI documentation.

Supported models

You can use following Mistral AI models.

  • Mistral 7B Instruct

  • Mixtral 8X7B Instruct

  • Mistral Large

Request and Response


The Mistral AI models have the following inference parameters.

{ "prompt": string, "max_tokens" : int, "stop" : [string], "temperature": float, "top_p": float, "top_k": int }

The following are required parameters.

  • prompt – (Required) The prompt that you want to pass to the model, as shown in the following example.

    <s>[INST] What is your favourite condiment? [/INST]

    The following example shows how to format is a multi-turn prompt.

    <s>[INST] What is your favourite condiment? [/INST] Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> [INST] Do you have mayonnaise recipes? [/INST]

    Text for the user role is inside the [INST]...[/INST] tokens, text outside is the assistant role. The beginning and ending of a string are represented by the <s> (beginning of string) and </s> (end of string) tokens. For information about sending a chat prompt in the correct format, see Chat template in the Mistral AI documentation.

The following are optional parameters.

  • max_tokens – Specify the maximum number of tokens to use in the generated response. The model truncates the response once the generated text exceeds max_tokens.

    Default Minimum Maximum

    Mistral 7B Instruct – 512

    Mixtral 8X7B Instruct – 512

    Mistral Large – 8,192


    Mistral 7B Instruct – 8,192

    Mixtral 8X7B Instruct – 4,096

    Mistral Large – 8,192

  • stop – A list of stop sequences that if generated by the model, stops the model from generating further output.

    Default Minimum Maximum




  • temperature – Controls the randomness of predictions made by the model. For more information, see Inference parameters.

    Default Minimum Maximum

    Mistral 7B Instruct – 0.5

    Mixtral 8X7B Instruct – 0.5

    Mistral Large – 0.7



  • top_p – Controls the diversity of text that the model generates by setting the percentage of most-likely candidates that the model considers for the next token. For more information, see Inference parameters.

    Default Minimum Maximum

    Mistral 7B Instruct – 0.9

    Mixtral 8X7B Instruct – 0.9

    Mistral Large – 1



  • top_k – Controls the number of most-likely candidates that the model considers for the next token. For more information, see Inference parameters.

    Default Minimum Maximum

    Mistral 7B Instruct – 50

    Mixtral 8X7B Instruct – 50

    Mistral Large – disabled




The body response from a call to InvokeModel is the following:

{ "outputs": [ { "text": string, "stop_reason": string } ] }

The body response has the following fields:

  • outputs – A list of outputs from the model. Each output has the following fields.

    • text – The text that the model generated.

    • stop_reason – The reason why the response stopped generating text. Possible values are:

      • stop – The model has finished generating text for the input prompt. The model stops because it has no more content to generate or if the model generates one of the stop sequences that you define in the stop request parameter.

      • length – The length of the tokens for the generated text exceeds the value of max_tokens in the call to InvokeModel (InvokeModelWithResponseStream, if you are streaming output). The response is truncated to max_tokens tokens.

Code example

This examples shows how to call the Mistral 7B Instruct model.

# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # SPDX-License-Identifier: Apache-2.0 """ Shows how to generate text using a Mistral AI model. """ import json import logging import boto3 from botocore.exceptions import ClientError logger = logging.getLogger(__name__) logging.basicConfig(level=logging.INFO) def generate_text(model_id, body): """ Generate text using a Mistral AI model. Args: model_id (str): The model ID to use. body (str) : The request body to use. Returns: JSON: The response from the model. """ logger.info("Generating text with Mistral AI model %s", model_id) bedrock = boto3.client(service_name='bedrock-runtime') response = bedrock.invoke_model( body=body, modelId=model_id ) logger.info("Successfully generated text with Mistral AI model %s", model_id) return response def main(): """ Entrypoint for Mistral AI example. """ logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s") try: model_id = 'mistral.mistral-7b-instruct-v0:2' prompt = """<s>[INST] In Bash, how do I list all text files in the current directory (excluding subdirectories) that have been modified in the last month? [/INST]""" body = json.dumps({ "prompt": prompt, "max_tokens": 400, "temperature": 0.7, "top_p": 0.7, "top_k": 50 }) response = generate_text(model_id=model_id, body=body) response_body = json.loads(response.get('body').read()) outputs = response_body.get('outputs') for index, output in enumerate(outputs): print(f"Output {index + 1}\n----------") print(f"Text:\n{output['text']}\n") print(f"Stop reason: {output['stop_reason']}\n") except ClientError as err: message = err.response["Error"]["Message"] logger.error("A client error occurred: %s", message) print("A client error occured: " + format(message)) else: print(f"Finished generating text with Mistral AI model {model_id}.") if __name__ == "__main__": main()