MetaLlama模型 - Amazon Bedrock

本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。

MetaLlama模型

本節提供使用下列模型的推論參數和程式碼範例。Meta

  • Llama 2

  • Llama 2 Chat

  • Llama 3 Instruct

您可以使用InvokeModel或「串流」(串InvokeModelWithResponse流) 向MetaLlama模型提出推論要求。您需要您想要使用的模型的模型 ID。若要取得模型 ID,請參閱Amazon 基岩模型 ID

請求和回應

請求主體在請求InvokeModelInvokeModelWithResponse流body字段傳遞。

Request

Llama 2 ChatLlama 2、和Llama 3 Instruct模型具有下列推論參數。

{ "prompt": string, "temperature": float, "top_p": float, "max_gen_len": int }

下列是必要的參數。

  • prompt — (必填) 您要傳遞給模型的提示。使用Llama 2 Chat,使用下列範本格式化交談。

    <s>[INST] <<SYS>> {{ system_prompt }} <</SYS>> {{ user_message }} [/INST]

    <<SYS>>權杖之間的指示會為模型提供系統提示。以下是包含系統提示的範例提示。

    <s>[INST] <<SYS>> You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. <</SYS>> There's a llama in my garden What should I do? [/INST]

    如需更多資訊,請參閱下列內容。

以下是選用參數。

  • 溫度 — 使用較低的值來降低回應中的隨機性。

    預設 下限 最大

    0.5

    0

    1

  • top_p — 使用較低的值忽略較少可能的選項。設定為 0 或 1.0 以停用。

    預設 下限 最大

    0.9

    0

    1

  • max_gen_len — 指定要在產生的回應中使用的記號數目上限。一旦產生的文字超過 max_gen_len,模型就會截斷回應。

    預設 下限 最大

    512

    1

    2048

Response

Llama 2 ChatLlama 2、和Llama 3 Instruct模型會針對文字完成推論呼叫傳回下列欄位。

{ "generation": "\n\n<response>", "prompt_token_count": int, "generation_token_count": int, "stop_reason" : string }

下方提供有關每個欄位的詳細資訊。

  • [產生] — 產生文字。

  • 提示符數 — 提示中令牌的數量。

  • 產生字元數 — 產生文字中的記號數目。

  • stop_reason — 響應停止生成文本的原因。可能值為:

    • stop — 模型已完成產生輸入提示的文字。

    • length — 產生文字的記號長度超出對 InvokeModel 呼叫之 max_gen_len 的值 (如果您正在串流輸出則為 InvokeModelWithResponseStream)。回應會截斷為 max_gen_len 記號。請考慮增加 max_gen_len 的值,然後再試一次。

範例程式碼

此範例顯示如何呼叫 MetaLlama 2 Chat13B 模型。

# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # SPDX-License-Identifier: Apache-2.0 """ Shows how to generate text with Meta Llama 2 Chat (on demand). """ import json import logging import boto3 from botocore.exceptions import ClientError logger = logging.getLogger(__name__) logging.basicConfig(level=logging.INFO) def generate_text(model_id, body): """ Generate an image using Meta Llama 2 Chat on demand. Args: model_id (str): The model ID to use. body (str) : The request body to use. Returns: response (JSON): The text that the model generated, token information, and the reason the model stopped generating text. """ logger.info("Generating image with Meta Llama 2 Chat model %s", model_id) bedrock = boto3.client(service_name='bedrock-runtime') response = bedrock.invoke_model( body=body, modelId=model_id) response_body = json.loads(response.get('body').read()) return response_body def main(): """ Entrypoint for Meta Llama 2 Chat example. """ logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s") model_id = "meta.llama2-13b-chat-v1" prompt = """<s>[INST] <<SYS>> You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. <</SYS>> There's a llama in my garden What should I do? [/INST]""" max_gen_len = 128 temperature = 0.1 top_p = 0.9 # Create request body. body = json.dumps({ "prompt": prompt, "max_gen_len": max_gen_len, "temperature": temperature, "top_p": top_p }) try: response = generate_text(model_id, body) print(f"Generated Text: {response['generation']}") print(f"Prompt Token count: {response['prompt_token_count']}") print(f"Generation Token count: {response['generation_token_count']}") print(f"Stop reason: {response['stop_reason']}") except ClientError as err: message = err.response["Error"]["Message"] logger.error("A client error occurred: %s", message) print("A client error occured: " + format(message)) else: print( f"Finished generating text with Meta Llama 2 Chat model {model_id}.") if __name__ == "__main__": main()