Meta모델Llama

이 섹션에서는 다음 모델을 사용하기 위한 추론 파라미터와 코드 예제를 제공합니다. Meta

Llama 2
Llama 2 Chat
Llama 3 Instruct

InvokeModelOR InvokeModelWithResponseStream(스트리밍) 을 사용하여 Meta Llama 모델에 추론 요청을 합니다. 사용하려는 모델의 모델 ID가 필요합니다. 모델 ID를 가져오려면 을 참조하십시오아마존 베드락 모델 ID.

주제

요청 및 응답
예제 코드

요청 및 응답

요청 본문은 요청 body 필드에서 InvokeModel또는 으로 전달됩니다 InvokeModelWithResponseStream.

Request

Llama 2 ChatLlama 2, 및 Llama 3 Instruct 모델에는 다음과 같은 추론 매개변수가 있습니다.


{
    "prompt": string,
    "temperature": float,
    "top_p": float,
    "max_gen_len": int
}

다음은 필수 파라미터입니다.

prompt — (필수) 모델에 전달하려는 프롬프트입니다.

프롬프트 형식에 대한 자세한 내용은 MetaLlama 2및 을 참조하십시오 MetaLlama 3.

다음 파라미터는 선택 사항입니다.

온도 — 반응의 임의성을 줄이려면 더 낮은 값을 사용합니다.

기본값	최소	Maximum
0.5	0	1

top_p — 가능성이 낮은 옵션을 무시하려면 더 낮은 값을 사용합니다. 비활성화하려면 0 또는 1.0으로 설정합니다.

기본값	최소	Maximum
0.9	0	1

max_gen_len — 생성된 응답에 사용할 최대 토큰 수를 지정합니다. 생성된 텍스트가 max_gen_len을 초과하면 모델은 응답을 잘라냅니다.

기본값	최소	Maximum
512	1	2048

Response

Llama 2 Chat,Llama 2, Llama 3 Instruct 모델은 텍스트 완성 추론 호출에 대해 다음 필드를 반환합니다.


{
    "generation": "\n\n<response>",
    "prompt_token_count": int,
    "generation_token_count": int,
    "stop_reason" : string
}

각 필드에 대한 자세한 내용은 아래에 나와 있습니다.

생성 — 생성된 텍스트.
prompt_token_count — 프롬프트에 있는 토큰 수입니다.
generation_token_count — 생성된 텍스트의 토큰 수입니다.
stop_reason — 응답에서 텍스트 생성이 중단된 이유입니다. 가능한 값은 다음과 같습니다.
- 중지 - 모델이 입력 프롬프트에 대한 텍스트 생성을 완료했습니다.
- 길이 - 생성된 텍스트의 토큰 길이가 InvokeModel(InvokeModelWithResponseStream, 출력을 스트리밍하는 경우)에 대한 호출에서 max_gen_len의 값을 초과합니다. 응답은 max_gen_len 토큰 수로 잘립니다. max_gen_len의 값을 높인 후에 다시 시도합니다.

예제 코드

이 예제에서는 MetaLlama 2 Chat13B 모델을 호출하는 방법을 보여줍니다.


# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate text with Meta Llama 2 Chat (on demand).
"""

import json
import logging
import boto3


from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_text(model_id, body):
    """
    Generate an image using Meta Llama 2 Chat on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        response (JSON): The text that the model generated, token information, and the
        reason the model stopped generating text.
    """

    logger.info("Generating image with Meta Llama 2 Chat model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )

    response_body = json.loads(response.get('body').read())

    return response_body


def main():
    """
    Entrypoint for Meta Llama 2 Chat example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    model_id = 'meta.llama2-13b-chat-v1'
    prompt = """What is the average lifespan of a Llama?"""
    max_gen_len = 128
    temperature = 0.1
    top_p = 0.9


    # Create request body.
    body = json.dumps({
        "prompt": prompt,
        "max_gen_len": max_gen_len,
        "temperature": temperature,
        "top_p": top_p
    })


    try:

        response = generate_text(model_id, body)

        print(f"Generated Text: {response['generation']}")
        print(f"Prompt Token count:  {response['prompt_token_count']}")
        print(f"Generation Token count:  {response['generation_token_count']}")
        print(f"Stop reason:  {response['stop_reason']}")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))

    else:
        print(
            f"Finished generating text with Meta Llama 2 Chat model {model_id}.")


if __name__ == "__main__":
    main()

javascript가 브라우저에서 비활성화되거나 사용이 불가합니다.

AWS 설명서를 사용하려면 Javascript가 활성화되어야 합니다. 지침을 보려면 브라우저의 도움말 페이지를 참조하십시오.

문서 규칙

CohereCommand R및 Command R+ 모델

Mistral AI모델