SageMaker 대형 모델 추론을 위한 끝점 매개변수

다음과 같은 파라미터를 사용자 지정하여 지연 시간이 짧은 대형 모델 추론 (LMI) 을 촉진할 수 있습니다. SageMaker

인스턴스의 최대 Amazon EBS 볼륨 크기(VolumeSizeInGB) - 모델 크기가 30GB보다 크고 로컬 디스크가 없는 인스턴스를 사용하는 경우 이 매개변수를 모델 크기보다 약간 크게 늘려야 합니다.
상태 확인 제한 시간 할당량 (ContainerStartupHealthCheckTimeoutInSeconds) — 컨테이너가 올바르게 설정되고 CloudWatch 로그에 상태 확인 제한 시간이 표시된 경우 컨테이너가 상태 확인에 응답할 충분한 시간을 갖도록 이 할당량을 늘려야 합니다.
모델 다운로드 제한 시간 할당량(ModelDataDownloadTimeoutInSeconds) - 모델 크기가 40GB보다 큰 경우 Amazon S3에서 인스턴스로 모델을 다운로드할 수 있는 충분한 시간을 확보하려면 이 할당량을 늘려야 합니다.

아래의 코드 스니펫은 앞서 언급한 매개변수를 프로그래밍 방식으로 구성하는 방법을 보여줍니다. 예제의 기울임꼴 자리 표시자 텍스트를 본인의 정보로 대체하세요.


import boto3

aws_region = "aws-region"
sagemaker_client = boto3.client('sagemaker', region_name=aws_region)

# The name of the endpoint. The name must be unique within an AWS Region in your AWS account.
endpoint_name = "endpoint-name"

# Create an endpoint config name.
endpoint_config_name = "endpoint-config-name"

# The name of the model that you want to host.
model_name = "the-name-of-your-model"

instance_type = "instance-type"

sagemaker_client.create_endpoint_config(
    EndpointConfigName = endpoint_config_name
    ProductionVariants=[
        {
            "VariantName": "variant1", # The name of the production variant.
            "ModelName": model_name,
            "InstanceType": instance_type, # Specify the compute instance type.
            "InitialInstanceCount": 1, # Number of instances to launch initially.
            "VolumeSizeInGB": 256, # Specify the size of the Amazon EBS volume.
            "ModelDataDownloadTimeoutInSeconds": 1800, # Specify the model download timeout in seconds.
            "ContainerStartupHealthCheckTimeoutInSeconds": 1800, # Specify the health checkup timeout in seconds
        },
    ],
)

sagemaker_client.create_endpoint(EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name)

의 키에 대한 자세한 내용은 을 ProductionVariants 참조하십시오. ProductionVariant

대규모 모델로 지연 시간을 줄이는 추론을 달성하는 방법을 보여주는 예제는 aws-samples SageMaker 리포지토리의 Amazon의 제너레이티브 AI 추론 예제를 참조하십시오. GitHub

javascript가 브라우저에서 비활성화되거나 사용이 불가합니다.

AWS 설명서를 사용하려면 Javascript가 활성화되어야 합니다. 지침을 보려면 브라우저의 도움말 페이지를 참조하십시오.

문서 규칙

LMI 컨테이너 설명서

압축되지 않은 모델 배포하기