사전 조건 1단계: AWS 자격 증명 구성 2단계: SageMaker 실행 역할 생성 3단계: 모델 파라미터 구성 4단계: SageMaker 모델 및 엔드포인트 구성 생성 5단계: 엔드포인트 배포 6단계: 엔드포인트 간접 호출 7단계: 리소스 정리(선택 사항)

시작하기

이 가이드에서는 SageMaker 실시간 엔드포인트에 사용자 지정된 Amazon Nova 모델을 배포하고, 추론 파라미터를 구성하며, 테스트를 위해 모델을 간접 호출하는 방법을 보여줍니다.

사전 조건

다음은 SageMaker 추론에서 Amazon Nova 모델을 배포하기 위한 사전 조건입니다.

AWS 계정 생성 - 아직 없는 경우 AWS 계정을 생성합니다.
필수 IAM 권한 - IAM 사용자 또는 역할에 다음과 같은 관리형 정책이 연결되어 있는지 확인합니다.
- AmazonSageMakerFullAccess
- AmazonS3FullAccess
필수 SDK/CLI 버전 - 다음 SDK 버전은 SageMaker 추론에서 Amazon Nova 모델을 사용하여 테스트 및 검증되었습니다.
- 리소스 기반 API 접근 방식에 대한 SageMaker Python SDK v3.0.0 이상(sagemaker>=3.0.0)
- API 직접 호출에 대한 Boto3 버전 1.35.0 이상(boto3>=1.35.0). 이 가이드의 예제에서는 이 접근 방식을 사용합니다.

1단계: AWS 자격 증명 구성

다음 방법 중 하나를 사용하여 AWS 자격 증명을 구성합니다.

옵션 1: AWS CLI(권장됨)


aws configure

메시지가 나타나면 AWS 액세스 키 ID, 시크릿 키 및 기본 리전을 입력합니다.

옵션 2: AWS 자격 증명 파일

~/.aws/credentials를 생성 또는 편집합니다.


[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY

옵션 3: 환경 변수


export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key

참고

AWS 자격 증명에 대한 자세한 내용은 구성 및 자격 증명 파일 설정을 참조하세요.

AWS 클라이언트 초기화

다음 코드를 사용하여 Python 스크립트 또는 노트북을 생성해 AWS SDK를 초기화하고 자격 증명을 확인합니다.


import boto3

# AWS Configuration - Update these for your environment
REGION = "us-east-1"  # Supported regions: us-east-1, us-west-2
AWS_ACCOUNT_ID = "YOUR_ACCOUNT_ID"  # Replace with your AWS account ID

# Initialize AWS clients using default credential chain
sagemaker = boto3.client('sagemaker', region_name=REGION)
sts = boto3.client('sts')

# Verify credentials
try:
    identity = sts.get_caller_identity()
    print(f"Successfully authenticated to AWS Account: {identity['Account']}")
    
    if identity['Account'] != AWS_ACCOUNT_ID:
        print(f"Warning: Connected to account {identity['Account']}, expected {AWS_ACCOUNT_ID}")

except Exception as e:
    print(f"Failed to authenticate: {e}")
    print("Please verify your credentials are configured correctly.")

인증에 성공하면 AWS 계정 ID를 확인하는 출력이 표시됩니다.

2단계: SageMaker 실행 역할 생성

SageMaker 실행 역할은 사용자를 대신하여 모델 아티팩트용 Amazon S3 버킷 및 로깅용 CloudWatch와 같은 AWS 리소스에 액세스할 권한을 SageMaker에 부여하는 IAM 역할입니다.

실행 역할 생성

참고

IAM 역할을 생성하려면 iam:CreateRole 및 iam:AttachRolePolicy 권한이 필요합니다. 계속 진행하기 전에 IAM 사용자 또는 역할에 이러한 권한이 있는지 확인합니다.

다음 코드는 Amazon Nova 사용자 지정된 모델을 배포하는 데 필요한 권한을 가진 IAM 역할을 생성합니다.


import json

# Create SageMaker Execution Role
role_name = f"SageMakerInference-ExecutionRole-{AWS_ACCOUNT_ID}"

trust_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {"Service": "sagemaker.amazonaws.com"},
            "Action": "sts:AssumeRole"
        }
    ]
}

iam = boto3.client('iam', region_name=REGION)

# Create the role
role_response = iam.create_role(
    RoleName=role_name,
    AssumeRolePolicyDocument=json.dumps(trust_policy),
    Description='SageMaker execution role with S3 and SageMaker access'
)

# Attach required policies
iam.attach_role_policy(
    RoleName=role_name,
    PolicyArn='arn:aws:iam::aws:policy/AmazonSageMakerFullAccess'
)

iam.attach_role_policy(
    RoleName=role_name,
    PolicyArn='arn:aws:iam::aws:policy/AmazonS3FullAccess'
)

SAGEMAKER_EXECUTION_ROLE_ARN = role_response['Role']['Arn']
print(f"Created SageMaker execution role: {SAGEMAKER_EXECUTION_ROLE_ARN}")

기존 실행 역할 사용(선택 사항)

SageMaker 실행 역할이 이미 있는 경우 대신 다음을 사용할 수 있습니다.


# Replace with your existing role ARN
SAGEMAKER_EXECUTION_ROLE_ARN = "arn:aws:iam::YOUR_ACCOUNT_ID:role/YOUR_EXISTING_ROLE_NAME"

계정에서 기존 SageMaker 역할을 찾으려는 경우:


iam = boto3.client('iam', region_name=REGION)
response = iam.list_roles()
sagemaker_roles = [role for role in response['Roles'] if 'SageMaker' in role['RoleName']]
for role in sagemaker_roles:
    print(f"{role['RoleName']}: {role['Arn']}")

중요

실행 역할에는 Amazon S3 및 SageMaker 리소스에 액세스할 권한과 sagemaker.amazonaws.com과의 신뢰 관계가 있어야 합니다.

SageMaker 실행 역할에 대한 자세한 내용은 SageMaker Roles를 참조하세요.

3단계: 모델 파라미터 구성

Amazon Nova 모델에 대한 배포 파라미터를 구성합니다. 이러한 설정은 모델 동작, 리소스 할당 및 추론 특성을 제어합니다.

필수 파라미터

IMAGE: Amazon Nova 추론 컨테이너에 대한 Docker 컨테이너 이미지 URI. 이는 AWS에서 제공합니다.
CONTEXT_LENGTH: 모델 컨텍스트 길이.
MAX_CONCURRENCY: 반복당 최대 시퀀스 수. GPU의 단일 배치 내에서 동시에 처리할 수 있는 개별 사용자 요청(프롬프트) 수에 대한 제한을 설정합니다. 범위: 0보다 큰 정수.

선택적 생성 파라미터

DEFAULT_TEMPERATURE: 생성 시 무작위성을 제어합니다. 범위: 0.0~2.0(0.0 = 결정적, 높음 = 더 무작위).
DEFAULT_TOP_P: 토큰 선택을 위한 Nucleus 샘플링 임계치. 범위: 1e-10~1.0.
DEFAULT_TOP_K: 가능성이 가장 큰 상위 K개 토큰으로 토큰 선택을 제한합니다. 범위: 정수 -1 이상(-1 = 제한 없음).
DEFAULT_MAX_NEW_TOKENS: 응답으로 생성할 최대 토큰 수(즉, 최대 출력 토큰). 범위: 정수 1 이상.
DEFAULT_LOGPROBS: 토큰당 반환할 로그 확률 수. 범위: 정수 1~20.

배포 구성


# AWS Configuration
REGION = "us-east-1"  # Must match region from Step 1

# ECR Account mapping by region
ECR_ACCOUNT_MAP = {
    "us-east-1": "708977205387",
    "us-west-2": "176779409107"
}

# Container Image - Replace with the image URI provided by your AWS contact
# Two image tags are available (both point to the same image):
IMAGE_LATEST = f"{ECR_ACCOUNT_MAP[REGION]}.dkr.ecr.{REGION}.amazonaws.com/nova-inference-repo:SM-Inference-latest"
IMAGE_VERSIONED = f"{ECR_ACCOUNT_MAP[REGION]}.dkr.ecr.{REGION}.amazonaws.com/nova-inference-repo:v1.0.0"

# Use the versioned tag for production deployments (recommended)
IMAGE = IMAGE_VERSIONED
print(f"IMAGE = {IMAGE}")
print(f"Available tags:")
print(f"  Latest: {IMAGE_LATEST}")
print(f"  Versioned: {IMAGE_VERSIONED}")

# Model Parameters
CONTEXT_LENGTH = "8000"        # Maximum total context length
MAX_CONCURRENCY = "16"         # Maximum concurrent sequences

# Optional: Default generation parameters (uncomment to use)
DEFAULT_TEMPERATURE = "0.0"   # Deterministic output
DEFAULT_TOP_P = "1.0"         # Consider all tokens
# DEFAULT_TOP_K = "50"        # Uncomment to limit to top 50 tokens
# DEFAULT_MAX_NEW_TOKENS = "2048"  # Uncomment to set max output tokens
# DEFAULT_LOGPROBS = "1"      # Uncomment to enable log probabilities

# Build environment variables for the container
environment = {
    'CONTEXT_LENGTH': CONTEXT_LENGTH,
    'MAX_CONCURRENCY': MAX_CONCURRENCY,
}

# Add optional parameters if defined
if 'DEFAULT_TEMPERATURE' in globals():
    environment['DEFAULT_TEMPERATURE'] = DEFAULT_TEMPERATURE
if 'DEFAULT_TOP_P' in globals():
    environment['DEFAULT_TOP_P'] = DEFAULT_TOP_P
if 'DEFAULT_TOP_K' in globals():
    environment['DEFAULT_TOP_K'] = DEFAULT_TOP_K
if 'DEFAULT_MAX_NEW_TOKENS' in globals():
    environment['DEFAULT_MAX_NEW_TOKENS'] = DEFAULT_MAX_NEW_TOKENS
if 'DEFAULT_LOGPROBS' in globals():
    environment['DEFAULT_LOGPROBS'] = DEFAULT_LOGPROBS

print("Environment configuration:")
for key, value in environment.items():
    print(f"  {key}: {value}")

배포별 파라미터 구성

이제 모델 아티팩트 위치 및 인스턴스 유형 선택을 포함하여 Amazon Nova 모델 배포에 대한 특정 파라미터를 구성합니다.

배포 식별자 설정


# Deployment identifier - use a descriptive name for your use case
JOB_NAME = "my-nova-deployment"

모델 아티팩트 위치 지정

훈련된 Amazon Nova 모델 아티팩트가 저장되는 Amazon S3 URI를 제공합니다. 이는 모델 훈련 또는 미세 조정 작업의 출력 위치여야 합니다.


# S3 location of your trained Nova model artifacts
# Replace with your model's S3 URI - must end with /
MODEL_S3_LOCATION = "s3://your-bucket-name/path/to/model/artifacts/"

모델 변형 및 인스턴스 유형 선택


# Configure model variant and instance type
TESTCASE = {
    "model": "micro",              # Options: micro, lite, lite2
    "instance": "ml.g5.12xlarge"   # Refer to "Supported models and instances" section
}

# Generate resource names
INSTANCE_TYPE = TESTCASE["instance"]
MODEL_NAME = JOB_NAME + "-" + TESTCASE["model"] + "-" + INSTANCE_TYPE.replace(".", "-")
ENDPOINT_CONFIG_NAME = MODEL_NAME + "-Config"
ENDPOINT_NAME = MODEL_NAME + "-Endpoint"

print(f"Model Name: {MODEL_NAME}")
print(f"Endpoint Config: {ENDPOINT_CONFIG_NAME}")
print(f"Endpoint Name: {ENDPOINT_NAME}")

이름 지정 규칙

코드는 AWS 리소스에 대한 일관된 이름을 자동으로 생성합니다.

모델 이름: {JOB_NAME}-{model}-{instance-type}
엔드포인트 구성: {MODEL_NAME}-Config
엔드포인트 이름: {MODEL_NAME}-Endpoint

4단계: SageMaker 모델 및 엔드포인트 구성 생성

이 단계에서는 Amazon Nova 모델 아티팩트를 참조하는 SageMaker 모델 객체와 모델 배포 방법을 정의하는 엔드포인트 구성이라는 두 가지 필수 리소스를 생성합니다.

SageMaker 모델: 추론 컨테이너 이미지, 모델 아티팩트 위치 및 환경 구성을 패키징하는 모델 객체. 이는 여러 엔드포인트에 배포할 수 있는 재사용 가능한 리소스입니다.

엔드포인트 구성: 인스턴스 유형, 인스턴스 수 및 모델 변형을 포함하여 배포를 위한 인프라 설정을 정의합니다. 이를 통해 모델 자체와 별도로 배포 설정을 관리할 수 있습니다.

SageMaker 모델 생성

다음 코드는 Amazon Nova 모델 아티팩트를 참조하는 SageMaker 모델을 생성합니다.


try:
    model_response = sagemaker.create_model(
        ModelName=MODEL_NAME,
        PrimaryContainer={
            'Image': IMAGE,
            'ModelDataSource': {
                'S3DataSource': {
                    'S3Uri': MODEL_S3_LOCATION,
                    'S3DataType': 'S3Prefix',
                    'CompressionType': 'None'
                }
            },
            'Environment': environment
        },
        ExecutionRoleArn=SAGEMAKER_EXECUTION_ROLE_ARN,
        EnableNetworkIsolation=True
    )
    print("Model created successfully!")
    print(f"Model ARN: {model_response['ModelArn']}")
    
except sagemaker.exceptions.ClientError as e:
    print(f"Error creating model: {e}")

키 파라미터:

ModelName: 모델의 고유한 식별자
Image: Amazon Nova 추론을 위한 Docker 컨테이너 이미지 URI
ModelDataSource: 모델 아티팩트의 Amazon S3 위치
Environment: 3단계에서 구성된 환경 변수
ExecutionRoleArn: 2단계의 IAM 역할
EnableNetworkIsolation: 향상된 보안을 위해 True로 설정(컨테이너가 아웃바운드 네트워크 직접 호출을 하지 못하도록 방지)

엔드포인트 구성 생성

그런 다음 배포 인프라를 정의하는 엔드포인트 구성을 생성합니다.


# Create Endpoint Configuration
try:
    production_variant = {
        'VariantName': 'primary',
        'ModelName': MODEL_NAME,
        'InitialInstanceCount': 1,
        'InstanceType': INSTANCE_TYPE,
    }
    
    config_response = sagemaker.create_endpoint_config(
        EndpointConfigName=ENDPOINT_CONFIG_NAME,
        ProductionVariants=[production_variant]
    )
    print("Endpoint configuration created successfully!")
    print(f"Config ARN: {config_response['EndpointConfigArn']}")
    
except sagemaker.exceptions.ClientError as e:
    print(f"Error creating endpoint configuration: {e}")

키 파라미터:

VariantName: 이 모델 변형의 식별자(단일 모델 배포의 경우 'primary' 사용)
ModelName: 위에서 생성된 모델 참조
InitialInstanceCount: 배포할 인스턴스 수(1로 시작, 필요한 경우 나중에 조정)
InstanceType: 3단계에서 선택한 ML 인스턴스 유형

리소스 생성 확인

리소스가 성공적으로 생성되었는지 확인할 수 있습니다.


# Describe the model
model_info = sagemaker.describe_model(ModelName=MODEL_NAME)
print(f"Model Status: {model_info['ModelName']} created")

# Describe the endpoint configuration
config_info = sagemaker.describe_endpoint_config(EndpointConfigName=ENDPOINT_CONFIG_NAME)
print(f"Endpoint Config Status: {config_info['EndpointConfigName']} created")

5단계: 엔드포인트 배포

다음 단계는 SageMaker 실시간 엔드포인트를 생성하여 Amazon Nova 모델을 배포하는 것입니다. 이 엔드포인트는 모델을 호스팅하고 추론 요청을 위해 안전한 HTTPS 엔드포인트를 제공합니다.

엔드포인트 생성은 일반적으로 15~30분이 걸립니다. AWS에서 인프라를 프로비저닝하고, 모델 아티팩트를 다운로드하며, 추론 컨테이너를 초기화하기 때문입니다.

엔드포인트 생성


import time

try:
    endpoint_response = sagemaker.create_endpoint(
        EndpointName=ENDPOINT_NAME,
        EndpointConfigName=ENDPOINT_CONFIG_NAME
    )
    print("Endpoint creation initiated successfully!")
    print(f"Endpoint ARN: {endpoint_response['EndpointArn']}")
except Exception as e:
    print(f"Error creating endpoint: {e}")

엔드포인트 생성 모니터링

다음 코드는 배포가 완료될 때까지 엔드포인트 상태를 폴링합니다.


# Monitor endpoint creation progress
print("Waiting for endpoint creation to complete...")
print("This typically takes 15-30 minutes...\n")

while True:
    try:
        response = sagemaker.describe_endpoint(EndpointName=ENDPOINT_NAME)
        status = response['EndpointStatus']
        
        if status == 'Creating':
            print(f"⏳ Status: {status} - Provisioning infrastructure and loading model...")
        elif status == 'InService':
            print(f"✅ Status: {status}")
            print("\nEndpoint creation completed successfully!")
            print(f"Endpoint Name: {ENDPOINT_NAME}")
            print(f"Endpoint ARN: {response['EndpointArn']}")
            break
        elif status == 'Failed':
            print(f"❌ Status: {status}")
            print(f"Failure Reason: {response.get('FailureReason', 'Unknown')}")
            print("\nFull response:")
            print(response)
            break
        else:
            print(f"Status: {status}")
        
    except Exception as e:
        print(f"Error checking endpoint status: {e}")
        break
    
    time.sleep(30)  # Check every 30 seconds

엔드포인트가 준비되었는지 확인

엔드포인트가 InService 상태가 되면 해당 구성을 확인할 수 있습니다.


# Get detailed endpoint information
endpoint_info = sagemaker.describe_endpoint(EndpointName=ENDPOINT_NAME)

print("\n=== Endpoint Details ===")
print(f"Endpoint Name: {endpoint_info['EndpointName']}")
print(f"Endpoint ARN: {endpoint_info['EndpointArn']}")
print(f"Status: {endpoint_info['EndpointStatus']}")
print(f"Creation Time: {endpoint_info['CreationTime']}")
print(f"Last Modified: {endpoint_info['LastModifiedTime']}")

# Get endpoint config for instance type details
endpoint_config_name = endpoint_info['EndpointConfigName']
endpoint_config = sagemaker.describe_endpoint_config(EndpointConfigName=endpoint_config_name)

# Display production variant details
for variant in endpoint_info['ProductionVariants']:
    print(f"\nProduction Variant: {variant['VariantName']}")
    print(f"  Current Instance Count: {variant['CurrentInstanceCount']}")
    print(f"  Desired Instance Count: {variant['DesiredInstanceCount']}")
    # Get instance type from endpoint config
    for config_variant in endpoint_config['ProductionVariants']:
        if config_variant['VariantName'] == variant['VariantName']:
            print(f"  Instance Type: {config_variant['InstanceType']}")
            break

엔드포인트 생성 실패 문제 해결

일반적인 실패 이유:

용량 부족: 요청한 인스턴스 유형을 사용자 리전에서 사용할 수 없음
- 해결 방법: 다른 인스턴스 유형을 시도하거나 할당량 증가 요청
IAM 권한: 실행 역할에 필요한 권한이 없음
- 해결 방법: 역할에 Amazon S3 모델 아티팩트 및 필요한 SageMaker 권한에 대한 액세스 권한이 있는지 확인
모델 아티팩트를 찾을 수 없음: Amazon S3 URI가 잘못되었거나 이에 액세스할 수 없음
- 해결 방법: Amazon S3 URI를 확인하고 버킷 권한을 확인한 다음, 올바른 리전에 있는지 확인
리소스 제한: 엔드포인트 또는 인스턴스에 대한 계정 제한을 초과함
- 해결 방법: Service Quotas 또는 AWS Support를 통해 서비스 할당량 증가 요청

참고

실패한 엔드포인트를 삭제하고 다시 시작해야 하는 경우:


sagemaker.delete_endpoint(EndpointName=ENDPOINT_NAME)

6단계: 엔드포인트 간접 호출

엔드포인트가 InService 상태가 되면 추론 요청을 전송하여 Amazon Nova 모델에서 예측을 생성할 수 있습니다. SageMaker는 동기식 엔드포인트(스트리밍/비스트리밍 모드에서 실시간) 및 비동기식 엔드포인트(배치 처리를 위한 Amazon S3 기반)를 지원합니다.

런타임 클라이언트 설정

적절한 제한 시간 설정을 사용하여 SageMaker 런타임 클라이언트를 생성합니다.


import json
import boto3
import botocore
from botocore.exceptions import ClientError

# Configure client with appropriate timeouts
config = botocore.config.Config(
    read_timeout=120,      # Maximum time to wait for response
    connect_timeout=10,    # Maximum time to establish connection
    retries={'max_attempts': 3}  # Number of retry attempts
)

# Create SageMaker Runtime client
runtime_client = boto3.client('sagemaker-runtime', config=config, region_name=REGION)

범용 추론 함수 생성

다음 함수는 스트리밍 요청과 비스트리밍 요청을 모두 처리합니다.


def invoke_nova_endpoint(request_body):
    """
    Invoke Nova endpoint with automatic streaming detection.
    
    Args:
        request_body (dict): Request payload containing prompt and parameters
    
    Returns:
        dict: Response from the model (for non-streaming requests)
        None: For streaming requests (prints output directly)
    """
    body = json.dumps(request_body)
    is_streaming = request_body.get("stream", False)
    
    try:
        print(f"Invoking endpoint ({'streaming' if is_streaming else 'non-streaming'})...")
        
        if is_streaming:
            response = runtime_client.invoke_endpoint_with_response_stream(
                EndpointName=ENDPOINT_NAME,
                ContentType='application/json',
                Body=body
            )
            
            event_stream = response['Body']
            for event in event_stream:
                if 'PayloadPart' in event:
                    chunk = event['PayloadPart']
                    if 'Bytes' in chunk:
                        data = chunk['Bytes'].decode()
                        print("Chunk:", data)
        else:
            # Non-streaming inference
            response = runtime_client.invoke_endpoint(
                EndpointName=ENDPOINT_NAME,
                ContentType='application/json',
                Accept='application/json',
                Body=body
            )
            
            response_body = response['Body'].read().decode('utf-8')
            result = json.loads(response_body)
            print("✅ Response received successfully")
            return result
    
    except ClientError as e:
        error_code = e.response['Error']['Code']
        error_message = e.response['Error']['Message']
        print(f"❌ AWS Error: {error_code} - {error_message}")
    except Exception as e:
        print(f"❌ Unexpected error: {str(e)}")

예제 1: 비스트리밍 채팅 완료

대화형 상호 작용에 채팅 형식을 사용합니다.


# Non-streaming chat request
chat_request = {
    "messages": [
        {"role": "user", "content": "Hello! How are you?"}
    ],
    "max_tokens": 100,
    "max_completion_tokens": 100,  # Alternative to max_tokens
    "stream": False,
    "temperature": 0.7,
    "top_p": 0.9,
    "top_k": 50,
    "logprobs": True,
    "top_logprobs": 3,
    "allowed_token_ids": None,  # List of allowed token IDs
    "truncate_prompt_tokens": None,  # Truncate prompt to this many tokens
    "stream_options": None
}

response = invoke_nova_endpoint(chat_request)

예제 2: 간단한 텍스트 완료

간단한 텍스트 생성에 대한 완료 형식을 사용합니다.


# Simple completion request
completion_request = {
    "prompt": "The capital of France is",
    "max_tokens": 50,
    "stream": False,
    "temperature": 0.0,
    "top_p": 1.0,
    "top_k": -1,  # -1 means no limit
    "logprobs": 3,  # Number of log probabilities to return
    "allowed_token_ids": None,  # List of allowed token IDs
    "truncate_prompt_tokens": None,  # Truncate prompt to this many tokens
    "stream_options": None
}

response = invoke_nova_endpoint(completion_request)

예제 3: 스트리밍 채팅 완료


# Streaming chat request
streaming_request = {
    "messages": [
        {"role": "user", "content": "Tell me a short story about a robot"}
    ],
    "max_tokens": 200,
    "stream": True,
    "temperature": 0.7,
    "top_p": 0.95,
    "top_k": 40,
    "logprobs": True,
    "top_logprobs": 2,
    "stream_options": {"include_usage": True}
}

invoke_nova_endpoint(streaming_request)

예제 4: 멀티모달 채팅 완료

이미지 및 텍스트 입력에 멀티모달 형식을 사용합니다.


# Multimodal chat request (if supported by your model)
multimodal_request = {
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
            ]
        }
    ],
    "max_tokens": 150,
    "temperature": 0.3,
    "top_p": 0.8,
    "stream": False
}

response = invoke_nova_endpoint(multimodal_request)

7단계: 리소스 정리(선택 사항)

불필요한 비용이 발생하지 않도록 이 자습서 중에 생성한 AWS 리소스를 삭제합니다. 추론 요청을 적극적으로 생성하지 않더라도 SageMaker 엔드포인트에서는 실행 중에 요금이 발생합니다.

중요

테이블 삭제는 영구적이며 실행 취소할 수 없습니다. 계속 진행하기 전에 이러한 리소스가 더 이상 필요하지 않은지 확인합니다.

엔드포인트 삭제


import boto3

# Initialize SageMaker client
sagemaker = boto3.client('sagemaker', region_name=REGION)

try:
    print("Deleting endpoint...")
    sagemaker.delete_endpoint(EndpointName=ENDPOINT_NAME)
    print(f"✅ Endpoint '{ENDPOINT_NAME}' deletion initiated")
    print("Charges will stop once deletion completes (typically 2-5 minutes)")
except Exception as e:
    print(f"❌ Error deleting endpoint: {e}")

참고

엔드포인트 삭제는 비동기식입니다. 삭제 상태를 모니터링할 수 있습니다.


import time

print("Monitoring endpoint deletion...")
while True:
    try:
        response = sagemaker.describe_endpoint(EndpointName=ENDPOINT_NAME)
        status = response['EndpointStatus']
        print(f"Status: {status}")
        time.sleep(10)
    except sagemaker.exceptions.ClientError as e:
        if e.response['Error']['Code'] == 'ValidationException':
            print("✅ Endpoint successfully deleted")
            break
        else:
            print(f"Error: {e}")
            break

엔드포인트 구성 삭제

엔드포인트가 삭제된 후 엔드포인트 구성을 제거합니다.


try:
    print("Deleting endpoint configuration...")
    sagemaker.delete_endpoint_config(EndpointConfigName=ENDPOINT_CONFIG_NAME)
    print(f"✅ Endpoint configuration '{ENDPOINT_CONFIG_NAME}' deleted")
except Exception as e:
    print(f"❌ Error deleting endpoint configuration: {e}")

모델 삭제

SageMaker 모델 객체를 제거합니다.


try:
    print("Deleting model...")
    sagemaker.delete_model(ModelName=MODEL_NAME)
    print(f"✅ Model '{MODEL_NAME}' deleted")
except Exception as e:
    print(f"❌ Error deleting model: {e}")

javascript가 브라우저에서 비활성화되거나 사용이 불가합니다.

AWS 설명서를 사용하려면 Javascript가 활성화되어야 합니다. 지침을 보려면 브라우저의 도움말 페이지를 참조하십시오.

문서 규칙

SageMaker 추론

API 참조