Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Request Inferences from a Deployed Service (Amazon SageMaker SDK)

Focus mode
Request Inferences from a Deployed Service (Amazon SageMaker SDK) - Amazon SageMaker AI

Use the following the code examples to request inferences from your deployed service based on the framework you used to train your model. The code examples for the different frameworks are similar. The main difference is that TensorFlow requires application/json as the content type.

PyTorch and MXNet

If you are using PyTorch v1.4 or later or MXNet 1.7.0 or later and you have an Amazon SageMaker AI endpoint InService, you can make inference requests using the predictor package of the SageMaker AI SDK for Python.

Note

The API varies based on the SageMaker AI SDK for Python version:

The following code example shows how to use these APIs to send an image for inference:

SageMaker Python SDK v1.x
from sagemaker.predictor import RealTimePredictor endpoint = 'insert name of your endpoint here' # Read image into memory payload = None with open("image.jpg", 'rb') as f: payload = f.read() predictor = RealTimePredictor(endpoint=endpoint, content_type='application/x-image') inference_response = predictor.predict(data=payload) print (inference_response)
SageMaker Python SDK v2.x
from sagemaker.predictor import Predictor endpoint = 'insert name of your endpoint here' # Read image into memory payload = None with open("image.jpg", 'rb') as f: payload = f.read() predictor = Predictor(endpoint) inference_response = predictor.predict(data=payload) print (inference_response)
from sagemaker.predictor import RealTimePredictor endpoint = 'insert name of your endpoint here' # Read image into memory payload = None with open("image.jpg", 'rb') as f: payload = f.read() predictor = RealTimePredictor(endpoint=endpoint, content_type='application/x-image') inference_response = predictor.predict(data=payload) print (inference_response)

TensorFlow

The following code example shows how to use the SageMaker Python SDK API to send an image for inference:

from sagemaker.predictor import Predictor from PIL import Image import numpy as np import json endpoint = 'insert the name of your endpoint here' # Read image into memory image = Image.open(input_file) batch_size = 1 image = np.asarray(image.resize((224, 224))) image = image / 128 - 1 image = np.concatenate([image[np.newaxis, :, :]] * batch_size) body = json.dumps({"instances": image.tolist()}) predictor = Predictor(endpoint) inference_response = predictor.predict(data=body) print(inference_response)
PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.