Supported Data Formats - Amazon SageMaker

Supported Data Formats

When you create an input manifest file for a built-in task types manually, your input data must be in one of the following support file formats for the respective input data type. To learn about automated data setup, see Automated Data Setup.

Tip

When you use the automated data setup, additional data formats can be used to generate an input manifest file for video frame and text based task types.

Task Types Input Data Type Support Formats Example Input Manifest Line

Bounding Box, Semantic Segmentation, Image Classification (Single Label and Multi-label), Verify and Adjust Labels

Image

.jpg, .jpeg, .png

{"source-ref": "s3://DOC-EXAMPLE-BUCKET1/example-image.png"}

Named Entity Recognition, Text Classification (Single and Multi-Label)

Text Raw text
{"source": "Lorem ipsum dolor sit amet"}

Video Classification

Video clips .mp4, .ogg, and .webm
{"source-ref": "s3:///example-video.mp4"}
Video Frame Object Detection, Video Frame Object Tracking (bounding boxes, polylines, polygons or keypoint) Video frames and video frame sequence files (for Object Tracking)

Video frames: .jpg, .jpeg, .png

Sequence files: .json

Refer to Create a Video Frame Input Manifest File.

3D Point Cloud Semantic Segmentation, 3D Point Cloud Object Detection, 3D Point Cloud Object Tracking

Point clouds and point cloud sequence files (for Object Tracking)

Point clouds: Binary pack format and ASCII. For more information see Accepted Raw 3D Data Formats.

Sequence files: .json

Refer to Create an Input Manifest File for a 3D Point Cloud Labeling Job.