Object Detection Algorithm
The Amazon SageMaker Object Detection algorithm detects and classifies objects in
images
using a single deep neural network. It is a supervised learning algorithm that takes
images
as input and identifies all instances of objects within the image scene. The object
is
categorized into one of the classes in a specified collection with a confidence score
that
it belongs to the class. Its location and scale in the image are indicated by a rectangular
bounding box. It uses the Single Shot
multibox Detector (SSD)
Topics
Input/Output Interface for the Object Detection Algorithm
The SageMaker Object Detection algorithm supports both RecordIO
(application/x-recordio
) and image (image/png
,
image/jpeg
, and application/x-image
) content types for
training in file mode and supports RecordIO (application/x-recordio
) for
training in pipe mode. However you can also train in pipe mode using the image files
(image/png
, image/jpeg
, and
application/x-image
), without creating RecordIO files, by using the
augmented manifest format. The recommended input format for the Amazon SageMaker object
detection algorithms is Apache
MXNet RecordIOapplication/x-image
for inference.
To maintain better interoperability with existing deep learning frameworks, this differs from the protobuf data formats commonly used by other Amazon SageMaker algorithms.
See the Object Detection Sample Notebooks for more details on data formats.
Train with the RecordIO Format
If you use the RecordIO format for training, specify both train and validation
channels as values for the InputDataConfig
parameter of the
CreateTrainingJob
request.
Specify one RecordIO (.rec) file in the train channel and one RecordIO file in the
validation channel. Set the content type for both channels to
application/x-recordio
. An example of how to generate RecordIO file
can be found in the object detection sample notebook. You can also use tools from
the MXNet's GluonCV
Train with the Image Format
If you use the image format for training, specify train
,
validation
, train_annotation
, and
validation_annotation
channels as values for the
InputDataConfig
parameter of
CreateTrainingJob
request. Specify the individual image
data (.jpg or .png) files for the train and validation channels. For annotation
data, you can use the JSON format. Specify the corresponding .json files in the
train_annotation
and validation_annotation
channels.
Set the content type for all four channels to image/png
or
image/jpeg
based on the image type. You can also use the content
type application/x-image
when your dataset contains both .jpg and .png
images. The following is an example of a .json file.
{ "file": "your_image_directory/sample_image1.jpg", "image_size": [ { "width": 500, "height": 400, "depth": 3 } ], "annotations": [ { "class_id": 0, "left": 111, "top": 134, "width": 61, "height": 128 }, { "class_id": 0, "left": 161, "top": 250, "width": 79, "height": 143 }, { "class_id": 1, "left": 101, "top": 185, "width": 42, "height": 130 } ], "categories": [ { "class_id": 0, "name": "dog" }, { "class_id": 1, "name": "cat" } ] }
Each image needs a .json file for annotation, and the .json file should have the
same name as the corresponding image. The name of above .json file should be
"sample_image1.json". There are four properties in the annotation .json file. The
property "file" specifies the relative path of the image file. For example, if your
training images and corresponding .json files are stored in
s3://your_bucket
/train/sample_image and
s3://your_bucket
/train_annotation, specify the path
for your train and train_annotation channels as
s3://your_bucket
/train and
s3://your_bucket
/train_annotation, respectively.
In the .json file, the relative path for an image named sample_image1.jpg should
be sample_image/sample_image1.jpg. The "image_size"
property specifies
the overall image dimensions. The SageMaker object detection algorithm currently
only supports 3-channel images. The "annotations"
property specifies
the categories and bounding boxes for objects within the image. Each object is
annotated by a "class_id"
index and by four bounding box coordinates
("left"
, "top"
, "width"
,
"height"
). The "left"
(x-coordinate) and
"top"
(y-coordinate) values represent the upper-left corner of the
bounding box. The "width"
(x-coordinate) and "height"
(y-coordinate) values represent the dimensions of the bounding box. The origin (0,
0) is the upper-left corner of the entire image. If you have multiple objects within
one image, all the annotations should be included in a single .json file. The
"categories"
property stores the mapping between the class index
and class name. The class indices should be numbered successively and the numbering
should start with 0. The "categories"
property is optional for the
annotation .json file
Train with Augmented Manifest Image Format
The augmented manifest format enables you to do training in pipe mode using image
files without needing to create RecordIO files. You need to specify both train and
validation channels as values for the InputDataConfig
parameter of the
CreateTrainingJob
request. While using the format, an S3 manifest file needs to be generated that
contains the list of images and their corresponding annotations. The manifest file
format should be in JSON Lines'source-ref'
tag that points to the S3 location of the image. The
annotations are provided under the "AttributeNames"
parameter value as
specified in the
CreateTrainingJob
request. It can also contain additional
metadata under the metadata
tag, but these are ignored by the
algorithm. In the following example, the "AttributeNames
are contained
in the list ["source-ref", "bounding-box"]
:
{"source-ref": "s3://your_bucket/image1.jpg", "bounding-box":{"image_size":[{ "width": 500, "height": 400, "depth":3}], "annotations":[{"class_id": 0, "left": 111, "top": 134, "width": 61, "height": 128}, {"class_id": 5, "left": 161, "top": 250, "width": 80, "height": 50}]}, "bounding-box-metadata":{"class-map":{"0": "dog", "5": "horse"}, "type": "groundtruth/object-detection"}} {"source-ref": "s3://your_bucket/image2.jpg", "bounding-box":{"image_size":[{ "width": 400, "height": 300, "depth":3}], "annotations":[{"class_id": 1, "left": 100, "top": 120, "width": 43, "height": 78}]}, "bounding-box-metadata":{"class-map":{"1": "cat"}, "type": "groundtruth/object-detection"}}
The order of "AttributeNames"
in the input files matters when
training the Object Detection algorithm. It accepts piped data in a specific order,
with image
first, followed by annotations
. So the
"AttributeNames" in this example are provided with "source-ref"
first,
followed by "bounding-box"
. When using Object Detection with Augmented
Manifest, the value of parameter RecordWrapperType
must be set as
"RecordIO"
.
For more information on augmented manifest files, see Provide Dataset Metadata to Training Jobs with an Augmented Manifest File.
Incremental Training
You can also seed the training of a new model with the artifacts from a model that you trained previously with SageMaker. Incremental training saves training time when you want to train a new model with the same or similar data. SageMaker object detection models can be seeded only with another built-in object detection model trained in SageMaker.
To use a pretrained model, in the
CreateTrainingJob
request, specify the
ChannelName
as "model" in the InputDataConfig
parameter. Set the ContentType
for the model channel to
application/x-sagemaker-model
. The input hyperparameters of both
the new model and the pretrained model that you upload to the model channel must
have the same settings for the base_network
and
num_classes
input parameters. These parameters define the network
architecture. For the pretrained model file, use the compressed model artifacts (in
.tar.gz format) output by SageMaker. You can use either RecordIO or image formats
for
input data.
For a sample notebook that shows how to use incremental training with the SageMaker
object detection algorithm, see SageMaker Object Detection Incremental Training
EC2 Instance Recommendation for the Object Detection Algorithm
For object detection, we support the following GPU instances for training:
ml.p2.xlarge
, ml.p2.8xlarge
, ml.p2.16xlarge
,
ml.p3.2xlarge
, ml.p3.8xlarge
and
ml.p3.16xlarge
. We recommend using GPU instances with more memory for
training with large batch sizes. You can also run the algorithm on multi-GPU and
multi-machine settings for distributed training. However, both CPU (such as C5 and
M5)
and GPU (such as P2 and P3) instances can be used for the inference. All the supported
instance types for inference are itemized on Amazon SageMaker ML
Instance Types
Object Detection Sample Notebooks
For a sample notebook that shows how to use the SageMaker Object Detection algorithm
to
train and host a model on the COCO dataset using the Single Shot multibox Detector
algorithm, see Object Detection using the Image and JSON format