Object Detection - MXNet
The Amazon SageMaker AI Object Detection - MXNet algorithm detects and classifies objects in images using a
single deep neural network. It is a supervised learning algorithm that takes images as input
and identifies all instances of objects within the image scene. The object is categorized
into one of the classes in a specified collection with a confidence score that it belongs to
the class. Its location and scale in the image are indicated by a rectangular bounding box.
It uses the Single Shot multibox Detector
(SSD)
Topics
Input/Output Interface for the Object Detection Algorithm
The SageMaker AI Object Detection algorithm supports both RecordIO
(application/x-recordio
) and image (image/png
,
image/jpeg
, and application/x-image
) content types for
training in file mode and supports RecordIO (application/x-recordio
) for
training in pipe mode. However you can also train in pipe mode using the image files
(image/png
, image/jpeg
, and
application/x-image
), without creating RecordIO files, by using the
augmented manifest format. The recommended input format for the Amazon SageMaker AI object
detection algorithms is Apache MXNet
RecordIOapplication/x-image
for inference.
Note
To maintain better interoperability with existing deep learning frameworks, this differs from the protobuf data formats commonly used by other Amazon SageMaker AI algorithms.
See the Object Detection Sample Notebooks for more details on data formats.
Train with the RecordIO Format
If you use the RecordIO format for training, specify both train and validation
channels as values for the InputDataConfig
parameter of the CreateTrainingJob
request. Specify one RecordIO (.rec) file
in the train channel and one RecordIO file in the validation channel. Set the
content type for both channels to application/x-recordio
. An example of
how to generate RecordIO file can be found in the object detection sample notebook.
You can also use tools from the MXNet's
GluonCV
Train with the Image Format
If you use the image format for training, specify train
,
validation
, train_annotation
, and
validation_annotation
channels as values for the
InputDataConfig
parameter of CreateTrainingJob
request. Specify the individual image
data (.jpg or .png) files for the train and validation channels. For annotation
data, you can use the JSON format. Specify the corresponding .json files in the
train_annotation
and validation_annotation
channels.
Set the content type for all four channels to image/png
or
image/jpeg
based on the image type. You can also use the content
type application/x-image
when your dataset contains both .jpg and .png
images. The following is an example of a .json file.
{ "file": "your_image_directory/sample_image1.jpg", "image_size": [ { "width": 500, "height": 400, "depth": 3 } ], "annotations": [ { "class_id": 0, "left": 111, "top": 134, "width": 61, "height": 128 }, { "class_id": 0, "left": 161, "top": 250, "width": 79, "height": 143 }, { "class_id": 1, "left": 101, "top": 185, "width": 42, "height": 130 } ], "categories": [ { "class_id": 0, "name": "dog" }, { "class_id": 1, "name": "cat" } ] }
Each image needs a .json file for annotation, and the .json file should have the
same name as the corresponding image. The name of above .json file should be
"sample_image1.json". There are four properties in the annotation .json file. The
property "file" specifies the relative path of the image file. For example, if your
training images and corresponding .json files are stored in
s3://your_bucket
/train/sample_image and
s3://your_bucket
/train_annotation, specify the path
for your train and train_annotation channels as
s3://your_bucket
/train and
s3://your_bucket
/train_annotation, respectively.
In the .json file, the relative path for an image named sample_image1.jpg should
be sample_image/sample_image1.jpg. The "image_size"
property specifies
the overall image dimensions. The SageMaker AI object detection algorithm currently only
supports 3-channel images. The "annotations"
property specifies the
categories and bounding boxes for objects within the image. Each object is annotated
by a "class_id"
index and by four bounding box coordinates
("left"
, "top"
, "width"
,
"height"
). The "left"
(x-coordinate) and
"top"
(y-coordinate) values represent the upper-left corner of the
bounding box. The "width"
(x-coordinate) and "height"
(y-coordinate) values represent the dimensions of the bounding box. The origin (0,
0) is the upper-left corner of the entire image. If you have multiple objects within
one image, all the annotations should be included in a single .json file. The
"categories"
property stores the mapping between the class index
and class name. The class indices should be numbered successively and the numbering
should start with 0. The "categories"
property is optional for the
annotation .json file
Train with Augmented Manifest Image Format
The augmented manifest format enables you to do training in pipe mode using image
files without needing to create RecordIO files. You need to specify both train and
validation channels as values for the InputDataConfig
parameter of the
CreateTrainingJob
request. While using the format, an S3
manifest file needs to be generated that contains the list of images and their
corresponding annotations. The manifest file format should be in JSON Lines'source-ref'
tag that points to the S3 location of the image. The annotations are provided under
the "AttributeNames"
parameter value as specified in the CreateTrainingJob
request. It can also contain additional
metadata under the metadata
tag, but these are ignored by the
algorithm. In the following example, the "AttributeNames
are contained
in the list ["source-ref", "bounding-box"]
:
{"source-ref": "s3://your_bucket/image1.jpg", "bounding-box":{"image_size":[{ "width": 500, "height": 400, "depth":3}], "annotations":[{"class_id": 0, "left": 111, "top": 134, "width": 61, "height": 128}, {"class_id": 5, "left": 161, "top": 250, "width": 80, "height": 50}]}, "bounding-box-metadata":{"class-map":{"0": "dog", "5": "horse"}, "type": "groundtruth/object-detection"}} {"source-ref": "s3://your_bucket/image2.jpg", "bounding-box":{"image_size":[{ "width": 400, "height": 300, "depth":3}], "annotations":[{"class_id": 1, "left": 100, "top": 120, "width": 43, "height": 78}]}, "bounding-box-metadata":{"class-map":{"1": "cat"}, "type": "groundtruth/object-detection"}}
The order of "AttributeNames"
in the input files matters when
training the Object Detection algorithm. It accepts piped data in a specific order,
with image
first, followed by annotations
. So the
"AttributeNames" in this example are provided with "source-ref"
first,
followed by "bounding-box"
. When using Object Detection with Augmented
Manifest, the value of parameter RecordWrapperType
must be set as
"RecordIO"
.
For more information on augmented manifest files, see Augmented Manifest Files for Training Jobs.
Incremental Training
You can also seed the training of a new model with the artifacts from a model that you trained previously with SageMaker AI. Incremental training saves training time when you want to train a new model with the same or similar data. SageMaker AI object detection models can be seeded only with another built-in object detection model trained in SageMaker AI.
To use a pretrained model, in the CreateTrainingJob
request, specify the
ChannelName
as "model" in the InputDataConfig
parameter. Set the ContentType
for the model channel to
application/x-sagemaker-model
. The input hyperparameters of both
the new model and the pretrained model that you upload to the model channel must
have the same settings for the base_network
and
num_classes
input parameters. These parameters define the network
architecture. For the pretrained model file, use the compressed model artifacts (in
.tar.gz format) output by SageMaker AI. You can use either RecordIO or image formats for
input data.
For more information on incremental training and for instructions on how to use it, see Use Incremental Training in Amazon SageMaker AI.
EC2 Instance Recommendation for the Object Detection Algorithm
The object detection algorithm supports P2, P3, G4dn, and G5 GPU instance families. We recommend using GPU instances with more memory for training with large batch sizes. You can run the object detection algorithm on multi-GPU and mult-machine settings for distributed training.
You can use both CPU (such as C5 and M5) and GPU (such as P3 and G4dn) instances for inference.
Object Detection Sample Notebooks
For a sample notebook that shows how to use the SageMaker AI Object Detection algorithm to train and host a model on the
Caltech Birds
(CUB 200 2011)
For more information about the Amazon SageMaker AI Object Detection algorithm, see the following blog posts: