Amazon SageMaker
Developer Guide

Object Detection Hyperparameters

In the CreateTrainingJob request, you specify the training algorithm that you want to use. You can also specify algorithm-specific hyperparameters that are used to help estimate the parameters of the model from a training dataset. The following table lists the hyperparameters provided by Amazon SageMaker for training the object detection algorithm. For more information about how object training works, see How Object Detection Works.

Parameter Name Description

The base network architecture to use. Optional.

Valid values: 'vgg-16' or 'resnet-50'

Default value: 'vgg-16'


Flag to indicate whether to use a pre-trained model for training. If set to 1, then the pre-trained model with corresponding architecture is loaded and used for training. Otherwise, the network is trained from scratch. Optional.

Valid values: 0 or 1

Default: 1


Number of output classes. This parameter defines the dimensions of the network output and is typically set to the number of classes in the dataset. Required.

Valid values: positive integer

Default: -


Number of training epochs. Optional.

Valid values: positive integer

Default: 30


Initial learning rate. Optional.

Valid values: float in (0, 1]

Default: 0.001


The ratio to reduce learning rate. Used in conjunction with the lr_scheduler_step parameter defined as lr_new = lr_old * lr_scheduler_factor. Optional.

Valid values: float in (0, 1)

Default: 0.1


The epochs at which to reduce the learning rate. The learning rate is reduced by lr_scheduler_factor at epochs listed in a comma-delimited string: "epoch1, epoch2, ...". For example, if the value is set to "10, 20" and the lr_scheduler_factor is set to 1/2, then the learning rate is halved after 10th epoch and then halved again after 20th epoch. Optional.

Valid values: string

Default: -


The optimizer types. For details on optimizer values, see MXnet's API. Optional.

Valid values: ['sgd', 'adam', 'rmsprop', 'adadelta']

Default: 'sgd'


The momentum for sgd. Ignored for other optimizers. Optional.

Valid values: float in (0, 1]

Default: 0.9


The weight decay coefficient for sgd and rmsprop. Ignored for other optimizers. Optional.

Valid values: float in (0, 1)

Default: 0.0005


The batch size for training. In a single-machine multi-gpu setting, each gpu handles mini_batch_size/num_gpu training samples. For the multi-machine training in dist_sync mode, the actual batch size is mini_batch_size*number of machines. A large mini_batch_size usually leads to faster training, but it may cause out of memory problem. The memory usage is related to mini_batch_size, image_shape, and base_network architecture. For example, on a single p3.2xlarge instance, the largest mini_batch_size without an out of memory error is 32 with the base_network set to "resnet-50" and an image_shape of 300. With the same instance, you can use 64 as the mini_batch_size with with the base network vgg-16 and an image_shape of 300. Optional.

Valid values: positive integer

Default: 32


The image size for input images. We rescale the input image to a square image with this size. We recommend using 300 and 512 for better performance.

Valid values: positive integer ≥300

Default: 300


Force padding label width to sync across training and validation data. For example, if one image in the data contains at most 10 objects, and each object's annotation is specified with 5 numbers, [class_id, left, top, width, height], then the label_width should be no smaller than (10*5 + header information length). The header information length is usually 2. We recommend using a slightly larger label_width for the training, such as 60 for this example. Optional.

Valid values: Positive integer large enough to accommodate the largest annotation information length in the data.

Default: 350


Number of training examples in the input dataset. Required.


If there is a mismatch between this value and the number of samples in the training set, then the behavior of the lr_scheduler_step parameter will be undefined and distributed training accuracy may be affected.

Valid values: positive integer

Default: -


Non-maximum suppression threshold. Optional.

Valid values: float in (0, 1]

Default: 0.45


Evaluation overlap threshold. Optional.

Valid values: float in (0, 1]

Default: 0.5


The regular expression (regex) for freezing layers in the base network. For example, if we set freeze_layer_pattern = "^(conv1_|conv2_).*", then any layers with a name that contains "conv1_" or "conv2_" are frozen, which means that the weights for these layers are not updated during training. The layer names can be found in the network symbol files here and here. Optional.

Valid values: string

Default: -


Weight update synchronization mode for distributed training. The weights can be updated either synchronously or asynchronously across machines. Synchronous updates typically provide better accuracy than asynchronous updates but can be slower. See the Distributed Training MXnet tutorial for details. Optional.


This parameter is not applicable to single machine training.

Valid values: 'dist_sync' or 'dist_async'

  • 'dist_sync': The gradients are synchronized after every batch with all the workers. With dist_sync, batch-size now means the batch size used on each machine. So if there are n machines and we use batch size b, then dist_sync behaves like a single machine with batch size n*b.

  • 'dist_async': Performs asynchronous updates. The weights are updated whenever gradients are received from any machine and the weight updates are atomic. However, the order is not guaranteed.

Default: -