Hyperparameters are parameters that are set before a machine learning model begins learning. The following hyperparameters are supported by the Amazon SageMaker AI built-in Image Classification algorithm. See Tune an Image Classification Model for information on image classification hyperparameter tuning.
Parameter Name | Description |
---|---|
num_classes |
Number of output classes. This parameter defines the dimensions of the network output and is typically set to the number of classes in the dataset. Besides multi-class classification, multi-label classification is supported too. Please refer to Input/Output Interface for the Image Classification Algorithm for details on how to work with multi-label classification with augmented manifest files. Required Valid values: positive integer |
num_training_samples |
Number of training examples in the input dataset. If there is a mismatch between this value and the number of
samples in the training set, then the behavior of the
Required Valid values: positive integer |
augmentation_type |
Data augmentation type. The input images can be augmented in multiple ways as specified below.
Optional Valid values: Default value: no default value |
beta_1 |
The beta1 for Optional Valid values: float. Range in [0, 1]. Default value: 0.9 |
beta_2 |
The beta2 for Optional Valid values: float. Range in [0, 1]. Default value: 0.999 |
checkpoint_frequency |
Period to store model parameters (in number of epochs). Note that all checkpoint files are saved as part of the final model file "model.tar.gz" and uploaded to S3 to the specified model location. This increases the size of the model file proportionally to the number of checkpoints saved during training. Optional Valid values: positive integer no greater than
Default value: no default value (Save checkpoint at the epoch that has the best validation accuracy) |
early_stopping |
Optional Valid values: Default value: |
early_stopping_min_epochs |
The minimum number of epochs that must be run before the early
stopping logic can be invoked. It is used only when
Optional Valid values: positive integer Default value: 10 |
early_stopping_patience |
The number of epochs to wait before ending training if no
improvement is made in the relevant metric. It is used only when
Optional Valid values: positive integer Default value: 5 |
early_stopping_tolerance |
Relative tolerance to measure an improvement in accuracy
validation metric. If the ratio of the improvement in accuracy
divided by the previous best accuracy is smaller than the
Optional Valid values: 0 ≤ float ≤ 1 Default value: 0.0 |
epochs |
Number of training epochs. Optional Valid values: positive integer Default value: 30 |
eps |
The epsilon for Optional Valid values: float. Range in [0, 1]. Default value: 1e-8 |
gamma |
The gamma for Optional Valid values: float. Range in [0, 1]. Default value: 0.9 |
image_shape |
The input image dimensions, which is the same size as the input layer of the network. The
format is defined as ' For training, if any input image is smaller than this parameter in any dimension, training fails.
If an image is larger, a portion of the image is cropped, with the cropped area specified by this parameter.
If hyperparameter At inference, input images are resized to the Optional Valid values: string Default value: ‘3,224,224’ |
kv_store |
Weight update synchronization mode during distributed training. The weight updates can be updated either synchronously or asynchronously across machines. Synchronous updates typically provide better accuracy than asynchronous updates but can be slower. See distributed training in MXNet for more details. This parameter is not applicable to single machine training.
Optional Valid values: Default value: no default value |
learning_rate |
Initial learning rate. Optional Valid values: float. Range in [0, 1]. Default value: 0.1 |
lr_scheduler_factor |
The ratio to reduce learning rate used in conjunction with the
Optional Valid values: float. Range in [0, 1]. Default value: 0.1 |
lr_scheduler_step |
The epochs at which to reduce the learning rate. As explained
in the Optional Valid values: string Default value: no default value |
mini_batch_size |
The batch size for training. In a single-machine multi-GPU
setting, each GPU handles Optional Valid values: positive integer Default value: 32 |
momentum |
The momentum for Optional Valid values: float. Range in [0, 1]. Default value: 0.9 |
multi_label |
Flag to use for multi-label classification where each sample can be assigned multiple labels. Average accuracy across all classes is logged. Optional Valid values: 0 or 1 Default value: 0 |
num_layers |
Number of layers for the network. For data with large image size (for example, 224x224 - like ImageNet), we suggest selecting the number of layers from the set [18, 34, 50, 101, 152, 200]. For data with small image size (for example, 28x28 - like CIFAR), we suggest selecting the number of layers from the set [20, 32, 44, 56, 110]. The number of layers in each set is based on the ResNet paper. For transfer learning, the number of layers defines the architecture of base network and hence can only be selected from the set [18, 34, 50, 101, 152, 200]. Optional Valid values: positive integer in [18, 34, 50, 101, 152, 200] or [20, 32, 44, 56, 110] Default value: 152 |
optimizer |
The optimizer type. For more details of the parameters for the optimizers, please refer to MXNet's API. Optional Valid values: One of
Default value: |
precision_dtype |
The precision of the weights used for training. The algorithm
can use either single precision ( Optional Valid values: Default value: |
resize |
The number of pixels in the shortest side of an image after
resizing it for training. If the parameter is not set, then the
training data is used without resizing. The parameter should be larger
than both the width and height components of Required when using image content types Optional when using the RecordIO content type Valid values: positive integer Default value: no default value |
top_k |
Reports the top-k accuracy during training. This parameter has to be greater than 1, since the top-1 training accuracy is the same as the regular training accuracy that has already been reported. Optional Valid values: positive integer larger than 1. Default value: no default value |
use_pretrained_model |
Flag to use pre-trained model for training. If set to 1, then the pretrained model with the corresponding number of layers is loaded and used for training. Only the top FC layer are reinitialized with random weights. Otherwise, the network is trained from scratch. Optional Valid values: 0 or 1 Default value: 0 |
use_weighted_loss |
Flag to use weighted cross-entropy loss for multi-label
classification (used only when Optional Valid values: 0 or 1 Default value: 0 |
weight_decay |
The coefficient weight decay for Optional Valid values: float. Range in [0, 1]. Default value: 0.0001 |