# Hyperparameters

Parameter Name | Description |
---|---|

`num_classes` |
Number of output classes. This parameter defines the dimensions of the network output and is typically set to the number of classes in the dataset.
Valid values: positive integer |

`num_training_samples` |
Number of training examples in the input dataset. If there is a mismatch between this value and the number of
samples in the training set, then the behavior of the
Valid values: positive integer |

`augmentation_type` |
Data augmentation type. The input images can be augmented in multiple ways as specified below. -
`crop` : Randomly crop the image and flip the image horizontally -
`crop_color` : In addition to ‘crop’, three random values in the range [-36, 36], [-50, 50], and [-50, 50] are added to the corresponding Hue-Saturation-Lightness channels respectively -
`crop_color_transform` : In addition to`crop_color` , random transformations, including rotation, shear, and aspect ratio variations are applied to the image. The maximum angle of rotation is 10 degrees, the maximum shear ratio is 0.1, and the maximum aspect changing ratio is 0.25.
Valid values: Default value: no default value |

`beta_1` |
The beta1 for
Valid values: float. Range in [0, 1]. Default value: 0.9 |

`beta_2` |
The beta2 for
Valid values: float. Range in [0, 1]. Default value: 0.999 |

`checkpoint_frequency` |
Period to store model parameters (in number of epochs).
Valid values: positive integer no greater than
Default value: |

`epochs` |
Number of training epochs.
Valid values: positive integer Default value: 30 |

`eps` |
The epsilon for
Valid values: float. Range in [0, 1]. Default value: 1e-8 |

`gamma` |
The gamma for
Valid values: float. Range in [0, 1]. Default value: 0.9 |

`image_shape` |
The input image dimensions, which is the same size as the input
layer of the network. The format is defined as
'
Valid values: string Default value: ‘3, 224, 224’ |

`kv_store` |
Weight update synchronization mode during distributed training. The weight updates can be updated either synchronously or asynchronously across machines. Synchronous updates typically provide better accuracy than asynchronous updates but can be slower. See distributed training in MXNet for more details. This parameter is not applicable to single machine training. -
`dist_sync` : The gradients are synchronized after every batch with all the workers. With`dist_sync` , batch-size now means the batch size used on each machine. So if there are n machines and we use batch size b, then`dist_sync` behaves like local with batch size n*b -
`dist_async` : Performs asynchronous updates. The weights are updated whenever gradients are received from any machine and the weight updates are atomic. However, the order is not guaranteed.
Valid values: Default value: no default value |

`learning_rate` |
Initial learning rate.
Valid values: float. Range in [0, 1]. Default value: 0.1 |

`lr_scheduler_factor` |
The ratio to reduce learning rate used in conjunction with the
Valid values: float. Range in [0, 1]. Default value: 0.1 |

`lr_scheduler_step` |
The epochs at which to reduce the learning rate. As explained
in the
Valid values: string Default value: no default value |

`mini_batch_size` |
The batch size for training. In a single-machine multi-GPU
setting, each GPU handles
Valid values: positive integer Default value: 32 |

`momentum` |
The momentum for
Valid values: float. Range in [0, 1]. Default value: 0 |

`multi_label` |
Flag to use for multi-label classification where each sample can be assigned multiple labels. Average accuracy across all classes is logged.
Valid values: 0 or 1 Default value: 0 |

`num_layers` |
Number of layers for the network. For data with large image size (for example, 224x224 - like ImageNet), we suggest selecting the number of layers from the set [18, 34, 50, 101, 152, 200]. For data with small image size (for example, 28x28 - like CIFAR), we suggest selecting the number of layers from the set [20, 32, 44, 56, 110]. The number of layers in each set is based on the ResNet paper. For transfer learning, the number of layers defines the architecture of base network and hence can only be selected from the set [18, 34, 50, 101, 152, 200].
Valid values: positive integer in [18, 34, 50, 101, 152, 200] or [20, 32, 44, 56, 110] Default value: 152 |

`optimizer` |
The optimizer type. For more details of the parameters for the optimizers, please refer to MXNet's API.
Valid values: One of -
`rmsprop` : Root mean square propagation
Default value: |

`precision_dtype` |
The precision of the weights used for training. The algorithm
can use either single precision (
Valid values: Default value: |

`resize` |
Resizes the image before using it for training. The images are resized so that the shortest side has the number of pixels specified by this parameter. If the parameter is not set, then the training data is used without resizing.
Valid values: positive integer Default value: no default value |

`top_k` |
Reports the top-k accuracy during training. This parameter has to be greater than 1, since the top-1 training accuracy is the same as the regular training accuracy that has already been reported.
Valid values: positive integer larger than 1. Default value: no default value |

`use_pretrained_model` |
Flag to use pre-trained model for training. If set to 1, then the pretrained model with the corresponding number of layers is loaded and used for training. Only the top FC layer are reinitialized with random weights. Otherwise, the network is trained from scratch.
Valid values: 0 or 1 Default value: 0 |

`use_weighted_loss` |
Flag to use weighted cross-entropy loss for multi-label
classification (used only when
Valid values: 0 or 1 Default value: 0 |

`weight_decay` |
The coefficient weight decay for
Valid values: float. Range in [0, 1]. Default value: 0.0001 |