Text Classification - TensorFlow Hyperparameters - Amazon SageMaker

Text Classification - TensorFlow Hyperparameters

Hyperparameters are parameters that are set before a machine learning model begins learning. The following hyperparameters are supported by the Amazon SageMaker built-in Object Detection - TensorFlow algorithm. See Tune a Text Classification - TensorFlow model for information on hyperparameter tuning.

Parameter Name Description
batch_size

The batch size for training. For training on instances with multiple GPUs, this batch size is used across the GPUs.

Valid values: positive integer.

Default value: 32.

beta_1

The beta1 for the "adam" and "adamw" optimizers. Represents the exponential decay rate for the first moment estimates. Ignored for other optimizers.

Valid values: float, range: [0.0, 1.0].

Default value: 0.9.

beta_2

The beta2 for the "adam" and "adamw" optimizers. Represents the exponential decay rate for the second moment estimates. Ignored for other optimizers.

Valid values: float, range: [0.0, 1.0].

Default value: 0.999.

dropout_rate

The dropout rate for the dropout layer in the top classification layer. Used only when reinitialize_top_layer is set to "True".

Valid values: float, range: [0.0, 1.0].

Default value: 0.2

early_stopping

Set to "True" to use early stopping logic during training. If "False", early stopping is not used.

Valid values: string, either: ("True" or "False").

Default value: "False".

early_stopping_min_delta The minimum change needed to qualify as an improvement. An absolute change less than the value of early_stopping_min_delta does not qualify as improvement. Used only when early_stopping is set to "True".

Valid values: float, range: [0.0, 1.0].

Default value: 0.0.

early_stopping_patience

The number of epochs to continue training with no improvement. Used only when early_stopping is set to "True".

Valid values: positive integer.

Default value: 5.

epochs

The number of training epochs.

Valid values: positive integer.

Default value: 10.

epsilon

The epsilon for "adam", "rmsprop", "adadelta", and "adagrad" optimizers. Usually set to a small value to avoid division by 0. Ignored for other optimizers.

Valid values: float, range: [0.0, 1.0].

Default value: 1e-7.

initial_accumulator_value

The starting value for the accumulators, or the per-parameter momentum values, for the "adagrad" optimizer. Ignored for other optimizers.

Valid values: float, range: [0.0, 1.0].

Default value: 0.0001.

learning_rate The optimizer learning rate.

Valid values: float, range: [0.0, 1.0].

Default value: 0.001.

momentum

The momentum for the "sgd" and "nesterov" optimizers. Ignored for other optimizers.

Valid values: float, range: [0.0, 1.0].

Default value: 0.9.

optimizer

The optimizer type. For more information, see Optimizers in the TensorFlow documentation.

Valid values: string, any of the following: ("adamw", "adam", "sgd", "nesterov", "rmsprop", "adagrad" , "adadelta").

Default value: "adam".

regularizers_l2

The L2 regularization factor for the dense layer in the classification layer. Used only when reinitialize_top_layer is set to "True".

Valid values: float, range: [0.0, 1.0].

Default value: 0.0001.

reinitialize_top_layer

If set to "Auto", the top classification layer parameters are re-initialized during fine-tuning. For incremental training, top classification layer parameters are not re-initialized unless set to "True".

Valid values: string, any of the following: ("Auto", "True" or "False").

Default value: "Auto".

rho

The discounting factor for the gradient of the "adadelta" and "rmsprop" optimizers. Ignored for other optimizers.

Valid values: float, range: [0.0, 1.0].

Default value: 0.95.

train_only_on_top_layer

If "True", only the top classification layer parameters are fine-tuned. If "False", all model parameters are fine-tuned.

Valid values: string, either: ("True" or "False").

Default value: "False".

validation_split_ratio

The fraction of training data to randomly split to create validation data. Only used if validation data is not provided through the validation channel.

Valid values: float, range: [0.0, 1.0].

Default value: 0.2.

warmup_steps_fraction

The fraction of the total number of gradient update steps, where the learning rate increases from 0 to the initial learning rate as a warm up. Only used with the adamw optimizer.

Valid values: float, range: [0.0, 1.0].

Default value: 0.1.