NTM Hyperparameters - Amazon SageMaker

NTM Hyperparameters

Parameter Name Description

feature_dim

The vocabulary size of the dataset.

Required

Valid values: Positive integer (min: 1, max: 1,000,000)

num_topics

The number of required topics.

Required

Valid values: Positive integer (min: 2, max: 1000)

batch_norm

Whether to use batch normalization during training.

Optional

Valid values: true or false

Default value: false

clip_gradient

The maximum magnitude for each gradient component.

Optional

Valid values: Float (min: 1e-3)

Default value: Infinity

encoder_layers

The number of layers in the encoder and the output size of each layer. When set to auto, the algorithm uses two layers of sizes 3 x num_topics and 2 x num_topics respectively.

Optional

Valid values: Comma-separated list of positive integers or auto

Default value: auto

encoder_layers_activation

The activation function to use in the encoder layers.

Optional

Valid values:

Default value: sigmoid

epochs

The maximum number of passes over the training data.

Optional

Valid values: Positive integer (min: 1)

Default value: 50

learning_rate

The learning rate for the optimizer.

Optional

Valid values: Float (min: 1e-6, max: 1.0)

Default value: 0.001

mini_batch_size

The number of examples in each mini batch.

Optional

Valid values: Positive integer (min: 1, max: 10000)

Default value: 256

num_patience_epochs

The number of successive epochs over which early stopping criterion is evaluated. Early stopping is triggered when the change in the loss function drops below the specified tolerance within the last num_patience_epochs number of epochs. To disable early stopping, set num_patience_epochs to a value larger than epochs.

Optional

Valid values: Positive integer (min: 1)

Default value: 3

optimizer

The optimizer to use for training.

Optional

Valid values:

Default value: adadelta

rescale_gradient

The rescale factor for gradient.

Optional

Valid values: float (min: 1e-3, max: 1.0)

Default value: 1.0

sub_sample

The fraction of the training data to sample for training per epoch.

Optional

Valid values: Float (min: 0.0, max: 1.0)

Default value: 1.0

tolerance

The maximum relative change in the loss function. Early stopping is triggered when change in the loss function drops below this value within the last num_patience_epochs number of epochs.

Optional

Valid values: Float (min: 1e-6, max: 0.1)

Default value: 0.001

weight_decay

The weight decay coefficient. Adds L2 regularization.

Optional

Valid values: Float (min: 0.0, max: 1.0)

Default value: 0.0