Amazon SageMaker
Developer Guide

LDA Hyperparameters

In the CreateTrainingJob request, you specify the training algorithm. You can also specify algorithm-specific hyperparameters as string-to-string maps. The following table lists the hyperparameters for the LDA training algorithm provided by Amazon SageMaker. For more information, see How LDA Works.

Parameter Name Description
num_topics

The number of topics for LDA to find within the data.

Required

Valid values: positive integer

feature_dim

The size of the vocabulary of the input document corpus.

Required

Valid values: positive integer

mini_batch_size

The total number of documents in the input document corpus.

Required

Valid values: positive integer

alpha0

Initial guess for the concentration parameter: the sum of the elements of the Dirichlet prior. Small values are more likely to generate sparse topic mixtures and large values (greater than 1.0) produce more uniform mixtures.

Optional

Valid values: Positive float

Default value: 0.1

max_restarts

The number of restarts to perform during the Alternating Least Squares (ALS) spectral decomposition phase of the algorithm. Can be used to find better quality local minima at the expense of additional computation, but typically should not be adjusted.

Optional

Valid values: Positive integer

Default value: 10

max_iterations

The maximum number of iterations to perform during the ALS phase of the algorithm. Can be used to find better quality minima at the expense of additional computation, but typically should not be adjusted.

Optional

Valid values: Positive integer

Default value: 1000

tol

Target error tolerance for the ALS phase of the algorithm. Can be used to find better quality minima at the expense of additional computation, but typically should not be adjusted.

Optional

Valid values: Positive float

Default value: 1e-8