Amazon SageMaker
Developer Guide

Tuning a BlazingText Model

Automatic model tuning, also known as hyperparameter tuning, finds the best version of a model by running many jobs that test a range of hyperparameters on your dataset. You choose the tunable hyperparameters, a range of values for each, and an objective metric. You choose the objective metric from the metrics that the algorithm computes. Automatic model tuning searches the hyperparameters chosen to find the combination of values that result in the model that optimizes the objective metric.

For more information about model tuning, see Automatic Model Tuning.

Metrics Computed by the BlazingText Algorithm

The BlazingText Word2Vec algorithm (skipgram, cbow, and batch_skipgram modes) reports on a single metric during training: train:mean_rho. This metric is computed on WS-353 word similarity datasets. When tuning the hyperparameter values for the Word2Vec algorithm, use this metric as the objective.

BlazingText Text Classification algorithm (supervised mode), also reports on a single metric during training: the validation:accuracy. When tuning the hyperparameter values for the text classification algorithm, use this metrics as the objective.

Metric Name Description Optimization Direction
train:mean_rho

Mean rho (Spearman's rank correlation coefficient) on WS-353 word similarity datasets.

Maximize

validation:accuracy

Classification accuracy on user specified validation dataset.

Maximize

Tunable BlazingText Hyperparameters

Tunable Hyperparameters for Word2Vec

Tune the Amazon SageMaker BlazingText Word2Vec model with the following hyperparameters. The hyperparameters that have the greatest impact on Word2Vec objective metrics are: mode, learning_rate, window_size, vector_dim, and negative_samples.

Parameter Name Parameter Type Recommended Ranges
batch_size

IntegerParameterRange

[8-32]

epochs

IntegerParameterRange

[5-15]

learning_rate

ContinuousParameterRange

MinValue: 0.005, MaxValue: 0.01

min_count

IntegerParameterRange

[0-100]

mode

CategoricalParameterRange

['batch_skipgram', 'skipgram', 'cbow']

negative_samples

IntegerParameterRange

[5-25]

sampling_threshold

ContinuousParameterRange

MinValue: 0.0001, MaxValue: 0.001

vector_dim

IntegerParameterRange

[32-300]

window_size

IntegerParameterRange

[1-10]

Tunable Hyperparameters for Text Classification

Tune the Amazon SageMaker BlazingText Text Classification model with the following hyperparameters.

Parameter Name Parameter Type Recommended Ranges
buckets

IntegerParameterRange

[1000000-10000000]

epochs

IntegerParameterRange

[5-15]

learning_rate

ContinuousParameterRange

MinValue: 0.005, MaxValue: 0.01

min_count

IntegerParameterRange

[0-100]

mode

CategoricalParameterRange

['supervised']

vector_dim

IntegerParameterRange

[32-300]

word_ngrams

IntegerParameterRange

[1-3]