How Hyperparameter Tuning Works - Amazon SageMaker

How Hyperparameter Tuning Works

When you build complex machine learning systems like deep learning neural networks, exploring all of the possible combinations is impractical. Hyperparameter tuning can accelerate your productivity by trying many variations of a model. It looks for the best model automatically by focusing on the most promising combinations of hyperparameter values within the ranges that you specify. To get good results, you must choose the right ranges to explore.

Use the API reference guide to understand how to interact with hyperparameter tuning. The examples on this page can be found in the HyperParameterTuningJobConfig and HyperbandStrategyConfig APIs.

Note

Because the algorithm itself is stochastic, it’s possible that the hyperparameter tuning model will fail to converge on the best answer. This can occur even if the best possible combination of values is within the ranges that you choose.

When using grid search, hyperparameter tuning chooses combinations of values from the range of categorical values that you specify when you create the job. Only categorical parameters are supported when using the grid search strategy. You do not need to specify the MaxNumberOfTrainingJobs. The number of training jobs created by the tuning job will be automatically calculated to be the total number of distinct categorical combinations possible. If specified, the value of MaxNumberOfTrainingJobs should equal the total number of distinct categorical combinations possible.

When using random search, hyperparameter tuning chooses a random combination of values from within the ranges that you specify for hyperparameters for each training job it launches. Because the choice of hyperparameter values doesn't depend on the results of previous training jobs, you can run the maximum number of concurrent training jobs without affecting the performance of the tuning.

For an example notebook that uses random search, see the Random search and hyperparameter scaling with SageMaker XGBoost and Automatic Model Tuning notebook.

Bayesian Optimization

Bayesian optimization treats hyperparameter tuning like a regression problem. Given a set of input features (the hyperparameters), hyperparameter tuning optimizes a model for the metric that you choose. To solve a regression problem, hyperparameter tuning makes guesses about which hyperparameter combinations are likely to get the best results, and runs training jobs to test these values. After testing a set of hyperparameter values, hyperparameter tuning uses regression to choose the next set of hyperparameter values to test.

Hyperparameter tuning uses an Amazon SageMaker implementation of Bayesian optimization.

When choosing the best hyperparameters for the next training job, hyperparameter tuning considers everything that it knows about this problem so far. Sometimes it chooses a combination of hyperparameter values close to the combination that resulted in the best previous training job to incrementally improve performance. This allows hyperparameter tuning to exploit the best known results. Other times, it chooses a set of hyperparameter values far removed from those it has tried. This allows it to explore the range of hyperparameter values to try to find new areas that are not yet well understood. The explore/exploit trade-off is common in many machine learning problems.

For more information about Bayesian optimization, see the following:

Hyperband

Hyperband is a multi-fidelity based tuning strategy that dynamically reallocates resources. Hyperband uses both intermediate and final results of training jobs to re-allocate epochs to well-utilized hyperparameter configurations and automatically stops those that underperform. It also seamlessly scales to using many parallel training jobs. These features can significantly speed up hyperparameter tuning over random search and Bayesian optimization strategies.

Hyperband should only be used to tune iterative algorithms that publish results at different resource levels. For example, Hyperband can be used to tune a neural network for image classification which publishes accuracy metrics after every epoch.

For more information about Hyperband, see the following links:

Hyperband with early stopping

Training jobs can be stopped early when they are unlikely to improve the objective metric of the hyperparameter tuning job. This can help reduce compute time and avoid overfitting your model. Hyperband uses an advanced internal mechanism to apply early stopping. Thus, the parameter TrainingJobEarlyStoppingType in the HyperParameterTuningJobConfig API must be set to OFF when using the Hyperband internal early stopping feature.

Note

Hyperparameter tuning might not improve your model. It is an advanced tool for building machine solutions. As such, it should be considered part of the scientific development process.