Perform Automatic Model Tuning with SageMaker - Amazon SageMaker

Perform Automatic Model Tuning with SageMaker

Amazon SageMaker automatic model tuning, also known as hyperparameter tuning, finds the best version of a model by running many training jobs on your dataset using the algorithm and ranges of hyperparameters that you specify. It then chooses the hyperparameter values that result in a model that performs the best, as measured by a metric that you choose.

For example, suppose that you want to solve a binary classification problem on a marketing dataset. Your goal is to maximize the area under the curve (auc) metric of the algorithm by training an XGBoost Algorithm model. You don't know which values of the eta, alpha, min_child_weight, and max_depth hyperparameters to use to train the best model. To find the best values for these hyperparameters, you can specify ranges of values that SageMaker hyperparameter tuning searches to find the combination of values that results in the training job that performs the best as measured by the objective metric that you chose. Hyperparameter tuning launches training jobs that use hyperparameter values in the ranges that you specified, and returns the training job with highest auc.

You can use SageMaker automatic model tuning with built-in algorithms, custom algorithms, and SageMaker pre-built containers for machine learning frameworks.

Amazon SageMaker automatic model tuning can use Amazon EC2 Spot instance to optimize costs when running training jobs. For more information on managed spot training, see Managed Spot Training in Amazon SageMaker.

Before you start using hyperparameter tuning, you should have a well-defined machine learning problem, including the following:

  • A dataset

  • An understanding of the type of algorithm you need to train

  • A clear understanding of how you measure success

You should also prepare your dataset and algorithm so that they work in SageMaker and successfully run a training job at least once. For information about setting up and running a training job, see Get Started with Amazon SageMaker.

Resource Limits for Automatic Model Tuning

SageMaker sets the following default limits for resources used by automatic model tuning:

  • Number of parallel (concurrent) hyperparameter tuning jobs: 100

  • Number of hyperparameters that can be searched: 20

    Note

    Every possible value in a categorical hyperparameter counts against this limit.

  • Number of metrics defined per hyperparameter tuning job: 20

  • Number of parallel (concurrent) training jobs per hyperparameter tuning job: 10

    Note

    This can be increased to hundreds.

  • [Bayesian search strategy] Number of training jobs per hyperparameter tuning job: 500

  • [Random search strategy] Number of training jobs per hyperparameter tuning job: 500

    Note

    This can be increased up to ten thousand.

  • Maximum run time for a hyperparameter tuning job: 30 days

When you plan hyperparameter tuning jobs, you also have to take into account the limits on training resources. For information about the default resource limits for SageMaker training jobs, see SageMaker Limits. Every concurrent training instance on which all of your hyperparameter tuning jobs run counts against the total number of training instances allowed. For example, if you run 10 concurrent hyperparameter tuning jobs, each of those hyperparameter tuning jobs runs 100 total training jobs and 20 concurrent training jobs. Each of those training jobs runs on one ml.m4.xlarge instance. The following limits apply:

  • Number of concurrent hyperparameter tuning jobs: You don't need to increase the limit, because 10 tuning jobs is below the limit of 100.

  • Number of training jobs per hyperparameter tuning job: You don't need to increase the limit, because 100 training jobs is below the limit of 500.

  • Number of concurrent training jobs per hyperparameter tuning job: You need to request a limit increase to 20, because the default limit is 10.

  • SageMaker training ml.m4.xlarge instances: You need to request a limit increase to 200, because you have 10 hyperparameter tuning jobs, each of which is running 20 concurrent training jobs. The default limit is 20 instances.

  • SageMaker training total instance count: You need to request a limit increase to 200, because you have 10 hyperparameter tuning jobs, each of which is running 20 concurrent training jobs. The default limit is 20 instances.

To request a quota increase:

  1. Open the AWS Support Center page, sign in if necessary, and then choose Create case.

  2. On the Create case page, choose Service limit increase.

  3. On the Case details panel, select SageMaker Automatic Model Tuning [Hyperparameter Optimization] for the Limit type

  4. On the Requests panel for Request 1, select the Region, the resource Limit to increase and the New Limit value you are requesting. Select Add another request if you have additional requests for quota increases.

    
            Resource limit increase request UI.
  5. In the Case description panel, provide a description of your use case .

  6. In the Contact options panel, select your preferred Contact methods (Web, Chat or Phone) and then choose Submit.