Linear learner hyperparameters - Amazon SageMaker

Linear learner hyperparameters

The following table contains the hyperparameters for the linear learner algorithm. These are parameters that are set by users to facilitate the estimation of model parameters from data. The required hyperparameters that must be set are listed first, in alphabetical order. The optional hyperparameters that can be set are listed next, also in alphabetical order. When a hyperparameter is set to auto, Amazon SageMaker will automatically calculate and set the value of that hyperparameter.

Parameter Name Description
num_classes

The number of classes for the response variable. The algorithm assumes that classes are labeled 0, ..., num_classes - 1.

Required when predictor_type is multiclass_classifier. Otherwise, the algorithm ignores it.

Valid values: Integers from 3 to 1,000,000

predictor_type

Specifies the type of target variable as a binary classification, multiclass classification, or regression.

Required

Valid values: binary_classifier, multiclass_classifier, or regressor

accuracy_top_k

When computing the top-k accuracy metric for multiclass classification, the value of k. If the model assigns one of the top-k scores to the true label, an example is scored as correct.

Optional

Valid values: Positive integers

Default value: 3

balance_multiclass_weights

Specifies whether to use class weights, which give each class equal importance in the loss function. Used only when the predictor_type is multiclass_classifier.

Optional

Valid values: true, false

Default value: false

beta_1

The exponential decay rate for first-moment estimates. Applies only when the optimizer value is adam.

Optional

Valid values: auto or floating-point value between 0 and 1.0

Default value: auto

beta_2

The exponential decay rate for second-moment estimates. Applies only when the optimizer value is adam.

Optional

Valid values: auto or floating-point integer between 0 and 1.0

Default value: auto

bias_lr_mult

Allows a different learning rate for the bias term. The actual learning rate for the bias is learning_rate * bias_lr_mult.

Optional

Valid values: auto or positive floating-point integer

Default value: auto

bias_wd_mult

Allows different regularization for the bias term. The actual L2 regularization weight for the bias is wd * bias_wd_mult. By default, there is no regularization on the bias term.

Optional

Valid values: auto or non-negative floating-point integer

Default value: auto

binary_classifier_model_selection_criteria

When predictor_type is set to binary_classifier, the model evaluation criteria for the validation dataset (or for the training dataset if you don't provide a validation dataset). Criteria include:

  • accuracy—The model with the highest accuracy.

  • f_beta—The model with the highest F1 score. The default is F1.

  • precision_at_target_recall—The model with the highest precision at a given recall target.

  • recall_at_target_precision—The model with the highest recall at a given precision target.

  • loss_function—The model with the lowest value of the loss function used in training.

Optional

Valid values: accuracy, f_beta, precision_at_target_recall, recall_at_target_precision, or loss_function

Default value: accuracy

early_stopping_patience If no improvement is made in the relevant metric, the number of epochs to wait before ending training. If you have provided a value for binary_classifier_model_selection_criteria. the metric is that value. Otherwise, the metric is the same as the value specified for the loss hyperparameter.

The metric is evaluated on the validation data. If you haven't provided validation data, the metric is always the same as the value specified for the loss hyperparameter and is evaluated on the training data. To disable early stopping, set early_stopping_patience to a value greater than the value specified for epochs.

Optional

Valid values: Positive integer

Default value: 3

early_stopping_tolerance

The relative tolerance to measure an improvement in loss. If the ratio of the improvement in loss divided by the previous best loss is smaller than this value, early stopping considers the improvement to be zero.

Optional

Valid values: Positive floating-point integer

Default value: 0.001

epochs

The maximum number of passes over the training data.

Optional

Valid values: Positive integer

Default value: 15

f_beta

The value of beta to use when calculating F score metrics for binary or multiclass classification. Also used if the value specified for binary_classifier_model_selection_criteria is f_beta.

Optional

Valid values: Positive floating-point integers

Default value: 1.0

feature_dim

The number of features in the input data.

Optional

Valid values: auto or positive integer

Default values: auto

huber_delta

The parameter for Huber loss. During training and metric evaluation, compute L2 loss for errors smaller than delta and L1 loss for errors larger than delta.

Optional

Valid values: Positive floating-point integer

Default value: 1.0

init_bias

Initial weight for the bias term.

Optional

Valid values: Floating-point integer

Default value: 0

init_method

Sets the initial distribution function used for model weights. Functions include:

  • uniform—Uniformly distributed between (-scale, +scale)

  • normal—Normal distribution, with mean 0 and sigma

Optional

Valid values: uniform or normal

Default value: uniform

init_scale

Scales an initial uniform distribution for model weights. Applies only when the init_method hyperparameter is set to uniform.

Optional

Valid values: Positive floating-point integer

Default value: 0.07

init_sigma

The initial standard deviation for the normal distribution. Applies only when the init_method hyperparameter is set to normal.

Optional

Valid values: Positive floating-point integer

Default value: 0.01

l1

The L1 regularization parameter. If you don't want to use L1 regularization, set the value to 0.

Optional

Valid values: auto or non-negative float

Default value: auto

learning_rate

The step size used by the optimizer for parameter updates.

Optional

Valid values: auto or positive floating-point integer

Default value: auto, whose value depends on the optimizer chosen.

loss

Specifies the loss function.

The available loss functions and their default values depend on the value of predictor_type:

  • If the predictor_type is set to regressor, the available options are auto, squared_loss, absolute_loss, eps_insensitive_squared_loss, eps_insensitive_absolute_loss, quantile_loss, and huber_loss. The default value for auto is squared_loss.

  • If the predictor_type is set to binary_classifier, the available options are auto,logistic, and hinge_loss. The default value for auto is logistic.

  • If the predictor_type is set to multiclass_classifier, the available options are auto and softmax_loss. The default value for auto is softmax_loss.

Valid values: auto, logistic, squared_loss, absolute_loss, hinge_loss, eps_insensitive_squared_loss, eps_insensitive_absolute_loss, quantile_loss, or huber_loss

Optional

Default value: auto

loss_insensitivity

The parameter for the epsilon-insensitive loss type. During training and metric evaluation, any error smaller than this value is considered to be zero.

Optional

Valid values: Positive floating-point integer

Default value: 0.01

lr_scheduler_factor

For every lr_scheduler_step hyperparameter, the learning rate decreases by this quantity. Applies only when the use_lr_scheduler hyperparameter is set to true.

Optional

Valid values: auto or positive floating-point integer between 0 and 1

Default value: auto

lr_scheduler_minimum_lr

The learning rate never decreases to a value lower than the value set for lr_scheduler_minimum_lr. Applies only when the use_lr_scheduler hyperparameter is set to true.

Optional

Valid values: auto or positive floating-point integer

Default values: auto

lr_scheduler_step

The number of steps between decreases of the learning rate. Applies only when the use_lr_scheduler hyperparameter is set to true.

Optional

Valid values: auto or positive integer

Default value: auto

margin

The margin for the hinge_loss function.

Optional

Valid values: Positive floating-point integer

Default value: 1.0

mini_batch_size

The number of observations per mini-batch for the data iterator.

Optional

Valid values: Positive integer

Default value: 1000

momentum

The momentum of the sgd optimizer.

Optional

Valid values: auto or a floating-point integer between 0 and 1.0

Default value: auto

normalize_data

Normalizes the feature data before training. Data normalization shifts the data for each feature to have a mean of zero and scales it to have unit standard deviation.

Optional

Valid values: auto, true, or false

Default value: true

normalize_label

Normalizes the label. Label normalization shifts the label to have a mean of zero and scales it to have unit standard deviation.

The auto default value normalizes the label for regression problems but does not for classification problems. If you set the normalize_label hyperparameter to true for classification problems, the algorithm ignores it.

Optional

Valid values: auto, true, or false

Default value: auto

num_calibration_samples

The number of observations from the validation dataset to use for model calibration (when finding the best threshold).

Optional

Valid values: auto or positive integer

Default value: auto

num_models

The number of models to train in parallel. For the default, auto, the algorithm decides the number of parallel models to train. One model is trained according to the given training parameter (regularization, optimizer, loss), and the rest by close parameters.

Optional

Valid values: auto or positive integer

Default values: auto

num_point_for_scaler

The number of data points to use for calculating normalization or unbiasing of terms.

Optional

Valid values: Positive integer

Default value: 10,000

optimizer

The optimization algorithm to use.

Optional

Valid values:

  • auto—The default value.

  • sgd—Stochastic gradient descent.

  • adamAdaptive momentum estimation.

  • rmsprop—A gradient-based optimization technique that uses a moving average of squared gradients to normalize the gradient.

Default value: auto. The default setting for auto is adam.

positive_example_weight_mult

The weight assigned to positive examples when training a binary classifier. The weight of negative examples is fixed at 1. If you want the algorithm to choose a weight so that errors in classifying negative vs. positive examples have equal impact on training loss, specify balanced. If you want the algorithm to choose the weight that optimizes performance, specify auto.

Optional

Valid values: balanced, auto, or a positive floating-point integer

Default value: 1.0

quantile

The quantile for quantile loss. For quantile q, the model attempts to produce predictions so that the value of true_label is greater than the prediction with probability q.

Optional

Valid values: Floating-point integer between 0 and 1

Default value: 0.5

target_precision

The target precision. If binary_classifier_model_selection_criteria is recall_at_target_precision, then precision is held at this value while recall is maximized.

Optional

Valid values: Floating-point integer between 0 and 1.0

Default value: 0.8

target_recall

The target recall. If binary_classifier_model_selection_criteria is precision_at_target_recall, then recall is held at this value while precision is maximized.

Optional

Valid values: Floating-point integer between 0 and 1.0

Default value: 0.8

unbias_data

Unbiases the features before training so that the mean is 0. By default data is unbiased as the use_bias hyperparameter is set to true.

Optional

Valid values: auto, true, or false

Default value: auto

unbias_label

Unbiases labels before training so that the mean is 0. Applies to regression only if the use_bias hyperparameter is set to true.

Optional

Valid values: auto, true, or false

Default value: auto

use_bias

Specifies whether the model should include a bias term, which is the intercept term in the linear equation.

Optional

Valid values: true or false

Default value: true

use_lr_scheduler

Whether to use a scheduler for the learning rate. If you want to use a scheduler, specify true.

Optional

Valid values: true or false

Default value: true

wd

The weight decay parameter, also known as the L2 regularization parameter. If you don't want to use L2 regularization, set the value to 0.

Optional

Valid values:auto or non-negative floating-point integer

Default value: auto