Factorization Machines Hyperparameters - Amazon SageMaker

Factorization Machines Hyperparameters

The following table contains the hyperparameters for the Factorization Machines algorithm. These are parameters that are set by users to facilitate the estimation of model parameters from data. The required hyperparameters that must be set are listed first, in alphabetical order. The optional hyperparameters that can be set are listed next, also in alphabetical order.

Parameter Name Description
feature_dim

The dimension of the input feature space. This could be very high with sparse input.

Required

Valid values: Positive integer. Suggested value range: [10000,10000000]

num_factors

The dimensionality of factorization.

Required

Valid values: Positive integer. Suggested value range: [2,1000], 64 typically generates good outcomes and is a good starting point.

predictor_type

The type of predictor.

  • binary_classifier: For binary classification tasks.

  • regressor: For regression tasks.

Required

Valid values: String: binary_classifier or regressor

bias_init_method

The initialization method for the bias term:

  • normal: Initializes weights with random values sampled from a normal distribution with a mean of zero and standard deviation specified by bias_init_sigma.

  • uniform: Initializes weights with random values uniformly sampled from a range specified by [-bias_init_scale, +bias_init_scale].

  • constant: Initializes the weights to a scalar value specified by bias_init_value.

Optional

Valid values: uniform, normal, or constant

Default value: normal

bias_init_scale

Range for initialization of the bias term. Takes effect if bias_init_method is set to uniform.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: None

bias_init_sigma

The standard deviation for initialization of the bias term. Takes effect if bias_init_method is set to normal.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: 0.01

bias_init_value

The initial value of the bias term. Takes effect if bias_init_method is set to constant.

Optional

Valid values: Float. Suggested value range: [1e-8, 512].

Default value: None

bias_lr

The learning rate for the bias term.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: 0.1

bias_wd

The weight decay for the bias term.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: 0.01

clip_gradient

Gradient clipping optimizer parameter. Clips the gradient by projecting onto the interval [-clip_gradient, +clip_gradient].

Optional

Valid values: Float

Default value: None

epochs

The number of training epochs to run.

Optional

Valid values: Positive integer

Default value: 1

eps

Epsilon parameter to avoid division by 0.

Optional

Valid values: Float. Suggested value: small.

Default value: None

factors_init_method

The initialization method for factorization terms:

  • normal Initializes weights with random values sampled from a normal distribution with a mean of zero and standard deviation specified by factors_init_sigma.

  • uniform: Initializes weights with random values uniformly sampled from a range specified by [-factors_init_scale, +factors_init_scale].

  • constant: Initializes the weights to a scalar value specified by factors_init_value.

Optional

Valid values: uniform, normal, or constant.

Default value: normal

factors_init_scale

The range for initialization of factorization terms. Takes effect if factors_init_method is set to uniform.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: None

factors_init_sigma

The standard deviation for initialization of factorization terms. Takes effect if factors_init_method is set to normal.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: 0.001

factors_init_value

The initial value of factorization terms. Takes effect if factors_init_method is set to constant.

Optional

Valid values: Float. Suggested value range: [1e-8, 512].

Default value: None

factors_lr

The learning rate for factorization terms.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: 0.0001

factors_wd

The weight decay for factorization terms.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: 0.00001

linear_lr

The learning rate for linear terms.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: 0.001

linear_init_method

The initialization method for linear terms:

  • normal Initializes weights with random values sampled from a normal distribution with a mean of zero and standard deviation specified by linear_init_sigma.

  • uniform Initializes weights with random values uniformly sampled from a range specified by [-linear_init_scale, +linear_init_scale].

  • constant Initializes the weights to a scalar value specified by linear_init_value.

Optional

Valid values: uniform, normal, or constant.

Default value: normal

linear_init_scale

Range for initialization of linear terms. Takes effect if linear_init_method is set to uniform.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: None

linear_init_sigma

The standard deviation for initialization of linear terms. Takes effect if linear_init_method is set to normal.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: 0.01

linear_init_value

The initial value of linear terms. Takes effect if linear_init_method is set to constant.

Optional

Valid values: Float. Suggested value range: [1e-8, 512].

Default value: None

linear_wd

The weight decay for linear terms.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: 0.001

mini_batch_size

The size of mini-batch used for training.

Optional

Valid values: Positive integer

Default value: 1000

rescale_grad

Gradient rescaling optimizer parameter. If set, multiplies the gradient with rescale_grad before updating. Often choose to be 1.0/batch_size.

Optional

Valid values: Float

Default value: None