Amazon SageMaker
Developer Guide

Factorization Machines Hyperparameters

Parameter Name Description
feature_dim

The dimension of the input feature space. This could be very high with sparse input.

Required

Valid values: Positive integer. Suggested value range: [10000,10000000]

num_factors

The dimensionality of factorization.

Required

Valid values: Positive integer. Suggested value range: [2,1000], 64 usually optimal.

predictor_type

The type of predictor.

  • binary_classifier: For binary classification tasks.

  • regressor: For regression tasks.

Required

Valid values: String: binary_classifier or regressor

bias_init_method

The initialization method for the bias term:

  • normal: Initializes weights with random values sampled from a normal distribution with a mean of zero and standard deviation specified by bias_init_sigma.

  • uniform: Initializes weights with random values uniformly sampled from a range specified by [-bias_init_scale, +bias_init_scale].

  • constant: Initializes the weights to a scalar value specified by bias_init_value.

Optional

Valid values: uniform, normal, or constant

Default value: normal

bias_init_scale

Range for initialization of the bias term. Takes effect if bias_init_method is set to uniform.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: None

bias_init_sigma

The standard deviation for initialization of the bias term. Takes effect if bias_init_method is set to normal.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: 0.01

bias_init_value

The initial value of the bias term. Takes effect if bias_init_method is set to constant.

Optional

Valid values: Float. Suggested value range: [1e-8, 512].

Default value: None

bias_lr

The learning rate for the bias term.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: 0.1

bias_wd

The weight decay for the bias term.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: 0.01

clip_gradient

Gradient clipping optimizer parameter. Clips the gradient by projecting onto the interval [-clip_gradient, +clip_gradient].

Optional

Valid values: Float

Default value: None

epochs

The number of training epochs to run.

Optional

Valid values: Positive integer

Default value: 1

eps

Epsilon parameter to avoid division by 0.

Optional

Valid values: Float. Suggested value: small.

Default value: None

factors_init_method

The initialization method for factorization terms:

  • normal Initializes weights with random values sampled from a normal distribution with a mean of zero and standard deviation specified by factors_init_sigma.

  • uniform: Initializes weights with random values uniformly sampled from a range specified by [-factors_init_scale, +factors_init_scale].

  • constant: Initializes the weights to a scalar value specified by factors_init_value.

Optional

Valid values: uniform, normal, or constant.

Default value: normal

factors_init_scale

The range for initialization of factorization terms. Takes effect if factors_init_method is set to uniform.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: None

factors_init_sigma

The standard deviation for initialization of factorization terms. Takes effect if factors_init_method is set to normal.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: 0.001

factors_init_value

The initial value of factorization terms. Takes effect if factors_init_method is set to constant.

Optional

Valid values: Float. Suggested value range: [1e-8, 512].

Default value: None

factors_lr

The learning rate for factorization terms.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: 0.0001

factors_wd

The weight decay for factorization terms.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: 0.00001

linear_lr

The learning rate for linear terms.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: 0.001

linear_init_method

The initialization method for linear terms:

  • normal Initializes weights with random values sampled from a normal distribution with a mean of zero and standard deviation specified by linear_init_sigma.

  • uniform Initializes weights with random values uniformly sampled from a range specified by [-linear_init_scale, +linear_init_scale].

  • constant Initializes the weights to a scalar value specified by linear_init_value.

Optional

Valid values: uniform, normal, or constant.

Default value: normal

linear_init_scale

Range for initialization of linear terms. Takes effect if linear_init_method is set to uniform.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: None

linear_init_sigma

The standard deviation for initialization of linear terms. Takes effect if linear_init_method is set to normal.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: 0.01

linear_init_value

The initial value of linear terms. Takes effect if linear_init_method is set to constant.

Optional

Valid values: Float. Suggested value range: [1e-8, 512].

Default value: None

linear_wd

The weight decay for linear terms.

Optional

Valid values: Non-negative float. Suggested value range: [1e-8, 512].

Default value: 0.001

mini_batch_size

The size of mini-batch used for training.

Optional

Valid values: Positive integer

Default value: 1000

rescale_grad

Gradient rescaling optimizer parameter. If set, multiplies the gradient with rescale_grad before updating. Often choose to be 1.0/batch_size.

Optional

Valid values: Float

Default value: None