# K-Means Hyperparameters

In the CreateTrainingJob request, you specify the training algorithm that you want to use. You can also specify algorithm-specific hyperparameters as string-to-string maps. The following table lists the hyperparameters for the k-means training algorithm provided by Amazon SageMaker. For more information about how k-means clustering works, see How K-Means Clustering Works.

Parameter Name | Description |
---|---|

`feature_dim` |
The number of features in the input data.
Valid values: Positive integer |

`k` |
The number of required clusters.
Valid values: Positive integer |

`epochs` |
The number of passes done over the training data.
Valid values: Positive integer Default value: 1 |

`eval_metrics` |
A JSON list of metric types used to report a score for the
model. Allowed values are
Valid values: Either Default value: |

`extra_center_factor` |
The algorithm creates K centers =
Valid values: Either a positive integer or
Default value: |

`half_life_time_size` |
Used to determine the weight given to an observation when
computing a cluster mean. This weight decays exponentially as more
points are observed. When a point is first observed, it is assigned
a weight of 1 when computing the cluster mean. The decay constant
for the exponential decay function is chosen so that after observing
Valid values: Non-negative integer Default value: 0 |

`init_method` |
Method by which the algorithm chooses the initial cluster centers. The standard k-means approach chooses them at random. An alternative k-means++ method chooses the first cluster center at random. Then it spreads out the position of the remaining initial clusters by weighting the selection of centers with a probability distribution that is proportional to the square of the distance of the remaining data points from existing centers.
Valid values: Either Default value: |

`local_lloyd_init_method` |
The initialization method for Lloyd's expectation-maximization
(EM) procedure used to build the final model containing
Valid values: Either Default value: |

`local_lloyd_max_iter` |
The maximum number of iterations for Lloyd's
expectation-maximization (EM) procedure used to build the final
model containing
Valid values: Positive integer Default value: 300 |

`local_lloyd_num_trials` |
The number of times the Lloyd's expectation-maximization (EM)
procedure with the least loss is run when building the final model
containing
Valid values: Either a positive integer or
Default value: |

`local_lloyd_tol` |
The tolerance for change in loss for early stopping of Lloyd's
expectation-maximization (EM) procedure used to build the final
model containing
Valid values: Float. Range in [0, 1]. Default value: 0.0001 |

`mini_batch_size` |
The number of observations per mini-batch for the data iterator.
Valid values: Positive integer Default value: 5000 |