CatBoost hyperparameters

The following table contains the subset of hyperparameters that are required or most commonly used for the Amazon SageMaker CatBoost algorithm. Users set these parameters to facilitate the estimation of model parameters from data. The SageMaker CatBoost algorithm is an implementation of the open-source CatBoost package.

Note

The default hyperparameters are based on example datasets in the CatBoost sample notebooks.

By default, the SageMaker CatBoost algorithm automatically chooses an evaluation metric and loss function based on the type of classification problem. The CatBoost algorithm detects the type of classification problem based on the number of labels in your data. For regression problems, the evaluation metric and loss functions are both root mean squared error. For binary classification problems, the evaluation metric is Area Under the Curve (AUC) and the loss function is log loss. For multiclass classification problems, the evaluation metric and loss functions are multiclass cross entropy. You can use the eval_metric hyperparameter to change the default evaluation metric. Refer to the following table for more information on LightGBM hyperparameters, including descriptions, valid values, and default values.

Parameter Name	Description
`iterations`	The maximum number of trees that can be built. Valid values: integer, range: Positive integer. Default value: `500`.
`early_stopping_rounds`	The training will stop if one metric of one validation data point does not improve in the last `early_stopping_rounds` round. If `early_stopping_rounds` is less than or equal to zero, this hyperparameter is ignored. Valid values: integer. Default value: `5`.
`eval_metric`	The evaluation metric for validation data. If `eval_metric` is set to the default `"auto"` value, then the algorithm automatically chooses an evaluation metric based on the type of classification problem: `"RMSE"` for regression `"AUC"` for binary classification `"MultiClass"` for multi-class classification Valid values: string, refer to the CatBoost documentation for valid values. Default value: `"auto"`.
`learning_rate`	The rate at which the model weights are updated after working through each batch of training examples. Valid values: float, range: (`0.0`, `1.0`). Default value: `0.009`.
`depth`	Depth of the tree. Valid values: integer, range: (`1`, `16`). Default value: `6`.
`l2_leaf_reg`	Coefficient for the L2 regularization term of the cost function. Valid values: integer, range: Positive integer. Default value: `3`.
`random_strength`	The amount of randomness to use for scoring splits when the tree structure is selected. Use this parameter to avoid overfitting the model. Valid values: float, range: Positive floating point number. Default value: `1.0`.
`max_leaves`	The maximum number of leaves in the resulting tree. Can only be used with the `"Lossguide"` growing policy. Valid values: integer, range: [`2`, `64`]. Default value: `31`.
`rsm`	Random subspace method. The percentage of features to use at each split selection, when features are selected over again at random. Valid values: float, range: (`0.0`, `1.0`]. Default value: `1.0`.
`sampling_frequency`	Frequency to sample weights and objects when building trees. Valid values: string, either: (`"PerTreeLevel"` or `"PerTree"`). Default value: `"PerTreeLevel"`.
`min_data_in_leaf`	The minimum number of training samples in a leaf. CatBoost does not search for new splits in leaves with a sample count less than the specified value. Can only be used with the `"Lossguide"` and `"Depthwise"` growing policies. Valid values: integer, range: (`1` or `∞`). Default value: `1`.
`bagging_temperature`	Defines the settings of the Bayesian bootstrap. Use the Bayesian bootstrap to assign random weights to objects. If `bagging_temperature` is set to `1.0`, then the weights are sampled from an exponential distribution. If `bagging_temperature` is set to `0.0`, then all weights are 1.0. Valid values: float, range: Non-negative float. Default value: `1.0`.
`boosting_type`	The boosting scheme. "Auto" means that the `boosting_type` is selected based on processing unit type, the number of objects in the training dataset, and the selected learning mode. Valid values: string, any of the following: (`"Auto"`, `"Ordered"`, `"Plain"`). Default value: `"Auto"`.
`scale_pos_weight`	The weight for positive class in binary classification. The value is used as a multiplier for the weights of objects from positive class. Valid values: float, range: Positive float. Default value: `1.0`.
`max_bin`	The number of splits for numerical features. `"Auto"` means that `max_bin` is selected based on the processing unit type and other parameters. For details, see the CatBoost documentation. Valid values: string, either: (`"Auto"` or string of integer from `"1"` to `"65535"` inclusively). Default value: `"Auto"`.
`grow_policy`	The tree growing policy. Defines how to perform greedy tree construction. Valid values: string, any of the following: (`"SymmetricTree"`, `"Depthwise"`, or `"Lossguide"`). Default value: `"SymmetricTree"`.
`random_seed`	The random seed used for training. Valid values: integer, range: Non-negative integer. Default value: `1.0`.
`thread_count`	The number of threads to use during the training. If `thread_count` is `-1`, then the number of threads is equal to the number of processor cores. `thread_count` cannot be `0`. Valid values: integer, either: (`-1` or positive integer). Default value: `-1`.
`verbose`	The verbosity of print messages, with higher levels corresponding to more detailed print statements. Valid values: integer, range: Positive integer. Default value: `1`.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

How It Works

Model Tuning