IP Insights Hyperparameters - Amazon SageMaker

IP Insights Hyperparameters

In the CreateTransformJob request, you specify the training algorithm. You can also specify algorithm-specific hyperparameters as string-to-string maps. The following table lists the hyperparameters for the Amazon SageMaker IP Insights algorithm.

Parameter Name Description
num_entity_vectors

The number of entity vector representations (entity embedding vectors) to train. Each entity in the training set is randomly assigned to one of these vectors using a hash function. Because of hash collisions, it might be possible to have multiple entities assigned to the same vector. This would cause the same vector to represent multiple entities. This generally has a negligible effect on model performance, as long as the collision rate is not too severe. To keep the collision rate low, set this value as high as possible. However, the model size, and, therefore, the memory requirement, for both training and inference, scales linearly with this hyperparameter. We recommend that you set this value to twice the number of unique entity identifiers.

Required

Valid values: 1 ≤ positive integer ≤ 250,000,000

vector_dim

The size of embedding vectors to represent entities and IP addresses. The larger the value, the more information that can be encoded using these representations. In practice, model size scales linearly with this parameter and limits how large the dimension can be. In addition, using vector representations that are too large can cause the model to overfit, especially for small training datasets. Overfitting occurs when a model doesn't learn any pattern in the data but effectively memorizes the training data and, therefore, cannot generalize well and performs poorly during inference. The recommended value is 128.

Required

Valid values: 4 ≤ positive integer ≤ 4096

batch_metrics_publish_interval

The interval (every X batches) at which the Apache MXNet Speedometer function prints the training speed of the network (samples/second).

Optional

Valid values: positive integer ≥ 1

Default value: 1,000

epochs

The number of passes over the training data. The optimal value depends on your data size and learning rate. Typical values range from 5 to 100.

Optional

Valid values: positive integer ≥ 1

Default value: 10

learning_rate

The learning rate for the optimizer. IP Insights use a gradient-descent-based Adam optimizer. The learning rate effectively controls the step size to update model parameters at each iteration. Too large a learning rate can cause the model to diverge because the training is likely to overshoot a minima. On the other hand, too small a learning rate slows down convergence. Typical values range from 1e-4 to 1e-1.

Optional

Valid values: 1e-6 ≤ float ≤ 10.0

Default value: 0.001

mini_batch_size

The number of examples in each mini batch. The training procedure processes data in mini batches. The optimal value depends on the number of unique account identifiers in the dataset. In general, the larger the mini_batch_size, the faster the training and the greater the number of possible shuffled-negative-sample combinations. However, with a large mini_batch_size, the training is more likely to converge to a poor local minimum and perform relatively worse for inference.

Optional

Valid values: 1 ≤ positive integer ≤ 500000

Default value: 10,000

num_ip_encoder_layers

The number of fully connected layers used to encode the IP address embedding. The larger the number of layers, the greater the model's capacity to capture patterns among IP addresses. However, using a large number of layers increases the chance of overfitting.

Optional

Valid values: 0 ≤ positive integer ≤ 100

Default value: 1

random_negative_sampling_rate

The number of random negative samples, R, to generate per input example. The training procedure relies on negative samples to prevent the vector representations of the model collapsing to a single point. Random negative sampling generates R random IP addresses for each input account in the mini batch. The sum of the random_negative_sampling_rate (R) and shuffled_negative_sampling_rate (S) must be in the interval: 1 ≤ R + S ≤ 500.

Optional

Valid values: 0 ≤ positive integer ≤ 500

Default value: 1

shuffled_negative_sampling_rate

The number of shuffled negative samples, S, to generate per input example. In some cases, it helps to use more realistic negative samples that are randomly picked from the training data itself. This kind of negative sampling is achieved by shuffling the data within a mini batch. Shuffled negative sampling generates S negative IP addresses by shuffling the IP address and account pairings within a mini batch. The sum of the random_negative_sampling_rate (R) and shuffled_negative_sampling_rate (S) must be in the interval: 1 ≤ R + S ≤ 500.

Optional

Valid values: 0 ≤ positive integer ≤ 500

Default value: 1

weight_decay

The weight decay coefficient. This parameter adds an L2 regularization factor that is required to prevent the model from overfitting the training data.

Optional

Valid values: 0.0 ≤ float ≤ 10.0

Default value: 0.00001