IP Insights Hyperparameters

In the CreateTransformJob request, you specify the training algorithm. You can also specify algorithm-specific hyperparameters as string-to-string maps. The following table lists the hyperparameters for the Amazon SageMaker IP Insights algorithm.

Parameter Name	Description
`num_entity_vectors`	The number of entity vector representations (entity embedding vectors) to train. Each entity in the training set is randomly assigned to one of these vectors using a hash function. Because of hash collisions, it might be possible to have multiple entities assigned to the same vector. This would cause the same vector to represent multiple entities. This generally has a negligible effect on model performance, as long as the collision rate is not too severe. To keep the collision rate low, set this value as high as possible. However, the model size, and, therefore, the memory requirement, for both training and inference, scales linearly with this hyperparameter. We recommend that you set this value to twice the number of unique entity identifiers. Required Valid values: 1 ≤ positive integer ≤ 250,000,000
`vector_dim`	The size of embedding vectors to represent entities and IP addresses. The larger the value, the more information that can be encoded using these representations. In practice, model size scales linearly with this parameter and limits how large the dimension can be. In addition, using vector representations that are too large can cause the model to overfit, especially for small training datasets. Overfitting occurs when a model doesn't learn any pattern in the data but effectively memorizes the training data and, therefore, cannot generalize well and performs poorly during inference. The recommended value is 128. Required Valid values: 4 ≤ positive integer ≤ 4096
`batch_metrics_publish_interval`	The interval (every X batches) at which the Apache MXNet Speedometer function prints the training speed of the network (samples/second). Optional Valid values: positive integer ≥ 1 Default value: 1,000
`epochs`	The number of passes over the training data. The optimal value depends on your data size and learning rate. Typical values range from 5 to 100. Optional Valid values: positive integer ≥ 1 Default value: 10
`learning_rate`	The learning rate for the optimizer. IP Insights use a gradient-descent-based Adam optimizer. The learning rate effectively controls the step size to update model parameters at each iteration. Too large a learning rate can cause the model to diverge because the training is likely to overshoot a minima. On the other hand, too small a learning rate slows down convergence. Typical values range from 1e-4 to 1e-1. Optional Valid values: 1e-6 ≤ float ≤ 10.0 Default value: 0.001
`mini_batch_size`	The number of examples in each mini batch. The training procedure processes data in mini batches. The optimal value depends on the number of unique account identifiers in the dataset. In general, the larger the `mini_batch_size`, the faster the training and the greater the number of possible shuffled-negative-sample combinations. However, with a large `mini_batch_size`, the training is more likely to converge to a poor local minimum and perform relatively worse for inference. Optional Valid values: 1 ≤ positive integer ≤ 500000 Default value: 10,000
`num_ip_encoder_layers`	The number of fully connected layers used to encode the IP address embedding. The larger the number of layers, the greater the model's capacity to capture patterns among IP addresses. However, using a large number of layers increases the chance of overfitting. Optional Valid values: 0 ≤ positive integer ≤ 100 Default value: 1
`random_negative_sampling_rate`	The number of random negative samples, R, to generate per input example. The training procedure relies on negative samples to prevent the vector representations of the model collapsing to a single point. Random negative sampling generates R random IP addresses for each input account in the mini batch. The sum of the `random_negative_sampling_rate` (R) and `shuffled_negative_sampling_rate` (S) must be in the interval: 1 ≤ R + S ≤ 500. Optional Valid values: 0 ≤ positive integer ≤ 500 Default value: 1
`shuffled_negative_sampling_rate`	The number of shuffled negative samples, S, to generate per input example. In some cases, it helps to use more realistic negative samples that are randomly picked from the training data itself. This kind of negative sampling is achieved by shuffling the data within a mini batch. Shuffled negative sampling generates S negative IP addresses by shuffling the IP address and account pairings within a mini batch. The sum of the `random_negative_sampling_rate` (R) and `shuffled_negative_sampling_rate` (S) must be in the interval: 1 ≤ R + S ≤ 500. Optional Valid values: 0 ≤ positive integer ≤ 500 Default value: 1
`weight_decay`	The weight decay coefficient. This parameter adds an L2 regularization factor that is required to prevent the model from overfitting the training data. Optional Valid values: 0.0 ≤ float ≤ 10.0 Default value: 0.00001

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

How It Works

Model Tuning