A/B Testing - Machine Learning Lens

This whitepaper is in the process of being updated.

A/B Testing

A/B testing is a technique that you can use to compare the performance of different versions of the same feature, while monitoring a high-level metric, such as the click-through rate or conversion rate. In this context, this means making inferences using different models for different users, and then analyzing the results. The different models are built using the same algorithm (the built-in Amazon SageMaker algorithm or your custom algorithm), but using two different hyperparameter settings.

A/B testing is similar to canary testing, but has larger user groups and a longer time scale, typically days or even weeks. For this type of testing, Amazon SageMaker endpoint configuration uses two production variants: one for model A, and one for model B. To begin, configure the settings for both models to balance traffic between the models equally (50/50) and make sure that both models have identical instance configurations. After you have monitored the performance of both models with the initial setting of equal weights, you can either gradually change the traffic weights to put the models out of balance (60/40, 80/20, etc.), or you can change the weights in a single step, continuing until a single model is processing all of the live traffic.

The following is a sample production variant configuration for A/B testing.

ProductionVariants=[ { 'InstanceType':'ml.m4.xlarge', 'InitialInstanceCount':1, 'ModelName':'model_name_a', 'VariantName':'Model-A', 'InitialVariantWeight':1 }, { 'InstanceType':'ml.m4.xlarge', 'InitialInstanceCount':1, 'ModelName':'model_name_b', 'VariantName':'Model-B', 'InitialVariantWeight':1 } ])