Factorization Machines Algorithm
A factorization machine is a generalpurpose supervised learning algorithm that you can use for both classification and regression tasks. It is an extension of a linear model that is designed to capture interactions between features within high dimensional sparse datasets economically. For example, in a click prediction system, the factorization machine model can capture click rate patterns observed when ads from a certain adcategory are placed on pages from a certain pagecategory. Factorization machines are a good choice for tasks dealing with high dimensional sparse datasets, such as click prediction and item recommendation.
The Amazon SageMaker implementation of factorization machines considers only pairwise (2nd order) interactions between features.
Topics
 Input/Output Interface for the Factorization Machines Algorithm
 EC2 Instance Recommendation for the Factorization Machines Algorithm
 Factorization Machines Sample Notebooks
 How Factorization Machines Work
 Factorization Machines Hyperparameters
 Tune a Factorization Machines Model
 Factorization Machine Response Formats
Input/Output Interface for the Factorization Machines Algorithm
The factorization machine algorithm can be run in either in binary classification mode or regression mode. In each mode, a dataset can be provided to the test channel along with the train channel dataset. The scoring depends on the mode used. In regression mode, the testing dataset is scored using Root Mean Square Error (RMSE). In binary classification mode, the test dataset is scored using Binary Cross Entropy (Log Loss), Accuracy (at threshold=0.5) and F1 Score (at threshold =0.5).
For training, the factorization machines algorithm
currently supports only the recordIOprotobuf
format with
Float32
tensors. Because their use case is predominantly on sparse
data, CSV
is not a good candidate. Both File and Pipe mode training are
supported for recordIOwrapped protobuf.
For inference, factorization machines support the
application/json
and xrecordioprotobuf
formats.

For the binary classification problem, the algorithm predicts a score and a label. The label is a number and can be either
0
or1
. The score is a number that indicates how strongly the algorithm believes that the label should be1
. The algorithm computes score first and then derives the label from the score value. If the score is greater than or equal to 0.5, the label is1
. 
For the regression problem, just a score is returned and it is the predicted value. For example, if Factorization Machines is used to predict a movie rating, score is the predicted rating value.
Please see Factorization Machines Sample Notebooks for more details on training and inference file formats.
EC2 Instance Recommendation for the Factorization Machines Algorithm
The Amazon SageMaker Factorization Machines algorithm is highly scalable and can train across distributed instances. We recommend training and inference with CPU instances for both sparse and dense datasets. In some circumstances, training with one or more GPUs on dense data might provide some benefit. Training with GPUs is available only on dense data. Use CPU instances for sparse data.
Factorization Machines Sample Notebooks
For a sample notebook that uses the Amazon SageMaker factorization machine learning
algorithm to
analyze the images of handwritten digits from zero to nine in the MNIST dataset, see
An Introduction to Factorization Machines with MNIST