Linear Learner Algorithm
Linear models are supervised learning algorithms used
for solving either classification or regression problems. For input, you give the
model
labeled examples (x, y). x is
a highdimensional vector and y is a numeric label. For binary
classification problems, the label must be either 0 or 1. For multiclass classification
problems, the labels must be from 0 to num_classes
 1. For regression
problems, y is a real number. The algorithm learns a linear function,
or, for classification problems, a linear threshold function, and maps a vector
x to an approximation of the label y.
The Amazon SageMaker linear learner algorithm provides a solution for both classification and regression problems. With the Amazon SageMaker algorithm, you can simultaneously explore different training objectives and choose the best solution from a validation set. You can also explore a large number of models and choose the best. The best model optimizes either of the following:

Continuous objective, such as mean square error, cross entropy loss, absolute error, and so on

Discrete objectives suited for classification, such as F1 measure, precision@recall, or accuracy
Compared with methods that provide a solution for only continuous objectives, the Amazon SageMaker linear learner algorithm provides a significant increase in speed over naive hyperparameter optimization techniques. It is also more convenient.
The linear learner algorithm requires a data matrix, with rows representing the
observations, and columns representing the dimensions of the features. It also requires
an
additional column that contains the labels that match the data points. At a minimum,
Amazon SageMaker linear learner requires you to specify input and output data locations,
and
objective type (classification or regression) as arguments. The feature dimension
is also
required. For more information, see CreateTrainingJob. You can specify additional parameters in the
HyperParameters
string map of the request body. These parameters control
the optimization procedure, or specifics of the objective function that you train
on. For
example, the number of epochs, regularization, and loss type.
Topics
Input/Output Interface for the Linear Learner Algorithm
The Amazon SageMaker linear learner algorithm supports three data channels: train,
validation
(optional), and test (optional). If you provide validation data,
it
should be FullyReplicated
. The algorithm logs validation
loss at every epoch, and uses a sample of the validation data to calibrate and select
the best model. If you don't provide validation data, the algorithm uses a sample
of the
training data to calibrate and select the model. If you provide test data, the algorithm
logs include the test score for the final model.
For
training, the linear learner algorithm supports both recordIOwrapped
protobuf
and CSV
formats. For the
application/xrecordioprotobuf
input type, only Float32 tensors are
supported. For the text/csv
input type, the first column is assumed to be
the label, which is the target variable for prediction. You can use either File mode
or
Pipe mode to train linear learner models on data that is formatted as
recordIOwrappedprotobuf
or as
CSV
.
For inference, the linear learner algorithm supports the
application/json
, application/xrecordioprotobuf
, and
text/csv
formats. For binary classification models, it returns both the
score and the predicted label. For regression, it returns only the score.
For more information on input and output file formats, see Linear Learner Response Formats for inference, and the Linear Learner Sample Notebooks.
EC2 Instance Recommendation for the Linear Learner Algorithm
You can train the linear learner algorithm on single or multimachine CPU and GPU instances. During testing, we have not found substantial evidence that multiGPU computers are faster than singleGPU computers. Results can vary, depending on your specific use case.
Linear Learner Sample Notebooks
For a sample notebook that uses the Amazon SageMaker linear learner algorithm to analyze the images of handwritten digits from zero to nine in the MNIST dataset, see An Introduction to Linear Learner with MNIST. For instructions on how to create and access Jupyter notebook instances that you can use to run the example in Amazon SageMaker, see Use Notebook Instances. After you have created a notebook instance and opened it, choose the SageMaker Examples tab to see a list of all of the Amazon SageMaker samples. The topic modeling example notebooks using the NTM algorithms are located in the Introduction to Amazon algorithms section. To open a notebook, choose its Use tab and choose Create copy.