The Spline Quantile Forecaster (SQF) Recipe
The Amazon Forecast Spline Quantile Forecaster (SQF) recipe is a recurrent neural network (RNN)based supervised learning algorithm for probabilistic forecasting of scalar (onedimensional) time series. The underlying RNN architecture is similar to that used in DeepAR+, except that it doesn't use likelihood. Instead, SQF estimates the quantile function directly, and uses the continuous rank probability score (CRPS) as a loss function. SQF is suitable for time series that do not follow standard parametric likelihood assumptions (for example, data with heavy tails). Currently, SQF doesn't support count data.
How SQF Works
To forecast with SQF, you must train a model using a training dataset. The dataset consists of (preferably more than one) target time series, optionally associated with a vector of feature time series and a vector of categorical features.
Training an Amazon Forecast SQF model is similar to training a DeepAR+ model. Training
time series and features are augmented with builtin daterelated features (such as
hourofday
, dayofweek
, and so on). Multiple training
examples are sliced off of the augmented training data. Each example consists of a
context window, whose length is controlled by the context_length
hyperparameter, and the prediction window that immediately follows, whose length is
controlled by the prediction_length
hyperparameter.
Similar to the example for DeepAR+, the following example for an element of a training set indexed by i applies to SQF. It consists of a target time series, Z_{i,t}, and two associated feature time series, U_{i,1,t} and X_{i,2,t} .
Additional lagged values of the training time series are included in each training example. The lags are selected to capture common seasonality patterns, using information about the frequency of the data. For example, if a model is trained using hourly data, each training example also includes data from 24, 48, and 72 hours before.
Exclusive Features of SQF
Probabilistic models use a parametric distribution to predict each future data point. The distribution can be, for example, Gaussian, Student's t, or negative binomial, and is described by a few parameters. Although it is simpler to make predictions with these distributions, they have significant limitations. For example, they cannot represent longtail or multimodal distributions.
SQF overcomes these shortcomings. As the following figure shows, SQF considers a much wider family of piecewiselinear functions to parametrize the quantile function (the inverse of the cumulative distribution). In the right plot, the original quantile function (dotted line) is approximated by means of a piecewiselinear function (blue line). This is equivalent to approximating the corresponding density with a piecewiseconstant density function (left plot).
The algorithm automatically determines the shape of the predicted distribution from training data. The distribution can be unimodal, multimodal, symmetric, skewed, bounded, or unbounded.
To use the Amazon Forecast SQF recipe, follow the best practices for using
the DeepAR+ recipe. In summary, set the likelihood
to be a
piecewiselinear function, and set all other hyperparameters the same way as for
DeepAR+ with the same default values.
SQF Hyperparameters
The Amazon Forecast SQF recipe has the same hyperparameters as Amazon Forecast DeepAR+. For information about the Amazon Forecast SQF hyperparameters, see DeepAR+ Hyperparameters.
Tune SQF Models
To achieve the best results with the SQF recipe, consider the following points when preparing the data.

Except for splitting the dataset for training and testing, always provide entire time series for training, testing, and calling the model for prediction. Regardless of how you set
context_length
, don't divide the time series or provide only a part of the time series. The model will use data points further back thancontext_length
for the lagged values feature. 
For model tuning, you can split the dataset into a training dataset and a testing dataset. In a typical evaluation scenario, you should use the same time series that you use for training to test the model on the future
prediction_length
time points immediately after the last time point visible during training. You can create training and testing datasets that satisfy these criteria by using the entire dataset (all time series of full length) as a testing set and removing the lastprediction_length
points from each time series for training during training. This way, the model doesn't see the target values for time points to be evaluated during testing. In the test phase, the lastprediction_length
points of each time series in the test dataset are withheld and a prediction is generated. The forecast is then compared with the actual values for the lastprediction_length
points. You can create more complex evaluations by repeating time series multiple times in the testing dataset, but cutting them at different endpoints. This produces accuracy metrics that are averaged over multiple forecasts from different time points. 
Don't use very large values (> 400) for the
prediction_length
because this makes the model slow and less accurate. If you want to forecast further into the future, consider aggregating at a higher frequency. For example, use5min
instead of1min
. 
Because of lags, the model can look further back than the value that you set in
context_length
. Therefore, you don't need to set this hyperparameter to a large value. A good starting point is to set this hyperparameter to the same value as theprediction_length
. 
Train an SQF model on as many time series as are available. Although an SQF model trained on a single time series might work well, standard forecasting methods, such as ARIMA or ETS, might be more accurate and are more tailored to this use case. The SQF approach starts to outperform the standard methods when your dataset contains hundreds of related time series.
Also keep in mind that SQF predicts a larger number of parameters about future data
than the DeepAR+ recipe using ordinary likelihood functions. For this reason, training
an optimal model might require more epochs. To do this, increase the epochs
and early_stopping_patience
hyperparameters.