This is prerelease documentation for a service in preview release. It is subject to change.

The Spline Quantile Forecaster (SQF) Recipe

The Amazon Forecast Spline Quantile Forecaster (SQF) recipe is a recurrent neural network (RNN)-based supervised learning algorithm for probabilistic forecasting of scalar (one-dimensional) time series. The underlying RNN architecture is similar to that used in DeepAR+, except that it doesn't use likelihood. Instead, SQF estimates the quantile function directly, and uses the continuous rank probability score (CRPS) as a loss function. SQF is suitable for time series that do not follow standard parametric likelihood assumptions (for example, data with heavy tails). Currently, SQF doesn't support count data.

How SQF Works

To forecast with SQF, you must train a model using a training dataset. The dataset consists of (preferably more than one) target time series, optionally associated with a vector of feature time series and a vector of categorical features.

Training an Amazon Forecast SQF model is similar to training a DeepAR+ model. Training time series and features are augmented with built-in date-related features (such as hour-of-day, day-of-week, and so on). Multiple training examples are sliced off of the augmented training data. Each example consists of a context window, whose length is controlled by the context_length hyperparameter, and the prediction window that immediately follows, whose length is controlled by the prediction_length hyperparameter.

Similar to the example for DeepAR+, the following example for an element of a training set indexed by i applies to SQF. It consists of a target time series, Zi,t, and two associated feature time series, Ui,1,t and Xi,2,t .


                Image: SQF time series, full sampled.

Additional lagged values of the training time series are included in each training example. The lags are selected to capture common seasonality patterns, using information about the frequency of the data. For example, if a model is trained using hourly data, each training example also includes data from 24, 48, and 72 hours before.


                Image: SQF time series, full-lags.

Exclusive Features of SQF

Probabilistic models use a parametric distribution to predict each future data point. The distribution can be, for example, Gaussian, Student's t, or negative binomial, and is described by a few parameters. Although it is simpler to make predictions with these distributions, they have significant limitations. For example, they cannot represent long-tail or multi-modal distributions.

SQF overcomes these shortcomings. As the following figure shows, SQF considers a much wider family of piecewise-linear functions to parametrize the quantile function (the inverse of the cumulative distribution). In the right plot, the original quantile function (dotted line) is approximated by means of a piecewise-linear function (blue line). This is equivalent to approximating the corresponding density with a piecewise-constant density function (left plot).


                    Image: SQF exclusive features

The algorithm automatically determines the shape of the predicted distribution from training data. The distribution can be uni-modal, multi-modal, symmetric, skewed, bounded, or unbounded.

To use the Amazon Forecast SQF recipe, follow the best practices for using the DeepAR+ recipe. In summary, set the likelihood to be a piecewise-linear function, and set all other hyperparameters the same way as for DeepAR+ with the same default values.

SQF Hyperparameters

The Amazon Forecast SQF recipe has the same hyperparameters as Amazon Forecast DeepAR+. For information about the Amazon Forecast SQF hyperparameters, see DeepAR+ Hyperparameters.

Tune SQF Models

To achieve the best results with the SQF recipe, consider the following points when preparing the data.

  • Except for splitting the dataset for training and testing, always provide entire time series for training, testing, and calling the model for prediction. Regardless of how you set context_length, don't divide the time series or provide only a part of the time series. The model will use data points further back than context_length for the lagged values feature.

  • For model tuning, you can split the dataset into a training dataset and a testing dataset. In a typical evaluation scenario, you should use the same time series that you use for training to test the model on the future prediction_length time points immediately after the last time point visible during training. You can create training and testing datasets that satisfy these criteria by using the entire dataset (all time series of full length) as a testing set and removing the last prediction_length points from each time series for training during training. This way, the model doesn't see the target values for time points to be evaluated during testing. In the test phase, the last prediction_length points of each time series in the test dataset are withheld and a prediction is generated. The forecast is then compared with the actual values for the last prediction_length points. You can create more complex evaluations by repeating time series multiple times in the testing dataset, but cutting them at different endpoints. This produces accuracy metrics that are averaged over multiple forecasts from different time points.

  • Don't use very large values (> 400) for the prediction_length because this makes the model slow and less accurate. If you want to forecast further into the future, consider aggregating at a higher frequency. For example, use 5min instead of 1min.

  • Because of lags, the model can look further back than the value that you set in context_length. Therefore, you don't need to set this hyperparameter to a large value. A good starting point is to set this hyperparameter to the same value as the prediction_length.

  • Train an SQF model on as many time series as are available. Although an SQF model trained on a single time series might work well, standard forecasting methods, such as ARIMA or ETS, might be more accurate and are more tailored to this use case. The SQF approach starts to outperform the standard methods when your dataset contains hundreds of related time series.

Also keep in mind that SQF predicts a larger number of parameters about future data than the DeepAR+ recipe using ordinary likelihood functions. For this reason, training an optimal model might require more epochs. To do this, increase the epochs and early_stopping_patience hyperparameters.