Amazon Forecast
Developer Guide

This is prerelease documentation for a service in preview release. It is subject to change.

The Amazon Forecast API will undergo significant changes during scheduled maintenance occurring from 10 AM on 7/22/19 until 10 AM on 7/23/19. During maintenance, access to the Forecast APIs and console might be interrupted.

After 7/22/19, your Forecast resources (datasets, predictors, and forecasts) will no longer be available. However, you can save your forecasts for future use. We recommend using the CreateForecastExportJob API to save them to your S3 bucket before 7/22/19.

After maintenance concludes, before using the APIs, you must download a new SDK and modify your existing code to reflect the syntax changes. If you use only the console, you won’t need to make any changes.

We will provide new API documentation before scheduled maintenance begins. If you have questions, contact

Getting Started (AWS CLI)

In this exercise, you use the AWS Command Line Interface (CLI) to explore Amazon Forecast. You create an Amazon Forecast dataset, train a predictor, and use the resulting predictor to generate a forecast. Before you begin, make sure that you have an AWS account and that you've set up the AWS CLI. For more information, see Setting Up.

Note

The CLI commands in this exercise were tested on Linux. For information about using the CLI commands on Windows, see Specifying Parameter Values for the AWS Command Line Interface in the AWS Command Line Interface User Guide.

Step 1: Create an Amazon Forecast Dataset

Begin by creating a dataset and importing the electricity usage data into it.

To create an Amazon Forecast dataset

  1. Decide which domain and type of the dataset is appropriate.

    The training data that you will import into the dataset influences your choice of dataset domain and type. So, let's review a few sample rows of the electricity usage data:

    2014-01-01 01:00:00, 2.53807106598985, client_0 2014-01-01 01:00:00, 23.648648648648624, client_1 2014-01-01 02:00:00, 9.648648648612345, client_0

    The data format is CSV (comma-separated values), and it's collected hourly (as shown by the timestamps). It includes these columns:

    • Column 1 – Timestamps that show when electricity usage was recorded.

    • Column 2 – Hourly electricity usage values (note how the timestamp values increase by hour).

    • Column 3 – Client ID values that identify the customers using the electricity.

    For this data, choose the following predefined dataset domain and dataset type:

    • Custom domain – None of the dataset domains, such as METRICS, RETAIL, or WEB_TRAFFIC, applies to this data, so choose the Custom domain.

    • TARGET_TIME_SERIES type – The data is a time series because it tracks electricity usage over time. It also includes the target that we want to forecast (Column 2, electricity usage). Therefore, choose the TARGET_TIME_SERIES dataset type.

      To understand why you choose this type, see Predefined Dataset Domains and Dataset Types.

  2. Decide on a dataset schema.

    The TARGET_TIME_SERIES type for the CUSTOM Domain requires these fields; timestamp, target_value, and item_id. The target_value field is the target, Amazon Forecast generates the forecast for this field.

    To map the required fields to columns in your data, you can do one of the following:

    • Add headers in your data files. For example:

      timestamp, target_value, item_id 2014-01-01 01:00:00, 2.53807106598985, client_0 2014-01-01 01:00:00, 23.648648648648624, client_1 2014-01-01 02:00:00, 9.648648648612345, client_0
    • Specify a schema when you create the dataset. For this exercise, you specify the following schema.

      Important

      The order of fields names in the schema must match the order of fields in the training data.

      { "Attributes":[ { "AttributeName":"timestamp", "AttributeType":"timestamp" }, { "AttributeName":"target_value", "AttributeType":"float" }, { "AttributeName":"item_id", "AttributeType":"string" } ] }

    You now have the information necessary to create a dataset and import data into it.

  3. Create the dataset. For more information about this operation, see CreateDataset.

    aws forecast create-dataset \ --domain CUSTOM \ --dataset-type TARGET_TIME_SERIES \ --data-format CSV \ --dataset-name electricity_demand_ds \ --data-frequency 1H \ --time-stamp-format 'yyyy-MM-dd HH:mm:ss' \ --schema '{ "Attributes":[ { "AttributeName":"timestamp", "AttributeType":"timestamp" }, { "AttributeName":"target_value", "AttributeType":"float" }, { "AttributeName":"item_id", "AttributeType":"string" } ] }'

    In the request, the data-frequency value 1H represents a data collection frequency of hourly. This is the response:

    { "DatasetArn": "arn:aws:forecast:us-west-2:acct-id:ds/electricity_demand_ds", "DatasetName": "electricity_demand_ds" }
  4. (Optional) Get the description of the dataset:

    aws forecast describe-dataset \ --dataset-name electricity_demand_ds

    This is the response:

    { "DataFormat": "CSV", "DatasetArn": "arn:aws:forecast:us-west-2:acct-id:ds/electricity_demand_ds", "DatasetName": "electricity_demand_ds", "ScheduleExpression": "none", "DatasetType": "TARGET_TIME_SERIES", "Status": "ACTIVE", "Domain": "CUSTOM" }
  5. Create a dataset group and add the dataset to it:

    aws forecast create-dataset-group \ --dataset-group-name electricity_ds_group \ --dataset-names electricity_demand_ds \ --role-arn role-ARN

    This is an example response:

    { "DatasetGroupName": "electricity_ds_group", "DatasetGroupArn": "arn:aws:forecast:us-west-2:acct-id:dsgroup/electricity_ds_group" }
  6. (Optional) Get the description of the dataset group:

    aws forecast describe-dataset-group \ --dataset-group-name electricity_ds_group

    Here is an example response:

    { "DatasetGroupName": "electricity_ds_group", "DatasetGroupArn": "arn:aws:forecast:us-west-2:acct-id:dsgroup/electricity_ds_group", "Datasets": [ "electricity_demand_ds" ], "RoleArn": "arn:aws:iam::acct-id:role/ForecastRole" }

    For more information, see CreateDatasetGroup.

  7. Import the electricity usage training data to the dataset:

    aws forecast create-dataset-import-job \ --dataset-name electricity_demand_ds \ --dataset-group-name electricity_ds_group \ --delimiter ',' \ --s3-uri s3://bucket/electricityusagedata.csv

    This is an example response (your version ID will be different):

    { "DatasetName": "electricity_demand_ds", "DatasetArn": "arn:aws:forecast:us-west-2:acct-id:ds/electricity_demand_ds", "VersionId": "5242d43a" }

    For more information about the operation, see CreateDatasetImportJob.

  8. Check the import status.

    aws forecast describe-dataset-import-job \ --dataset-name electricity_demand_ds \ --version-id version-id

    This is an example response:

    { "Status": "CREATING", "FieldStatistics": {}, "DatasetName": "electricity_demand_ds", "VersionId": "version-id", "DatasetArn": "arn:aws:forecast:us-west-2:acct-id:ds/electricity_demand_ds" }

    Important

    Don't proceed until the status is ACTIVE.

    When all of the data has been imported, you get a response similar to the following. It includes statistics for the data.

    { "DatasetName": "electricity_demand_ds", "VersionId": "5242d43a", "FieldStatistics": { "target": { "Max": "168200.0", "Count": 3241200, "CountDistinct": 1196961, "Min": "0.0", "CountNull": 0, "Avg": 606.5167610461679, "Stddev": 3518.405223972031 }, "item": { "Count": 3241200, "CountDistinct": 370, "CountNull": 0 }, "date": { "Max": "2015-01-01T00:00:00Z", "Count": 3241200, "Min": "2014-01-01T01:00:00Z", "CountDistinct": 8760, "CountNull": 0 } }, "Status": "ACTIVE", "DatasetArn": "arn:aws:forecast:us-west-2:acct-id:ds/electricity_demand_ds" }

    For more information about the operation, see DescribeDatasetImportJob.

Step 2: Create a Predictor

To create a predictor, you use the CreatePredictor operation and provide the following:

  • A recipe – A recipe provides an algorithm that trains a predictor using data in the dataset group. For this exercise, you use a recipe called forecast_DEEP_AR, which is provided by Amazon Forecast. For a list of recipes that Amazon Forecast provides and when to use them, see Choosing an Amazon Forecast Recipe.

    Note

    If you aren't sure which recipe to use, you can set the AutoML flag in the CreatePredictor operation to tell Amazon Forecast to run AutoML. AutoML determines which recipe or recipes to use for predictor training.

  • A dataset group – You created the dataset group in the preceding step.

To create a predictor

  1. (Optional) See which recipes are available.

    1. (Optional) Get a list of recipes:

      aws forecast list-recipes

      This is the response:

      "RecipeNames": { [ "forecast_ARIMA", "forecast_DEEP_AR", "forecast_DEEP_AR_PLUS", "forecast_ETS", "forecast_NPTS", "forecast_PROPHET" ] }

      For this exercise, you use the forecast_DEEP_AR (DeepAR) recipe.

    2. (Optional) Get details about the forecast_DEEP_AR recipe:

      aws forecast describe-recipe \ --recipe-name forecast_DEEP_AR

      This is the response. It includes the default hyperparameter configuration information.

      { "Recipe": { "Name": "forecast_DEEP_AR", "Train": [ { "TrainingInfo": { "AlgorithmName": "DEEP_AR", "TrainedModelName": "algorithm_DEEP_AR", "TrainingParameters": { "num_layers": "2", "learning_rate": "1E-3", "mini_batch_size": "128", "epochs": "400", "likelihood": "student-t", "dropout_rate": "0.1", "early_stopping_patience": "50", "num_cells": "40" } }, "BackTestWindowCount": 2, "MetricsBuckets": [] } ] } }
  2. Create the predictor and review the evaluation metrics.

    1. Create a predictor:

      aws forecast create-predictor \ --dataset-group-name electricity_ds_group \ --recipe-name forecast_DEEP_AR \ --predictor-name electricitypredictor \ --forecast-horizon 20

      This is an example response (your version ID will be different):

      { "PredictorArn": "arn:aws:forecast:us-west-2:acct-id:predictor/electricitypredictor", "PredictorName": "electricitypredictor", "VersionId": "51e6d9d5" }

      For more information about the operation, see CreatePredictor. For more information about predictors, see Predictors.

    2. (Optional) Get a list of predictors in your account:

      aws forecast list-predictors
    3. (Optional) Get a list of version IDs for the predictor (right now, you have only one version of the predictor):

      aws forecast list-predictor-versions \ --predictor-name electricitypredictor
    4. This is an example response:

      { "PredictorVersions": [ { "VersionId": "51e6d9d5", "PredictorName": "electricitypredictor" } ] }
    5. Get the predictor's status:

      aws forecast describe-predictor \ --predictor-name electricitypredictor \ --version-id version-id

      This is an example response:

      { "RecipeParameters": {}, "Status": "CREATING", "DatasetGroup": "electricity_ds_group", "CreationStartTime": "2018-12-05T19:49:48.536Z", "VersionId": "version-id", "RecipeName": "forecast_DEEP_AR", "PredictorArn": "arn:aws:forecast:us-west-2:acct-id:predictor/electricitypredictor", "PredictorName": "electricitypredictor" }

      Important

      Model training takes time. Don't proceed until it has completed and the status of the predictor is ACTIVE.

  3. Get the accuracy metrics for the predictor.

    In addition to training a predictor, the CreatePredictor operation evaluates the predictor. In production, you can use the metrics to decide whether to use the predictor for generating forecasts. To see the evaluation metrics, use the GetAccuracyMetrics operation.

    aws forecastquery get-accuracy-metrics \ --predictor-name electricitypredictor

    This is an example response:

    { "ModelMetrics": { "DEEP_AR": { "Metrics": { "p50": "0.12013971889173483", "p90": "0.07030490020765823", "rmse": "329.8365020572516", "p10": "0.07369659992536774" }, "MetricsByBucket": [] } } }

    The metrics show the error loss data for each quantile. For example, the p10 value shows that there was a 6.95% error on the quantile. The metrics also show the root-mean-square error (rmse) value.

Step 3: Deploy the Predictor to Generate a Forecast

To generate a forecast, you use the DeployPredictor operation to deploy the predictor. Amazon Forecast generates a forecast for the target_value field (as determined by the dataset domain and type) for each unique item_id in the dataset. In this example, the target_value field provides electricity usage and the item_id provides client IDs. You get a forecast for the hourly electricity usage by customer.

To deploy the predictor to generate a forecast

  1. Deploy the predictor:

    aws forecast deploy-predictor \ --predictor-name electricitypredictor

    The operation uses the predictor to run inference and generate a forecast. In the response, you get the Amazon Resource Name (ARN) of the forecast. You use this ARN to retrieve the forecast. For example:

    { "PredictorArn": "arn:aws:forecast:us-west-2:acct-id:predictor/electricitypredictor", "VersionId": "51e6d9d5", "PredictorName": "electricitypredictor" }

    For more information, see DeployPredictor.

  2. (Optional) List the deployed predictors:

    aws forecast list-deployed-predictors

    This is an example response:

    { "PredictorNames": [ "electricitypredictor" ] }
  3. Get information about the predictor, including its status. If the deployment status is CREATING, wait until Amazon Forecast creates the forecast (completes the deployment).

    aws forecast describe-deployed-predictor \ --predictor-name electricitypredictor

    This is an example response (version ID is an example value):

    { "PredictorArn": "arn:aws:forecast:us-west-2:acct-id:predictor/electricitypredictor", "DeploymentInProgressVersionId": "78974aa1", "PredictorName": "electricitypredictor", "ScheduleExpression": "none", "Status": "CREATING" }

    Important

    Don't proceed until the status is ACTIVE.

  4. Retrieve the forecast. In this exercise, you first retrieve the forecast ID and then use it to get the forecast.

    1. Get the forecast ID generated by the DeployPredictor operation:

      aws forecast list-forecasts \ --predictor-name electricitypredictor

      The command doesn't specify a version ID for the predictor. Therefore, the operation returns forecast IDs of all versions of the predictor generated by the deployment. So far, you have only one version of a predictor, which you deployed once. So you get one forecast ID. A sample response is shown:

      { "ForecastInfoList": [ { "VersionId": "78974aa1", "ForecastId": "1542737866_6f24c60e", "PredictorName": "electricitypredictor" } ] }

      For more information, see ListForecasts.

    2. Get the forecast:

      aws forecastquery get-forecast \ --predictor-name electricitypredictor \ --forecast-id 1544043602_a26d5c01 \ --start-date 2014-01-31T00:00:00Z \ --end-date 2015-02-01T00:00:00Z \ --interval hour \ --filters '{"item_id":"client_1"}'

      The command includes the following optional parameters:

      • filters – Specifies the item_id filter to retrieve the electricity forecast for client_0.

      • start-date and end-date – Specifies an optional date range to retrieve the forecast for. If you don't specify these parameters, the operation returns the entire forecast for up to 1 year.

      • interval – Specifies the aggregation interval. Ideally, you set this value to the data collection frequency that you specified when you created the dataset. For this exercise, you specify hourly. To get an aggregated forecast every day, you can also specify a greater value, such as daily.

      This is an example response:

      { "Forecast": { "ForecastId": "1544043602_a26d5c01", "Predictions": { "p90": [ { "Val": 22.457826614379883, "Date": "2015-01-01T01:00:00" }, { "Val": 22.60888671875, "Date": "2015-01-01T02:00:00" }, { "Val": 21.2325496673584, "Date": "2015-01-01T03:00:00" }, ... ], "mean": [ { "Val": 19.724512100219727, "Date": "2015-01-01T01:00:00" }, { "Val": 18.654216766357422, "Date": "2015-01-01T02:00:00" }, { "Val": 18.023365020751953, "Date": "2015-01-01T03:00:00" }, ... ], "p50": [ { "Val": 19.862913131713867, "Date": "2015-01-01T01:00:00" }, { "Val": 18.67522430419922, "Date": "2015-01-01T02:00:00" }, { "Val": 18.17038345336914, "Date": "2015-01-01T03:00:00" }, ... ], "p10": [ { "Val": 17.527488708496094, "Date": "2015-01-01T01:00:00" }, { "Val": 15.716723442077637, "Date": "2015-01-01T02:00:00" }, ... ] } } }

      Because this is an hourly forecast, the response shows hourly forecast values. In the response, note the following:

      • mean – For the specific date and time, the mean is the predicted mean electricity usage value for the customer.

      • p50 – Amazon Forecast is 50% confident that at the specified date and time the actual value will be below the listed value. The same applies for p90 and p10.

      For more information about this operation, see GetForecast.

  5. Export the forecast to your Amazon S3 bucket. The IAM role that you provide must have permission to allow the S3:PutObject action to write data to your S3 bucket.

    1. Create a forecast export job:

      aws forecast create-forecast-export-job \ --forecast-id forecast-id \ --output-path '{"S3Uri":"s3://bucket-name","RoleArn":"roleArn"}'

      This is an example response:

      { "ForecastExportJobId": "64bbc087", "ForecastExportArn": "arn:aws:forecast::us-west-2:acct-id:forecast-export/64bbc087" }
    2. Get the status of the export job.

    3. Export the forecast to your S3 bucket:

      aws forecast describe-forecast-export-job \ --forecast-export-job-id 64bbc087

      This is an example response:

      { "ForecastId": "1542400676_e2bcefb1", "OutputPath": { "S3Uri": "s3://bucket-name", "RoleArn": "arn:aws:iam::acct-id:role/METRICS-beta" }, "ForecastExportArn": "arn:aws:forecast::us-west-2:acct-id:forecast-export/64bbc087", "Status": "CREATING" }

    When the status is ACTIVE, you can find the objects in the specified S3 bucket.