Formatting your data - Amazon Lookout for Equipment

Formatting your data

To monitor your equipment, you must provide Amazon Lookout for Equipment with time-series data from the sensors on your equipment. The data that you're providing to Lookout for Equipment is a series of numerical measurements from the sensors. You provide this data from either a data historian or Amazon Simple Storage Service (Amazon S3). A data historian is a software program that records and retrieves sensor data from your equipment.

To provide Amazon Lookout for Equipment with time-series data from the sensors, you must use properly formatted .csv files to create a dataset. Creating a dataset aggregates the data in a format that is suitable for analysis. You create a dataset for a single piece of equipment, or asset. You train an ML model on the dataset that you create. You then use that model to monitor your asset. You don't have to use all the data from the sensors to train a model. You train a model using data from some of the sensors in the dataset.

You can store the data for your asset in one of the following ways:

  • Recommended: Using one .csv file for each sensor

  • Storing all of the sensor data in one .csv file

Each .csv file must have at least two columns. The first column of the file is a timestamp that indicates the date and time. You must have at least one additional column containing the data from a sensor. Each subsequent column can have data from a different sensor.

You must have a double (numerical) as the data type for your sensor data. You can only train your model on numeric data.

When you are preparing your data, you should keep the following information in mind:

  • The data across all of your .csv files must span at least 180 days. For example, you can have two .csv files, with one file having data from January to April, and the other having data from May to August.

  • You can create a dataset with up to 3,000 sensors, but you can train a model on up to 300 sensors.

  • The maximum length of a sensor name is 200 characters.

  • The size of a .csv file can't exceed 5 GB. If you want to create a dataset greater than 5 GB, you must use multiple .csv files.

  • The files that you use to create a dataset can't exceed 50 GB in total.

  • You can use up to 1,000 files to create a dataset.

  • You can use the following delimiters for the data in the timestamp column:

    • '-'

    • '_'

    • ' '


      Quotation marks are used around the delimiters to make them easier to read.

  • The timestamp column can use the following formats:

    • yyyy-MM-dd-HH-mm-ss

    • yyyy-MM-dd'T'HH:mm:ss

    • yyyy-MM-dd HH:mm:ss

    • yyyy-MM-dd-HH:mm:ss

    • yyyy/MM/dd'T'HH:mm:ss

    • yyyy/MM/dd HH:mm:ss

    • yyyy MM dd'T'HH:mm:ss

    • yyyy MM dd HH:mm:ss

    • yyyyMMdd'T'HH:mm:ss

    • yyyyMMdd HH:mm:ss

    • yyyyMMddHHmmss

    • yyyy-MM-dd'T'HH:mm

    • yyyy-MM-dd HH:mm

    • yyyy-MM-dd-HH:mm

    • yyyy/MM/dd'T'HH:mm

    • yyyy/MM/dd HH:mm

    • yyyy MM dd'T'HH:mm

    • yyyy MM dd HH:mm

    • yyyyMMdd'T'HH:mm

    • yyyyMMdd HH:mm

    • yyyyMMddHHmm

  • The valid characters that you can use in the column names of the dataset are 0 to 9, a to z, A to Z, ., \ _,and -.

You can use label data to highlight any part of your dataset where your asset functioned abnormally. For more information, see Label data.

The following examples show you the different ways that you can format a .csv file.

If your are storing the data from each sensor in one .csv file, use the following table to see how to format the data.


Sensor 3

1/1/2020 0:00 34
1/1/2020 0:05 33
1/1/2020 0:10 35
1/1/2020 0:15 33
1/1/2020 0:20 34

The following example shows the information from the preceding table as a .csv file.

Timestamp,Sensor 3 1/1/2020 0:00,34 1/1/2020 0:05,33 1/1/2020 0:10,35 1/1/2020 0:15,33 1/1/2020 0:20,34

We recommend using "Timestamp" as the name for the column with the time-series data. For the column with data from the sensor, we recommend using a name that distinguishes it from other sensors.

To store the data for your asset in one .csv file, you arrange the data in the following format.


Sensor 1

Sensor 2

1/1/2020 0:00 2 12
1/1/2020 0:05 3 11
1/1/2020 0:10 5 10
1/1/2020 0:15 3 9
1/1/2020 0:20 4 12

The following example shows the information from the preceding table as a .csv file.

Timestamp,Sensor 1,Sensor 2 1/1/2020 0:00,2,12 1/1/2020 0:05,3,11 1/1/2020 0:10,5,10 1/1/2020 0:15,3,9 1/1/2020 0:20,4,12

You can choose your column names. We recommend using "Timestamp" as the name for the column with the time-series data. For the names of the columns with data from your sensors, we recommend using names that distinguish one sensor from another.

If you have them available, we recommend using labels for abnormal equipment behavior in your data. These labels could be applied to periods when the equipment did not function properly. You store the label data as a .csv file that consists of two columns. The file has no header. The first column has the start time of the abnormal behavior. The second column has the end time.

The following example shows how your label data should appear as a .csv file.

2020-02-01T20:00:00.000000,2020-02-03T00:00:00.000000 2020-07-01T20:00:00.000000,2020-07-03T00:01:00.000000

Next step

Uploading your data into Amazon S3