Creating a dataset from multiple .csv files - Amazon Lookout for Equipment

Creating a dataset from multiple .csv files

If you've uploaded multiple .csv files, with each sensor having its own .csv file, you would use the following schema to create a dataset from those files.

{ "Components": [ { "ComponentName": "Sensor1", "Columns": [ { "Name": "Timestamp", "Type": "DATETIME" }, { "Name": "Sensor1", "Type": "DOUBLE" } ] }, { "ComponentName": "Sensor2", "Columns": [ { "Name": "Timestamp", "Type": "DATETIME" }, { "Name": "Sensor2", "Type": "DOUBLE" } ] }, { "ComponentName": "Sensor3", "Columns": [ { "Name": "Timestamp", "Type": "DATETIME" }, { "Name": "Sensor3", "Type": "DOUBLE" } ] }, { "ComponentName": "Sensor4", "Columns": [ { "Name": "Timestamp", "Type": "DATETIME" }, { "Name": "Sensor4", "Type": "DOUBLE" } ] } ] }

In the preceding schema, Components refers to a collection of identifiers for the .csv files of your sensors. The ComponentName is the portion of a prefix of an Amazon S3 object key that identifies a .csv file. The following examples show you how the values specified for ComponentName access the .csv files you've stored in your Amazon S3 buckets:

  • "ComponentName: "Sensor1" accesses s3://DOC-EXAMPLE-BUCKET/AssetName/Sensor1/Sensor1.csv

  • "ComponentName: "Sensor2" accesses s3://DOC-EXAMPLE-BUCKET/AssetName/Sensor2/Sensor2.csv

  • "ComponentName: "Sensor3" accesses s3://DOC-EXAMPLE-BUCKET/AssetName/Sensor3/Sensor3.csv

  • "ComponentName: "Sensor4" accesses s3://DOC-EXAMPLE-BUCKET/AssetName/Sensor4/Sensor4.csv

You define a Columns object for each ComponentName that you define in the schema. The Name fields in the Columns object must match the columns in your .csv files.

Within each Columns object, the Name fields that reference the columns containing the timestamp data must have the Type field specified as DATETIME. The Name fields that reference your sensor data must have a Type of DOUBLE.

You can use a schema to create a dataset for your .csv files in the Amazon Lookout for Equipment console, but we recommend using the API. You can use the following example code with the AWS SDK for Python (Boto3) to create a dataset.

import boto3 import json import pprint from botocore.config import Config ​ ​ config = Config( region_name = 'Region' # Choose a valid AWS Region ) ​ lookoutequipment = boto3.client(service_name="lookoutequipment", config=config) ​ dataset_schema = { "Components": [ { "ComponentName": "Sensor1", "Columns": [ { "Name": "Timestamp", "Type": "DATETIME" }, { "Name": "Sensor1", "Type": "DOUBLE" } ] }, { "ComponentName": "Sensor2", "Columns": [ { "Name": "Timestamp", "Type": "DATETIME" }, { "Name": "Sensor2", "Type": "DOUBLE" } ] }, { "ComponentName": "Sensor3", "Columns": [ { "Name": "Timestamp", "Type": "DATETIME" }, { "Name": "Sensor3", "Type": "DOUBLE" } ] }, { "ComponentName": "Sensor4", "Columns": [ { "Name": "Timestamp", "Type": "DATETIME" }, { "Name": "Sensor4", "Type": "DOUBLE" } ] } ] } ​ dataset_name = "dataset-name" data_schema = { 'InlineDataSchema': json.dumps(dataset_schema), } ​ create_dataset_response = lookoutequipment.create_dataset(DatasetName=dataset_name, DatasetSchema=data_schema) ​ pp = pprint.PrettyPrinter(depth=4) pp.pprint(create_dataset_response)

Next step

Ingesting a dataset