Ingesting a dataset - Amazon Lookout for Equipment

Ingesting a dataset

Amazon Lookout for Equipment requires you to create a dataset from your .csv files containing your sensor data and the schema that you've provided for those files. For more information about using a schema to create a dataset, see Creating a dataset in Amazon Lookout for Equipment.

To convert the dataset into a format that is suitable for analysis, you must ingest it. Ingesting a dataset imports it into Amazon Lookout for Equipment and lets you train a machine learning model on it. To ingest your data, you start the data ingestion step and specify the Amazon Simple Storage Service (Amazon S3) location that contains your sensor data.

You can use one the following procedures to ingest a dataset.

To use the console to ingest a dataset, choose the Amazon S3 prefix that selects all of the sensor data for the asset.

To ingest a dataset (console)

  1. Sign in to AWS Management Console and open the Amazon Lookout for Equipment console at Amazon Lookout for Equipment console.

  2. Choose a dataset that you've created.

  3. Choose Ingest data.

  4. For S3 location, provide the Amazon S3 prefix for all of the sensor data for the asset.

    1. If you created the dataset from multiple .csv files, with each file containing data from one sensor, you would use the prefix: s3://DOC-EXAMPLE-BUCKET1/AssetName/.

    2. If you created the dataset from a single .csv file for your asset, you would use the prefix: s3://DOC-EXAMPLE-BUCKET1/FacilityName/.

  5. For IAM role, choose a role that provides permissions to access the .csv files you've stored in Amazon S3. If you don't have a role that provides permissions, choose Create a role.

  6. Choose Ingest.

Use the following AWS SDK for Python (Boto3) example code to ingest your dataset. You must have the modules installed from the code examples that showed you how to create a dataset to successfully use the following code.

import boto3 import time from botocore.config import Config config = Config( region_name = 'Region' #Choose a valid AWS Region ) lookoutequipment = boto3.client(service_name="lookoutequipment", config=config) INGESTION_DATA_SOURCE_BUCKET = 'DOC-EXAMPLE-BUCKET1' # If you're ingesting multiple .csv files of your sensor data, use the following Amazon S3 path: s3://DOC-EXAMPLE-BUCKET/AssetName/. If you're ingesting a single .csv file of your asset data, use the following Amazon S3 path: s3://DOC-EXAMPLE-BUCKET/FacilityName/. INGESTION_DATA_SOURCE_PREFIX = 'my_data/sensor_readings/' # The ROLE_ARN and DATASET_NAME values that are used in this script have been defined in the previous SDK for Python example code for creating a dataset. data_ingestion_role_arn = ROLE_ARN dataset_name = DATASET_NAME ingestion_input_config = dict() ingestion_input_config['S3InputConfiguration'] = dict( [ ('Bucket', INGESTION_DATA_SOURCE_BUCKET), ('Prefix', INGESTION_DATA_SOURCE_PREFIX) ] ) # Start data ingestion start_data_ingestion_job_response = lookoutequipment.start_data_ingestion_job( DatasetName=dataset_name, RoleArn=data_ingestion_role_arn, IngestionInputConfiguration=ingestion_input_config) data_ingestion_job_id = start_data_ingestion_job_response['JobId'] data_ingestion_status = start_data_ingestion_job_response['Status'] print(f'=====Data Ingestion job is started. Job ID: {data_ingestion_job_id}=====\n') # Wait until completes print("=====Polling Data Ingestion Status=====\n") print("Data Ingestion Status: " + data_ingestion_status) while data_ingestion_status == 'IN_PROGRESS': time.sleep(30) describe_data_ingestion_job_response = lookoutequipment.describe_data_ingestion_job(JobId=data_ingestion_job_id) data_ingestion_status = describe_data_ingestion_job_response['Status'] print("Data Ingestion Status: " + data_ingestion_status) print("\n=====End of Polling Data Ingestion Status=====")