Training a model - Amazon Lookout for Equipment

Training a model

You create a model with the dataset that you've ingested. You can train your model on up to 300 sensors in your dataset. Because failures or other serious issues with equipment are rare, Amazon Lookout for Equipment uses your dataset to establish a normal mode of behavior for your asset. If you have data showing that the asset has failed or malfunctioned, you can label those failures in the dataset. Labeling the failures or the events that required equipment maintenance can improve the accuracy of the model.

Before you train a model, you must create a dataset and ingest it. For information about creating a dataset, see Creating a dataset in Amazon Lookout for Equipment. For information about ingesting a dataset, see Ingesting a dataset.

When you create a model, you can use the following work flow to help improve your model's accuracy:

  1. Choose the sensors that your model uses.

  2. Use labels for asset failures in the dataset if you have them available.

  3. Set the time range for training the model and the time range for evaluating how well it performed. For more information, see Evaluating the output and Improving your results.

  4. Choose a sampling rate from the original dataset. For a dataset that has sensors taking readings every minute, you can use readings that have been taken every hour to train your model. For more information, see Evaluating the output and Improving your results.

  5. Evaluate how the model performed.

  6. If you want to improve the performance of the model, repeat this procedure and choose different sensors, labels, sampling rates, or time ranges.

You might believe that some sensors give more insight into the performance of your asset than others. You can choose which sensors are most useful in training your model.

Amazon Lookout for Equipment is designed to establish a baseline for normal behavior of your assets and detect when your equipment is behaving abnormally. You can improve the model's ability to detect abnormal behavior by using label data that highlights when the equipment wasn't functioning properly.

Within your dataset, you can specify a time range for training your model and a time range for testing your model's performance. You can evaluate your model's performance only if you provide these time ranges.

You might have a lot of data in your dataset. Sampling from that dataset might help you avoid overtraining your model.

The following procedures show you how to create a model.

To train a model (console)

  1. Sign in to AWS Management Console and open the Amazon Lookout for Equipment console at Amazon Lookout for Equipment console.

  2. Choose a dataset that you've ingested.

  3. Choose Create model.

  4. For Model name, choose a name for your model.

  5. For Component name, under Fields, choose the sensors that you want to use to train your model.

  6. To improve the accuracy of the model, you have the option to do the following.

    • For S3 location under Historical maintenance label event (labels) data - optional, provide the Amazon S3 location of the label data. For IAM role, you must specify an IAM role that provides Amazon Lookout for Equipment access to your data in S3.

    • For Training and evaluation setting - optional, provide the following:

      • Training data time range - The time range for training the model on your data.

      • Evaluation data time range - The time range for testing the model's performance on your data.

    • For Time series sample rate, specify the rate that you want to downsample the data from your dataset.

The following example code uses the AWS SDK for Python (Boto3) to train a model.

import boto3 import json import pprint import time from datetime import datetime from botocore.config import Config ​ ​ config = Config(region_name = 'Region') ​ lookoutequipment = boto3.client(service_name="lookoutequipment", config=config) ​ MODEL_NAME = 'model-name' # You can choose a sampling rate for your data. The valid values are "PT1S", "PT5S", "PT10S", "PT15S", "PT30S", "PT1M", "PT5M", "PT10M", "PT15M", "PT30M", "PT1H". S - second, M - minute, H - hour TARGET_SAMPLING_RATE = 'sampling-rate' # If you have label data, specify the following variables LABEL_DATA_SOURCE_BUCKET = 'label-data-source-bucket' LABEL_DATA_SOURCE_PREFIX = 'label-data-source-prefix/' # This must end with "/" if you provide a prefix ​ # The following are example training and evaluation start times. datetime(2018, 8, 13, 0, 0, 0) generates 2018-08-13 00:00:00 ​ TRAINING_DATA_START_TIME = datetime(2016, 11, 1, 0, 0, 0) TRAINING_DATA_END_TIME = datetime(2017, 12, 31, 0, 0, 0) ​ EVALUATION_DATA_START_TIME = datetime(2018, 1, 1, 0, 0, 0) EVALUATION_DATA_END_TIME = datetime(2018, 8, 13, 0, 0, 0) ​ ######################################################## # construct request for create_model ######################################################## ​ model_name = MODEL_NAME DATA_SCHEMA_FOR_MODEL = None # You can use a schema similar to dataset here. The sensors used here should be subset of what is present in dataset ​ create_model_request = { 'ModelName': model_name, 'DatasetName': DATASET_NAME, } ​ if DATA_SCHEMA_FOR_MODEL is not None: data_schema_for_model = { 'InlineDataSchema': DATA_SCHEMA_FOR_MODEL, } create_model_request['DatasetSchema'] = data_schema_for_model ​ if TARGET_SAMPLING_RATE is not None: data_preprocessing_config = { 'TargetSamplingRate': TARGET_SAMPLING_RATE } create_model_request['DataPreProcessingConfiguration'] = data_preprocessing_config ​ if LABEL_DATA_SOURCE_BUCKET is not None: labels_input_config = dict() labels_input_config['S3InputConfiguration'] = dict( [ ('Bucket', LABEL_DATA_SOURCE_BUCKET), ('Prefix', LABEL_DATA_SOURCE_PREFIX) ] ) create_model_request['LabelsInputConfiguration'] = labels_input_config # We need to set role_arn to access label data create_model_request['RoleArn'] = ROLE_ARN ​ if TRAINING_DATA_START_TIME is not None or TRAINING_DATA_END_TIME is not None: create_model_request['TrainingDataStartTime'] = TRAINING_DATA_START_TIME create_model_request['TrainingDataEndTime'] = TRAINING_DATA_END_TIME ​ if EVALUATION_DATA_START_TIME is not None or EVALUATION_DATA_END_TIME is not None: create_model_request['EvaluationDataStartTime'] = EVALUATION_DATA_START_TIME create_model_request['EvaluationDataEndTime'] = EVALUATION_DATA_END_TIME ​ ​ ######################################################## # Create_model ######################################################## create_model_response = lookoutequipment.create_model(**create_model_request) ​ ######################################################## # Wait until complete ######################################################## model_status = create_model_response['Status'] print("=====Polling Model Status=====\n") print("Model Status: " + model_status) while model_status == 'IN_PROGRESS': time.sleep(30) describe_model_response = lookoutequipment.describe_model(ModelName=model_name) model_status = describe_model_response['Status'] print("Model Status: " + model_status) print("\n=====End of Polling Model Status=====")