AWS IoT Analytics
AWS IoT Analytics User Guide

What Is AWS IoT Analytics?

AWS IoT Analytics allows you to collect large amounts of device data, process messages, and store them. You can then query the data and run sophisticated analytics on it. AWS IoT Analytics enables advanced data exploration through integration with Jupyter Notebooks and data visualization through integration with Amazon QuickSight.

Traditional analytics and business intelligence tools are designed to process structured data. IoT data often comes from devices that record noisy processes (such as temperature, motion, or sound). As a result the data from these devices can have significant gaps, corrupted messages, and false readings that must be cleaned up before analysis can occur. Also, IoT data is often only meaningful in the context of other data from external sources.

AWS IoT Analytics automates the steps required to analyze data from IoT devices. AWS IoT Analytics filters, transforms, and enriches IoT data before storing it in a time-series data store for analysis. You can set up the service to collect only the data you need from your devices, apply mathematical transforms to process the data, and enrich the data with device-specific metadata such as device type and location before storing it. Then, you can analyze your data by running queries using the built-in SQL query engine, or perform more complex analytics and machine learning inference. AWS IoT Analytics includes pre-built models for common IoT use cases so you can answer questions like which devices are about to fail or which customers are at risk of abandoning their wearable devices.

Why Use AWS IoT Analytics?

Benefits

Run Queries on IoT Data

With AWS IoT Analytics, you can run simple, ad-hoc queries using the built-in AWS IoT Analytics SQL query engine. The service allows you to use standard SQL queries to extract data from the data store to answer questions like the average distance traveled for a fleet of connected vehicles or how many doors are locked after 7 P.M. in a smart building. These queries can be re-used even if connected devices, fleet size, and analytic requirements change.

Run Time-Series Analytics

AWS IoT Analytics also supports time-series analyses so you can analyze the performance of devices over time and understand how and where they are being used, continuously monitor device data to predict maintenance issues, and monitor sensors to predict and react to environmental conditions.

Data Storage Optimized for IoT

AWS IoT Analytics stores the processed device data in a time-series data store that is optimized to deliver fast response times on IoT queries that typically include time as a criteria. The raw data is also automatically stored for later processing or to reprocess it for another use case.

Prepares Your IoT Data for Analysis

AWS IoT Analytics includes data preparation techniques that allow you to prepare and process your data for analysis. AWS IoT Analytics is integrated with AWS IoT Core so it is easy to ingest device data directly from connected devices. It can clean false readings, fill gaps in the data, and perform mathematical transformations of message data. As the data is ingested, AWS IoT Analytics can process it using conditional statements, filter data to collect just the data you want to analyze, and enrich it with information from the AWS IoT Registry. You can also use AWS Lambda functions to enrich your device data from external sources like the Weather Service, HERE Maps, Salesforce, or Amazon DynamoDB.

Tools for Machine Learning

AWS IoT Analytics allows you to apply machine learning to your IoT data with hosted Jupyter Notebooks. You can directly connect your IoT data to the notebook and build, train, and execute models right from the AWS IoT Analytics console without having to manage any of the underlying infrastructure. Using AWS IoT Analytics, you can apply machine learning algorithms to your device data to produce a health score for each device in your fleet. After you author a notebook, you can containerize it and execute it on a schedule you specify (Automating Your Workflow).

Use Cases

Smart Agriculture

AWS IoT Analytics can enrich IoT device data with contextual metadata using AWS IoT Registry data or public data sources so that your analytis factors in time, location, temperature, altitude, and other environmental conditions. With that analysis, you can write models that output recommended actions for your devices to take in the field. For example, to determine when to water, irrigation systems might enrich humidity sensor data with data on rainfall, allowing more efficient water usage.

Predictive Maintenance

AWS IoT Analytics provides templates to build predictive maintenance models and apply them to your devices. For example, you can use AWS IoT Analytics to predict when heating and cooling systems will fail on connected cargo vehicles so the vehicles can be rerouted to prevent shipment damage. Or, an auto manufacturer can detect which of its customers have worn brake pads and alert them to seek maintenance for their vehicles.

Proactive Replenishing of Supplies

AWS IoT Analytics lets you build IoT applications that can monitor inventories in real time. For example, a food and drink company can analyze data from food vending machines and proactively reorder merchandise whenever the supply is running low.

Process Efficiency Scoring

With AWS IoT Analytics, you can build applications that constantly monitor the efficiency of different processes and take action to improve the process. For example, a mining company can increase the efficiency of its ore trucks by maximizing the load for each trip. With AWS IoT Analytics, the company can identify the most efficient load for a location or truck over time, and then compare any deviations from the target load in real time, and better plan loading guidelines to improve efficiency.

Key Features

Collect
  • Integrated with AWS IoT Core – AWS IoT Analytics is fully integrated with AWS IoT Core so it can receive messages from connected devices as they stream in.

  • Use a batch API to add data from any source – AWS IoT Analytics can receive data from any source through HTTP. That means that any device or service that is connected to the internet can send data to AWS IoT Analytics. (BatchPutMessage)

  • Collect only the data you want to store and analyze – You can use the AWS IoT Analytics console to configure AWS IoT Analytics to receive messages from devices through MQTT topic filters in various formats and frequencies. AWS IoT Analytics validates that the data is within specific parameters you define and creates channels. Then, the service routes the channels to appropriate pipelines for message processing, transformation, and enrichment.

Process
  • Cleanse and filter – AWS IoT Analytics lets you define AWS Lambda functions that are triggered when AWS IoT Analytics detects missing data, so you can run code to estimate and fill gaps. You can also define max/min filters and percentile thresholds to remove outliers in your data.

  • Transform – AWS IoT Analytics can transform messages using mathematical or conditional logic you define, so you can perform common calculations like Celsius into Fahrenheit conversion.

  • Enrich – AWS IoT Analytics can enrich data with external data sources such as a weather forecast, and then route the data to the AWS IoT Analytics data store.

Store
  • Time-Series Data Store - AWS IoT Analytics stores the device data in an IoT optimized time-series data store for analysis. You can manage access permissions, implement data retention policies and export your data to external access points.

  • Store Processed and Raw Data - AWS IoT Analytics stores the processed data. It also automatically stores the raw ingested data so you can process it at a later time.

Analyze
  • Run Ad-Hoc SQL Queries - AWS IoT Analytics provides a SQL query engine so you can run ad-hoc queries and get results quickly. For example, you might want to run a quick query to find the number of active users for each device in your fleet.

  • Time-Series Analysis - AWS IoT Analytics supports time-series analysis so you can analyze the performance of devices over time and understand how and where they are being used, continuously monitor device data to predict maintenance issues, and monitor sensors to predict and react to environmental conditions.

  • Hosted Notebooks for Sophisticated Analytics and Machine Learning - AWS IoT Analytics includes support for hosted notebooks in Jupyter Notebooks for statistical analysis and machine learning. The service includes a set of notebook templates that contain AWS-authored machine learning models and visualizations to help you get started with IoT use cases related to device failure profiling, forecasting events such as low usage that might signal the customer will abandon the product, or segmenting devices by customer usage levels (for example heavy users, weekend users) or device health. After you author a notebook, you can containerize it and execute it on a schedule you specify (Automating Your Workflow).

  • Prediction - You can do statistical classification through a method called logistic regression. You can also use Long-Short-Term Memory (LSTM), which is a powerful neural network technique for predicting the output or state of a process that varies over time. The pre-built notebook templates also support the K-means clustering algorithm for device segmentation, which clusters your devices into cohorts of like devices. These templates are typically used to profile device health and device state such as HVAC units in a chocolate factory or wear and tear of blades on a wind turbine. Again, these notebook templates can be containerized and executed on a schedule.

Visualize
  • QuickSight Integration - AWS IoT Analytics provides a connector to Amazon QuickSight so you can visualize your data sets in a QuickSight dashboard. You can also visualize the results or your ad-hoc analysis in the embedded Jupyter Notebooks in the AWS IoT Analytics' console.

How to Use AWS IoT Analytics

AWS IoT Analytics Components and Concepts

Channel

A channel collects data from an MQTT topic and archives the raw, unprocessed messages before publishing the data to a pipeline. You can also send messages to a channel directly with the BatchPutMessage command.

Pipeline

A pipeline consumes messages from one or more channels and allows you to process the messages before storing them in a data store. The processing steps, called activities (Pipeline Activities), perform transformations on your messages such as removing, renaming or adding message attributes, filtering messages based on attribute values, invoking your lambda functions on messages for advanced processing or performing mathematical transformations to normalize device data.

Data store

Pipelines store their processed messages in a data store. A data store is not a database, but it is a scalable and queryable repository of your messages. You can have multiple data stores for messages coming from different devices or locations, or filtered by message attributes depending on your pipeline configuration and requirements.

Data set

You retrieve data from a data store by creating a data set. IoT Analytics allows you to create a SQL data set or a container data set.

After you have a data set, you can explore and gain insights into your data through integration with Amazon QuickSight. Or you can perform more advanced analytical functions through integration with Jupyter Notebooks. Jupyter Notebooks provide powerful data science tools that can perform machine learning and a range of statistical analyses. For more information, see Notebook Templates.

SQL data set

A SQL data set is similar to a materialized view from a SQL database. In fact, you create a SQL data set by applying a SQL action. SQL data sets can be generated automatically on a recurring schedule by specifying a trigger.

Container data set

A container data set allows you to automatically run your analysis tools and generate results (Automating Your Workflow). It brings together a SQL data set as input, a Docker container with your analysis tools and needed library files, input and output variables, and an optional schedule trigger. The input and output variables tell the executable image where to get the data and store the results. The trigger can run your analysis when a SQL data set finishes creating its content or according to a time schedule expression. A container data set will automatically run, generate and then save the results of the analysis tools.

Trigger

You can automatically create a data set by specifying a trigger. The trigger can be a time interval (for example, create this data set every two hours) or when another data set's contents have been created (for example, create this data set when "myOtherDataset" finishes creating its content). Or, you can generate data set content manually by calling CreateDatasetContent.

Docker container

According to www.docker.com, "a container is a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries, settings." You can create your own Docker container to package your analysis tools or use options provided by Amazon SageMaker. You can store a container in an Amazon ECR registry that you specify so it will be available to install on your desired platform. Containerizing A Notebook describes how to containerize a notebook. Docker containers are capable of running your custom analytical code prepared with Matlab, Octave, Wise.io, SPSS, R, Fortran, Python, Scala, Java, C++ and so on.

Delta windows

Delta windows are a series of user-defined, non-overlapping and contiguous time intervals. Delta windows allow you to create data set contents with, and perform analysis on, new data that has arrived in the data store since the last analysis. You create a delta window by setting the deltaTime in the filters portion of a queryAction of a data set (CreateDataset). Basically, this allows you to filter messages that have arrived during a specific time window, so the data contained in messages from previous time windows doesn't get counted twice.

Accessing AWS IoT Analytics

As part of AWS IoT, AWS IoT Analytics provides the following interfaces to interact with your devices and the data they generate:

AWS Command Line Interface (AWS CLI)

Run commands for AWS IoT Analytics on Windows, OS X, and Linux. These commands allow you to create and manage things, certificates, rules, and policies. To get started, see the AWS Command Line Interface User Guide. For more information about the commands for AWS IoT, see iot in the AWS Command Line Interface Reference.

Important

Use the aws iotanalytics command to interact with AWS IoT Analytics using the CLI. Use the aws iot command to interact with other parts of the IoT system using the CLI.

AWS IoT API

Build your IoT applications using HTTP or HTTPS requests. These API actions allow you to create and manage things, certificates, rules, and policies. For more information about the API actions for AWS IoT, see Actions in the AWS IoT API Reference.

AWS SDKs

Build your AWS IoT Analytics applications using language-specific APIs. These SDKs wrap the HTTP/HTTPS API and allow you to program in any of the supported languages. For more information, see AWS SDKs and Tools.

AWS IoT Device SDKs

Build applications that run on your devices that send messages to AWS IoT Analytics. For more information see AWS IoT SDKs.

AWS IoT Analytics Message Payload Restrictions

The field names of message payloads (data) that you send to AWS IoT Analytics:

  • Must contain only alphanumeric characters and undescores (_); no other special characters are allowed.

  • Must begin with an alphabetic character or single underscore (_).

  • Cannot contain hyphens (-).

  • In regular expression terms: "^[A-Za-z_]([A-Za-z0-9]*|[A-Za-z0-9][A-Za-z0-9_]*)$".

  • Cannot be greater than 255 characters.

  • Are case-insensitive. (Fields named "foo" and "FOO" in the same payload will be considered duplicates.)

For example, {"temp_01": 29} or {"_temp_01": 29} are valid, but {"temp-01": 29}, {"01_temp": 29} or {"__temp_01": 29} are invalid in message payloads.

AWS IoT Analytics Service Limits

API Limit Description Adjustable?

SampleChannelData

1 transaction per second per channel

yes

CreateDatasetContent

1 transaction per second per data set

yes

RunPipelineActivity

1 transaction per second

yes

other management APIs

20 transactions per second

yes

BatchPutMessage

100,000 messages per second per channel; 100 messages per batch; 128Kb per message

yes; yes; no

Resource Limit Description Adjustable?

channel

50 per account

yes

data store

25 per account

yes

pipeline

100 per account

yes

activities

25 per pipeline

no

data set

100 per account

yes

minimum data set refresh interval

1 hour

yes

concurrent data set content generation

2 data sets simultaneously

no

About Amazon Web Services

Amazon Web Services (AWS) is a collection of digital infrastructure services that developers can use when developing their applications. The services include computing, storage, database, and application synchronization (messaging and queuing).

AWS uses a pay-as-you-go service model. You are charged only for the services that you—or your applications—use. Also, to make AWS useful as a platform for prototyping and experimentation, AWS offers a free usage tier, in which services are free below a certain level of usage. For more information about AWS costs and the free usage tier see Test-Driving AWS in the Free Usage Tier.

If you don't have an AWS account, go to aws.amazon.com and choose Create an AWS Account.