Monitoring - IoT Lens

Monitoring

IoT applications can be simulated using production devices set up as test devices (with a specific test MQTT namespace), or by using simulated devices. All incoming data captured using the IoT rules engine is processed using the same workflows that are used for production.

The frequency of end-to-end simulations must be driven by your specific release cycle or device adoption. You should test failure pathways (code that is only run during a failure) to ensure that the solution is resilient to errors. You should also continually run device canaries against your production and pre-production accounts. The device canaries act as key indicators of the system performance during simulation tests. Outputs of the tests should be documented and remediation plans should be drafted. User acceptance tests should be performed.

There are several key types of performance monitoring related to IoT deployments including device, cloud performance, and storage/analytics. Create appropriate performance metrics using data collected from logs with telemetry and command data. Start with basic performance tracking and build on those metrics as your business core competencies expand.

Use CloudWatch Logs metric filters to transform your IoT application standard output into custom metrics through regex (regular expressions) pattern matching. Create CloudWatch alarms based on your application’s custom metrics to gain quick insight into your IoT application’s behavior.

Set up fine-grained logs to track specific thing groups. During IoT solution development, enable DEBUG logging for a clear understanding of the progress of events about each IoT message as it passes from your devices through the message broker and the rules engine. In production, change the logging to ERROR and WARN.

In addition to cloud instrumentation, you must run instrumentation on devices prior to deployment to ensure that the devices make the most efficient use of their local resources, and that firmware code does not lead to unwanted scenarios such as memory leaks. Deploy code that is highly optimized for constrained devices and monitor the health of your devices using device diagnostic messages published to AWS IoT from your embedded application.

IoT connectivity

IOTPERF 02. How do you ensure your IoT device’s performance and have mechanisms to measure it?

Before firmware is developed to communicate to the cloud, implement a secure, scalable connectivity platform to support the long-term growth of your devices over time. Based on the anticipated volume of devices, an IoT platform must be able to scale the communication workflows between devices and the cloud, whether that is simple ingestion of telemetry or command and response communication between devices.

You can build your IoT application using AWS services such as Amazon EC2, but you take on the undifferentiated heavy lifting for building unique value into your IoT offering. Therefore, AWS recommends that you use AWS IoT Core for your IoT platform.

AWS IoT Core supports HTTP, WebSockets, LoRaWAN, and MQTT. MQTT is a lightweight communication protocol designed to tolerate intermittent connections, minimize the code footprint on devices, and reduce network bandwidth requirements.

Defining and analyzing key performance metrics for your hardware devices helps you to understand the performance characteristics of the devices and how they relate to the application performances. Capturing device logs and device metrics are key to measuring, evaluating, and optimizing the performance of your IoT devices.

Best practice IOTPERF_2.1Capture device diagnostic data into the IoT platform

As the number of devices increases, watch out for performance bottlenecks when all the devices connect to the cloud-side. These devices could generate a large aggregate amount of data, and obtaining device diagnostics is critical for ensuring the understanding of the area of improvement. Using different types of device diagnostics, the immediate health of a device and those in proximity to that device can be obtained.

Recommendation IOTPERF_2.1.1Deploy an agent to the device to start capturing the relevant diagnostic data

  • For microprocessor-based applications, consider deploying the AWS Systems Manager Agent (SSM Agent) so that you can continuously monitor your device’s performance metrics.

  • There are sample agents provided to use on the device-side (device or gateway). If device-side diagnostic metrics cannot be obtained, then it is possible to obtain limited cloud-side metrics. Below are some sample metrics:

    • TCP connections

      • TCP_connections

      • Connections

      • Local_interface

    • Listening TCP/UDP ports

      • Listening_TCP/UDP_ports

      • Interface

    • Network statistics

      • Bytes_in/out

      • Packets_in/out

      • Network_statistics

  • Use custom metrics to define and monitor metrics that are unique to your fleet or use case.

Best practice IOTPERF_2.2 – Measure, evaluate, and optimize firmware updates

Firmware updates are critical to ensure that the IoT devices remain performant over time, but might not always have the expected impact. As you deploy firmware updates to your devices, monitoring your KPIs will ensure that the updates do not have any unintended impacts to the performance of your hardware devices or to your IoT applications.

Recommendation IOTPERF_2.2.1Implement canary deployment for device firmware

Best practice IOTPERF_2.3 – Limit the number of messages that devices receive and filter out

Firmware updates are critical, and filtering messages at the edge might subject the devices to unnecessary load. This result could be counterproductive from a power and memory consumption perspective. Sending only messages that the device makes use of reduces the load on the resources and ensures better performances.

Recommendation IOTPERF_2.3.1Structure the topics using the scope/verb approach.

In this way, you can subscribe to all messages for a given scope (for example, a device) or refine the subscription on a given scope and verb.

Resources

Related documents

IOTPERF 03. How do you ensure that your application operates within the limits set by the AWS service?

Databases

You will have multiple databases in your IoT application, each selected for attributes such as the write frequency of data to the database, the read frequency of data from the database, and how the data is structured and queried. There are other criteria to consider when selecting a database offering:

  • Volume of data and retention period.

  • Intrinsic data organization and structure.

  • Users and applications consuming the data (either raw or processed) and their geographical location and dispersion.

  • Advanced analytics needs, such as machine learning or real-time visualizations.

  • Data synchronization across other teams, organizations, and business units.

  • Security of the data at the row, table, and database levels.

  • Interactions with other related data-driven events such as enterprise applications, drill-through dashboards, or systems of interaction.

AWS has several database offerings that support IoT solutions. For structured data, you should use Amazon Aurora, a highly scalable relational interface to organizational data. For semi-structured data that requires low latency for queries and will be used by multiple consumers, use Amazon DynamoDB, a fully managed, multi-Region, multi-master database that provides consistent single-digit millisecond latency, and offers built-in security, backup and restore, and in-memory caching.

For storing raw, unformatted event data, use AWS IoT Analytics. AWS IoT Analytics filters, transforms, and enriches IoT data before storing it in a time series data store for analysis. Use Amazon SageMaker to build, train, and deploy machine learning models, based off of your IoT data, in the cloud and on the edge using AWS IoT services, such as machine learning inference in AWS IoT Greengrass. Consider storing your raw formatted time series data in a data warehouse solution such as Amazon Redshift. Unformatted data can be imported to Amazon Redshift using Amazon S3 and Amazon Data Firehose. By archiving unformatted data in a scalable, managed data storage solution, you can begin to gain business insights, explore your data, and identify trends and patterns over time.

In addition to storing and leveraging the historical trends of your IoT data, you must have a system that stores the current state of the device and provides the ability to query against the current state of all of your devices. This supports internal analytics and customer facing views into your IoT data.

The AWS IoT Device Shadow service is an effective mechanism to store a virtual representation of your device in the cloud. AWS IoT Device Shadow service is best suited for managing the current state of each device. In addition, for internal teams that need to query against the shadow for operational needs, leverage the managed capabilities of fleet indexing, which provides a searchable index incorporating your IoT registry and shadow metadata. If there is a need to provide index-based searching or filtering capability to a large number of external users, such as for a consumer application, dynamically archive the shadow state using a combination of the IoT rules engine, Firehose, and Amazon OpenSearch Service to store your data in a format that allows fine grained query access for external users.

IOTPERF 04. How do you bootstrap devices to use the endpoint with least latency?

In IoT, bootstrapping refers to the process of assigning identity to the device and enabling communications with an endpoint. Devices in a global fleet should be provisioned in the regional data center nearest to its physical location for minimum latency. Each device should get its regional endpoint and certificate no later than the time of bootstrapping. Each device is provisioned nearest to its physical location and gets the certificate and IoT endpoint at the time of bootstrapping. This ensures best possible latency for bidirectional communications.

Compute

IoT applications lend themselves to a high flow of ingestion that requires continuous processing over the stream of messages. Therefore, an architecture must choose compute services that support the steady enrichment of stream processing and the execution of business applications during and prior to data storage.

The most common compute service used in IoT is AWS Lambda, which allows actions to be invoked when telemetry data reaches AWS IoT Core or AWS IoT Greengrass. AWS Lambda can be used at different points throughout IoT. The location where you elect to launch your business logic with AWS Lambda is influenced by the time that you want to process a specific data event.

Amazon EC2 instances can also be used for a variety of IoT use cases. They can be used for managed relational databases systems and for a variety of applications, such as web, reporting, or hosting existing on-premises solutions.

Analytics

The primary business case for implementing IoT solutions is to respond more quickly to how devices are performing and being used in the field. By acting directly on incoming telemetry, businesses can make more informed decisions about which new products or features to prioritize, or how to more efficiently operate workflows within their organization. Analytics services must be selected in such a way that gives you varying views on your data based on the type of analysis you are performing. AWS provides several services that align with different analytics workflows including time-series analytics, real-time metrics, archival, and data lake use cases.

With IoT data, your application can generate time-series analytics in addition to the steaming data messages. You can calculate metrics over time windows and then stream values to other AWS services.

In addition, IoT applications that use AWS IoT Analytics can implement a managed AWS Data Pipeline consisting of data transformation, enrichment, and filtering before storing data in a time series data store. Additionally, with AWS IoT Analytics, visualizations and analytics can be performed natively using QuickSight and Jupyter Notebooks.

IOTPERF 05. How do you ensure that your applications operate within the limits set by the AWS service?

Being aware of the soft and hard quotas of the AWS service and continuously monitoring the key performance indicators enables you to anticipate when actions must be taken to request increases in the service quotas and re-evaluate your architecture. Ensuring that your application operates within the quotas of the services that you are building on is key to providing the optimal performance to your users.

Best practice IOTPERF_5.1 – Monitor and manage your IoT service quotas using available tools and metrics

Monitoring enables you to be aware of which service quotas you might be reaching, allowing you to change your application to cope with the hard quotas or to request the increase of a soft quota with sufficient lead time.

Recommendation IOTPERF_5.1.1Familiarize yourself with the service quotas of the different IoT services