Incident Response
Being prepared for incident response in IoT requires planning on how you will deal with two types of incidents in your IoT workload. The first incident is an attack against an individual IoT device in an attempt to disrupt the performance or impact the device’s behavior. The second incident is a larger scale IoT event, such as network outages and DDoS attack. In both scenarios, the architecture of your IoT application plays a large role in determining how quickly you will be able to diagnose incidents, correlate the data across the incident, and then subsequently apply runbooks to the affected devices in an automated, reliable fashion.
For IoT applications, follow the following best practices for incident responses:
-
IoT devices are organized in different groups based on device attributes such as location and hardware version.
-
IoT devices are searchable by dynamic attributes, such as connectivity status, firmware version, application status, and device health.
-
OTA updates can be staged for devices and deployed over a period of time. Deployment rollouts are monitored and can be automatically aborted if devices fail to maintain the appropriate KPIs.
-
Any update process is resilient to errors, and devices can recover and roll back from a failed software update.
-
Detailed logging, metrics, and device telemetry are available that contain contextual information about how a device is currently performing and has performed over a period of time.
-
Fleet-wide metrics monitor the overall health of your fleet and alert when operational KPIs are not met for a period of time.
-
Any individual device that deviates from expected behavior can be quarantined, inspected, and analyzed for potential compromise of the firmware and applications.
IOTSEC 11: How do you prepare to respond to an incident that impacts a single device or a fleet of devices? |
---|
Implement a strategy in which your InfoSec team can quickly identify the devices that need remediation. Ensure that the InfoSec team has runbooks that consider firmware versioning and patching for device updates. Create automated processes that proactively apply security patches to vulnerable devices as they come online.
At a minimum, your security team should be able to detect an incident on a specific device based on the device logs and current device behavior. After an incident is identified, the next phase is to quarantine the application. To implement this with AWS IoT services, you can use AWS IoT Things Groups with more restrictive IoT policies along with enabling custom group logging for those devices. This allows you to only enable features that relate to troubleshooting, as well as gather more data to understand root cause and remediation. Lastly, after an incident has been resolved, you must be able to deploy a firmware update to the device to return it to a known state.