Best Practice 20.1 – Use a data lake for raw telemetry data
A data lake brings different data sources together and provides a common management framework for browsing, viewing, and extracting the sources. An effective data lake enables IoT cost management by storing data in the right format for the right use case. With a data lake, storage and interaction characteristics can be aligned to a specific dataset format and required interfaces.
Recommendation 20.1.1 – Categorize telemetry types and map to storage capabilities
-
For each telemetry stream, identify key features of telemetry using the 4Vs of big data—velocity, volume, veracity, and variety.
-
Map each stream into the appropriate storage capability.
-
For example, a stream that sends an MQTT message with a JSON payload every second would be an ideal candidate for being batched, compressed then stored in Amazon S3.
-
For more: