Data Modeling - Amazon Timestream

Data Modeling

  • Amazon Timestream is designed to collect, store, and analyze time series data from applications and devices emitting a sequence of data with a timestamp that changes over time. For optimal performance, the data being sent to Timestream must have temporal characteristics and time must be a quintessential component of the data analysis requirements.

  • When deciding which attributes of your time series data map to measures and dimensions, consider the following:

    • Represent all metadata attributes as dimensions. Examples are the region and availability zone of an EC2 instance, the stock exchange name, the device ID and make of an IoT sensor.

    • Represent all measurements as measures. Examples are the CPU and memory utilization of an EC2 instance, the stock price, and the temperature/humidity reading of an IoT sensor.

    • Low cardinality attributes that are not measurements and do not contain metadata can be represented as dimensions. Examples include measurement quality (high, low, medium) or stock purchasing recommendation (buy, sell, hold). In this case, however, your time series data will consist of 3 time series that will need to be combined together using the UNION clause while running queries.

  • When deciding whether to create a single table or multiple tables to store data consider the following:

    • Consider the access control requirements of your application. Data that requires to be encrypted using different AWS KMS keys must be placed in separate databases.

    • Consider the data retention requirements of your application. Data that requires different retention policies must be place in different tables.

    • Unrelated data must be stored in separate tables.

    • Data that is queried together must be stored in the same table.

  • Keep the dimension names shorter to save on data ingestion and storage costs

  • Store Boolean measure values (a 0 or 1 state) using the Boolean data type, rather than the bigint data type. This optimizes your Timestream application for cost.

  • When ingesting data into Timestream, note that a single measure name can only be associated with one type of measure value. For example, if you have a measure value named cpu_user that is ingested into Timestream as a double, the measure value must remain a double, and cannot be converted to another data type.

  • Whenever possible, consider modeling your Timestream data using many measures per table whenever possible. This allows for better pruning.