Query - Amazon Timestream

Query

With Timestream, you can easily store and analyze metrics for DevOps, sensor data for IoT applications, and industrial telemetry data for equipment maintenance, as well as many other use cases. Timestream’s purpose-built, adaptive query engine allows you to access data across storage tiers using a single SQL statement. It transparently accesses and combines data across storage tiers without requiring you to specify the data location. You can use SQL to query data in Timestream to retrieve time series data from one or more tables. You can access the metadata information for databases and tables. Timestream’s SQL also supports built-in functions for time series analytics. You can refer to the Timestream Query Language Reference reference for additional details.

Timestream is designed to have a fully decoupled data ingestion, storage, and query architecture where each component can scale independent of other components, allowing it to offer virtually infinite scale for an application’s needs. This means that Timestream does not “tip over” when your applications send hundreds of terabytes of data per day or run millions of queries processing small or large amounts of data. As your data grows over time, Timestream’s query latency remains mostly unchanged. This is because Timestream’s query architecture can leverage massive amounts of parallelism to process larger data volumes and automatically scale to match query throughput needs of an application.

Data Model

Timestream supports two data models for queries – the flat model and the time series model.

Note

Data in Timestream is stored using the flat model and it is the default model for querying data. The time series model is a query-time concept and is used for time series analytics.

Flat Model

The flat model is Timestream’s default data model for queries. It represents time series data in a tabular format. The dimension names, time, measure names and measure values appear as columns. Each row in the table is an atomic data point corresponding to a measurement at a specific time within a time series.

The table below shows an illustrative example for how Timestream stores data representing the CPU utilization, memory utilization, and network activity of EC2 instances. In this case, the dimensions are the region, availability zone, virtual private cloud, and instance IDs of the EC2 instances. The measures are the CPU utilization, memory utilization, and the incoming network data for the EC2 instances. The columns region, az, vpc, and instance_id contain the dimension values. The column time contains the timestamp for each record. The column measure_name contains the names of the measures represented by cpu-utilization, memory_utilization, and network_bytes_in. The columns measure_value::double contains measurements emitted as doubles (e.g. CPU utilization and memory utilization). The column measure_value::bigint contains measurements emitted as integers e.g. the incoming network data.

time region az vpc instance_id measure_name measure_value::double measure_value::bigint

2019-12-04 19:00:00.000000000

us-east-1

us-east-1d

vpc-1a2b3c4d

i-1234567890abcdef0

cpu_utilization

35

null

2019-12-04 19:00:01.000000000

us-east-1

us-east-1d

vpc-1a2b3c4d

i-1234567890abcdef0

cpu_utilization

38.2

null

2019-12-04 19:00:02.000000000

us-east-1

us-east-1d

vpc-1a2b3c4d

i-1234567890abcdef0

cpu_utilization

45.3

null

2019-12-04 19:00:00.000000000

us-east-1

us-east-1d

vpc-1a2b3c4d

i-1234567890abcdef0

memory_utilization

54.9

null

2019-12-04 19:00:01.000000000

us-east-1

us-east-1d

vpc-1a2b3c4d

i-1234567890abcdef0

memory_utilization

42.6

null

2019-12-04 19:00:02.000000000

us-east-1

us-east-1d

vpc-1a2b3c4d

i-1234567890abcdef0

memory_utilization

33.3

null

2019-12-04 19:00:00.000000000

us-east-1

us-east-1d

vpc-1a2b3c4d

i-1234567890abcdef0

memory_utilization

54.9

null

2019-12-04 19:00:01.000000000

us-east-1

us-east-1d

vpc-1a2b3c4d

i-1234567890abcdef0

memory_utilization

42.6

null

2019-12-04 19:00:02.000000000

us-east-1

us-east-1d

vpc-1a2b3c4d

i-1234567890abcdef0

memory_utilization

33.3

null

2019-12-04 19:00:00.000000000

us-east-1

us-east-1d

vpc-1a2b3c4d

i-1234567890abcdef0

memory_utilization

54.9

null

2019-12-04 19:00:01.000000000

us-east-1

us-east-1d

vpc-1a2b3c4d

i-1234567890abcdef0

memory_utilization

42.6

null

2019-12-04 19:00:02.000000000

us-east-1

us-east-1d

vpc-1a2b3c4d

i-1234567890abcdef0

memory_utilization

33.3

null

2019-12-04 19:00:00.000000000

us-east-1

us-east-1d

vpc-1a2b3c4d

i-1234567890abcdef0

network_bytes_in

null

30,000

2019-12-04 19:00:01.000000000

us-east-1

us-east-1d

vpc-1a2b3c4d

i-1234567890abcdef0

network_bytes_in

null

15,200

2019-12-04 19:00:02.000000000

us-east-1

us-east-1d

vpc-1a2b3c4d

i-1234567890abcdef0

network_bytes_in

null

34,400

2019-12-04 19:00:00.000000000

us-east-1

us-east-1d

vpc-1a2b3c4d

i-1234567890abcdef0

network_bytes_in

null

1,500

2019-12-04 19:00:01.000000000

us-east-1

us-east-1d

vpc-1a2b3c4d

i-1234567890abcdef0

network_bytes_in

null

6,600

2019-12-04 19:00:02.000000000

us-east-1

us-east-1d

vpc-1a2b3c4d

i-1234567890abcdef0

network_bytes_in

null

7,200

Time series model

The time series model is a query time construct used for time series analytics. It represents data as an ordered sequence of (time, measure value) pairs. Timestream supports time series functions such as interpolation to enable you to fill the gaps in your data. To use these functions, you must convert your data into the time series model using functions such as create_time_series. Refer to Timestream Query Language Reference for more details.

Using the earlier example of the EC2 instance, here is the same data expressed as a timeseries:

region az vpc instance_id cpu_utilization

us-east-1

us-east-1d

vpc-1a2b3c4d

i-1234567890abcdef0

[{time: 2019-12-04 19:00:00.000000000, value: 35}, {time: 2019-12-04 19:00:01.000000000, value: 38.2}, {time: 2019-12-04 19:00:02.000000000, value: 45.3}]