Writes - Amazon Timestream

Writes

You can collect time series data from connected devices, IT systems, and industrial equipment, and write it into Timestream. Timestream enables you to write data points from a single time series and/or data points from many series in a single write request when the time series belong to the same table. For your convenience, Timestream offers you with a flexible schema that auto detects the column names and data types for your Timestream tables based on the dimension names and the data types of the measure values you specify when invoking writes into the database. You can also write batches of data into Timestream.

Timestream supports the following data types for writes:

Data type Description

BIGINT

Represent a 64-bit integer.

BOOLEAN

Represents the two truth values of logic, namely, true and false.

DOUBLE

64-bit variable-precision implementing the IEEE Standard 754 for Binary Floating-Point Arithmetic.

VARCHAR

Variable length character data with an optional maximum length. The maximum limit is 2KB.

MULTI

Data type for multi-measure records. This data type includes one or more measures of type BIGINT, BOOLEAN, DOUBLE, VARCHAR, and TIMESTAMP.

TIMESTAMP

Represents an instance in time using nanosecond precision time in UTC, tracking the time since Unix time. This data type is currently supported only for multi-measure records (i.e. within measure values of type MULTI).

Note

Timestream supports eventual consistency semantics for reads. This means that when you query data immediately after writing a batch of data into Timestream, the query results might not reflect the results of a recently completed write operation. The results may also include some stale data. Similarly, while writing time series data with one or more new dimensions, a query can return a partial subset of columns for a short period of time. If you repeat theses query requests after a short time, the results should return the latest data.

You can collect time series data from connected devices, IT systems, industrial equipment, and many other systems, and write it into Amazon Timestream. You can write data using the AWS SDKs, AWS CLI, or through AWS Lambda, AWS IoT Core, Amazon Kinesis Data Analytics for Apache Flink, Amazon Kinesis, Amazon MSK and Open source Telegraf.

No upfront schema definition

Before sending data into Amazon Timestream, you must create a database and a table using the AWS Management Console, Timestream’s SDKs, or the Timestream APIs. For more information, see Create a database and Create a table. While creating the table, you do not need to define the schema up front. Amazon Timestream automatically detects the schema based on the measures and dimensions of the data points being sent, so you no longer need to alter your schema offline to adapt it to your rapidly changing time series data.

Writing data (Inserts and Upserts)

The write operation in Amazon Timestream enables you to insert and upsert data. By default, writes in Amazon Timestream follow the first writer wins semantics, where data is stored as append only and duplicate records are rejected. While the first writer wins semantics satisfies the requirements of many time series applications, there are scenarios where applications need to update existing records in an idempotent manner and/or write data with the last writer wins semantics, where the record with the highest version is stored in the service. To address these scenarios, Amazon Timestream provides the ability to upsert data. Upsert is an operation that inserts a record in to the system when the record does not exist or updates the record, when one exists. When the record is updated, it is updated in an idempotent manner.

Writing data into the memory store and the magnetic store

Amazon Timestream offers the ability to directly write data into the memory store and the magnetic store. The memory store is optimized for high throughput data writes and the magnetic store is optimized for lower throughput writes of late arrival data. Late arrival data is data with a timestamp earlier than the current time. You must explicitly enable the ability to write late arrival data into the magnetic store by enabling magnetic store writes for the table.

Writing data with single-measure records and multi-measure records

Amazon Timestream offers the ability to write data using two types of records, namely, single-measure records and multi-measure records.

Single-measure records

Single-measure records enable you to send a single measure per record. When data is sent to Timestream using this format, Timestream creates one table row per record. This means that if a device emits 4 metrics and each metric is sent as a single-measure record, Timestream will create 4 rows in the table to store this data, and the device attributes will be repeated for each row. This format is recommended in cases when you want to monitor a single metric from an application or when your application does not emit multiple metrics at the same time.

Multi-measure records

With multi-measure records, you can store multiple measures in a single table row, instead of storing one measure per table row. Multi-measure records therefore enable you to migrate your existing data from relational databases to Amazon Timestream with minimal changes.

You can also batch more data in a single write request than single-measure records. This increases data write throughput and performance, and also reduces the cost of data writes. This is because batching more data in a write request enables Amazon Timestream to identify more repeatable data in a single write request (where applicable), and charge only once for repeated data.

Multi-measure records

With multi-measure records, you can store your time-series data in a more compact format in the memory and magnetic store, which helps lower data storage costs. Also, the compact data storage lends itself to writing simpler queries for data retrieval, improves query performance, and lowers the cost of queries.

Furthermore, multi-measure records also support the TIMESTAMP data type for storing more than one timestamp in a time-series record. This timestamp is independent of the memory store retention period and can support values in the past, present, or future. Multi-measure records therefore help improve performance, cost, and query simplicity—and offer more flexibility for storing different types of correlated measures.

Benefits

The following are the benefits of using multi-measure records:

  • Performance and cost – Multi-measure records enable you to write multiple time-series measures in a single write request. This increases the write throughput and also reduces the cost of writes. With multi-measure records, you can store data in a more compact manner, which helps lower the data storage costs. The compact data storage of multi-measure records results in less data being processed by queries. This is designed to improve the overall query performance and help lower the query cost.

  • Query simplicity – With multi-measure records, you don’t need to write complex common table expressions (CTEs) in a query to read multiple measures with the same timestamp. This is because the measures are stored as columns in a single table row. Multi-measure records therefore enable writing simpler queries.

  • Data modeling flexibility – Multi-measure records simplify migrating data from existing relational databases to Timestream. Multi-measure records also support additional data types such as TIMESTAMP. This is independent of the measure store retention window, and therefore enables you to store data from the past, present, or future.

Use cases

You can use multi-measure records for any time-series application that generates more than one measurement from the same device at any given time. The following are some example applications:

  • A video streaming platform that generates hundreds of metrics at a given time.

  • Medical devices that generate measurements such as blood oxygen levels, heart rate, and pulse.

  • Industrial equipment such as oil rigs that generate metrics, temperature, and weather sensors.

  • Other applications that are architected with one or more microservices.

Example: Monitoring the performance and health of a video streaming application

Consider a video streaming application that is running on 200 EC2 instances. You want to use Amazon Timestream to store and analyze the metrics being emitted from the application, so you can understand the performance and health of your application, quickly identify anomalies, resolve issues, and discover optimization opportunities.

We will model this scenario with single-measure records and multi-measure records, and then compare/contrast both approaches. For each approach, we make the following assumptions:

  • Each EC2 instance emits four measures (video_startup_time, rebuffering_ratio, video_playback_failures, and average_frame_rate) and four dimensions (device_id, device_type, os_version, and region) per second.

  • You want to store 6 hours of data in the memory store and 6 months of data in the magnetic store.

  • To identify anomalies, you’ve set up 10 queries that run every minute to identify any unusual activity over the past few minutes. You’ve also built a dashboard with eight widgets that display the last 6 hours of data, so that you can effectively monitor your application. This dashboard is accessed by five users at any given time and is auto-refreshed every hour.

Using single measure records

Data modeling: With single measure records, we will create one record for each of the four measures (video startup time, rebuffering ratio, video playback failures, and average frame rate). Each record will have the four dimensions (device_id, device_type, os_version, and region) and a timestamp.

Writes:When you write data into Amazon Timestream, the records are constructed as follows:

public void writeRecords() { System.out.println("Writing records"); // Specify repeated values for all records List<Record> records = new ArrayList<>(); final long time = System.currentTimeMillis(); List<Dimension> dimensions = new ArrayList<>(); final Dimension device_id = new Dimension().withName("device_id").withValue("12345678"); final Dimension device_type = new Dimension().withName("device_type").withValue("iPhone 11"); final Dimension os_version = new Dimension().withName("os_version").withValue("14.8"); final Dimension region = new Dimension().withName("region").withValue("us-east-1"); dimensions.add(device_id); dimensions.add(device_type); dimensions.add(os_version); dimensions.add(region); Record videoStartupTime = new Record() .withDimensions(dimensions) .withMeasureName("video_startup_time") .withMeasureValue("200") .withMeasureValueType(MeasureValueType.BIGINT) .withTime(String.valueOf(time)); Record rebufferingRatio = new Record() .withDimensions(dimensions) .withMeasureName("rebuffering_ratio") .withMeasureValue("0.5") .withMeasureValueType(MeasureValueType.DOUBLE) .withTime(String.valueOf(time)); Record videoPlaybackFailures = new Record() .withDimensions(dimensions) .withMeasureName("video_playback_failures") .withMeasureValue("0") .withMeasureValueType(MeasureValueType.BIGINT) .withTime(String.valueOf(time)); Record averageFrameRate = new Record() .withDimensions(dimensions) .withMeasureName("average_frame_rate") .withMeasureValue("0.5") .withMeasureValueType(MeasureValueType.DOUBLE) .withTime(String.valueOf(time)); records.add(videoStartupTime); records.add(rebufferingRatio); records.add(videoPlaybackFailures); records.add(averageFrameRate); WriteRecordsRequest writeRecordsRequest = new WriteRecordsRequest() .withDatabaseName(DATABASE_NAME) .withTableName(TABLE_NAME) .withRecords(records); try { WriteRecordsResult writeRecordsResult = amazonTimestreamWrite.writeRecords(writeRecordsRequest); System.out.println("WriteRecords Status: " + writeRecordsResult.getSdkHttpMetadata().getHttpStatusCode()); } catch (RejectedRecordsException e) { System.out.println("RejectedRecords: " + e); for (RejectedRecord rejectedRecord : e.getRejectedRecords()) { System.out.println("Rejected Index " + rejectedRecord.getRecordIndex() + ": " + rejectedRecord.getReason()); } System.out.println("Other records were written successfully. "); } catch (Exception e) { System.out.println("Error: " + e); } }

When you store single-measure records, the data is logically represented as:

time device_id device_type os_version region measure_name measure_value::bigint measure_value::double

2021-09-07 21:48:44 .000000000

12345678

iPhone 11

14.8

us-east-1

video_startup_time

200

2021-09-07 21:48:44 .000000000

12345678

iPhone 11

14.8

us-east-1

rebuffering_ratio

0.5

2021-09-07 21:48:44 .000000000

12345678

iPhone 11

14.8

us-east-1

video_playback_failures

0

2021-09-07 21:48:44 .000000000

12345678

iPhone 11

14.8

us-east-1

average_frame_rate

0.85

2021-09-07 21:53:44 .000000000

12345678

iPhone 11

14.8

us-east-1

video_startup_time

500

2021-09-07 21:53:44 .000000000

12345678

iPhone 11

14.8

us-east-1

rebuffering_ratio

1.5

2021-09-07 21:53:44 .000000000

12345678

iPhone 11

14.8

us-east-1

video_playback_failures

10

2021-09-07 21:53:44 .000000000

12345678

iPhone 11

14.8

us-east-1

average_frame_rate

0.2

Queries: You can write a query that retrieves all of the data points with the same timestamp received over the past 15 minutes as:

with cte_video_startup_time as ( SELECT time, device_id, device_type, os_version, region, measure_value::bigint as video_startup_time FROM table where time >= ago(15m) and measure_name=”video_startup_time”), cte_rebuffering_ratio as ( SELECT time, device_id, device_type, os_version, region, measure_value::double as rebuffering_ratio FROM table where time >= ago(15m) and measure_name=”rebuffering_ratio”), cte_video_playback_failures as ( SELECT time, device_id, device_type, os_version, region, measure_value::bigint as video_playback_failures FROM table where time >= ago(15m) and measure_name=”video_playback_failures”), cte_average_frame_rate as ( SELECT time, device_id, device_type, os_version, region, measure_value::double as average_frame_rate FROM table where time >= ago(15m) and measure_name=”average_frame_rate”) SELECT a.time, a.device_id, a.os_version, a.region, a.video_startup_time, b.rebuffering_ratio, c.video_playback_failures, d.average_frame_rate FROM cte_video_startup_time a, cte_buffering_ratio b, cte_video_playback_failures c, cte_average_frame_rate d WHERE a.time = b.time AND a.device_id = b.device_id AND a.os_version = b.os_version AND a.region=b.region AND a.time = c.time AND a.device_id = c.device_id AND a.os_version = c.os_version AND a.region=c.region AND a.time = d.time AND a.device_id = d.device_id AND a.os_version = d.os_version AND a.region=d.region

Workload cost: The cost of this workload is estimated to be $373.23 per month with single-measure records

Using multi-measure records

Data modeling: : With multi-measure records, we will create one record that contains all four measures (video startup time, rebuffering ratio, video playback failures, and average frame rate), all four dimensions (device_id, device_type, os_version, and region), and a timestamp.

Writes:When you write data into Amazon Timestream, the records are constructed as follows:

public void writeRecords() { System.out.println("Writing records"); // Specify repeated values for all records List<Record> records = new ArrayList<>(); final long time = System.currentTimeMillis(); List<Dimension> dimensions = new ArrayList<>(); final Dimension device_id = new Dimension().withName("device_id").withValue("12345678"); final Dimension device_type = new Dimension().withName("device_type").withValue("iPhone 11"); final Dimension os_version = new Dimension().withName("os_version").withValue("14.8"); final Dimension region = new Dimension().withName("region").withValue("us-east-1"); dimensions.add(device_id); dimensions.add(device_type); dimensions.add(os_version); dimensions.add(region); Record videoMetrics = new Record() .withDimensions(dimensions) .withMeasureName("video_metrics") .withTime(String.valueOf(time)); .withMeasureValueType(MeasureValueType.MULTI) .withMeasureValues( new MeasureValue() .withName("video_startup_time") .withValue("0") .withValueType(MeasureValueType.BIGINT), new MeasureValue() .withName("rebuffering_ratio") .withValue("0.5") .withType(MeasureValueType.DOUBLE), new MeasureValue() .withName("video_playback_failures") .withValue("0") .withValueType(MeasureValueType.BIGINT), new MeasureValue() .withName("average_frame_rate") .withValue("0.5") .withValueType(MeasureValueType.DOUBLE)) records.add(videoMetrics); WriteRecordsRequest writeRecordsRequest = new WriteRecordsRequest() .withDatabaseName(DATABASE_NAME) .withTableName(TABLE_NAME) .withRecords(records); try { WriteRecordsResult writeRecordsResult = amazonTimestreamWrite.writeRecords(writeRecordsRequest); System.out.println("WriteRecords Status: " + writeRecordsResult.getSdkHttpMetadata().getHttpStatusCode()); } catch (RejectedRecordsException e) { System.out.println("RejectedRecords: " + e); for (RejectedRecord rejectedRecord : e.getRejectedRecords()) { System.out.println("Rejected Index " + rejectedRecord.getRecordIndex() + ": " + rejectedRecord.getReason()); } System.out.println("Other records were written successfully. "); } catch (Exception e) { System.out.println("Error: " + e); } }

When you store multi-measure records, the data is logically represented as:

time device_id device_type os_version region measure_name video_startup_time rebuffering_ratio video_ playback_failures average_frame_rate

2021-09-07 21:48:44 .000000000

12345678

iPhone 11

14.8

us-east-1

video_metrics

200

0.5

0

0.85

2021-09-07 21:53:44 .000000000

12345678

iPhone 11

14.8

us-east-1

video_metrics

500

1.5

10

0.2

Queries: You can write a query that retrieves all of the data points with the same timestamp received over the past 15 minutes as:

SELECT time, device_id, device_type, os_version, region, video_startup_time, rebuffering_ratio, video_playback_failures, average_frame_rate FROM table where time >= ago(15m)

Workload cost: The cost of workload is estimated to be $127.43 with multi-measure records.

Note

In this case, using multi-measure records reduces the overall estimated monthly spend by 2.5x, with the data writes cost reduced by 3.3x, the storage cost reduced by 3.3x, and the query cost reduced by 1.2x.

Writing data with a timestamp that exists in the past or in the future

Timestream offers the ability to write data with a timestamp that lies outside of the memory store retention window through a couple different mechanisms.

  • Magnetic store writes – You can write late arrival data directly into the magnetic store through magnetic store writes. To use magnetic store writes, you must first enable magnetic store writes for a table. You can then ingest data into the table using the same mechanism used for writing data into the memory store. Amazon Timestream will automatically write the data into the magnetic store based on its timestamp.

    Note

    The write-to-read latency for the magnetic store can be up to 6 hours, unlike writing data into the memory store, where the write-to-read latency is in the sub-second range.

  • Using the TIMESTAMP datatype for measures – You can write late arrival data directly into the magnetic store through magnetic store writes. To use magnetic store writes, you must first enable magnetic store writes for a table. You can then ingest data into the table using the same mechanism used for writing data into the memory store. Amazon Timestream will automatically write the data into the magnetic store based on its timestamp..

    Note

    The TIMESTAMP datatype is currently supported only for multi-measure records.

Batching writes

Amazon Timestream enables you to write data points from a single time series and/or data points from many series in a single write request. Batching multiple data points in a single write operation is beneficial from a performance and cost perspective. See Writes in the Metering and Pricing section for more details.

Note

Your write requests to Timestream may be throttled as Timestream scales to adapt to the data ingestion needs of your application. If your applications encounter throttling exceptions, you must continue to send data at the same (or higher) throughput to allow Timestream to automatically scale to your application's needs.

Eventual consistency for reads

Timestream supports eventual consistency semantics for reads. This means that when you query data immediately after writing a batch of data into Timestream, the query results might not reflect the results of a recently completed write operation. If you repeat these query requests after a short time, the results should return the latest data.