Amazon Athena Timestream connector - Amazon Athena

Amazon Athena Timestream connector

The Amazon Athena Timestream connector enables Amazon Athena to communicate with Amazon Timestream, making your time series data accessible through Amazon Athena. You can optionally use AWS Glue Data Catalog as a source of supplemental metadata.

Amazon Timestream is a fast, scalable, fully managed, purpose-built time series database that makes it easy to store and analyze trillions of time series data points per day. Timestream saves you time and cost in managing the lifecycle of time series data by keeping recent data in memory and moving historical data to a cost optimized storage tier based upon user defined policies.

If you have Lake Formation enabled in your account, the IAM role for your Athena federated Lambda connector that you deployed in the AWS Serverless Application Repository must have read access in Lake Formation to the AWS Glue Data Catalog.

Prerequisites

Parameters

Use the Lambda environment variables in this section to configure the Timestream connector.

  • spill_bucket – Specifies the Amazon S3 bucket for data that exceeds Lambda function limits.

  • spill_prefix – (Optional) Defaults to a subfolder in the specified spill_bucket called athena-federation-spill. We recommend that you configure an Amazon S3 storage lifecycle on this location to delete spills older than a predetermined number of days or hours.

  • spill_put_request_headers – (Optional) A JSON encoded map of request headers and values for the Amazon S3 putObject request that is used for spilling (for example, {"x-amz-server-side-encryption" : "AES256"}). For other possible headers, see PutObject in the Amazon Simple Storage Service API Reference.

  • kms_key_id – (Optional) By default, any data that is spilled to Amazon S3 is encrypted using the AES-GCM authenticated encryption mode and a randomly generated key. To have your Lambda function use stronger encryption keys generated by KMS like a7e63k4b-8loc-40db-a2a1-4d0en2cd8331, you can specify a KMS key ID.

  • disable_spill_encryption – (Optional) When set to True, disables spill encryption. Defaults to False so that data that is spilled to S3 is encrypted using AES-GCM – either using a randomly generated key or KMS to generate keys. Disabling spill encryption can improve performance, especially if your spill location uses server-side encryption.

  • glue_catalog – (Optional) Use this option to specify a cross-account AWS Glue catalog. By default, the connector attempts to get metadata from its own AWS Glue account.

Setting up databases and tables in AWS Glue

You can optionally use the AWS Glue Data Catalog as a source of supplemental metadata. To enable an AWS Glue table for use with Timestream, you must have an AWS Glue database and table with names that match the Timestream database and table that you want to supply supplemental metadata for.

Note

For best performance, use only lowercase for your database names and table names. Using mixed casing causes the connector to perform a case insensitive search that is more computationally intensive.

To configure AWS Glue table for use with Timestream, you must set its table properties in AWS Glue.

To use an AWS Glue table for supplemental metadata
  1. Edit the table in the AWS Glue console to add the following table properties:

    • timestream-metadata-flag – This property indicates to the Timestream connector that the connector can use the table for supplemental metadata. You can provide any value for timestream-metadata-flag as long as the timestream-metadata-flag property is present in the list of table properties.

    • _view_template – When you use AWS Glue for supplemental metadata, you can use this table property and specify any Timestream SQL as the view. The Athena Timestream connector uses the SQL from the view together with your SQL from Athena to run your query. This is useful if you want to use a feature of Timestream SQL that is not otherwise available in Athena.

  2. Make sure that you use the data types appropriate for AWS Glue as listed in this document.

Data types

Currently, the Timestream connector supports only a subset of the data types available in Timestream, specifically: the scalar values varchar, double, and timestamp.

To query the timeseries data type, you must configure a view in AWS Glue table properties that uses the Timestream CREATE_TIME_SERIES function. You also need to provide a schema for the view that uses the syntax ARRAY<STRUCT<time:timestamp,measure_value::double:double>> as the type for any of your time series columns. Be sure to replace double with the appropriate scalar type for your table.

The following image shows an example of AWS Glue table properties configured to set up a view over a time series.


                    Configuring table properties in AWS Glue to set up a view over a time
                        series.

Required Permissions

For full details on the IAM policies that this connector requires, review the Policies section of the athena-timestream.yaml file. The following list summarizes the required permissions.

  • Amazon S3 write access – The connector requires write access to a location in Amazon S3 in order to spill results from large queries.

  • Athena GetQueryExecution – The connector uses this permission to fast-fail when the upstream Athena query has terminated.

  • AWS Glue Data Catalog – The Timestream connector requires read only access to the AWS Glue Data Catalog to obtain schema information.

  • CloudWatch Logs – The connector requires access to CloudWatch Logs for storing logs.

  • Timestream Access – For running Timestream queries.

Performance

We recommend that you use the LIMIT clause to limit the data returned (not the data scanned) to less than 256 MB to ensure that interactive queries are performant.

The Athena Timestream connector performs predicate pushdown to decrease the data scanned by the query. LIMIT clauses reduce the amount of data scanned, but if you do not provide a predicate, you should expect SELECT queries with a LIMIT clause to scan at least 16 MB of data. Selecting a subset of columns significantly speeds up query runtime and reduces data scanned. The Timestream connector is resilient to throttling due to concurrency.

Passthrough queries

The Timestream connector supports passthrough queries. Passthrough queries use a table function to push your full query down to the data source for execution.

To use passthrough queries with Timestream, you can use the following syntax:

SELECT * FROM TABLE( system.query( query => 'query string' ))

The following example query pushes down a query to a data source in Timestream. The query selects all columns in the customer table, limiting the results to 10.

SELECT * FROM TABLE( system.query( query => 'SELECT * FROM customer LIMIT 10' ))

License information

The Amazon Athena Timestream connector project is licensed under the Apache-2.0 License.

See also

For additional information about this connector, visit the corresponding site on GitHub.com.