Amazon Athena CloudWatch connector
The Amazon Athena CloudWatch connector enables Amazon Athena to communicate with CloudWatch so that you can query your log data with SQL.
The connector maps your LogGroups as schemas and each LogStream as a table. The connector
also maps a special all_log_streams
view that contains all LogStreams in the
LogGroup. This view enables you to query all the logs in a LogGroup at once instead of
searching through each LogStream individually.
Prerequisites
Deploy the connector to your AWS account using the Athena console or the AWS Serverless Application Repository. For more information, see Deploy a data source connector or Use the AWS Serverless Application Repository to deploy a data source connector.
Parameters
Use the Lambda environment variables in this section to configure the CloudWatch connector.
-
spill_bucket – Specifies the Amazon S3 bucket for data that exceeds Lambda function limits.
-
spill_prefix – (Optional) Defaults to a subfolder in the specified
spill_bucket
calledathena-federation-spill
. We recommend that you configure an Amazon S3 storage lifecycle on this location to delete spills older than a predetermined number of days or hours. -
spill_put_request_headers – (Optional) A JSON encoded map of request headers and values for the Amazon S3
putObject
request that is used for spilling (for example,{"x-amz-server-side-encryption" : "AES256"}
). For other possible headers, see PutObject in the Amazon Simple Storage Service API Reference. -
kms_key_id – (Optional) By default, any data that is spilled to Amazon S3 is encrypted using the AES-GCM authenticated encryption mode and a randomly generated key. To have your Lambda function use stronger encryption keys generated by KMS like
a7e63k4b-8loc-40db-a2a1-4d0en2cd8331
, you can specify a KMS key ID. -
disable_spill_encryption – (Optional) When set to
True
, disables spill encryption. Defaults toFalse
so that data that is spilled to S3 is encrypted using AES-GCM – either using a randomly generated key or KMS to generate keys. Disabling spill encryption can improve performance, especially if your spill location uses server-side encryption.
The connector also supports AIMD
congestion controlThrottlingInvoker
construct. You can tweak the default throttling behavior
by setting any of the following optional environment variables:
-
throttle_initial_delay_ms – The initial call delay applied after the first congestion event. The default is 10 milliseconds.
-
throttle_max_delay_ms – The maximum delay between calls. You can derive TPS by dividing it into 1000ms. The default is 1000 milliseconds.
-
throttle_decrease_factor – The factor by which Athena reduces the call rate. The default is 0.5
-
throttle_increase_ms – The rate at which Athena decreases the call delay. The default is 10 milliseconds.
Databases and tables
The Athena CloudWatch connector maps your LogGroups as schemas (that is, databases) and each
LogStream as a table. The connector also maps a special all_log_streams
view that contains all LogStreams in the LogGroup. This view enables you to query all
the logs in a LogGroup at once instead of searching through each LogStream
individually.
Every table mapped by the Athena CloudWatch connector has the following schema. This schema matches the fields provided by CloudWatch Logs.
-
log_stream – A
VARCHAR
that contains the name of the LogStream that the row is from. -
time – An
INT64
that contains the epoch time of when the log line was generated. -
message – A
VARCHAR
that contains the log message.
Examples
The following example shows how to perform a SELECT
query on a
specified LogStream.
SELECT * FROM "lambda:
cloudwatch_connector_lambda_name
"."log_group_path
"."log_stream_name
" LIMIT 100
The following example shows how to use the all_log_streams
view to
perform a query on all LogStreams in a specified LogGroup.
SELECT * FROM "lambda:
cloudwatch_connector_lambda_name
"."log_group_path
"."all_log_streams" LIMIT 100
Required Permissions
For full details on the IAM policies that this
connector requires, review the Policies
section of the athena-cloudwatch.yaml
-
Amazon S3 write access – The connector requires write access to a location in Amazon S3 in order to spill results from large queries.
-
Athena GetQueryExecution – The connector uses this permission to fast-fail when the upstream Athena query has terminated.
-
CloudWatch Logs Read/Write – The connector uses this permission to read your log data and to write its diagnostic logs.
Performance
The Athena CloudWatch connector attempts to optimize queries against CloudWatch by parallelizing scans of the log streams required for your query. For certain time period filters, predicate pushdown is performed both within the Lambda function and within CloudWatch Logs.
For best performance, use only lowercase for your log group names and log stream names. Using mixed casing causes the connector to perform a case insensitive search that is more computationally intensive.
Passthrough queries
The CloudWatch connector supports passthrough queries that use CloudWatch Logs Insights query syntax. For more information about CloudWatch Logs Insights, see Analyzing log data with CloudWatch Logs Insights in the Amazon CloudWatch Logs User Guide.
To create passthrough queries with CloudWatch, use the following syntax:
SELECT * FROM TABLE( system.query( STARTTIME => '
start_time
', ENDTIME => 'end_time
', QUERYSTRING => 'query_string
', LOGGROUPNAMES => 'log_group-names
', LIMIT => 'max_number_of_results
' ))
The following example CloudWatch passthrough query filters for the duration
field when it does not equal 1000.
SELECT * FROM TABLE( system.query( STARTTIME => '1710918615308', ENDTIME => '1710918615972', QUERYSTRING => 'fields @duration | filter @duration != 1000', LOGGROUPNAMES => '/aws/lambda/cloudwatch-test-1', LIMIT => '2' ))
License information
The Amazon Athena CloudWatch connector project is licensed under the Apache-2.0 License
Additional resources
For additional information about this connector, visit the corresponding site