Query Amazon DynamoDB tables with SQL by using Amazon Athena
Created by Gavin Perrie (AWS), Ajit Ambike (AWS), and Brad Yates (AWS)
Environment: PoC or pilot | Technologies: Analytics; Databases; Serverless | AWS services: Amazon Athena; Amazon DynamoDB; AWS Lambda; Amazon S3 |
Summary
If your data includes sources other than Amazon Simple Storage Service (Amazon S3), you can use federated queries to access those relational, non-relational, object, or custom data sources. This pattern shows how to configure federated query access through Amazon Athena to Amazon DynamoDB by using an SQL data source connector.
Using this pattern you can do the following:
Query DynamoDB tables by using SQL.
Run federated SQL queries in Athena and join DynamoDB tables with other supported data sources.
Prerequisites and limitations
Prerequisites
A DynamoDB table.
An Athena workgroup set to use Athena engine version 2. For instructions, see the Athena documentation.
An S3 bucket where the
AthenaDynamoDBConnector
AWS Lambda function can spill the data. The S3 bucket and the Lambda function must be in the same AWS Region.
If this is your first time accessing Athena, you will need an additional S3 bucket to use as the query result location. For instructions, see the Athena documentation.
Limitations
Write operations such as INSERT INTO are not supported.
Product versions
Architecture
Target architecture
The following diagram shows the connection flow after the pattern is established. The user connects to Amazon Athena to provide the query. Athena passes the query and target to the DynamoDB data source connector Lambda function, which retrieves and returns the data to Athena. If large amounts of data are returned, Athena stores the temporary results in the spill bucket before packaging and returning the complete dataset.
Tools
AWS services
Amazon Athena is an interactive query service that helps you analyze data directly in Amazon Simple Storage Service (Amazon S3) by using standard SQL. This pattern uses Amazon Athena DynamoDB Connector
, a tool built using the Amazon Athena Query Federation SDK and installed as an AWS Lambda application through the AWS Serverless Application Repository. Amazon DynamoDB is a fully managed NoSQL database service that provides fast, predictable, and scalable performance.
AWS Lambda is a compute service that helps you run code without needing to provision or manage servers. It runs your code only when needed and scales automatically, so you pay only for the compute time that you use.
Amazon Simple Storage Service (Amazon S3) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.
Code repository
The code for this pattern is available in the GitHub Athena Query Federation
Epics
Task | Description | Skills required |
---|---|---|
Deploy the AthenaDynamoDBConnector application. | To deploy AthenaDynamoDBConnecter, do the following:
| AWS DevOps |
Create a data source for Athena. | To create the data source, do the following:
| AWS DevOps |
Use Athena to query the DynamoDB table. | To query the DynamoDB table, do the following:
| App developer |
Troubleshooting
Issue | Solution |
---|---|
Query fails with | Make sure that the Athena spill bucket and Lambda function are created in the same AWS Region. |
The newly created data source isn't visible on the Athena console. | Athena data catalogs are regional. Ensure the |
You're unable to run the query against the newly created data source. | Check that the Query Result Location has been set. |