Query Amazon DynamoDB tables with SQL by using Amazon Athena - AWS Prescriptive Guidance

Query Amazon DynamoDB tables with SQL by using Amazon Athena

Created by Gavin Perrie (AWS), Ajit Ambike (AWS), and Brad Yates (AWS)

Environment: PoC or pilot

Technologies: Analytics; Databases; Serverless

AWS services: Amazon Athena; Amazon DynamoDB; AWS Lambda; Amazon S3

Summary

If your data includes sources other than Amazon Simple Storage Service (Amazon S3), you can use federated queries to access those relational, non-relational, object, or custom data sources. This pattern shows how to configure federated query access through Amazon Athena to Amazon DynamoDB by using an SQL data source connector.

Using this pattern you can do the following:

  • Query DynamoDB tables by using SQL.

  • Run federated SQL queries in Athena and join DynamoDB tables with other supported data sources.

Prerequisites and limitations

Prerequisites

  • A DynamoDB table.

  • An Athena workgroup set to use Athena engine version 2. For instructions, see the Athena documentation.

  • An S3 bucket where the AthenaDynamoDBConnector AWS Lambda function can spill the data. The S3 bucket and the Lambda function must be in the same AWS Region.

If this is your first time accessing Athena, you will need an additional S3 bucket to use as the query result location. For instructions, see the Athena documentation.

Limitations

  • Write operations such as INSERT INTO are not supported. 

Product versions

Architecture

Target architecture

The following diagram shows the connection flow after the pattern is established. The user connects to Amazon Athena to provide the query. Athena passes the query and target to the DynamoDB data source connector Lambda function, which retrieves and returns the data to Athena. If large amounts of data are returned, Athena stores the temporary results in the spill bucket before packaging and returning the complete dataset.

Workflow from users to Athena to Lambda, which connects to the S3 bucket and to the DynamoDB table.

Tools

AWS services

  • Amazon Athena is an interactive query service that helps you analyze data directly in Amazon Simple Storage Service (Amazon S3) by using standard SQL. This pattern uses Amazon Athena DynamoDB Connector, a tool built using the Amazon Athena Query Federation SDK and installed as an AWS Lambda application through the AWS Serverless Application Repository.

  • Amazon DynamoDB is a fully managed NoSQL database service that provides fast, predictable, and scalable performance.

  • AWS Lambda is a compute service that helps you run code without needing to provision or manage servers. It runs your code only when needed and scales automatically, so you pay only for the compute time that you use.

  • Amazon Simple Storage Service (Amazon S3) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.

Code repository

The code for this pattern is available in the GitHub Athena Query Federation repository.

Epics

TaskDescriptionSkills required

Deploy the AthenaDynamoDBConnector application.

To deploy AthenaDynamoDBConnecter, do the following:

  1. Sign in to the AWS Management Console, and choose the AWS Region that you're using for the DynamoDB table and the spill bucket.

  2. Open the Serverless Application Repository at https://console.aws.amazon.com/serverlessrepo/.

  3. In the navigation pane, choose Available Applications.

  4. For AWS Identity and Access Management (IAM) access, under the search bar, select the Show apps that create custom IAM roles or resource policies check box.

  5. Search for and select AthenaDynamoDBConnector, and ensure that the author listed is Amazon Athena Federation.

  6. In Application Settings, enter the following values:

    • SpillBucket ‒ Location that the function can spill data to.

    • AthenaCatalogName ‒ The name of the Lambda function that will be created. The name will also be used as the Data Source Name in Athena.

  7. Select the check box to acknowledge the creation of IAM roles and policies.

  8. Choose Deploy.

AWS DevOps

Create a data source for Athena.

To create the data source, do the following:

  1. Open the Athena console at

    https://console.aws.amazon.com/athena/.

  2. Expand the navigation pane, and choose Data sources.

  3. Choose Create data source.

  4. Choose Amazon DynamoDB.

  5. Enter the Data Source Name.

  6. Select the Lambda function that you created.

  7. Review the details, and choose Create data source.

AWS DevOps

Use Athena to query the DynamoDB table.

To query the DynamoDB table, do the following:

  1. On the Athena console, expand the navigation pane, and choose Query editor.

  2. On the Data source dropdown list, choose the data source that you created.

  3. Confirm that the DynamoDB tables are listed under Tables.

  4. Run the query.

App developer

Troubleshooting

IssueSolution

Query fails with GENERIC_INTERNAL_ERROR: The bucket is in this region: <region>.

Make sure that the Athena spill bucket and Lambda function are created in the same AWS Region.

The newly created data source isn't visible on the Athena console.

Athena data catalogs are regional. Ensure the AthenaDynamoDBConnector has been deployed in the Region where you are trying to use Athena.

You're unable to run the query against the newly created data source.

Check that the Query Result Location has been set.

Related resources