Amazon Athena
User Guide  | API Reference

What is Amazon Athena?

Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL. With a few actions in the AWS Management Console, you can point Athena at your data stored in Amazon S3 and begin using standard SQL to run ad-hoc queries and get results in seconds.

Athena is serverless, so there is no infrastructure to set up or manage, and you pay only for the queries you run. Athena scales automatically—executing queries in parallel—so results are fast, even with large datasets and complex queries.

When should I use Athena?#

Athena helps you analyze unstructured, semi-structured, and structured data stored in Amazon S3. Examples include CSV, JSON, or columnar data formats such as Apache Parquet and Apache ORC. You can use Athena to run ad-hoc queries using ANSI SQL, without the need to aggregate or load the data into Athena.

Athena integrates with the AWS Glue Data Catalog, which offers a persistent metadata store for your data in Amazon S3. This allows you to create tables and query data in Athena based on a central metadata store available throughout your AWS account and integrated with the ETL and data discovery features of AWS Glue. For more information, see Integration with AWS Glue and What is AWS Glue in the AWS Glue Developer Guide.

Athena integrates with Amazon QuickSight for easy data visualization.

You can use Athena to generate reports or to explore data with business intelligence tools or SQL clients connected with a JDBC driver. For more information, see What is Amazon QuickSight in the Amazon QuickSight User Guide and Connecting to Amazon Athena with JDBC.

You can create named queries with AWS CloudFormation and run them in Athena. Named queries allow you to map a query name to a query and then call the query multiple times referencing it by its name. For information, see CreateNamedQuery in the Amazon Athena API Reference, and AWS::Athena::NamedQuery in the AWS CloudFormation User Guide.

Accessing Athena#

You can access Athena using the AWS Management Console, through a JDBC connection, using the Athena API, or using the Athena CLI.

Understanding Tables, Databases, and the Data Catalog#

Before you create tables and run queries, it may be helpful to know what the terms table, database, and data catalog mean with respect to Athena, and how those relate to the underlying datasets you want to work with and analyze.

In Athena, tables and databases are metadata definitions that define a schema for underlying source data. Athena uses the AWS Glue Data Catalog to store and retrieve this metadata, using it when you run queries to analyze the underlying dataset. Databases are simply a logical grouping of tables to help you organize. Like tables, they consist only of metadata.

The metadata for a table tells Athena where the data is located in Amazon S3, specifies the structure of the source data, specifies the schema of the table (for example, the column names, data types, and name of the table), and so on.

The AWS Glue Data Catalog is accessible throughout your AWS account. Other AWS services can share the AWS Glue Data Catalog, so you can see databases and tables created throughout your organization using Athena and vice versa. In addition, AWS Glue has powerful features to automatically discover data schema and to extract, transform, and load (ETL) data.

From within Athena, you can use an AWS Glue crawler to create a table automatically.

For more information about AWS Glue and crawlers, see Integration with AWS Glue.

Note

If you have tables in Athena created before August 14, 2017, they were created in an Athena-managed data catalog that exists side-by-side with the AWS Glue Data Catalog until you choose to update. For more information, see Upgrading to the AWS Glue Data Catalog Step-by-Step.

You can also choose to create tables and databases manually. When you do, Athena uses HiveQL data definition language (DDL) statements such as CREATE TABLE, CREATE DATABASE, and DROP TABLE under the hood to create tables and databases in the AWS Glue Data Catalog.

You can create tables and databases manually in Athena in the following ways:

  • Use the AWS Management Console for Athena to run the Create Table Wizard.
  • Use the AWS Management Console for Athena to write Hive DDL statements in the Query Editor.
  • Use the Athena API or CLI to execute a SQL query string with DDL statements.
  • Use the Athena JDBC driver.

When you query an existing table, under the hood, Amazon Athena uses Presto, a distributed SQL engine. We have examples with sample data within Athena to show you how to create a table and then query against it using Athena. Athena also has a tutorial in the console that helps you get started creating a table based on data that is stored in Amazon S3.

  • For a step-by-step tutorial on creating a table and write queries in the Athena Query Editor, see Getting Started.
  • Run the Athena tutorial in the console. This launches automatically if you log in to the AWS Management Console for Athena for the first time. You can also choose Tutorial in the console to launch it.