What is Amazon Athena?

Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL. With a few actions in the AWS Management Console, you can point Athena at your data stored in Amazon S3 and begin using standard SQL to run ad-hoc queries and get results in seconds.

Athena is serverless, so there is no infrastructure to set up or manage, and you pay only for the queries they run. Athena scales automatically—executing queries in parallel—so results are fast, even with large datasets and complex queries.

When should I use Athena?#

Athena helps you analyze data stored in Amazon S3. You can use Athena to run ad-hoc queries using ANSI SQL, without the need to aggregate or load the data into Athena. You use Athena to process unstructured, semi-structured, and structured data sets. Examples include CSV, JSON, or columnar data formats such as Apache Parquet and Apache ORC. Athena integrates with Amazon QuickSight for easy visualization. You can also use Athena to generate reports or to explore data with business intelligence tools or SQL clients connected with a JDBC driver.

Accessing Athena#

There are currently two ways to access Athena: using the AWS Management Console or through a JDBC connection. To get started with the console, see Getting Started. To learn how to use the JDBC, see Accessing Amazon Athena with JDBC.

Creating Tables#

Before you can create tables, it is important to first know what is meant by the terms "database" and "table."

What are tables?#

Tables are a definition of how your data are stored. Tables are essentially metadata that describes your data in a way similar to a relation, although it is important to emphasize that tables and databases in Athena do not represent a true relational database.

What are databases?#

In Athena, databases simply are a logical grouping of tables. Synonymous terms include catalog and namespace.

Athena uses an internal data catalog to store information and schemas about the databases and tables that you create for your data stored in Amazon S3. You can modify the catalog using data definition language (DDL) statements or via the AWS Management Console. Any schemas you define are automatically saved unless you explicitly delete them. Athena applies schemas on-read, which means that your table definitions are applied to your data in Amazon S3 when queries are being executed. There is no data loading or transformation required. You can delete table definitions and schema without impacting the underlying data stored on Amazon S3.

Amazon Athena uses Presto, a distributed SQL engine, to execute your queries. You define data using tables created using the Hive DDL in the Athena Query Editor in the console. There are also examples with sample data within Athena to show you how to do this. Athena also has a wizard to get you started with creating a table based on data stored in Amazon S3.

For more information, see Creating Databases and Tables.

Querying Data#

You query data using the Athena Query Editor window that you used to create your table. Athena enables you to write DDL statements or SQL queries directly from the Query Editor. Results are automatically stored in Amazon S3. You can change the base prefix of where the results should be shared by choosing a setting. You also have the option to download results in CSV format. Athena supports ANSI SQL standard queries.

For more information, see Getting Started.

How to Get Started with Athena#

  • See the Getting Started tutorial for an in-depth walkthrough of how to create a table and write queries in the Athena Query Editor.
  • Run the Athena on-boarding tutorial in the console. You can do this by logging into the AWS Management Console for Athena.