Getting Started
This tutorial walks you through using Amazon Athena to query data. You'll create a table based on sample data stored in Amazon Simple Storage Service, query the table, and check the results of the query.
The tutorial is using live resources, so you are charged for the queries that you run. You aren't charged for the sample data in the location that this tutorial uses, but if you upload your own data files to Amazon S3, charges do apply.
Prerequisites
-
If you have not already done so, sign up for an account in Setting Up.
-
Using the same AWS Region (for example, US West (Oregon)) and account that you are using for Athena, Create a bucket in Amazon S3 to hold your query results from Athena.
Step 1: Create a Database
You first need to create a database in Athena.
To create a database
-
Open the Athena console.
-
If this is your first time visiting the Athena console, you'll go to a Getting Started page. Choose Get Started to open the Query Editor. If it isn't your first time, the Athena Query Editor opens.
-
Choose the link to set up a query result location in Amazon S3.
-
In the Settings dialog box, enter the path to the bucket that you created in Amazon S3 for your query results. Prefix the path with
s3://
and add a forward slash to the end of the path. -
Click Save.
-
In the Athena Query Editor, you see a query pane. You can type queries and statements here.
-
To create a database named
mydatabase
, enter the following CREATE DATABASE statement.CREATE DATABASE mydatabase
-
Choose Run Query or press
Ctrl+ENTER
. -
Confirm that the catalog display refreshes and
mydatabase
appears in the Database list in the navigation pane on the left.
Step 2: Create a Table
Now that you have a database, you're ready to run a statement to create a table. The
table will be based on Athena sample data in the location
s3://athena-examples-
.
The statement that creates the table defines columns that map to the data, specifies
how
the data is delimited, and specifies the Amazon S3 location that contains the sample
data.
myregion
/cloudfront/plaintext/
To create a table
-
For Database, choose
mydatabase
. -
Choose the plus (+) sign in the Query Editor to create a tab with a new query. You can have up to ten query tabs open at once.
-
In the query pane, enter the following
CREATE TABLE
statement. In theLOCATION
statement at the end of the query, replacemyregion
with the AWS Region that you are currently using (for example,us-west-1
).CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs ( `Date` DATE, Time STRING, Location STRING, Bytes INT, RequestIP STRING, Method STRING, Host STRING, Uri STRING, Status INT, Referrer STRING, os STRING, Browser STRING, BrowserVersion STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "^(?!#)([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+[^\(]+[\(]([^\;]+).*\%20([^\/]+)[\/](.*)$" ) LOCATION 's3://athena-examples-
myregion
/cloudfront/plaintext/';Note Replace
myregion
ins3://athena-examples-
with the region identifier where you run Athena, for example,myregion
/path/to/data/s3://athena-examples-us-west-1/path/to/data/
. -
Choose Run Query.
The table
cloudfront_logs
is created and appears under the list of Tables for themydatabase
database.
Step 3: Query Data
Now that you have the cloudfront_logs
table created in Athena based on the
data in Amazon S3, you can run SQL queries on the table and see the results in Athena.
For
more information about using SQL in Athena, see SQL Reference for Amazon Athena.
To run a query
-
Open a new query tab and enter the following SQL statement in the query pane.
SELECT os, COUNT(*) count FROM cloudfront_logs WHERE date BETWEEN date '2014-07-05' AND date '2014-08-05' GROUP BY os;
-
Choose Run Query.
The results look like the following:
-
You can save the results of the query to a
.csv
file by choosing the download icon on the Results pane. -
Choose the History tab to view your previous queries.
-
Choose Download results to download the results of a previous query. Query history is retained for 45 days.
For more information, see Working with Query Results, Output Files, and Query History.
Connecting to Other Data Sources
This tutorial used a data source in Amazon S3 in CSV format. You can connect Athena to a variety of data sources by using AWS Glue, ODBC and JDBC drivers, external Hive metastores, and Athena data source connectors. For more information, see Connecting to Data Sources.