This tutorial walks you through using Amazon Athena to query data. You'll create a table based on sample data stored in Amazon Simple Storage Service, query the table, and check the results of the query.
The tutorial is using live resources, so you are charged for the queries that you run. You aren't charged for the sample data sets that you use, but if you upload your own data files to Amazon S3, charges do apply.
You first need to create a database in Athena.
- Open the AWS Management Console for Athena.
- If this is your first time visiting the AWS Management Console for Athena, you'll go to a Getting Started page. Choose Get Started to open the Query Editor. If it isn't your first time, the Athena Query Editor opens.
- In the Athena Query Editor, you see a query pane with an example query. Start typing your query anywhere in the query pane.
To create a database named
mydatabase, enter the following CREATE DATABASE statement, and then choose Run Query:
CREATE DATABASE mydatabase
Confirm that the catalog display refreshes and
mydatabaseappears in the DATABASE list in the Catalog dashboard on the left side.
Now that you have a database, you're ready to create a table that's based on the sample data file. You define columns that map to the data, specify how the data is delimited, and provide the location in Amazon S3 for the file.
- Make sure that
mydatabaseis selected for DATABASE and then choose New Query.
- In the query pane, enter the following CREATE TABLE statement, and then choose Run Query:
NoteYou can querying data in regions other than the region where you run Athena. Standard inter-region data transfer rates for Amazon S3 apply in addition to standard Athena charges. To reduce data transfer charges, replace myregion in
s3://athena-examples-myregion/path/to/data/with the region identifier where you run Athena, for example,
s3://athena-examples-us-east-1/path/to/data/.CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs ( `Date` DATE, Time STRING, Location STRING, Bytes INT, RequestIP STRING, Method STRING, Host STRING, Uri STRING, Status INT, Referrer STRING, os STRING, Browser STRING, BrowserVersion STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "^(?!#)([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+[^\(]+[\(]([^\;]+).*\%20([^\/]+)[\/](.*)$" ) LOCATION 's3://athena-examples-myregion/cloudfront/plaintext/';
The table is created and appears in the Catalog dashboard for your database.
Now that you have a table with data, you can run queries on the data and see the results in Athena.
- Choose New Query, enter the following statement anywhere in the query pane, and then choose Run Query:
SELECT os, COUNT(*) count FROM cloudfront_logs WHERE date BETWEEN date '2014-07-05' AND date '2014-08-05' GROUP BY os;
Results are returned that look like the following:
Optionally, you can save the results of a query to CSV by choosing the file icon on the Results pane.
You can also view the results of previous queries or queries that may take some time to complete. Choose History then either search for your query or choose View or Download to view or download the results of previous completed queries. This also displays the status of queries that are currently running.
Query results are also stored in Amazon S3 in a bucket called aws-athena-query-results-ACCOUNTID-REGION. You can change the default location in the console by choosing Settings in the upper right pane.