Amazon Athena
User Guide  | API Reference

Managing the Athena Catalog

Amazon Athena uses an internal data catalog to store information and schemas about the databases and tables that you create for your data stored in Amazon S3. You can modify the catalog using DDL statements or the AWS Management Console. Any schemas that you define are automatically saved unless you explicitly delete them. Athena applies schemas on-read, which means that your table definitions are applied to your data in Amazon S3 when queries are executed. There is no data loading or transformation required. You can delete table definitions and schema without impacting the underlying data stored on Amazon S3.

Browse the catalog#

  1. Open the AWS Management Console for Athena.

  2. If you have not used Athena, you see a Getting Started page. Choose Get Started.

  3. Choose Catalog Manager.

    _images/choose_catalog.png
  4. Select a database, for example, default.

  5. In the database, select a table. The table display shows the schema for the table on the Columns tab.

    _images/catalog_columns.png

Other table information, including table data location, can be found on the Properties tab.

To create a table using the wizard#

  1. Open the AWS Management Console for Athena.
  2. Under the database display in the Query Editor, choose Add table, which displays a wizard.
  3. Follow the steps for creating your table.

To create a database using Hive DDL#

  1. Open the AWS Management Console for Athena.

  2. Choose Query Editor.

  3. Enter CREATE DATABASE myDataBase and choose Run Query.

    _images/createdatabase.png
  4. Select your database from the menu. It is likely to be an empty database.

    _images/catalogdashboard.png

To create a table using Hive DDL#

The Athena Query Editor displays the current database. If you create a table without further qualification, the database where the table is created is the one chosen in the Databases section on the Catalog tab.

  1. In the database that you created in To create a database using Hive DDL, create a table by entering the following statement and choosing Run Query:

    CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs (
        `Date` Date,
        Time STRING,
        Location STRING,
        Bytes INT,
        RequestIP STRING,
        Method STRING,
        Host STRING,
        Uri STRING,
        Status INT,
        Referrer STRING,
        OS String,
        Browser String,
        BrowserVersion String
    ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
    WITH SERDEPROPERTIES (
    "input.regex" = "^(?!#)([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+[^\(]+[\(]([^\;]+).*\%20([^\/]+)[\/](.*)$"
    ) LOCATION 's3://athena-examples/cloudfront/plaintext/';
    
  2. If the table was successfully created, you can then run queries against your data. For more information, see Getting Started.