Creating Databases and Tables
Amazon Athena uses Apache Hive data definition language (DDL) statements to define tables. You can run DDL statements using the Athena console, using a JDBC driver, or using the Athena create table wizard. When you create a new table schema in Athena, the schema is stored in a data catalog and used when executing queries, but it does not modify your data in Amazon S3.
Athena uses an approach known as schema-on-read, which means a schema is projected on to your data at the time you execute a query. This eliminates the need for data loading or transformation.
Athena uses Apache Hive to define tables and create databases, which are essentially a logical namespace of tables. When you create a database and table in Athena, you are simply describing the schema and where the table data are located in Amazon S3 for read-time querying. Database and table, therefore, have a slightly different meaning than they do for traditional relational database systems because the data isn't stored along with the schema definition for the database and table. When you query, you query the table using standard SQL and the data is read at that time. You can find guidance for how to create databases and tables using Apache Hive documentation, but the following provides guidance specifically for Athena.
Hive supports multiple data formats through the use of serializer-deserializer (SerDe) libraries. You can also define complex schemas using regular expressions. For a list of supported SerDes, see Supported Formats and SerDes.
The other benefit of using Hive is that the metastore found in Hive can be used in many other big data applications such as Spark, Hadoop, and Presto. The Athena catalog enables you to have this same Hive-compatible metastore in the cloud without needing to provision a cluster or RDS instance to host the metastore.
When you create a table, you specify an S3 bucket location for the underlying data using the
LOCATION clause. Consider the following:
- The data can be in a different region from the primary region where you run Athena. If this is the case, standard inter-region data transfer rates for Amazon S3 apply in addition to standard Athena charges.
- You must have the appropriate permissions to work with data in the Amazon S3 location. For more information, see Setting User and Amazon S3 Bucket Permissions.
- If the data is encrypted in Amazon S3, you must also have the appropriate permissions to decrypt the data. For more information, see Configuring Encryption Options.
- Athena does not support different storage classes within the bucket specified by the
LOCATIONclause, does not support the
GLACIERstorage class, and does not support Requester Pays buckets. For more information, see `Storage Classes <http://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html`_, Changing the Storage Class of an Object in |S3|, and Requester Pays Buckets in the Amazon Simple Storage Service Developer Guide.
The functions supported in Athena queries are those found within Presto. For more information, see Functions and Operators in the Presto documentation.
Athena does not support, for example,
CREATE TABLE AS SELECT, which creates a table from the result
of a SELECT query statement.
Athena does not currently support transaction-based operations on table data.
When you create, update, or delete tables, those operations are guaranteed ACID-compliant. For example, if multiple users or clients attempt to create or alter an existing table at the same time, only one will be successful.
If you use CREATE TABLE without the EXTERNAL keyword, you get an error; only tables with the EXTERNAL keyword can be created. We recommend that you always use the EXTERNAL keyword. When you drop a table in Athena, only the table metadata is removed; the data remains in Amazon S3.