Querying Delta Lake tables
Delta LakeMSCK REPAIR
statement.
The Delta Lake format stores the minimum and maximum values per column of each data file. The Athena implementation makes use of this information to enable file-skipping on predicates to eliminate unwanted files from consideration.
Considerations and limitations
Delta Lake support in Athena has the following limitations:
-
Tables with AWS Glue catalog only – Native Delta Lake support is supported only through tables registered with AWS Glue. If you have a Delta Lake table that is registered with another metastore, you can still keep it and treat it as your primary metastore. Because Delta Lake metadata is stored in the file system (for example, in Amazon S3) rather than in the metastore, Athena requires only the location property in AWS Glue to read from your Delta Lake tables.
-
V3 engine only – Delta Lake queries are supported only on Athena engine version 3. You must ensure that the workgroup you create is configured to use Athena engine version 3.
-
No time travel support – There is no support for queries that use Delta Lake’s time travel capabilities.
-
Read only – Write DML statements like
UPDATE
,INSERT
, orDELETE
are not supported. -
Lake Formation support – Lake Formation integration is not available for native Delta Lake tables. Table, column, row, and cell-level permissions are supported for symlink tables. For more information, see Using AWS Lake Formation with Amazon Athena and Set up permissions for a Delta Lake table in the AWS Lake Formation Developer Guide.
-
Limited DDL support – The following DDL statements are supported:
CREATE EXTERNAL TABLE
,SHOW COLUMNS
,SHOW TBLPROPERTIES
,SHOW PARTITIONS
,SHOW CREATE TABLE
, andDESCRIBE TABLE
. For information on using theCREATE EXTERNAL TABLE
statement, see the Getting started section.
Supported non-partition column data types
For non-partition columns, all data types that Athena supports except CHAR
are supported (CHAR
is not supported in the Delta Lake protocol itself).
Supported data types include:
boolean tinyint smallint integer bigint double float decimal varchar string binary date timestamp array map struct
Supported partition column data types
For partition columns, Athena supports tables with the following data types:
boolean integer smallint tinyint bigint decimal float double date timestamp varchar
For more information about the data types in Athena, see Data types in Amazon Athena.
Getting started
To be queryable, your Delta Lake table must exist in AWS Glue. If your table is in Amazon S3
but not in AWS Glue, run a CREATE EXTERNAL TABLE
statement using the following
syntax. If your table already exists in AWS Glue (for example, because you are using Apache
Spark or another engine with AWS Glue), you can skip this step.
CREATE EXTERNAL TABLE [
db_name
.]table_name
LOCATION 's3://DOC-EXAMPLE-BUCKET
/your-folder
/' TBLPROPERTIES ('table_type' = 'DELTA')
Note the omission of column definitions, SerDe library, and other table properties. Unlike traditional Hive tables, Delta Lake table metadata are inferred from the Delta Lake transaction log and synchronized directly to AWS Glue.
Note
For Delta Lake tables, CREATE TABLE
statements that include more
than the LOCATION
and table_type
property are not
allowed.
Reading Delta Lake tables
To query a Delta Lake table, use standard SQL SELECT
syntax:
[ WITH with_query [, ...] ]SELECT [ ALL | DISTINCT ] select_expression [, ...] [ FROM from_item [, ...] ] [ WHERE condition ] [ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ] [ HAVING condition ] [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ] [ ORDER BY expression [ ASC | DESC ] [ NULLS FIRST | NULLS LAST] [, ...] ] [ OFFSET count [ ROW | ROWS ] ] [ LIMIT [ count | ALL ] ]
For more information about SELECT
syntax, see SELECT in the Athena documentation.
The Delta Lake format stores the minimum and maximum values per column of each data file. Athena makes use of this information to enable file skipping on predicates to eliminate unnecessary files from consideration.
Synchronizing Delta Lake metadata
Athena synchronizes table metadata, including schema, partition columns, and table
properties, to AWS Glue if you use Athena to create your Delta Lake table. As time passes,
this metadata can lose its synchronization with the underlying table metadata in the
transaction log. To keep your table up to date, you can use the AWS Glue crawler for Delta
Lake tables. For more information, see Introducing native Delta Lake table support with AWS Glue crawlers