Use non-Hive table formats in Athena for Spark - Amazon Athena

Use non-Hive table formats in Athena for Spark

When you work with sessions and notebooks in Athena for Spark, you can use Linux Foundation Delta Lake, Apache Hudi, and Apache Iceberg tables, in addition to Apache Hive tables.

Considerations and limitations

When you use table formats other than Apache Hive with Athena for Spark, consider the following points:

  • In addition to Apache Hive, only one table format is supported per notebook. To use multiple table formats in Athena for Spark, create a separate notebook for each table format. For information about creating notebooks in Athena for Spark, see Step 7: Create your own notebook.

  • The Delta Lake, Hudi, and Iceberg table formats have been tested on Athena for Spark by using AWS Glue as the metastore. You might be able to use other metastores, but such usage is not currently supported.

  • To use the additional table formats, override the default spark_catalog property, as indicated in the Athena console and in this documentation. These non-Hive catalogs can read Hive tables, in addition to their own table formats.

Table versions

The following table shows supported non-Hive table versions in Amazon Athena for Apache Spark.

Table format Supported version
Apache Iceberg 1.2.1
Apache Hudi 0.13
Linux Foundation Delta Lake 2.0.2

In Athena for Spark, these table format .jar files and their dependencies are loaded onto the classpath for Spark drivers and executors.

For an AWS Big Data Blog post that shows how to work with Iceberg, Hudi, and Delta Lake table formats using Spark SQL in Amazon Athena notebooks, see Use Amazon Athena with Spark SQL for your open-source transactional table formats.