When you use Athena to read Apache Hudi tables, consider the following points.
-
Incremental queries – Athena does not support incremental queries.
-
CTAS – Athena does not support CTAS or INSERT INTO on Hudi data. If you would like Athena support for writing Hudi datasets, send feedback to
<athena-feedback@amazon.com>
.For more information about writing Hudi data, see the following resources:
-
Working with a Hudi dataset in the Amazon EMR Release Guide.
-
Writing Data
in the Apache Hudi documentation.
-
-
MSCK REPAIR TABLE – Using MSCK REPAIR TABLE on Hudi tables in Athena is not supported. If you need to load a Hudi table not created in AWS Glue, use ALTER TABLE ADD PARTITION.
-
Skipping S3 Glacier objects not supported – If objects in the Apache Hudi table are in an Amazon S3 Glacier storage class, setting the
read_restored_glacier_objects
table property tofalse
has no effect.For example, suppose you issue the following command:
ALTER TABLE
table_name
SET TBLPROPERTIES ('read_restored_glacier_objects' = 'false')For Iceberg and Delta Lake tables, the command produces the error
Unsupported table property key: read_restored_glacier_objects
. For Hudi tables, theALTER TABLE
command does not produce an error, but Amazon S3 Glacier objects are still not skipped. RunningSELECT
queries after theALTER TABLE
command continues to return all objects. -
Timestamp queries – Currently, queries that attempt to read timestamp columns in Hudi real time tables either fail or produce empty results. This limitation applies only to queries that read a timestamp column. Queries that include only non-timestamp columns from the same table succeed.
Failed queries return a message similar to the following:
GENERIC_INTERNAL_ERROR: class org.apache.hadoop.io.ArrayWritable cannot be cast to class org.apache.hadoop.hive.serde2.io.TimestampWritableV2 (org.apache.hadoop.io.ArrayWritable and org.apache.hadoop.hive.serde2.io.TimestampWritableV2 are in unnamed module of loader io.trino.server.PluginClassLoader @75c67992)