Limitations - AWS Glue

Limitations

Consider the following limitations before you use data lake frameworks with AWS Glue.

  • The following AWS Glue GlueContext methods for DynamicFrame don't support reading and writing data lake framework tables. Use the GlueContext methods for DataFrame or Spark DataFrame API instead.

    • The following GlueContext methods for DynamicFrame are not supported with Lake Formation permission control:

      • create_dynamic_frame.from_catalog

      • write_dynamic_frame.from_catalog

      • getDynamicFrame

      • writeDynamicFrame

    • The following GlueContext methods for DataFrame are supported with Lake Formation permission control:

      • create_data_frame.from_catalog

      • write_data_frame.from_catalog

      • getDataFrame

      • writeDataFrame

  • Grouping small files is not supported.

  • Job bookmarks are not supported.

  • Apache Hudi 0.10.1 for AWS Glue 3.0 doesn't support Hudi Merge on Read (MoR) tables.

  • ALTER TABLE … RENAME TO is not available for Apache Iceberg 0.13.1 for AWS Glue 3.0.

Limitations for data lake format tables managed by Lake Formation permissions

The data lake formats are integrated with AWS Glue ETL via Lake Formation permissions. Creating a DynamicFrame using create_dynamic_frame is not supported. For more information, see the following examples:

Note

The integration with AWS Glue ETL via Lake Formation permissions for Apache Hudi, Apache Iceberg, and Delta Lake is supported only in AWS Glue version 4.0.

Apache Iceberg has the best integration with AWS Glue ETL via Lake Formation permissions. It supports almost all operations and includes SQL support.

Hudi supports most basic operations with the exception of administrative operations. This is because these options generally are done via writing of dataframes and specified via additional_options. You need to use AWS Glue APIs to create DataFrames for your operations as SparkSQL is not supported.

Delta Lake only supports the reading and appending and overwriting of table data. Delta Lake requires the use of their own libraries to be able to perform various tasks such as updates.

The following features are not available for Iceberg tables managed by Lake Formation permissions.

  • Compaction using AWS Glue ETL

  • Spark SQL support via AWS Glue ETL

The following are limitations of Hudi tables managed by Lake Formation permissions:

  • Removal of orphaned files

The following are limitations of Delta Lake tables managed by Lake Formation permissions:

  • All features other than inserting and reading from Delta Lake tables.