Open source data lake frameworks simplify incremental data processing for files stored in data lakes built on Amazon S3. AWS Glue3.0 and later supports the following open-source data lake storage frameworks:

  • Apache Hudi

  • Linux Foundation Delta Lake

  • Apache Iceberg

AWS Glue provides native support for these frameworks so that you can read and write data that you store in Amazon S3 in a transactionally consistent manner. There's no need to install a separate connector or complete extra configuration steps in order to use these frameworks in AWS Glue jobs.

Data Lake frameworks can be used as a source or a target within AWS Glue Studio through Spark Script Editor jobs. For more information on using Apache Hudi, Apache Iceberg and Delta Lake see: Using data lake frameworks with AWS Glue ETL jobs.