Using Data Lake frameworks with AWS Glue Studio
Overview
Open source data lake frameworks simplify incremental data processing for files stored in data lakes built on Amazon S3. AWS Glue3.0 and later supports the following open-source data lake storage frameworks:
-
Apache Hudi
-
Linux Foundation Delta Lake
-
Apache Iceberg
AWS Glue provides native support for these frameworks so that you can read and write data that you store in Amazon S3 in a transactionally consistent manner. There's no need to install a separate connector or complete extra configuration steps in order to use these frameworks in AWS Glue jobs.
Data Lake frameworks can be used as a source or a target within AWS Glue Studio through Spark Script Editor jobs. For more information on using Apache Hudi, Apache Iceberg and Delta Lake see: Using data lake frameworks with AWS Glue ETL jobs.