Using Data Lake frameworks with AWS Glue Studio - AWS Glue Studio

Using Data Lake frameworks with AWS Glue Studio

Overview

Open source data lake frameworks simplify incremental data processing for files stored in data lakes built on Amazon S3. AWS Glue3.0 and later supports the following open-source data lake storage frameworks:

  • Apache Hudi

  • Linux Foundation Delta Lake

  • Apache Iceberg

AWS Glue provides native support for these frameworks so that you can read and write data that you store in Amazon S3 in a transactionally consistent manner. There's no need to install a separate connector or complete extra configuration steps in order to use these frameworks in AWS Glue jobs.

Data Lake frameworks can be used as a source or a target within AWS Glue Studio through Spark Script Editor jobs. For more information on using Apache Hudi, Apache Iceberg and Delta Lake see: Using data lake frameworks with AWS Glue ETL jobs.