Enabling Iceberg Example: Write Iceberg Example: Read Iceberg Example: Insert into an Iceberg table Example: Read an Iceberg table using Spark Example: Read and write Iceberg table with Lake Formation permission control

Using the Iceberg framework in AWS Glue

AWS Glue 3.0 and later supports the Apache Iceberg framework for data lakes. Iceberg provides a high-performance table format that works just like a SQL table. This topic covers available features for using your data in AWS Glue when you transport or store your data in an Iceberg table. To learn more about Iceberg, see the official Apache Iceberg documentation.

You can use AWS Glue to perform read and write operations on Iceberg tables in Amazon S3, or work with Iceberg tables using the AWS Glue Data Catalog. Additional operations including insert and all Spark Queries Spark Writes are also supported. Update is not supported for Iceberg tables.

Note

ALTER TABLE … RENAME TO is not available for Apache Iceberg 0.13.1 for AWS Glue 3.0.

The following table lists the version of Iceberg included in each AWS Glue version.

AWS Glue version	Supported Iceberg version
5.0	1.6.1
4.0	1.0.0
3.0	0.13.1

To learn more about the data lake frameworks that AWS Glue supports, see Using data lake frameworks with AWS Glue ETL jobs.

Enabling the Iceberg framework

To enable Iceberg for AWS Glue, complete the following tasks:

Specify iceberg as a value for the --datalake-formats job parameter. For more information, see Using job parameters in AWS Glue jobs.
Create a key named --conf for your AWS Glue job, and set it to the following value. Alternatively, you can set the following configuration using SparkConf in your script. These settings help Apache Spark correctly handle Iceberg tables.
```
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions 
--conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog 
--conf spark.sql.catalog.glue_catalog.warehouse=s3://<your-warehouse-dir>/ 
--conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog 
--conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO
```
If you are reading or writing to Iceberg tables that are registered with Lake Formation, follow the guidance in Using AWS Glue with AWS Lake Formation for fine-grained access control in AWS Glue 5.0 and later. In AWS Glue 4.0, add the following configuration to enable Lake Formation support.
```
--conf spark.sql.catalog.glue_catalog.glue.lakeformation-enabled=true
--conf spark.sql.catalog.glue_catalog.glue.id=<table-catalog-id>
```
If you use AWS Glue 3.0 with Iceberg 0.13.1, you must set the following additional configurations to use Amazon DynamoDB lock manager to ensure atomic transaction. AWS Glue 4.0 or later uses optimistic locking by default. For more information, see Iceberg AWS Integrations in the official Apache Iceberg documentation.
```
--conf spark.sql.catalog.glue_catalog.lock-impl=org.apache.iceberg.aws.glue.DynamoLockManager 
--conf spark.sql.catalog.glue_catalog.lock.table=<your-dynamodb-table-name>
```

Using a different Iceberg version

To use a version of Iceberg that AWS Glue doesn't support, specify your own Iceberg JAR files using the --extra-jars job parameter. Do not include iceberg as a value for the --datalake-formats parameter.

Enabling encryption for Iceberg tables

Note

Iceberg tables have their own mechanisms to enable server-side encryption. You should enable this configuration in addition to AWS Glue's security configuration.

To enable server-side encryption on Iceberg tables, review the guidance from the Iceberg documentation.

Example: Write an Iceberg table to Amazon S3 and register it to the AWS Glue Data Catalog

This example script demonstrates how to write an Iceberg table to Amazon S3. The example uses Iceberg AWS Integrations to register the table to the AWS Glue Data Catalog.

Alternatively, you can write an Iceberg table to Amazon S3 and the Data Catalog using Spark methods.

Prerequisites: You will need to provision a catalog for the Iceberg library to use. When using the AWS Glue Data Catalog, AWS Glue makes this straightforward. The AWS Glue Data Catalog is pre-configured for use by the Spark libraries as glue_catalog. Data Catalog tables are identified by a databaseName and a tableName. For more information about the AWS Glue Data Catalog, see Data discovery and cataloging in AWS Glue.

If you are not using the AWS Glue Data Catalog, you will need to provision a catalog through the Spark APIs. For more information, see Spark Configuration in the Iceberg documentation.

This example writes an Iceberg table to Amazon S3 and the Data Catalog using Spark.

Example: Read an Iceberg table from Amazon S3 using the AWS Glue Data Catalog

This example reads the Iceberg table that you created in Example: Write an Iceberg table to Amazon S3 and register it to the AWS Glue Data Catalog.

Example: Insert a `DataFrame` into an Iceberg table in Amazon S3 using the AWS Glue Data Catalog

This example inserts data into the Iceberg table that you created in Example: Write an Iceberg table to Amazon S3 and register it to the AWS Glue Data Catalog.

Note

This example requires you to set the --enable-glue-datacatalog job parameter in order to use the AWS Glue Data Catalog as an Apache Spark Hive metastore. To learn more, see Using job parameters in AWS Glue jobs.

Example: Read an Iceberg table from Amazon S3 using Spark

If you are not using the AWS Glue Data Catalog, you will need to provision a catalog through the Spark APIs. For more information, see Spark Configuration in the Iceberg documentation.

This example reads an Iceberg table in Amazon S3 from the Data Catalog using Spark.

Example: Read and write Iceberg table with Lake Formation permission control

This example reads and writes an Iceberg table with Lake Formation permission control.

Note

This example works only in AWS Glue 4.0. In AWS Glue 5.0 and later, follow the guidance in Using AWS Glue with AWS Lake Formation for fine-grained access control.

Create an Iceberg table and register it in Lake Formation:
1. To enable Lake Formation permission control, you’ll first need to register the table Amazon S3 path on Lake Formation. For more information, see Registering an Amazon S3 location. You can register it either from the Lake Formation console or by using the AWS CLI:
```
aws lakeformation register-resource --resource-arn arn:aws:s3:::<s3-bucket>/<s3-folder> --use-service-linked-role --region <REGION>
```
  Once you register an Amazon S3 location, any AWS Glue table pointing to the location (or any of its child locations) will return the value for the IsRegisteredWithLakeFormation parameter as true in the GetTable call.
2. Create an Iceberg table that points to the registered path through Spark SQL:
  
  Note
  The following are Python examples.
```
dataFrame.createOrReplaceTempView("tmp_<your_table_name>")

query = f"""
CREATE TABLE glue_catalog.<your_database_name>.<your_table_name>
USING iceberg
AS SELECT * FROM tmp_<your_table_name>
"""
spark.sql(query)
```
  You can also create the table manually through AWS Glue CreateTable API. For more information, see Creating Apache Iceberg tables.
  
  Note
  The UpdateTable API does not currently support Iceberg table format as an input to the operation.
Grant Lake Formation permission to the job IAM role. You can either grant permissions from the Lake Formation console, or using the AWS CLI. For more information, see: https://docs.aws.amazon.com/lake-formation/latest/dg/granting-table-permissions.html

Read an Iceberg table registered with Lake Formation. The code is same as reading a non-registered Iceberg table. Note that your AWS Glue job IAM role needs to have the SELECT permission for the read to succeed.


# Example: Read an Iceberg table from the AWS Glue Data Catalog
from awsglue.context import GlueContextfrom pyspark.context import SparkContext

sc = SparkContext()
glueContext = GlueContext(sc)

df = glueContext.create_data_frame.from_catalog(
    database="<your_database_name>",
    table_name="<your_table_name>",
    additional_options=additional_options
)

Write to an Iceberg table registered with Lake Formation. The code is same as writing to a non-registered Iceberg table. Note that your AWS Glue job IAM role needs to have the SUPER permission for the write to succeed.
```
glueContext.write_data_frame.from_catalog(
    frame=dataFrame,
    database="<your_database_name>",
    table_name="<your_table_name>",
    additional_options=additional_options
)
```

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Delta Lake

Data Catalog support for Spark SQL jobs

Using the Iceberg framework in AWS Glue

Note

Enabling the Iceberg framework

Note

Example: Write an Iceberg table to Amazon S3 and register it to the AWS Glue Data Catalog

Example: Read an Iceberg table from Amazon S3 using the AWS Glue Data Catalog

Example: Insert a DataFrame into an Iceberg table in Amazon S3 using the AWS Glue Data Catalog

Note

Example: Read an Iceberg table from Amazon S3 using Spark

Example: Read and write Iceberg table with Lake Formation permission control

Note

Note

Note

Example: Insert a `DataFrame` into an Iceberg table in Amazon S3 using the AWS Glue Data Catalog