Creating Apache Iceberg tables
AWS Lake Formation supports creating Apache Iceberg tables that use the Apache Parquet data format
in the AWS Glue Data Catalog with data residing in Amazon S3. A table in the Data Catalog is the metadata
definition that represents the data in a data store. By default, Lake Formation creates Iceberg v2
tables. For the difference between v1 and v2 tables, see Format version
changes
Apache Iceberg
You can use Lake Formation console or the CreateTable
operation in the AWS Glue API to create an Iceberg table in the Data Catalog.
For more information, see CreateTable action (Python: create_table).
When you create an Iceberg table in the Data Catalog, you must specify the table format and metadata file path in Amazon S3 to be able to perform reads and writes.
You can use Lake Formation to secure your Iceberg table using fine-grained access control permissions when you register the Amazon S3 data location with AWS Lake Formation. For source data in Amazon S3 and metadata that is not registered with Lake Formation, access is determined by IAM permissions policies for Amazon S3 and AWS Glue actions. For more information, see Managing Lake Formation permissions.
Note
Data Catalog doesn’t support creating partitions and adding Iceberg table properties.
Prerequisites
To create Iceberg tables in the Data Catalog, and set up Lake Formation data access permissions, you need to complete the following requirements:
-
Permissions required to create Iceberg tables without the data registered with Lake Formation.
In addition to the permissions required to create a table in the Data Catalog, the table creator requires the following permissions:
s3:PutObject
on resource arn:aws:s3:::{bucketName}-
s3:GetObject
on resource arn:aws:s3:::{bucketName} -
s3:DeleteObject
on resource arn:aws:s3:::{bucketName}
-
Permissions required to create Iceberg tables with data registered with Lake Formation:
To use Lake Formation to manage and secure the data in your data lake, register your Amazon S3 location that has the data for tables with Lake Formation. This is so that Lake Formation can vend credentials to AWS analytical services such as Athena, Redshift Spectrum, and Amazon EMR to access data. For more information on registering an Amazon S3 location, see Adding an Amazon S3 location to your data lake.
A principal who reads and writes the underlying data that is registered with Lake Formation requires the following permissions:
-
lakeformation:GetDataAccess
-
DATA_LOCATION_ACCESS
A principal who has data location permissions on a location also has location permissions on all child locations.
For more information on data location permissions, see Underlying data access control.
-
To enable compaction, the service needs to assume an IAM role that has permissions to update tables in the Data Catalog. For details, see Table optimization prerequisites.
Creating an Iceberg table
You can create Iceberg v1 and v2 tables using Lake Formation console or AWS Command Line Interface as documented on this page. You can also create Iceberg tables using AWS Glue console or AWS Glue crawler. For more information, see Data Catalog and Crawlers in the AWS Glue Developer Guide.
To create an Iceberg table