Apache Iceberg tables in AWS Clean Rooms - AWS Clean Rooms

Apache Iceberg tables in AWS Clean Rooms

Apache Iceberg is an open source table format for data lakes. AWS Clean Rooms can use the statistics stored in Apache Iceberg metadata to optimize query plans and reduce file scans during clean room query processing. For more information, see the Apache Iceberg documentation.

Consider the following when using AWS Clean Rooms with Iceberg tables:

  • Tables within the AWS Glue Data Catalog only – Apache Iceberg tables must be defined in the AWS Glue Data Catalog based on the open source glue catalog implementation.

  • Parquet file format – AWS Clean Rooms only supports Iceberg tables in the Parquet data file format.

  • GZIP and Snappy compression – AWS Clean Rooms supports Parquet with GZIP and Snappy compression.

  • Iceberg versions – AWS Clean Rooms supports running queries against version 1 and version 2 Iceberg tables.

  • Partitions – You don't need to manually add partitions for your Apache Iceberg tables in AWS Glue. AWS Clean Rooms detects new partitions in Apache Iceberg tables automatically and no manual operation is needed to update partitions in the table definition. Iceberg partitions appear as regular columns in the AWS Clean Rooms table schema and not separately as a partition key in the configured table schema.

  • Limitations

    • New Iceberg tables only

      Apache Iceberg tables converted from Apache Parquet tables are not supported.

    • Time travel queries

      AWS Clean Rooms does not support time travel queries with Apache Iceberg tables.

    • Athena engine version 2

      Iceberg tables created with Athena engine version 2 are not supported.

    • File formats

      Avro and Optimized Row Columnar (ORC) file formats are not supported.

    • Compression

      Zstandard (Zstd) compression for Parquet is not supported.

Supported data types for Iceberg tables

AWS Clean Rooms can query Iceberg tables that contain the following data types:

  • boolean

  • date

  • decimal

  • double

  • float

  • int

  • list

  • long

  • map

  • string

  • struct

  • timestamp without time zone

For more information about Iceberg data types, see the Schemas for Iceberg in the Apache Iceberg documentation.