Key features in version 3 Version compatibility Getting started with version 3 Best practices for version 3 Migration strategy Compatibility considerations Troubleshooting Pricing Availability Additional resources

Working with Iceberg table format specification version 3

The latest version of the Apache Iceberg table format specification is version 3. This version introduces advanced capabilities for building petabyte-scale data lakes with improved performance and reduced operational overhead. It addresses common performance bottlenecks encountered with version 2, particularly around batch updates and compliance delete operations.

AWS provides support for deletion vectors and row lineage as defined in the Iceberg version 3 specification. These features are available with Apache Spark on the following AWS services.

AWS service	Version 3 support
Amazon EMR for Apache Spark	Amazon EMR release 7.12 and later
AWS Glue	Yes
AWS Glue: Iceberg REST API, table maintenance	Yes
Amazon SageMaker Unified Studio notebooks	Yes
Amazon S3 Tables: Iceberg REST API, table maintenance	Yes
Amazon Athena (Trino)	No

Key features in version 3

Deletion vectors replace the positional delete files that were used in version 2 with an efficient binary format stored as Puffin files. This eliminates write amplification from random batch updates and General Data Protection Regulation (GDPR) compliance deletes, and significantly reduces the overhead of maintaining fresh data. Organizations that process high-frequency updates will see immediate improvements in write performance and reduced storage costs from fewer small files.

Row lineage enables precise change tracking at the row level. Your downstream systems can process changes incrementally, speeding up data pipelines and reducing compute costs for change data capture (CDC) workflows. This built-in capability eliminates the need for custom change tracking implementations.

Version compatibility

Version 3 maintains backward compatibility with version 2 tables. AWS services support both version 2 and version 3 tables simultaneously, so you can:

Run queries across both version 2 and version 3 tables.
Upgrade existing version 2 tables to version 3 without data rewrites.
Run time travel queries that span version 2 and version 3 snapshots.
Use schema evolution and hidden partitioning across table versions.

Getting started with version 3

Prerequisites

Before working with version 3 tables, make sure that you have:

An AWS account with appropriate AWS Identity and Access Management (IAM) permissions.
Access to one or more AWS analytics services (Amazon EMR, AWS Glue, Amazon SageMaker Unified Studio notebooks, or Amazon S3 Tables).
An S3 bucket for storing table data and metadata.
A table bucket to get started with Amazon S3 Tables or a general-purpose S3 bucket if you are building your own Iceberg infrastructure.
A configured AWS Glue catalog.

Creating version 3 tables

Creating new tables

To create a new Iceberg version 3 table, set the format-version table property to 3.

Using Spark SQL:


CREATE TABLE IF NOT EXISTS myns.orders_v3 (
    order_id bigint,
    customer_id string,
    order_date date,
    total_amount decimal(10,2),
    status string,
    created_at timestamp
)
USING iceberg
TBLPROPERTIES (
    'format-version' = '3'
)

Upgrading version 2 tables to version 3

You can upgrade existing version 2 tables to version 3 atomically without rewriting data.

Using Spark SQL:


ALTER TABLE myns.existing_table
SET TBLPROPERTIES ('format-version' = '3')

Important

Version 3 is a one-way upgrade. After a table is upgraded from version 2 to version 3, it cannot be downgraded back to version 2 through standard operations.

What happens during upgrade:

A new metadata snapshot is created atomically.
Existing Parquet data files are reused.
Row lineage fields are added to the table metadata.

After the upgrade:

The next compaction will remove old version 2 delete files.
New modifications will use the version 3 deletion vector files.

The upgrade doesn’t perform a historical backfill of row lineage change tracking records.

Enabling deletion vectors

To take advantage of deletion vectors for updates, deletes, and merges, configure your write mode.

Using Spark SQL:


ALTER TABLE myns.orders_v3
SET TBLPROPERTIES ('format-version' = '3',
                   'write.delete.mode' = 'merge-on-read',
                   'write.update.mode' = 'merge-on-read',
                   'write.merge.mode' = 'merge-on-read'
                  )

These settings ensure that update, delete, and merge operations create deletion vector files instead of rewriting entire data files.

Using row lineage for change tracking

Version 3 automatically adds row lineage metadata fields to track changes.

Using Spark SQL:


# Query with parameter value provided
last_processed_sequence = 47

SELECT 
    id,
    data,
    _row_id,
    _last_updated_sequence_number
FROM myns.orders_v3
WHERE _last_updated_sequence_number > :last_processed_sequence

The _row_id field uniquely identifies each row, and _last_updated_sequence_number tracks when the row was last modified. Use these fields to:

Identify changed rows for incremental processing.
Track data lineage for compliance.
Optimize CDC pipelines.
Reduce compute costs by processing only changes.

Best practices for version 3

When to use version 3

Consider upgrading to, or starting with, version 3 when:

You perform frequent batch updates or deletes.
You need to meet GDPR or compliance delete requirements.
Your workloads involve high-frequency upserts.
You require efficient CDC workflows.
You want to reduce storage costs from small files.
You need better change tracking capabilities.

Optimizing write performance

Enable deletion vectors for update-heavy workloads:


SET TBLPROPERTIES (
'write.delete.mode' = 'merge-on-read',
'write.update.mode' = 'merge-on-read',
'write.merge.mode' = 'merge-on-read'
)

Configure appropriate file sizes:


SET TBLPROPERTIES (
'write.target-file-size-bytes' = '536870912'  — 512 MB
)

Optimizing read performance

Use row lineage for incremental processing.
Use time travel to access historical data without copying.
Enable statistics collection for better query planning.

Migration strategy

When you migrate from version 2 to version 3, follow these best practices:

Test in a non-production environment first to validate the upgrade process and performance.
Upgrade during low-activity periods to minimize impact on concurrent operations.
Monitor initial performance, and track metrics after the upgrade.
Run compaction to consolidate delete files after the upgrade.
Update your team documentation to reflect version 3 features.

Compatibility considerations

Engine versions – Make sure that all engines accessing the table support version 3.
Third-party tools – Verify your tool’s version 3 compatibility before you upgrade.
Backup strategy – Test snapshot-based recovery procedures.
Monitoring – Update monitoring dashboards for version 3-specific metrics.

Troubleshooting

Common issues

Error: "format-version 3 is not supported"

Verify that your engine version supports version 3. For specifics, see the table at the beginning of this section.
Check catalog compatibility.
Make sure that you’re using the latest versions of AWS services.

Performance degradation after upgrade

Verify that there are no compaction compaction failures. For more information, see Logging and monitoring for S3 Tables in the Amazon S3 documentation.

Confirm that deletion vectors are enabled. The following properties should be set:


SET TBLPROPERTIES (
'write.delete.mode' = 'merge-on-read',
'write.update.mode' = 'merge-on-read',
'write.merge.mode' = 'merge-on-read'
)

You can verify table properties with the following code:


DESCRIBE FORMATTED myns.orders_v3

Review your partition strategy. Over-partitioning can lead to small files. Run the following query to get the average file size for your table:
```
SELECT avg(file_size_in_bytes) as avg_file_size_bytes 
FROM myns.orders_v3.files
```

Incompatibility with third-party tools

Verify that the tool supports the version 3 specification.
Consider maintaining version 2 tables for unsupported tools.
Contact the tool vendor for their version 3 support timeline.

Getting help

For AWS service-specific issues, contact AWS Support.
To get help from the Iceberg community, use the Iceberg Slack channel.
For information about using AWS services to manage your analytics workloads, see Analytics on AWS.

Pricing

Availability

Iceberg table format specification version 3 support is available in all AWS Regions where Amazon EMR, AWS Glue, AWS Glue Data Catalog, and S3 Tables operate. For Region availability, see AWS services by Region.

Additional resources

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Working with Iceberg tables by using PyIceberg

Migrating existing tables to Iceberg