Migration options summary - AWS Prescriptive Guidance

Migration options summary

This table summarizes the main characteristics and considerations for each migration option.

Feature

In-place migration

snapshot

In-place migration

migrate

Full data migration

CTAS or (CREATE TABLE + INSERT)

Data layout improvements as part of the migration process

  • Re-sort data

No

No

Yes

  • Change partitioning (for example, to use Iceberg hidden partitioning)

No

No

Yes

  • Change table schema

No

No

Yes

  • Optimize file size

No

No

Yes

  • Validate the schema of existing data before adding the data

No

No

Yes

Supported file formats

Parquet, Avro, ORC

Parquet, Avro, ORC

Parquet, Avro, ORC, JSON, CSV

Source table replacement by an Iceberg table

No

(creates a new table,  but with additional steps you can replace the source table)

Yes

(creates a backup table and substitutes the source table with an Iceberg table)

No

(creates a new table)

Source table impact

  • File deletion operations on Iceberg table (expire_snapshot operations, dropping a table with purge)

Corrupts source table

Corrupts backup table

Safe, source unaffected

Iceberg table impact

  • Impact if source table files are removed

Corrupts Iceberg table

Corrupts Iceberg table

No impact on Iceberg table

  • Impact if new files are added on source table location

Not visible on new table

(need to incorporate partition with add_files)

Not visible on new table

(need to incorporate partition with add_files)

Not visible on new table

(need to INSERT INTO the new table)

Cost

Low

Low

Higher (full data rewrite)

Migration speed

Fast

Fast

Slower

Can be used to migrate to Amazon S3 Tables

No

No

Yes

Requires manual DDL

No

(schema and partitions are copied from source table)

No

(schema and partitions are copied from source table)

If using CTAS, requires only specifying the partitioning

Best use

Quick migration without rewriting data, allowing side-by-side use of Hive and Iceberg for testing or gradual transition.

Replacing a Hive table in place without rewriting data, when an immediate switchover is acceptable.

Full Iceberg optimization with data rewrite. Ideal when redesigning partitions or schema, or improving layout and performance. Always recommended if possible.