Create Iceberg tables

Focus mode

Create Iceberg tables - Amazon Athena

Use a CREATE TABLE statement Use CREATE TABLE AS SELECT (CTAS)Use an AWS Glue crawler

To create an Iceberg table for use in Athena, you can use a CREATE TABLE statement as documented on this page, or you can use an AWS Glue crawler.

Use a CREATE TABLE statement

Athena creates Iceberg v2 tables. For the difference between v1 and v2 tables, see Format version changes in the Apache Iceberg documentation.

Athena CREATE TABLE creates an Iceberg table with no data. You can query a table from external systems such as Apache Spark directly if the table uses the Iceberg open source glue catalog. You do not have to create an external table.

Warning

Running CREATE EXTERNAL TABLE results in the error message External keyword not supported for table type ICEBERG.

To create an Iceberg table from Athena, set the 'table_type' table property to 'ICEBERG' in the TBLPROPERTIES clause, as in the following syntax summary.


CREATE TABLE
  [db_name.]table_name (col_name data_type [COMMENT col_comment] [, ...] )
  [PARTITIONED BY (col_name | transform, ... )]
  LOCATION 's3://amzn-s3-demo-bucket/your-folder/'
  TBLPROPERTIES ( 'table_type' ='ICEBERG' [, property_name=property_value] )

For information about the data types that you can query in Iceberg tables, see Supported data types for Iceberg tables in Athena.

Use partitions

To create Iceberg tables with partitions, use PARTITIONED BY syntax. Columns used for partitioning must be specified in the columns declarations first. Within the PARTITIONED BY clause, the column type must not be included. You can also define partition transforms in CREATE TABLE syntax. To specify multiple columns for partitioning, separate the columns with the comma (,) character, as in the following example.


CREATE TABLE iceberg_table (id bigint, data string, category string)
  PARTITIONED BY (category, bucket(16, id))
  LOCATION 's3://amzn-s3-demo-bucket/your-folder/'
  TBLPROPERTIES ( 'table_type' = 'ICEBERG' )

The following table shows the available partition transform functions.

Function	Description	Supported types
`year(ts)`	Partition by year	`date`, `timestamp`
`month(ts)`	Partition by month	`date`, `timestamp`
`day(ts)`	Partition by day	`date`, `timestamp`
`hour(ts)`	Partition by hour	`timestamp`
`bucket(N, col)`	Partition by hashed value mod `N` buckets. This is the same concept as hash bucketing for Hive tables.	`int`, `long`, `decimal`, `date`, `timestamp`, `string`, `binary`
`truncate(L, col)`	Partition by value truncated to `L`	`int`, `long`, `decimal`, `string`

Athena supports Iceberg's hidden partitioning. For more information, see Iceberg's hidden partitioning in the Apache Iceberg documentation.

This section describes table properties that you can specify as key-value pairs in the TBLPROPERTIES clause of the CREATE TABLE statement. Athena allows only a predefined list of key-value pairs in the table properties for creating or altering Iceberg tables. The following tables show the table properties that you can specify. For more information about the compaction options, see Optimize Iceberg tables in this documentation. If you would like Athena to support a specific open source table configuration property, send feedback to athena-feedback@amazon.com.

format

Description	File data format
Allowed property values	Supported file format and compression combinations vary by Athena engine version. For more information, see Use Iceberg table compression.
Default value	parquet

write_compression

Description	File compression codec
Allowed property values	Supported file format and compression combinations vary by Athena engine version. For more information, see Use Iceberg table compression.
Default value	Default write compression varies by Athena engine version. For more information, see Use Iceberg table compression.

optimize_rewrite_data_file_threshold

Description	Data optimization specific configuration. If there are fewer data files that require optimization than the given threshold, the files are not rewritten. This allows the accumulation of more data files to produce files closer to the target size and skip unnecessary computation for cost saving.
Allowed property values	A positive number. Must be less than 50.
Default value	5

optimize_rewrite_delete_file_threshold

Description	Data optimization specific configuration. If there are fewer delete files associated with a data file than the threshold, the data file is not rewritten. This allows the accumulation of more delete files for each data file for cost saving.
Allowed property values	A positive number. Must be less than 50.
Default value	2

vacuum_min_snapshots_to_keep

Description

Minimum number of snapshots to retain on a table's main branch.

This value takes precedence over the vacuum_max_snapshot_age_seconds property. If the minimum remaining snapshots are older than the age specified by vacuum_max_snapshot_age_seconds, the snapshots are kept, and the value of vacuum_max_snapshot_age_seconds is ignored.

Allowed property values A positive number.

Default value 1

vacuum_max_snapshot_age_seconds

Description	Maximum age of the snapshots to retain on the main branch. This value is ignored if the remaining minimum of snapshots specified by `vacuum_min_snapshots_to_keep` are older than the age specified. This table behavior property corresponds to the `history.expire.max-snapshot-age-ms` property in Apache Iceberg configuration.
Allowed property values	A positive number.
Default value	432000 seconds (5 days)

vacuum_max_metadata_files_to_keep

Description	The maximum number of previous metadata files to retain on the table's main branch.
Allowed property values	A positive number.
Default value	100

Specify table properties

format

Description	File data format
Allowed property values	Supported file format and compression combinations vary by Athena engine version. For more information, see Use Iceberg table compression.
Default value	parquet

write_compression

Description	File compression codec
Allowed property values	Supported file format and compression combinations vary by Athena engine version. For more information, see Use Iceberg table compression.
Default value	Default write compression varies by Athena engine version. For more information, see Use Iceberg table compression.

optimize_rewrite_data_file_threshold

Description	Data optimization specific configuration. If there are fewer data files that require optimization than the given threshold, the files are not rewritten. This allows the accumulation of more data files to produce files closer to the target size and skip unnecessary computation for cost saving.
Allowed property values	A positive number. Must be less than 50.
Default value	5

optimize_rewrite_delete_file_threshold

Description	Data optimization specific configuration. If there are fewer delete files associated with a data file than the threshold, the data file is not rewritten. This allows the accumulation of more delete files for each data file for cost saving.
Allowed property values	A positive number. Must be less than 50.
Default value	2

vacuum_min_snapshots_to_keep

Description

Minimum number of snapshots to retain on a table's main branch.

Allowed property values A positive number.

Default value 1

vacuum_max_snapshot_age_seconds

Description	Maximum age of the snapshots to retain on the main branch. This value is ignored if the remaining minimum of snapshots specified by `vacuum_min_snapshots_to_keep` are older than the age specified. This table behavior property corresponds to the `history.expire.max-snapshot-age-ms` property in Apache Iceberg configuration.
Allowed property values	A positive number.
Default value	432000 seconds (5 days)

vacuum_max_metadata_files_to_keep

Description	The maximum number of previous metadata files to retain on the table's main branch.
Allowed property values	A positive number.
Default value	100

Example CREATE TABLE statement

The following example creates an Iceberg table that has three columns.


CREATE TABLE iceberg_table (
  id int,
  data string,
  category string) 
PARTITIONED BY (category, bucket(16,id)) 
LOCATION 's3://amzn-s3-demo-bucket/iceberg-folder' 
TBLPROPERTIES (
  'table_type'='ICEBERG',
  'format'='parquet',
  'write_compression'='snappy',
  'optimize_rewrite_delete_file_threshold'='10'
)

Use CREATE TABLE AS SELECT (CTAS)

For information about creating an Iceberg table using the CREATE TABLE AS statement, see CREATE TABLE AS, with particular attention to the CTAS table properties section.

Use an AWS Glue crawler

You can use an AWS Glue crawler to automatically register your Iceberg tables into the AWS Glue Data Catalog. If you want to migrate from another Iceberg catalog, you can create and schedule an AWS Glue crawler and provide the Amazon S3 paths where the Iceberg tables are located. You can specify the maximum depth of the Amazon S3 paths that the AWS Glue crawler can traverse. After you schedule an AWS Glue crawler, the crawler extracts schema information and updates the AWS Glue Data Catalog with the schema changes every time it runs. The AWS Glue crawler supports schema merging across snapshots and updates the latest metadata file location in the AWS Glue Data Catalog. For more information, see Data Catalog and crawlers in AWS Glue.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Query Iceberg tables

Query Iceberg table data

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

Create Iceberg tables

Use a CREATE TABLE statement

Warning

Use partitions

Specify table properties

Example CREATE TABLE statement

Use CREATE TABLE AS SELECT (CTAS)

Use an AWS Glue crawler

On this page

Did this page help you?

Next topic:

Previous topic:

Need help?