SQL commands - Amazon Redshift

Amazon Redshift will no longer support the creation of new Python UDFs starting Patch 198. Existing Python UDFs will continue to function until June 30, 2026. For more information, see the blog post .

SQL commands

Apache Iceberg tables in Amazon Redshift provide a powerful way to manage large analytic datasets in your data lake. These tables support ACID transactions, schema evolution, and time travel capabilities while maintaining high performance for analytics workloads. Using Apache Iceberg tables, you can efficiently organize and partition your data, control file formats and compression, and seamlessly integrate with other AWS services.

You can create partitioned and unpartitioned Iceberg tables using CREATE TABLE ... USING ICEBERG and CREATE TABLE ... USING ICEBERG AS SELECT commands. You can reference Iceberg tables using either external schema notation (external_schema.table_name) or three-part notation ("catalog_name".database_name.table_name). The examples in this section demonstrate both methods.

After you create a table, you can add data using standard INSERT commands. Keep in mind that while Amazon Redshift works with many Iceberg data types, you might need to convert some data formats when inserting information.

You can view Iceberg tables using SHOW TABLES command. If you want to remove a table from the AWS Glue Data Catalog, you can use the DROP TABLE command. Note that this only removes the table registration. The actual data will stay in storage until you delete it separately.

You can also modify existing data using DELETE, UPDATE, and MERGE commands. All other SQL statements, such as ALTER TABLE, are not yet supported on Iceberg tables.

It's possible for you to write into an Iceberg table that is not created by Amazon Redshift. However, there are some limitations:

  • The table must be an Iceberg v2 table.

  • The table must be using Parquet as default data format.

  • The table must not have metadata compression set to True.

  • The table must not enable Write-Audit-Publish (WAP).

The following sections demonstrate SQL syntax for creating, inserting, modifying, and managing Iceberg tables in Amazon Redshift.

CREATE TABLE

CREATE TABLE [IF NOT EXISTS] <external_schema>.<table_name> ( column_name data_type [, ...] ) USING ICEBERG [LOCATION 's3://your-bucket-name/prefix/'] [PARTITIONED BY [[column_name | transform_function]], ...] [TABLE PROPERTIES ('compression_type'='<compression_value>')]

You can also use three-part notation for S3 table buckets:

CREATE TABLE "<table_bucket_name>@s3tablescatalog".<database_name>.<table_name> ( column_name data_type [, ...] ) USING ICEBERG [PARTITIONED BY [[column_name | transform_function]], ...] [TABLE PROPERTIES ('compression_type'='<compression_value>')]

Note that <external_schema> must be an existing external schema name in which the external table will be created. For more information about how to create and manage external schemas, see CREATE EXTERNAL SCHEMA in the Amazon Redshift documentation.

The LOCATION clause defines the table location for this newly created Iceberg table. For Amazon S3 tables, LOCATION cannot be specified as the table location is determined by Amazon S3 tables catalog (s3tablescatalog).

In all other cases, LOCATION is required, and it should be an empty location, meaning there are no existing Amazon S3 objects sharing this same bucket and prefix. Note that the Amazon S3 bucket region must be in the same region as Amazon Redshift cluster.

However, AWS provides a method to replicate data from Iceberg tables stored in an AWS Glue Data Catalog in one AWS Region to a different AWS Region, which allows you to replicate the write to a different region. For more information, see Replicate data across AWS Regions.

PARTITIONED BY defines the Iceberg table partition. Amazon Redshift supports all Iceberg v2 partition transforms except for void. Here is the list of transforms that are supported:

  • identity

  • bucket[N]

  • truncate[W]

  • year

  • month

  • day

  • hour

For the full definitions of these transforms and the compatible data types, see Partition Transforms in the Apache Iceberg documentation.

The PARTITIONED BY supports multi-level partitioning. For example, you can run the following command:

CREATE TABLE ... USING ICEBERG LOCATION ... PARTITIONED BY (bucket(16, id), year(ship_date));

However, Amazon Redshift doesn't support using a single column in more than one transform. For example, the following syntax is not supported:

CREATE TABLE ... USING ICEBERG LOCATION ... PARTITIONED BY (bucket(16, ship_date), year(ship_date));

The TABLE PROPERTIES clause defines the extra table properties for this Iceberg table. The only table property we support is compression_type which defines the default Parquet data file compression. If this is not specified, snappy is used as the compression codec. The possible values for compression_type are: zstd, brotli, gzip, snappy, and uncompressed.

Note

CREATE TABLE ... LIKE ... is not supported for Iceberg tables. Iceberg tables also don't support column constraints and column attributes like RMS table does.

Alternatively, you can create and populate an Iceberg table in a single operation using CREATE TABLE AS SELECT:

CREATE TABLE AS SELECT

CREATE TABLE <external_schema>.<table_name> [( column_name[, ...] )] USING ICEBERG [LOCATION 's3://your-bucket-name/prefix/'] [PARTITIONED BY [[column_name | transform_function]], ...] [TABLE PROPERTIES ('compression_type'='<compression-value>')] AS SELECT query

You can also use three-part notation to create tables in auto-mounted catalogs:

CREATE TABLE "<catalog_name>".<database_name>.<table_name> [( column_name[, ...] )] USING ICEBERG [LOCATION 's3://your-bucket-name/prefix/'] [PARTITIONED BY [[column_name | transform_function]], ...] [TABLE PROPERTIES ('compression_type'='<compression-value>')] AS SELECT query

This is similar to the CREATE TABLE statement except that CREATE is followed by a SELECT statement to populate the table with SELECT query results.

The CREATE TABLE clause here no longer allows you to specify the data types as the column data types will be decided by the SELECT query.

If the SELECT query fails for any reason, this query will fail and the Iceberg table will not be created.

You can view the structure of your Iceberg tables using SHOW TABLE:

SHOW TABLE

SHOW TABLE <external_schema>.<table_name>

You can also use three-part notation with auto-mounted catalogs:

SHOW TABLE "<catalog_name>".<database_name>.<table_name>

SHOW TABLE displays the CREATE TABLE statement for Iceberg table. The command will show the appropriate results based on the type of the table. The following is an example of the SHOW TABLE output for Iceberg table:

CREATE TABLE my_schema.items (id int, price decimal(5, 2)) USING ICEBERG LOCATION 's3://my_s3_bucket/items/' PARTITIONED BY (bucket(16, id)) TABLE PROPERTIES ('compression_type'='snappy')
Note

For Amazon S3 tables, since the table location is managed by Amazon S3 tables catalog, the LOCATION clause will be omitted in the SHOW TABLE results.

After creating tables, you can add data using INSERT INTO:

INSERT INTO

INSERT INTO <external_schema>.<table_name> [(column_name [, ...])] VALUES (...) INSERT INTO <external_schema>.<table_name> [(column_name [, ...])] (SELECT query) -- Using three-part notation for S3 table buckets: INSERT INTO "<table_bucket_name>@s3tablescatalog".<database_name>.<table_name> [(column_name [, ...])] VALUES (...) INSERT INTO "<table_bucket_name>@s3tablescatalog".<database_name>.<table_name> [(column_name [, ...])] (SELECT query)

You can INSERT INTO existing Iceberg table using the above syntax. If VALUES clause is used, you provide the values for columns listed by column_name, or all columns if column_name part is omitted.

When data is inserted into partitioned table, new rows are distributed according to the predefined partition specification. If for any reason the SELECT query fails, the query will fail and no data will be inserted into the Iceberg table.

DELETE

The DELETE query for Iceberg table uses the existing DELETE syntax in the RMS table:

[ WITH [RECURSIVE] common_table_expression [, common_table_expression , ...] ] DELETE [ FROM ] iceberg_table [ { USING } table_name, ... ] [ WHERE condition ]

You can also use three-part notation for S3 table buckets:

[ WITH [RECURSIVE] common_table_expression [, common_table_expression , ...] ] DELETE [ FROM ] "<table_bucket_name>@s3tablescatalog".<database_name>.<table_name> [ { USING } table_name, ... ] [ WHERE condition ]

The iceberg_table can be referenced using the <external_schema>.<external_table_name> form, or use the 3-part notation for auto mounted catalog. See Referencing Iceberg tables in Amazon Redshift.

The table_name in the USING clause will be used to join with the target table for deleting rows that are satisfied with the WHERE condition. The table_name can be an Iceberg table or a Amazon Redshift RMS table.

Since Iceberg uses hidden partition scheme, user can use DELETE query to remove partitions, achieving the same effect as ALTER TABLE ... DROP PARTITION ... for Hive tables.

For example, when we have partitioned Iceberg table like below:

CREATE TABLE my_external_schema.lineitem (l_item_id int, l_ship_date varchar, ... ) USING ICEBERG LOCATION ... PARTITIONED BY l_ship_date;

Then we can easily remove a partition using query like this:

DELETE FROM my_external_schema.lineitem WHERE l_ship_date = '20251231';

For query like this, Amazon Redshift will optimize the execution to only conduct metadata only operation and short circuit the execution. Thus unlike normal DELETE query, the metadata only delete query doesn't show execution steps in EXPLAIN:

explain DELETE FROM my_external_schema.lineitem WHERE l_ship_date = '20251231'; QUERY PLAN ------------ "XN Seq Scan Metadata of my_external_schema.lineitem location: "s3://s3-path//table-location" format:ICEBERG (cost=0.00..0.01 rows=0 width=0)" (0 rows)

UPDATE

The UPDATE query syntax for Iceberg table is very similar to the existing UPDATE syntax for the RMS table:

[ WITH [RECURSIVE] common_table_expression [, common_table_expression , ...] ] UPDATE iceberg_table [ [ AS ] alias ] SET column = { expression } [,...] [ FROM fromlist ] [ WHERE condition ]

You can also use three-part notation for S3 table buckets:

[ WITH [RECURSIVE] common_table_expression [, common_table_expression , ...] ] UPDATE "<table_bucket_name>@s3tablescatalog".<database_name>.<table_name> [ [ AS ] alias ] SET column = { expression } [,...] [ FROM fromlist ] [ WHERE condition ]

The iceberg_table can be referenced using the <external_schema>.<external_table_name> form, or use the 3-part notation for auto mounted catalog. See Referencing Iceberg tables in Amazon Redshift.

You can update a table by referencing information in other tables. List these other tables in the FROM clause or use a subquery as part of the WHERE condition. The source tables can be either Iceberg tables or Amazon Redshift RMS tables.

UPDATE can also run on partitioned table. When UPDATE changes column values that belongs to current partition spec, the new updated row would be inserted into the new partition based on the newly updated value.

For example, when we have a partitioned Iceberg table like below:

CREATE TABLE my_external_schema.lineitem (l_item_id int, l_ship_date varchar, ... ) USING ICEBERG LOCATION ... PARTITIONED BY l_ship_date; INSERT INTO my_external_schema.lineitem VALUES (10099, '20251231', ...);

And when we run below update query:

UPDATE my_external_schema.lineitem SET l_ship_date = '20260101' WHERE l_item_id = 10099;

we will move this row with l_item_id 10099 from partition 20251231 to new partition 20260101.

It's also important to note that it's possible UPDATE has multiple candidate values. Consider below query:

CREATE TABLE my_ext_schema.t1(x1 int, y1 int) USING ICEBERG LOCATION ...; CREATE TABLE my_ext_schema.t2(x2 int, y2 int) USING ICEBERG LOCATION ...; INSERT INTO my_ext_schema.t1 VALUES (1,10), (2,20), (3,30); INSERT INTO my_ext_schema.t2 VALUES (2,40), (2,50); UPDATE my_ext_schema.t1 SET y1=y2 FROM my_ext_schema.t2 WHERE x1=x2;

In this case, y1 can be 40 or 50. The result is nondeterministic. You can set the configuration parameter error_on_nondeterministic_update to true to force query error when such case happens. This is consistent with the existing RMS table UPDATE behavior. For more, refer to error_on_nondeterministic_update.

MERGE

The MERGE query conditionally merges rows from a source table into a target table. It shares the same MERGE query syntax as the existing RMS table:

MERGE INTO target_iceberg_table USING source_table [ [ AS ] alias ] ON match_condition [ WHEN MATCHED THEN { UPDATE SET col_name = { expr } [,...] | DELETE } WHEN NOT MATCHED THEN INSERT [ ( col_name [,...] ) ] VALUES ( { expr } [, ...] ) | REMOVE DUPLICATES ]

You can also use three-part notation for S3 table buckets:

MERGE INTO "<table_bucket_name>@s3tablescatalog".<database_name>.<table_name> USING source_table [ [ AS ] alias ] ON match_condition [ WHEN MATCHED THEN { UPDATE SET col_name = { expr } [,...] | DELETE } WHEN NOT MATCHED THEN INSERT [ ( col_name [,...] ) ] VALUES ( { expr } [, ...] ) | REMOVE DUPLICATES ]

The target_iceberg_table can be referenced using the <external_schema>.<external_table_name> form, or use the 3-part notation for auto mounted catalog. See Referencing Iceberg tables in Amazon Redshift.

The source_table can be either an Iceberg table or a Amazon Redshift RMS table.

When REMOVE DUPLICATES is used, the MERGE command uses simplified mode. For more details about simplified mode, please refer to the original MERGE command document.

While executing MERGE query, Amazon Redshift generates and stores intermediate data files in the target table location. These files will be garbage collected at the end of the query. Because of this, MERGE query would require DELETE permission on Amazon S3 bucket in order to work properly. An insufficient permission error would throw if the garbage collection operation is failed. For Amazon S3 tables the garbage collection is managed by Amazon S3 tables service. Hence DELETE permission is not required to execute MERGE query.

DROP TABLE

To remove an Iceberg table from the catalog, use the DROP TABLE command:

DROP TABLE <external_schema>.<table_name>

You can also use three-part notation with auto-mounted catalogs:

DROP TABLE "<catalog_name>".<database_name>.<table_name>

Dropping an Iceberg table is a metadata only operation. It removes the table entry from AWS Glue Data Catalog and Amazon S3 table catalog, if this is an Amazon S3 table. Amazon Redshift doesn't clean up or delete any existing data file or metadata files under the table location. You can use features in AWS Glue and Amazon S3 tables to remove orphaned files. For AWS Glue, see Deleting orphan files. For Amazon S3 tables, see Table maintenance.