Register S3 table bucket catalogs and query Tables from Athena
Amazon S3 table buckets are a bucket type in Amazon S3 that is purpose-built to store tabular data in Apache Iceberg tables. Table buckets automate table management tasks such as compaction, snapshot management, and garbage collection to continuously optimize query performance and minimize cost. Whether you're just starting out, or have thousands of tables in your Iceberg environment, table buckets simplify data lakes at any scale. For more information, see Table buckets.
Considerations and limitations
-
All DDL operations supported for Iceberg tables are supported for S3 Tables with the following exceptions:
-
ALTER TABLE RENAME
,CREATE VIEW
, andALTER DATABASE
are not supported. -
OPTIMIZE
andVACUUM
– You can manage compaction and snapshot management in S3. For more information, see S3 Tables maintenance documentation.
-
-
DDL queries on S3 Tables registered as Athena data sources are not supported.
-
Query result reuse is not supported.
-
In workgroups with SSE-KMS, CSE-KMS encryption enabled, you can't run write operations like
INSERT
,UPDATE
,DELETE
, orMERGE
on S3 Tables. -
In workgroups with S3 Requester Pays option enabled, you can't run DML operations on S3 Tables.
Query S3 Tables from Athena
Complete these prerequisite steps before you query S3 Tables in Athena
Create an S3 table bucket. For more information, see Creating a table bucket in Amazon Simple Storage Service User Guide.
-
Make sure that the integration of your table buckets with AWS Glue Data Catalog and AWS Lake Formation is successful by following Prerequisites for integration and Integrating table buckets with AWS analytics services in Amazon Simple Storage Service User Guide.
Note
If you enabled the integration while creating an S3 table bucket from the S3 console in Step 1, then you can skip this step.
For the principal you use to run queries with Athena, grant Lake Formation permissions on the S3 Table catalog, either through the Lake Formation console or AWS CLI.
Submit queries for S3 Tables
Submit a
CREATE DATABASE
query from Athena with the above granted user/role. In this example,s3tablescatalog
is the parent Glue Data Catalog created from the integration ands3tablescatalog/
is the child Glue Data Catalog created for each S3 table bucket. There are two ways in which you can query.amzn-s3-demo-bucket
-
With the database that you created in previous step, use
CREATE TABLE
to create a table. The following example creates a table in the
database that you previously created in thetest_namespace
s3tablescatalog/
Glue catalog.amzn-s3-demo-bucket
-
Insert data into the table that you created in the previous step.
-
After inserting data into the table, you can query it.
Create S3 Tables in Athena
Athena supports creating tables in existing S3 Table namespaces or namespaces
created in Athena with CREATE DATABASE
statements. To create an S3 Table
from Athena, the syntax is the same as when you create a regular Iceberg table
except you don't specify the LOCATION
, as shown in the following
example.
CREATE TABLE [db_name.]table_name (col_name data_type [COMMENT col_comment] [, ...] ) [PARTITIONED BY (col_name | transform, ... )] [TBLPROPERTIES ([, property_name=property_value] )]
You can also create S3 Tables using CREATE TABLE AS SELECT (CTAS) statements. For more information, see CTAS for S3 Tables.
Register S3 table bucket catalogs as Athena data sources
To register S3 table bucket catalogs with the Athena console, perform the following steps.
-
Open the Athena console at https://console.aws.amazon.com/athena/
. -
In the navigation pane, choose Data sources and catalogs.
-
On the Data sources and catalogs page, choose Create data source.
-
For Choose a data source, choose Amazon S3 - AWS Glue Data Catalog.
-
In the AWS Glue Data Catalog section, for Data source account, choose AWS Glue Data Catalog in this account.
-
For Create a table or register a catalog, choose Register a new AWS Glue Catalog.
-
In the Data source details section, for Data source name, enter the name that you want to use to specify the data source in your SQL queries or use the default name that is generated.
-
For Catalog, choose Browse to search for a list of AWS Glue catalogs in the same account. If you don't see any existing catalogs, create one in AWS Glue console
. -
In the Browse AWS Glue catalogs dialog box, select the catalog that you want to use, and then choose Choose.
-
(Optional) For Tags, enter any key/value pairs that you want to associate with the data source.
-
Choose Next.
-
On the Review and create page, verify that the information that you entered is correct, and then choose Create data source.
CTAS for S3 Tables
Amazon Athena now supports CREATE TABLE AS SELECT (CTAS) operations for S3 Tables. This feature enables you to create new S3 Tables based on the results of a SELECT query.
When creating a CTAS query for an S3 Table, there are a few important differences compared to standard Athena tables:
-
You must omit the location property because S3 Tables automatically manage their own storage locations.
-
The
table_type
property defaults toICEBERG
, so you don't need to explicitly specify it in your query. -
If you don't specify a format, the system automatically uses
PARQUET
as the default format for your data. -
All other properties follow the same syntax as regular Iceberg tables.
Before you create S3 Tables using CTAS, ensure that you have the necessary permissions configured in AWS Lake Formation. Specifically, you need permissions to create tables in the S3 Tables catalog. Without these permissions, your CTAS operations will fail.
Note
If your CTAS query fails, you might have to delete your table using the S3
Tables API before attempting to re-run your query. you cannot use the Athena
DROP TABLE
statements to remove the table that was partially created by
the query.
Example
CREATE TABLE "s3tablescatalog/
amzn-s3-demo-bucket
"."namespace
"."s3-table-name
" WITH ( format = 'PARQUET' ) AS SELECT * FROMsource_table
;