Get started with Amazon S3 Tables in Amazon SageMaker Unified Studio
Amazon SageMaker Unified Studio provides integrated support for S3 Tables, allowing you to create S3 table buckets and Apache Iceberg tables in those buckets.
Amazon S3 Tables provide S3 storage that’s optimized for analytics workloads, with built-in Apache Iceberg support and features designed to continuously improve query performance and reduce storage costs for tables. Data in S3 Tables is stored in table buckets, which are specialized buckets for storing tabular data. For more information, see Working with Amazon S3 Tables and table buckets.
You can begin working with S3 Tables directly by creating an S3 table bucket as a new data source within Amazon SageMaker Unified Studio.
Integrating S3 with AWS analytics services through Amazon SageMaker Unified Studio
Amazon S3 table buckets integrate with AWS Glue Data Catalog and AWS Lake Formation to allow AWS analytics services to automatically discover and access your table data. For more information, see Integrating Amazon S3 Tables with AWS analytics services.
If you've never used S3 Tables before in the current Region, you can allow Amazon SageMaker to enable the S3 Tables analytics integration when you create a new S3 Tables catalog in the Amazon SageMaker Unified Studio console.
When you allow Amazon SageMaker Unified Studio to perform the integration, Amazon SageMaker takes the following actions on your behalf in your account:
-
Creates a new AWS AWS Identity and Access Management (IAM) service role that gives Lake Formation access to all your tables and table buckets in your current Region. This allows Lake Formation to manage access, permissions, and governance for all current and future table buckets in that Region.
-
Creates the
S3tablescatalog
in the AWS Glue Data Catalog in your current Region without privileged access. -
Adds the Amazon Redshift service role (
AWSServiceRoleForRedshift
) as a Lake Formation Read-only administrator. This allows Amazon Redshift to automatically mount all tables in S3 table buckets in the Region.
Note
Integration will be performed in the current Region only.
Prerequisites
-
Create a Amazon SageMaker domain and project. For more information, see Setting up Amazon SageMaker.
Creating S3 Tables catalogs in Amazon SageMaker Unified Studio
To get started using S3 Tables in Amazon SageMaker Unified Studio you create a new Lakehouse catalog with S3 table bucket source using the following steps.
-
Open the Amazon SageMaker at https://console.aws.amazon.com/sagemaker/
and use the Region selector in the top navigation bar to choose the appropriate AWS Region. -
Select your Amazon SageMaker domain.
-
Select the project you want to create a table bucket in.
-
In the navigation menu select Data, then select + to add a new data source.
-
select Create Lakehouse catalog.
-
In the add catalog menu, choose S3 Tables as the source.
-
Enter a name for the catalog, and a database name.
-
Choose Create catalog. This creates the following resources in your account:
-
A new S3 Table bucket and the corresponding AWS Glue child catalog under the parent catalog
s3tablescatalog
. -
A new database within that AWS Glue child catalog. The database name will match the database name you provided. In S3 tables, this is the table namespace.
-
-
Begin creating tables in your database and querying them using query editor or Jupyter notebook.
Creating and Querying S3 Tables
After you add an S3 Tables catalog it can be queried as s3tablescatalog/
. You can begin creating S3 tables in the catalog and querying them in Amazon SageMaker Unified Studio with the Query editor and Jupyterlab.your-bucket-name
Note
You can only create S3 tables in Amazon SageMaker Unified Studio with Athena engine or Spark. Once created, you can query tables with Athena, Amazon Redshift, or Spark.