Use an Iceberg cluster with Trino
Starting with Amazon EMR version 6.6.0, you can use Iceberg with your Trino cluster.
In this tutorial, you use the AWS CLI to work with Iceberg on an Amazon EMR Trino
cluster. To use the console to create a cluster with Iceberg installed, follow the
steps in Build an Apache Iceberg data lake using Amazon Athena, Amazon EMR, and
AWS Glue
Create an Iceberg cluster
To use Iceberg on Amazon EMR with the AWS CLI, first create a cluster with the following steps. For information on specifying the Iceberg classification using the AWS CLI, see Supply a configuration using the AWS CLI when you create a cluster or Supply a configuration using the Java SDK when you create a cluster.
-
Create an
iceberg.properties
file and set a value for your chosen catalog. For example, if you want to use the Hive metastore as your catalog, your file should have the following content.connector.name=iceberg hive.metastore.uri=thrift://localhost:9083
If you want to use the AWS Glue Data Catalog as your store, your file should have the following content.
connector.name=iceberg iceberg.catalog.type=glue
-
Create a bootstrap action that copies
iceberg.properties
from Amazon S3 to/etc/trino/conf/catalog/iceberg.properties
, as in the following example. For information on bootstrap actions, see Create bootstrap actions to install additional software.set -ex sudo aws s3 cp s3://
DOC-EXAMPLE-BUCKET
/iceberg.properties /etc/trino/conf/catalog/iceberg.properties -
Create a cluster with the following configuration, replacing the example bootstrap actions script path and key name with your own.
aws emr create-cluster --release-label emr-6.7.0 \ --applications Name=Trino \ --region us-east-1 \ --name My_Trino_Iceberg_Cluster \ --bootstrap-actions '[{"Path":"s3://
DOC-EXAMPLE-BUCKET
","Name":"Add iceberg.properties"}]' \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=c3.4xlarge InstanceGroupType=CORE,InstanceCount=3,InstanceType=c3.4xlarge \ --use-default-roles \ --ec2-attributes KeyName=<key-name>
Initialize a Trino session for Iceberg
To initialize a Trino session, run the following command.
trino-cli --catalog iceberg
Write to an Iceberg table
Create and write to your table with the following SQL commands.
trino> SHOW SCHEMAS; trino> CREATE TABLE default.iceberg_table ( id int, data varchar, category varchar) WITH ( format = 'PARQUET', partitioning = ARRAY['category', 'bucket(id, 16)'], location = 's3://
DOC-EXAMPLE-BUCKET
/<prefix>') trino> INSERT INTO default.iceberg_table VALUES (1,'a','c1'), (2,'b','c2'), (3,'c','c3');
Read from a table for Iceberg
To read from your Iceberg table, run the following command.
trino> SELECT * from default.iceberg_table;