Querying AWS Glue Data Catalog
Because AWS Glue Data Catalog is used by many AWS services as their central metadata repository, you might want to query Data Catalog metadata. To do so, you can use SQL queries in Athena. You can use Athena to query AWS Glue catalog metadata like databases, tables, partitions, and columns.
You can use individual hive DDL commands to extract metadata information for specific databases, tables, views, partitions, and columns from Athena, but the output is in a non-tabular format.
To obtain AWS Glue Catalog metadata, you query the information_schema
database
on the Athena backend. The example queries in this topic show how to use Athena to
query AWS Glue
Catalog metadata for common use cases.
You cannot use CREATE VIEW
to create a view on the
information_schema
database.
Topics
Listing Databases and Searching a Specified Database
The examples in this section show how to list the databases in metadata by schema name.
Example – Listing Databases
The following example query lists the databases from the
information_schema.schemata
table.
SELECT schema_name FROM information_schema.schemata LIMIT 10;
The following table shows sample results.
6 | alb-databas1 |
7 | alb_original_cust |
8 | alblogsdatabase |
9 | athena_db_test |
10 | athena_ddl_db |
Example – Searching a Specified Database
In the following example query, rdspostgresql
is a sample
database.
SELECT schema_name FROM information_schema.schemata WHERE schema_name = 'rdspostgresql'
The following table shows sample results.
schema_name | |
---|---|
1 | rdspostgresql |
Listing Tables in a Specified Database and Searching for a Table by Name
To list metadata for tables, you can query by table schema or by table name.
Example – Listing Tables by Schema
The following query lists tables that use the rdspostgresql
table
schema.
SELECT table_schema, table_name, table_type FROM information_schema.tables WHERE table_schema = 'rdspostgresql'
The following table shows a sample result.
table_schema | table_name | table_type | |
---|---|---|---|
1 | rdspostgresql | rdspostgresqldb1_public_account | BASE TABLE |
Example – Searching for a Table by Name
The following query obtains metadata information for the table
athena1
.
SELECT table_schema, table_name, table_type FROM information_schema.tables WHERE table_name = 'athena1'
The following table shows a sample result.
table_schema | table_name | table_type | |
---|---|---|---|
1 | default | athena1 | BASE TABLE |
Listing Partitions for a Specific Table
You can use SHOW PARTITIONS
to list
the partitions for a specified table, as in the following example.
table_name
SHOW PARTITIONS cloudtrail_logs_test2
You can also use a metadata query to list the partition numbers and partition values for a specific table. The syntax that you use depends on the Athena engine version.
Example – Querying the Partitions for a Table in Athena engine version 2
The following example query lists the partitions for the table
cloudtrail_logs_test2
using Athena engine version 2.
SELECT * FROM default."cloudtrail_logs_test2$partitions" ORDER BY partition_number
The following table shows sample results.
table_catalog | table_schema | table_name | year | month | day | |
---|---|---|---|---|---|---|
1 | awsdatacatalog | default | cloudtrail_logs_test2 | 2020 | 08 | 10 |
2 | awsdatacatalog | default | cloudtrail_logs_test2 | 2020 | 08 | 11 |
3 | awsdatacatalog | default | cloudtrail_logs_test2 | 2020 | 08 | 12 |
Example – Querying the Partitions for a Table in Athena engine version 1
The following example query lists the partitions for the table
cloudtrail_logs_test2
using Athena engine version 1.
SELECT * FROM information_schema.__internal_partitions__ WHERE table_schema = 'default' AND table_name = 'cloudtrail_logs_test2' ORDER BY partition_number
The following table shows sample results.
table_catalog | table_schema | table_name | partition_number | partition_key | partition_value | |
---|---|---|---|---|---|---|
1 | awsdatacatalog | default | cloudtrail_logs_test2 | 1 | year | 2018 |
2 | awsdatacatalog | default | cloudtrail_logs_test2 | 1 | month | 09 |
3 | awsdatacatalog | default | cloudtrail_logs_test2 | 1 | day | 30 |
Listing or Searching Columns for a Specified Table or View
You can list all columns for a table, all columns for a view, or search for a column by name in a specified database and table.
Example – Listing All Columns for a Specified Table
The following example query lists all columns for the table
rdspostgresqldb1_public_account
.
SELECT * FROM information_schema.columns WHERE table_schema = 'rdspostgresql' AND table_name = 'rdspostgresqldb1_public_account'
The following table shows sample results.
table_catalog | table_schema | table_name | column_name | ordinal_position | column_default | is_nullable | data_type | comment | extra_info | |
---|---|---|---|---|---|---|---|---|---|---|
1 | awsdatacatalog | rdspostgresql | rdspostgresqldb1_public_account | password | 1 | YES | varchar | |||
2 | awsdatacatalog | rdspostgresql | rdspostgresqldb1_public_account | user_id | 2 | YES | integer | |||
3 | awsdatacatalog | rdspostgresql | rdspostgresqldb1_public_account | created_on | 3 | YES | timestamp | |||
4 | awsdatacatalog | rdspostgresql | rdspostgresqldb1_public_account | last_login | 4 | YES | timestamp | |||
5 | awsdatacatalog | rdspostgresql | rdspostgresqldb1_public_account | 5 | YES | varchar | ||||
6 | awsdatacatalog | rdspostgresql | rdspostgresqldb1_public_account | username | 6 | YES | varchar |
Example – Listing the Columns for a Specified View
The following example query lists all the columns in the default
database for the view arrayview
.
SELECT * FROM information_schema.columns WHERE table_schema = 'default' AND table_name = 'arrayview'
The following table shows sample results.
table_catalog | table_schema | table_name | column_name | ordinal_position | column_default | is_nullable | data_type | comment | extra_info | |
---|---|---|---|---|---|---|---|---|---|---|
1 | awsdatacatalog | default | arrayview | searchdate | 1 | YES | varchar | |||
2 | awsdatacatalog | default | arrayview | sid | 2 | YES | varchar | |||
3 | awsdatacatalog | default | arrayview | btid | 3 | YES | varchar | |||
4 | awsdatacatalog | default | arrayview | p | 4 | YES | varchar | |||
5 | awsdatacatalog | default | arrayview | infantprice | 5 | YES | varchar | |||
6 | awsdatacatalog | default | arrayview | sump | 6 | YES | varchar | |||
7 | awsdatacatalog | default | arrayview | journeymaparray | 7 | YES | array(varchar) |
Example – Searching for a Column by Name in a Specified Database and Table
The following example query searches for metadata for the sid
column
in the arrayview
view of the default
database.
SELECT * FROM information_schema.columns WHERE table_schema = 'default' AND table_name = 'arrayview' AND column_name='sid'
The following table shows a sample result.
table_catalog | table_schema | table_name | column_name | ordinal_position | column_default | is_nullable | data_type | comment | extra_info | |
---|---|---|---|---|---|---|---|---|---|---|
1 | awsdatacatalog | default | arrayview | sid | 2 | YES | varchar |