Use the AWS CLI with Hive metastores
You can use aws athena
CLI commands to manage the Hive metastore data
catalogs that you use with Athena. After you have defined one or more catalogs to use with
Athena, you can reference those catalogs in your aws athena
DDL and DML
commands.
Using the AWS CLI to manage Hive metastore catalogs
Registering a catalog: Create-data-catalog
To register a data catalog, you use the create-data-catalog
command.
Use the name
parameter to specify the name that you want to use for the
catalog. Pass the ARN of the Lambda function to the metadata-function
option of the parameters
argument. To create tags for the new catalog,
use the tags
parameter with one or more space-separated
Key=
argument pairs.key
,Value=value
The following example registers the Hive metastore catalog named
hms-catalog-1
. The command has been formatted for
readability.
$ aws athena create-data-catalog --name "hms-catalog-1" --type "HIVE" --description "Hive Catalog 1" --parameters "metadata-function=arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-v3,sdk-version=1.0" --tags Key=MyKey,Value=MyValue --region us-east-1
Showing catalog details: Get-data-catalog
To show the details of a catalog, pass the name of the catalog to the
get-data-catalog
command, as in the following example.
$ aws athena get-data-catalog --name "hms-catalog-1" --region us-east-1
The following sample result is in JSON format.
{ "DataCatalog": { "Name": "hms-catalog-1", "Description": "Hive Catalog 1", "Type": "HIVE", "Parameters": { "metadata-function": "arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-v3", "sdk-version": "1.0" } } }
Listing registered catalogs: List-data-catalogs
To list the registered catalogs, use the list-data-catalogs
command
and optionally specify a Region, as in the following example. The catalogs listed
always include AWS Glue.
$ aws athena list-data-catalogs --region us-east-1
The following sample result is in JSON format.
{ "DataCatalogs": [ { "CatalogName": "AwsDataCatalog", "Type": "GLUE" }, { "CatalogName": "hms-catalog-1", "Type": "HIVE", "Parameters": { "metadata-function": "arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-v3", "sdk-version": "1.0" } } ] }
Updating a catalog: Update-data-catalog
To update a data catalog, use the update-data-catalog
command, as in
the following example. The command has been formatted for readability.
$ aws athena update-data-catalog --name "hms-catalog-1" --type "HIVE" --description "My New Hive Catalog Description" --parameters "metadata-function=arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-new,sdk-version=1.0" --region us-east-1
Deleting a catalog: Delete-data-catalog
To delete a data catalog, use the delete-data-catalog
command, as in
the following example.
$ aws athena delete-data-catalog --name "hms-catalog-1" --region us-east-1
Showing database details: Get-database
To show the details of a database, pass the name of the catalog and the database
to the get-database
command, as in the following example.
$ aws athena get-database --catalog-name hms-catalog-1 --database-name mydb
The following sample result is in JSON format.
{ "Database": { "Name": "mydb", "Description": "My database", "Parameters": { "CreatedBy": "Athena", "EXTERNAL": "TRUE" } } }
Listing databases in a catalog: List-databases
To list the databases in a catalog, use the list-databases
command
and optionally specify a Region, as in the following example.
$ aws athena list-databases --catalog-name AwsDataCatalog --region us-west-2
The following sample result is in JSON format.
{ "DatabaseList": [ { "Name": "default" }, { "Name": "mycrawlerdatabase" }, { "Name": "mydatabase" }, { "Name": "sampledb", "Description": "Sample database", "Parameters": { "CreatedBy": "Athena", "EXTERNAL": "TRUE" } }, { "Name": "tpch100" } ] }
Showing table details: Get-table-metadata
To show the metadata for a table, including column names and datatypes, pass the
name of the catalog, database, and table name to the get-table-metadata
command, as in the following example.
$ aws athena get-table-metadata --catalog-name AwsDataCatalog --database-name mydb --table-name cityuseragent
The following sample result is in JSON format.
{ "TableMetadata": { "Name": "cityuseragent", "CreateTime": 1586451276.0, "LastAccessTime": 0.0, "TableType": "EXTERNAL_TABLE", "Columns": [ { "Name": "city", "Type": "string" }, { "Name": "useragent1", "Type": "string" } ], "PartitionKeys": [], "Parameters": { "COLUMN_STATS_ACCURATE": "false", "EXTERNAL": "TRUE", "inputformat": "org.apache.hadoop.mapred.TextInputFormat", "last_modified_by": "hadoop", "last_modified_time": "1586454879", "location": "s3://amzn-s3-demo-bucket/", "numFiles": "1", "numRows": "-1", "outputformat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "rawDataSize": "-1", "serde.param.serialization.format": "1", "serde.serialization.lib": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "totalSize": "61" } } }
Showing metadata for all tables in a database: List-table-metadata
To show the metadata for all tables in a database, pass the name of the catalog
and database name to the list-table-metadata
command. The
list-table-metadata
command is similar to the
get-table-metadata
command, except that you do not specify a table
name. To limit the number of results, you can use the --max-results
option, as in the following example.
$ aws athena list-table-metadata --catalog-name AwsDataCatalog --database-name sampledb --region us-east-1 --max-results 2
The following sample result is in JSON format.
{ "TableMetadataList": [ { "Name": "cityuseragent", "CreateTime": 1586451276.0, "LastAccessTime": 0.0, "TableType": "EXTERNAL_TABLE", "Columns": [ { "Name": "city", "Type": "string" }, { "Name": "useragent1", "Type": "string" } ], "PartitionKeys": [], "Parameters": { "COLUMN_STATS_ACCURATE": "false", "EXTERNAL": "TRUE", "inputformat": "org.apache.hadoop.mapred.TextInputFormat", "last_modified_by": "hadoop", "last_modified_time": "1586454879", "location": "s3://amzn-s3-demo-bucket/", "numFiles": "1", "numRows": "-1", "outputformat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "rawDataSize": "-1", "serde.param.serialization.format": "1", "serde.serialization.lib": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "totalSize": "61" } }, { "Name": "clearinghouse_data", "CreateTime": 1589255544.0, "LastAccessTime": 0.0, "TableType": "EXTERNAL_TABLE", "Columns": [ { "Name": "location", "Type": "string" }, { "Name": "stock_count", "Type": "int" }, { "Name": "quantity_shipped", "Type": "int" } ], "PartitionKeys": [], "Parameters": { "EXTERNAL": "TRUE", "inputformat": "org.apache.hadoop.mapred.TextInputFormat", "location": "s3://amzn-s3-demo-bucket/", "outputformat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "serde.param.serialization.format": "1", "serde.serialization.lib": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "transient_lastDdlTime": "1589255544" } } ], "NextToken": "eyJsYXN0RXZhbHVhdGVkS2V5Ijp7IkhBU0hfS0VZIjp7InMiOiJ0Ljk0YWZjYjk1MjJjNTQ1YmU4Y2I5OWE5NTg0MjFjYTYzIn0sIlJBTkdFX0tFWSI6eyJzIjoiY2xlYXJpbmdob3VzZV9kYXRhIn19LCJleHBpcmF0aW9uIjp7InNlY29uZHMiOjE1ODkzNDIwMjIsIm5hbm9zIjo2NTUwMDAwMDB9fQ==" }
Running DDL and DML statements
When you use the AWS CLI to run DDL and DML statements, you can pass the name of the Hive metastore catalog in one of two ways:
-
Directly into the statements that support it.
-
To the
--query-execution-context
Catalog
parameter.
DDL statements
The following example passes in the catalog name directly as part of the
show create table
DDL statement. The command has been formatted for
readability.
$ aws athena start-query-execution --query-string "show create table hms-catalog-1.hms_tpch_partitioned.lineitem" --result-configuration "OutputLocation=s3://amzn-s3-demo-bucket/lambda/results"
The following example DDL show create table
statement uses the
Catalog
parameter of --query-execution-context
to pass
the Hive metastore catalog name hms-catalog-1
. The command has been
formatted for readability.
$ aws athena start-query-execution --query-string "show create table lineitem" --query-execution-context "Catalog=hms-catalog-1,Database=hms_tpch_partitioned" --result-configuration "OutputLocation=s3://amzn-s3-demo-bucket/lambda/results"
DML statements
The following example DML select
statement passes the catalog name
into the query directly. The command has been formatted for readability.
$ aws athena start-query-execution --query-string "select * from hms-catalog-1.hms_tpch_partitioned.customer limit 100" --result-configuration "OutputLocation=s3://amzn-s3-demo-bucket/lambda/results"
The following example DML select
statement uses the
Catalog
parameter of --query-execution-context
to pass
in the Hive metastore catalog name hms-catalog-1
. The command has been
formatted for readability.
$ aws athena start-query-execution --query-string "select * from customer limit 100" --query-execution-context "Catalog=hms-catalog-1,Database=hms_tpch_partitioned" --result-configuration "OutputLocation=s3://amzn-s3-demo-bucket/lambda/results"