Use the AWS CLI with Hive metastores - Amazon Athena

Use the AWS CLI with Hive metastores

You can use aws athena CLI commands to manage the Hive metastore data catalogs that you use with Athena. After you have defined one or more catalogs to use with Athena, you can reference those catalogs in your aws athena DDL and DML commands.

Using the AWS CLI to manage Hive metastore catalogs

Registering a catalog: Create-data-catalog

To register a data catalog, you use the create-data-catalog command. Use the name parameter to specify the name that you want to use for the catalog. Pass the ARN of the Lambda function to the metadata-function option of the parameters argument. To create tags for the new catalog, use the tags parameter with one or more space-separated Key=key,Value=value argument pairs.

The following example registers the Hive metastore catalog named hms-catalog-1. The command has been formatted for readability.

$ aws athena create-data-catalog --name "hms-catalog-1" --type "HIVE" --description "Hive Catalog 1" --parameters "metadata-function=arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-v3,sdk-version=1.0" --tags Key=MyKey,Value=MyValue --region us-east-1

Showing catalog details: Get-data-catalog

To show the details of a catalog, pass the name of the catalog to the get-data-catalog command, as in the following example.

$ aws athena get-data-catalog --name "hms-catalog-1" --region us-east-1

The following sample result is in JSON format.

{ "DataCatalog": { "Name": "hms-catalog-1", "Description": "Hive Catalog 1", "Type": "HIVE", "Parameters": { "metadata-function": "arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-v3", "sdk-version": "1.0" } } }

Listing registered catalogs: List-data-catalogs

To list the registered catalogs, use the list-data-catalogs command and optionally specify a Region, as in the following example. The catalogs listed always include AWS Glue.

$ aws athena list-data-catalogs --region us-east-1

The following sample result is in JSON format.

{ "DataCatalogs": [ { "CatalogName": "AwsDataCatalog", "Type": "GLUE" }, { "CatalogName": "hms-catalog-1", "Type": "HIVE", "Parameters": { "metadata-function": "arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-v3", "sdk-version": "1.0" } } ] }

Updating a catalog: Update-data-catalog

To update a data catalog, use the update-data-catalog command, as in the following example. The command has been formatted for readability.

$ aws athena update-data-catalog --name "hms-catalog-1" --type "HIVE" --description "My New Hive Catalog Description" --parameters "metadata-function=arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-new,sdk-version=1.0" --region us-east-1

Deleting a catalog: Delete-data-catalog

To delete a data catalog, use the delete-data-catalog command, as in the following example.

$ aws athena delete-data-catalog --name "hms-catalog-1" --region us-east-1

Showing database details: Get-database

To show the details of a database, pass the name of the catalog and the database to the get-database command, as in the following example.

$ aws athena get-database --catalog-name hms-catalog-1 --database-name mydb

The following sample result is in JSON format.

{ "Database": { "Name": "mydb", "Description": "My database", "Parameters": { "CreatedBy": "Athena", "EXTERNAL": "TRUE" } } }

Listing databases in a catalog: List-databases

To list the databases in a catalog, use the list-databases command and optionally specify a Region, as in the following example.

$ aws athena list-databases --catalog-name AwsDataCatalog --region us-west-2

The following sample result is in JSON format.

{ "DatabaseList": [ { "Name": "default" }, { "Name": "mycrawlerdatabase" }, { "Name": "mydatabase" }, { "Name": "sampledb", "Description": "Sample database", "Parameters": { "CreatedBy": "Athena", "EXTERNAL": "TRUE" } }, { "Name": "tpch100" } ] }

Showing table details: Get-table-metadata

To show the metadata for a table, including column names and datatypes, pass the name of the catalog, database, and table name to the get-table-metadata command, as in the following example.

$ aws athena get-table-metadata --catalog-name AwsDataCatalog --database-name mydb --table-name cityuseragent

The following sample result is in JSON format.

{ "TableMetadata": { "Name": "cityuseragent", "CreateTime": 1586451276.0, "LastAccessTime": 0.0, "TableType": "EXTERNAL_TABLE", "Columns": [ { "Name": "city", "Type": "string" }, { "Name": "useragent1", "Type": "string" } ], "PartitionKeys": [], "Parameters": { "COLUMN_STATS_ACCURATE": "false", "EXTERNAL": "TRUE", "inputformat": "org.apache.hadoop.mapred.TextInputFormat", "last_modified_by": "hadoop", "last_modified_time": "1586454879", "location": "s3://amzn-s3-demo-bucket/", "numFiles": "1", "numRows": "-1", "outputformat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "rawDataSize": "-1", "serde.param.serialization.format": "1", "serde.serialization.lib": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "totalSize": "61" } } }

Showing metadata for all tables in a database: List-table-metadata

To show the metadata for all tables in a database, pass the name of the catalog and database name to the list-table-metadata command. The list-table-metadata command is similar to the get-table-metadata command, except that you do not specify a table name. To limit the number of results, you can use the --max-results option, as in the following example.

$ aws athena list-table-metadata --catalog-name AwsDataCatalog --database-name sampledb --region us-east-1 --max-results 2

The following sample result is in JSON format.

{ "TableMetadataList": [ { "Name": "cityuseragent", "CreateTime": 1586451276.0, "LastAccessTime": 0.0, "TableType": "EXTERNAL_TABLE", "Columns": [ { "Name": "city", "Type": "string" }, { "Name": "useragent1", "Type": "string" } ], "PartitionKeys": [], "Parameters": { "COLUMN_STATS_ACCURATE": "false", "EXTERNAL": "TRUE", "inputformat": "org.apache.hadoop.mapred.TextInputFormat", "last_modified_by": "hadoop", "last_modified_time": "1586454879", "location": "s3://amzn-s3-demo-bucket/", "numFiles": "1", "numRows": "-1", "outputformat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "rawDataSize": "-1", "serde.param.serialization.format": "1", "serde.serialization.lib": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "totalSize": "61" } }, { "Name": "clearinghouse_data", "CreateTime": 1589255544.0, "LastAccessTime": 0.0, "TableType": "EXTERNAL_TABLE", "Columns": [ { "Name": "location", "Type": "string" }, { "Name": "stock_count", "Type": "int" }, { "Name": "quantity_shipped", "Type": "int" } ], "PartitionKeys": [], "Parameters": { "EXTERNAL": "TRUE", "inputformat": "org.apache.hadoop.mapred.TextInputFormat", "location": "s3://amzn-s3-demo-bucket/", "outputformat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "serde.param.serialization.format": "1", "serde.serialization.lib": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "transient_lastDdlTime": "1589255544" } } ], "NextToken": "eyJsYXN0RXZhbHVhdGVkS2V5Ijp7IkhBU0hfS0VZIjp7InMiOiJ0Ljk0YWZjYjk1MjJjNTQ1YmU4Y2I5OWE5NTg0MjFjYTYzIn0sIlJBTkdFX0tFWSI6eyJzIjoiY2xlYXJpbmdob3VzZV9kYXRhIn19LCJleHBpcmF0aW9uIjp7InNlY29uZHMiOjE1ODkzNDIwMjIsIm5hbm9zIjo2NTUwMDAwMDB9fQ==" }

Running DDL and DML statements

When you use the AWS CLI to run DDL and DML statements, you can pass the name of the Hive metastore catalog in one of two ways:

  • Directly into the statements that support it.

  • To the --query-execution-context Catalog parameter.

DDL statements

The following example passes in the catalog name directly as part of the show create table DDL statement. The command has been formatted for readability.

$ aws athena start-query-execution --query-string "show create table hms-catalog-1.hms_tpch_partitioned.lineitem" --result-configuration "OutputLocation=s3://amzn-s3-demo-bucket/lambda/results"

The following example DDL show create table statement uses the Catalog parameter of --query-execution-context to pass the Hive metastore catalog name hms-catalog-1. The command has been formatted for readability.

$ aws athena start-query-execution --query-string "show create table lineitem" --query-execution-context "Catalog=hms-catalog-1,Database=hms_tpch_partitioned" --result-configuration "OutputLocation=s3://amzn-s3-demo-bucket/lambda/results"

DML statements

The following example DML select statement passes the catalog name into the query directly. The command has been formatted for readability.

$ aws athena start-query-execution --query-string "select * from hms-catalog-1.hms_tpch_partitioned.customer limit 100" --result-configuration "OutputLocation=s3://amzn-s3-demo-bucket/lambda/results"

The following example DML select statement uses the Catalog parameter of --query-execution-context to pass in the Hive metastore catalog name hms-catalog-1. The command has been formatted for readability.

$ aws athena start-query-execution --query-string "select * from customer limit 100" --query-execution-context "Catalog=hms-catalog-1,Database=hms_tpch_partitioned" --result-configuration "OutputLocation=s3://amzn-s3-demo-bucket/lambda/results"