Connect Athena to a Hive metastore using an existing IAM execution role
To connect your external Hive metastore to Athena with a Lambda function that uses an existing IAM role, you can use Athena's reference implementation of the Athena connector for external Hive metastore.
The three major steps are as follows:
-
Clone and build – Clone the Athena reference implementation and build the JAR file that contains the Lambda function code.
-
AWS Lambda console – In the AWS Lambda console, create a Lambda function, assign it an existing IAM execution role, and upload the function code that you generated.
-
Amazon Athena console – In the Amazon Athena console, create a data source name that you can use to refer to your external Hive metastore in your Athena queries.
If you already have permissions to create a custom IAM role, you can use a simpler workflow that uses the Athena console and the AWS Serverless Application Repository to create and configure a Lambda function. For more information, see Connect Athena to an Apache Hive metastore.
Prerequisites
-
Git must be installed on your system.
-
You must have Apache Maven
installed. -
You have an IAM execution role that you can assign to the Lambda function. For more information, see Allow Lambda function access to external Hive metastores.
Clone and build the Lambda function
The function code for the Athena reference implementation is a Maven project located on
GitHub at awslabs/aws-athena-hive-metastore
To clone and build the Lambda function code
-
Enter the following command to clone the Athena reference implementation:
git clone https://github.com/awslabs/aws-athena-hive-metastore
-
Run the following command to build the
.jar
file for the Lambda function:mvn clean install
After the project builds successfully, the following
.jar
file is created in the target folder of your project:hms-lambda-func-1.0-SNAPSHOT-withdep.jar
In the next section, you use the AWS Lambda console to upload this file to your Amazon Web Services account.
Create and configure the Lambda function in the AWS Lambda console
In this section, you use the AWS Lambda console to create a function that uses an existing IAM execution role. After you configure a VPC for the function, you upload the function code and configure the environment variables for the function.
Create the Lambda function
In this step, you create a function in the AWS Lambda console that uses an existing IAM role.
To create a Lambda function that uses an existing IAM role
Sign in to the AWS Management Console and open the AWS Lambda console at https://console.aws.amazon.com/lambda/
. -
In the navigation pane, choose Functions.
-
Choose Create function.
-
Choose Author from scratch.
-
For Function name, enter the name of your Lambda function (for example,
EHMSBasedLambda
). -
For Runtime, choose Java 8.
-
Under Permissions, expand Change default execution role.
-
For Execution role, choose Use an existing role.
-
For Existing role, choose the IAM execution role that your Lambda function will use for Athena (this example uses a role called
AthenaLambdaExecutionRole
). -
Expand Advanced settings.
-
Select Enable Network.
-
For VPC, choose the VPC that your function will have access to.
-
For Subnets, choose the VPC subnets for Lambda to use.
-
For Security groups, choose the VPC security groups for Lambda to use.
-
Choose Create function. The AWS Lambda console and opens the configuration page for your function and begins creating your function.
Upload the code and configure the Lambda function
When the console informs you that your function has been successfully created, you are ready to upload the function code and configure its environment variables.
To upload your Lambda function code and configure its environment variables
-
In the Lambda console, make sure that you are on the Code tab of the page of the function that you specfied.
-
For Code source, choose Upload from, and then choose .zip or .jar file.
-
Upload the
hms-lambda-func-1.0-SNAPSHOT-withdep.jar
file that you generated previously. -
On your Lambda function page, choose the Configuration tab.
-
From the pane on the left, choose Environment variables.
-
In the Environment variables section, choose Edit.
-
On the Edit environment variables page, use the Add environment variable option to add the following environment variable keys and values:
-
HMS_URIS – Use the following syntax to enter the URI of your Hive metastore host that uses the Thrift protocol at port 9083.
thrift://
<host_name>
:9083 -
SPILL_LOCATION – Specify an Amazon S3 location in your Amazon Web Services account to hold spillover metadata if the Lambda function response size exceeds 4 MB.
-
-
Choose Save.
At this point, you are ready to configure Athena to use your Lambda function to connect to your Hive metastore. For steps, see Configure Athena to use a deployed Hive metastore connector.