Allow Lambda function access to external Hive metastores - Amazon Athena

Allow Lambda function access to external Hive metastores

To invoke a Lambda function in your account, you must create a role that has the following permissions:

  • AWSLambdaVPCAccessExecutionRole – An AWS Lambda execution role permission to manage elastic network interfaces that connect your function to a VPC. Ensure that you have a sufficient number of network interfaces and IP addresses available.

  • AmazonAthenaFullAccess – The AmazonAthenaFullAccess managed policy grants full access to Athena.

  • An Amazon S3 policy to allow the Lambda function to write to S3 and to allow Athena to read from S3.

For example, the following policy defines the permission for the spill location s3:\\mybucket\spill.

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetBucketLocation", "s3:GetObject", "s3:ListBucket", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::mybucket/spill" ] } ] }

Whenever you use IAM policies, make sure that you follow IAM best practices. For more information, see Security best practices in IAM in the IAM User Guide.

Creating Lambda functions

To create a Lambda function in your account, function development permissions or the AWSLambdaFullAccess role are required. For more information, see Identity-based IAM policies for AWS Lambda.

Because Athena uses the AWS Serverless Application Repository to create Lambda functions, the superuser or administrator who creates Lambda functions should also have IAM policies to allow Athena federated queries.

Catalog registration and metadata API operations

For access to catalog registration API and metadata API operations, use the AmazonAthenaFullAccess managed policy. If you do not use this policy, add the following API operations to your Athena policies:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "athena:ListDataCatalogs", "athena:GetDataCatalog", "athena:CreateDataCatalog", "athena:UpdateDataCatalog", "athena:DeleteDataCatalog", "athena:GetDatabase", "athena:ListDatabases", "athena:GetTableMetadata", "athena:ListTableMetadata" ], "Resource": [ "*" ] } ] }

Cross Region Lambda invocation

To invoke a Lambda function in a region other than the region in which you are running Athena queries, use the full ARN of the Lambda function. By default, Athena invokes Lambda functions defined in the same region. If you need to invoke a Lambda function to access a Hive metastore in a region other than the region in which you run Athena queries, you must provide the full ARN of the Lambda function.

For example, suppose you define the catalog ehms on the Europe (Frankfurt) Region eu-central-1 to use the following Lambda function in the US East (N. Virginia) Region.

arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-new

When you specify the full ARN in this way, Athena can call the external-hms-service-new Lambda function on us-east-1 to fetch the Hive metastore data from eu-central-1.

Note

The catalog ehms should be registered in the same region that you run Athena queries.

Cross account Lambda invocation

Sometimes you might require access to a Hive metastore from a different account. For example, to run a Hive metastore, you might launch an EMR cluster from an account that is different from the one that you use for Athena queries. Different groups or teams might run Hive metastore with different accounts inside their VPC. Or you might want to access metadata from different Hive metastores from different groups or teams.

Athena uses the AWS Lambda support for cross account access to enable cross account access for Hive Metastores.

Note

Note that cross account access for Athena normally implies cross account access for both metadata and data in Amazon S3.

Imagine the following scenario:

  • Account 111122223333 sets up the Lambda function external-hms-service-new on us-east-1 in Athena to access a Hive Metastore running on an EMR cluster.

  • Account 111122223333 wants to allow account 444455556666 to access the Hive Metastore data.

To grant account 444455556666 access to the Lambda function external-hms-service-new, account 111122223333 uses the following AWS CLI add-permission command. The command has been formatted for readability.

$ aws --profile perf-test lambda add-permission --function-name external-hms-service-new --region us-east-1 --statement-id Id-ehms-invocation2 --action "lambda:InvokeFunction" --principal arn:aws:iam::444455556666:user/perf1-test { "Statement": "{\"Sid\":\"Id-ehms-invocation2\", \"Effect\":\"Allow\", \"Principal\":{\"AWS\":\"arn:aws:iam::444455556666:user/perf1-test\"}, \"Action\":\"lambda:InvokeFunction\", \"Resource\":\"arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-new\"}" }

To check the Lambda permission, use the get-policy command, as in the following example. The command has been formatted for readability.

$ aws --profile perf-test lambda get-policy --function-name arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-new --region us-east-1 { "RevisionId": "711e93ea-9851-44c8-a09f-5f2a2829d40f", "Policy": "{\"Version\":\"2012-10-17\", \"Id\":\"default\", \"Statement\":[{\"Sid\":\"Id-ehms-invocation2\", \"Effect\":\"Allow\", \"Principal\":{\"AWS\":\"arn:aws:iam::444455556666:user/perf1-test\"}, \"Action\":\"lambda:InvokeFunction\", \"Resource\":\"arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-new\"}]}" }

After adding the permission, you can use a full ARN of the Lambda function on us-east-1 like the following when you define catalog ehms:

arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-new

For information about cross region invocation, see Cross Region Lambda invocation earlier in this topic.

Granting cross-account access to data

Before you can run Athena queries, you must grant cross account access to the data in Amazon S3. You can do this in one of the following ways:

  • Update the access control list policy of the Amazon S3 bucket with a canonical user ID.

  • Add cross account access to the Amazon S3 bucket policy.

For example, add the following policy to the Amazon S3 bucket policy in the account 111122223333 to allow account 444455556666 to read data from the Amazon S3 location specified.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "Stmt1234567890123", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::444455556666:user/perf1-test" }, "Action": "s3:GetObject", "Resource": "arn:aws:s3:::athena-test/lambda/dataset/*" } ] }
Note

You might need to grant cross account access to Amazon S3 not only to your data, but also to your Amazon S3 spill location. Your Lambda function spills extra data to the spill location when the size of the response object exceeds a given threshold. See the beginning of this topic for a sample policy.

In the current example, after cross account access is granted to 444455556666, 444455556666 can use catalog ehms in its own account to query tables that are defined in account 111122223333.

In the following example, the SQL Workbench profile perf-test-1 is for account 444455556666. The query uses catalog ehms to access the Hive metastore and the Amazon S3 data in account 111122223333.


                    Accessing Hive metastore and Amazon S3 data across accounts in SQL
                        Workbench.