Using AWS Glue Data Catalog views - Amazon Athena

Using AWS Glue Data Catalog views

This feature is in preview release and is subject to change. For more information, see the Betas and Previews section in the AWS Service Terms document.

Use AWS Glue Data Catalog views when you want a single common view across AWS services like Amazon Athena and Amazon Redshift. In Data Catalog views, access permissions are defined by the user who created the view instead of the user who queries the view. This method of granting permissions is called definer semantics.

The following use cases show how you can use Data Catalog views.

  • Greater access control – You create a view that restricts data access based on the level of permissions the user requires. For example, you can use Data Catalog views to prevent employees who don't work in the human resources (HR) department from seeing personally identifiable information.

  • Ensure complete records – By applying certain filters onto your Data Catalog view, you make sure that the data records in a Data Catalog view are always complete.

  • Enhanced security – In Data Catalog views, the query definition that creates the view must be intact in order for the view to be created. This makes Data Catalog views less susceptible to SQL commands from malicious actors.

  • Prevent access to underlying tables – Definer semantics allow users to access a view without making the underlying table available to them. Only the user who defines the view requires access to the tables.

Data Catalog view definitions are stored in the AWS Glue Data Catalog. This means that you can use AWS Lake Formation to grant access through resource grants, column grants, or tag-based access controls. For more information about granting and revoking access in Lake Formation, see Granting and revoking permissions on Data Catalog resources in the AWS Lake Formation Developer Guide.

Permissions

Data Catalog views require three roles: Lake Formation Admin, Definer, and Invoker.

  • Lake Formation Admin – Has access to configure all Lake Formation permissions.

  • Definer – Creates the Data Catalog view. The Definer role must have full grantable SELECT permissions on all underlying tables that the view definition references.

  • Invoker – Can query the Data Catalog view or check its metadata. To show the invoker of a query, you can use the invoker_principal() DML function. For more information, see invoker_principal().

The Definer role's trust relationships must allow the sts:AssumeRole action for the AWS Glue and Lake Formation service principals, as in the following example.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": [ "glue.amazonaws.com", "lakeformation.amazonaws.com" ] }, "Action": "sts:AssumeRole" } ] }

IAM permissions for Athena access are also required. For more information, see AWS managed policies for Amazon Athena.

Limitations

  • Data Catalog views cannot reference other views.

  • You can reference up to 10 tables in the view definition.

  • Underlying tables must be registered with Lake Formation.

  • The DEFINER principal can be only an IAM role.

  • The DEFINER role must have full SELECT (grantable) permissions on the underlying tables.

  • UNPROTECTED Data Catalog views are not supported.

  • User-defined functions (UDFs) are not supported in the view definition.

  • Athena federated data sources cannot be used in Data Catalog views.

  • Data Catalog views are not supported for external Hive metastores.

  • Athena displays an error message when it detects stale views. A stale view is reported when one of the following occurs:

    • The view references tables or databases that do not exist.

    • A schema or metadata change is made in a referenced table.

    • A referenced table is dropped and recreated with a different schema or configuration.

Creating a Data Catalog view

The following example syntax shows how a user of the Definer role creates the orders_by_date Data Catalog view. The example assumes that the Definer role has full SELECT permissions on the orders table in the default database.

CREATE PROTECTED MULTI DIALECT VIEW orders_by_date SECURITY DEFINER AS SELECT orderdate, sum(totalprice) AS price FROM orders WHERE order_city = 'SEATTLE' GROUP BY orderdate

Querying a Data Catalog view

After the view is created, the Lake Formation Admin can grant SELECT permissions on the Data Catalog view to the Invoker principals. The Invoker principals can then query the view without having access to the underlying base tables referenced by the view. The following is an example Invoker query.

SELECT * from orders_by_date where price > 5000

Updating a Data Catalog view

The Lake Formation Admin or the Definer can use the ALTER VIEW UPDATE DIALECT syntax to update the view definition. The following example modifies the view definition to select columns from the returns table instead of the orders table.

ALTER VIEW orders_by_date UPDATE DIALECT AS SELECT return_date, sum(totalprice) AS price FROM returns WHERE order_city = 'SEATTLE' GROUP BY orderdate

For more information about the syntax for creating and managing Data Catalog views, see Glue Data Catalog view syntax.