Data filters in Lake Formation
You can implement column-level, row-level, and cell-level security by creating
data filters. You select a data filter when you grant
the SELECT
Lake Formation permission on tables.
Each data filter belongs to a specific table in your Data Catalog. A data filter includes the following information:
-
Filter name
-
The Catalog IDs of the table associated with the filter
-
Table name
-
Name of the database that contains the table
-
Column specification – a list of columns to include or exclude in query results.
-
Row filter expression – an expression that specifies the rows to include in query results. With some restrictions, the expression has the syntax of a
WHERE
clause in the PartiQL language. To specify all rows, entertrue
in the console or useAllRowsWildcard
in API calls.For more information about what is supported in row filter expressions, see PartiQL support in row filter expressions.
The level of filtering that you get depends on how you populate the data filter.
-
When you specify the "all columns" wildcard and provide a row filter expression, you are establishing row-level security (row filtering) only.
-
When you include or exclude specific columns and specify "all rows" using the all-rows wildcard, you are establishing column-level security (column filtering) only.
-
When you include or exclude specific columns and also provide a row filter expression, you are establishing cell-level security (cell filtering).
The following screenshot from the Lake Formation console shows a data filter that performs
cell-level filtering. For queries against the orders
table, it restricts access
to the customer_name
column and the query results return only rows where the product_type
column
contains 'pharma'.

Note the use of single quotes to enclose the string literal, 'pharma'
.
You can use the Lake Formation console to create this data filter, or you can supply the following
request object to the CreateDataCellsFilter
API operation.
{ "Name": "restrict-pharma", "DatabaseName": "sales", "TableName": "orders", "TableCatalogId": "111122223333", "RowFilter": {"FilterExpression": "product_type='pharma'"}, "ColumnWildcard": { "ExcludedColumnNames": ["customer_name"] } }
You can create as many data filters as you need for a table. In order to do so, you
require SELECT
permission with the grant option on a table. Data Lake
Administrators by default have the permission to create data
filters on all tables in that account. You typically only use a subset of the
possible data filters when granting permissions on the
table to a principal. For example, you could create a second data filter for the
orders
table that is a row-security-only data filter. Referring to the
preceding screenshot, you could choose the Access to all columns option
and include a row filter expression of product_type<>pharma
. The name of this
data filter could be no-pharma
. It restricts access to all rows that have the
product_type
column set to 'pharma'.
The request object for the CreateDataCellsFilter
API operation for this data
filter is the following.
{ "Name": "no-pharma", "DatabaseName": "sales", "TableName": "orders", "TableCatalogId": "111122223333", "RowFilter": {"FilterExpression": "product_type<>'pharma'"}, "ColumnNames": ["customer_id", "customer_name", "order_num" "product_id", "purchase_date", "product_type", "product_manufacturer", "quantity", "price"] }
You could then grant SELECT
on the orders
table with the
restrict-pharma
data filter to an administrative user, and SELECT
on the orders
table with the no-pharma
data filter to
non-administrative users. For users in the healthcare sector, you would grant
SELECT
on the orders
table with full access to all rows and
columns (no data filter), or perhaps with yet another data filter that restricts access to
pricing information.