Lake Formation workflow for application integration API operations - AWS Lake Formation

Lake Formation workflow for application integration API operations

The following is the work flow for application integration API operations:

  1. A user submits a query or request for data using an integrated third-party query engine. The query engine assumes an IAM role that represents the user or a group of users, and retrieves trusted credentials to be used when calling the application integration API operations.

  2. The query engine calls GetUnfilteredTableMetadata, and if it is a partitioned table, the query engine calls GetUnfilteredPartitionsMetadata to retrieve metadata and policy information from the Data Catalog.

  3. Lake Formation performs authorization for the request. If the user doesn't have appropriate permissions on the table, then AccessDeniedException is thrown.

  4. As part of the request, the query engine sends the filtering it supports. There are two flags that can be sent within an array: COLUMN_PERMISSIONS and CELL_FILTER_PERMISSION. If the query engine doesn't support any of these features, and a policy exists on the table for the feature, then a PermissionTypeMismatchException is thrown and the query fails. This is to avoid data leakage.

  5. The returned response contains the following:

    • The entire schema for the table so that query engines can use it to parse the data from storage.

    • A list of authorized columns that the user has access. If the authorized column list is empty, it indicates that the user has DESCRIBE permissions, but does not have SELECT permissions, and the query fails.

    • A flag, IsRegisteredWithLakeFormation, which indicates if Lake Formation can vend credentials to this resources data. If this returns false, then the customers' credentials should be used to access Amazon S3.

    • A list of CellFilters if any that should be applied to rows of data. This list contains columns and an expression to evaluate each row. This should only be populated if CELL_FILTER_PERMISSION is sent as part of the request and there is a data filter against the table for the calling user.

  6. After the metadata is retrieved, the query engine calls GetTemporaryGlueTableCredentials or GetTemporaryGluePartitionCredentials to get AWS credentials to retrieve data from the Amazon S3 location.

  7. The query engine reads relevant objects from Amazon S3, filters the data based on the policies it received in step 2, and returns the results to the user.

The application integration API operations for Lake Formation contain additional content for configuring integration with third-party query engines. You can see the operation details in the Credential vending API operations section.