Querying APIs - AWS Lake Formation

Querying APIs

The Querying APIs allow you to share transactionally-consistent data for data lakes managed in Amazon S3, Amazon Redshift and other AWS services.

Data Types

WorkUnitRange Structure

Defines the valid range of work unit IDs for querying the execution service.

Fields

  • WorkUnitIdMaxRequired: Number (long).

    Defines the maximum work unit ID in the range. The maximum value is inclusive.

  • WorkUnitIdMinRequired: Number (long).

    Defines the minimum work unit ID in the range.

  • WorkUnitTokenRequired: UTF-8 string.

    A work token used to query the execution service.

GetWorkUnitsResponse Structure

A structure for the output.

Fields

  • NextToken – UTF-8 string.

    A continuation token for paginating the returned list of tokens, returned if the current segment of the list is not the last.

  • QueryIdRequired: UTF-8 string.

    The ID of the plan query operation.

  • WorkUnitRangesRequired: An array of WorkUnitRange objects.

    A WorkUnitRangeList object that specifies the valid range of work unit IDs for querying the execution service.

GetQueryStateResponse Structure

A structure for the output.

Fields

  • Error – UTF-8 string.

    An error message when the operation fails.

  • StateRequired: UTF-8 string (valid values: PENDING="" | WORKUNITS_AVAILABLE="" | ERROR="" | FINISHED="" | EXPIRED="").

    The state of a query previously submitted. The possible states are:

    • PENDING: the query is pending.

    • WORKUNITS_AVAILABLE: some work units are ready for retrieval and execution.

    • FINISHED: the query planning finished successfully, and all work units are ready for retrieval and execution.

    • ERROR: an error occurred with the query, such as an invalid query ID or a backend error.

GetWorkUnitResultsResponse Structure

A structure for the output.

Fields

  • ResultStream – Blob.

    Rows returned from the GetWorkUnitResults operation as a stream of Apache Arrow v1.0 messages.

QueryPlanningContext Structure

A structure containing information about the query plan.

Fields

  • CatalogId – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the partition in question resides. If none is provided, the AWS account ID is used by default.

  • DatabaseNameRequired: UTF-8 string, at least 1 byte long, matching the Single-line string pattern.

    The database containing the table.

  • QueryAsOfTime – Timestamp.

    The time as of when to read the table contents. If not set, the most recent transaction commit time will be used. Cannot be specified along with TransactionId.

  • QueryParameters – A map array of key-value pairs.

    Each key is a UTF-8 string.

    Each value is a UTF-8 string.

    A map consisting of key-value pairs.

  • TransactionId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Custom string pattern #11.

    The transaction ID at which to read the table contents. If this transaction is not committed, the read will be treated as part of that transaction and will see its writes. If this transaction has aborted, an error will be returned. If not set, defaults to the most recent committed transaction. Cannot be specified along with QueryAsOfTime.

ExecutionStatistics Structure

Statistics related to the processing of a query statement.

Fields

  • AverageExecutionTimeMillis – Number (long).

    The average time the request took to be executed.

  • DataScannedBytes – Number (long).

    The amount of data that was scanned in bytes.

  • WorkUnitsExecutedCount – Number (long).

    The number of work units executed.

PlanningStatistics Structure

Statistics related to the processing of a query statement.

Fields

  • EstimatedDataToScanBytes – Number (long).

    An estimate of the data that was scanned in bytes.

  • PlanningTimeMillis – Number (long).

    The time that it took to process the request.

  • QueueTimeMillis – Number (long).

    The time the request was in queue to be processed.

  • WorkUnitsGeneratedCount – Number (long).

    The number of work units generated.

Operations

StartQueryPlanning Action (Python: start_query_planning)

Submits a request to process a query statement.

This operation generates work units that can be retrieved with the GetWorkUnits operation as soon as the query state is WORKUNITS_AVAILABLE or FINISHED.

Request

  • QueryPlanningContextRequired: A QueryPlanningContext object.

    A structure containing information about the query plan.

  • QueryStringRequired: UTF-8 string, at least 1 byte long.

    A PartiQL query statement used as an input to the planner service.

Response

A structure for the output.

  • QueryIdRequired: UTF-8 string.

    The ID of the plan query operation can be used to fetch the actual work unit descriptors that are produced as the result of the operation. The ID is also used to get the query state and as an input to the Execute operation.

Errors

  • InternalServiceException

  • InvalidInputException

  • AccessDeniedException

  • ThrottledException

GetQueryState Action (Python: get_query_state)

Returns the state of a query previously submitted. Clients are expected to poll GetQueryState to monitor the current state of the planning before retrieving the work units. A query state is only visible to the principal that made the initial call to StartQueryPlanning.

Request

  • QueryIdRequired: UTF-8 string, not less than 36 or more than 36 bytes long.

    The ID of the plan query operation.

Response

A structure for the output.

  • Error – UTF-8 string.

    An error message when the operation fails.

  • StateRequired: UTF-8 string (valid values: PENDING="" | WORKUNITS_AVAILABLE="" | ERROR="" | FINISHED="" | EXPIRED="").

    The state of a query previously submitted. The possible states are:

    • PENDING: the query is pending.

    • WORKUNITS_AVAILABLE: some work units are ready for retrieval and execution.

    • FINISHED: the query planning finished successfully, and all work units are ready for retrieval and execution.

    • ERROR: an error occurred with the query, such as an invalid query ID or a backend error.

Errors

  • InternalServiceException

  • InvalidInputException

  • AccessDeniedException

GetWorkUnits Action (Python: get_work_units)

Retrieves the work units generated by the StartQueryPlanning operation.

Request

  • NextToken – UTF-8 string.

    A continuation token, if this is a continuation call.

  • PageSize – Number (integer).

    The size of each page to get in the AWS service call. This does not affect the number of items returned in the command's output. Setting a smaller page size results in more calls to the AWS service, retrieving fewer items in each call. This can help prevent the AWS service calls from timing out.

  • QueryIdRequired: UTF-8 string, not less than 36 or more than 36 bytes long.

    The ID of the plan query operation.

Response

A structure for the output.

  • NextToken – UTF-8 string.

    A continuation token for paginating the returned list of tokens, returned if the current segment of the list is not the last.

  • QueryIdRequired: UTF-8 string.

    The ID of the plan query operation.

  • WorkUnitRangesRequired: An array of WorkUnitRange objects.

    A WorkUnitRangeList object that specifies the valid range of work unit IDs for querying the execution service.

Errors

  • WorkUnitsNotReadyYetException

  • InternalServiceException

  • InvalidInputException

  • AccessDeniedException

  • ExpiredException

GetWorkUnitResults Action (Python: get_work_unit_results)

Returns the work units resulting from the query. Work units can be executed in any order and in parallel.

Request

  • QueryIdRequired: UTF-8 string, not less than 36 or more than 36 bytes long.

    The ID of the plan query operation for which to get results.

  • WorkUnitIdRequired: Number (long), not more than None.

    The work unit ID for which to get results. Value generated by enumerating WorkUnitIdMin to WorkUnitIdMax (inclusive) from the WorkUnitRange in the output of GetWorkUnits.

  • WorkUnitTokenRequired: UTF-8 string, at least 1 byte long.

    A work token used to query the execution service. Token output from GetWorkUnits.

Response

A structure for the output.

  • ResultStream – Blob.

    Rows returned from the GetWorkUnitResults operation as a stream of Apache Arrow v1.0 messages.

Errors

  • InternalServiceException

  • InvalidInputException

  • AccessDeniedException

  • ExpiredException

  • ThrottledException

GetQueryStatistics Action (Python: get_query_statistics)

Retrieves statistics on the planning and execution of a query.

Request

  • QueryIdRequired: UTF-8 string, not less than 36 or more than 36 bytes long.

    The ID of the plan query operation.

Response

  • ExecutionStatistics – An ExecutionStatistics object.

    An ExecutionStatistics structure containing execution statistics.

  • PlanningStatistics – A PlanningStatistics object.

    A PlanningStatistics structure containing query planning statistics.

  • QuerySubmissionTime – UTF-8 string.

    The time that the query was submitted.

Errors

  • StatisticsNotReadyYetException

  • InternalServiceException

  • InvalidInputException

  • AccessDeniedException

  • ExpiredException

  • ThrottledException

Exceptions

StatisticsNotReadyYetException Structure

Contains details about an error related to statistics not being ready.

Fields

  • Message – UTF-8 string.

    A message describing the error.

WorkUnitsNotReadyYetException Structure

Contains details about an error related to work units not being ready.

Fields

  • Message – UTF-8 string.

    A message describing the error.

ExpiredException Structure

Contains details about an error where the query request expired.

Fields

  • Message – UTF-8 string.

    A message describing the error.

ThrottledException Structure

Contains details about an error where the query request was throttled.

Fields

  • Message – UTF-8 string.

    A message describing the error.