Optimizing query performance - Amazon Quantum Ledger Database (Amazon QLDB)

Optimizing query performance

Amazon QLDB is intended to address the needs of high-performance online transaction processing (OLTP) workloads. This means that QLDB is optimized for a specific set of query patterns, even though it supports SQL-like query capabilities. It's critical to design applications and their data models to work with these query patterns. Otherwise, as your tables grow, you will encounter significant performance problems, including query latency, transaction timeouts, and concurrency conflicts.

This section describes query constraints in QLDB and provides guidance for writing optimal queries given these constraints.

Transaction timeout limit

In QLDB, every PartiQL statement (including every SELECT query) is processed in a transaction and is subject to a transaction timeout limit. A transaction can run for up to 30 seconds before being committed. After this limit, QLDB rejects any work done on the transaction and discards the session that ran the transaction. This limit protects the service's client from leaking sessions by starting transactions and not committing or canceling them.

Concurrency conflicts

QLDB implements concurrency control by using optimistic concurrency control (OCC). Suboptimal queries can also lead to more OCC conflicts. For information about OCC, see Amazon QLDB concurrency model.

Optimal query patterns

As a best practice, you should run statements with a WHERE predicate clause that filters on an indexed field or a document ID. QLDB requires an equality operator (= or IN) on an indexed field to efficiently look up a document.

The following are examples of optimal query patterns in the user view.

--Indexed field (VIN) lookup using the = operator SELECT * FROM VehicleRegistration WHERE VIN = '1N4AL11D75C109151' --Indexed field (VIN) AND non-indexed field (City) lookup SELECT * FROM VehicleRegistration WHERE VIN = '1N4AL11D75C109151' AND City = 'Seattle' --Indexed field (VIN) lookup using the IN operator SELECT * FROM VehicleRegistration WHERE VIN IN ('1N4AL11D75C109151', 'KM8SRDHF6EU074761') --Document ID (r_id) lookup using the BY clause SELECT * FROM VehicleRegistration BY r_id WHERE r_id = '3Qv67yjXEwB9SjmvkuG6Cp'

Any query that doesn't follow these patterns invokes a full table scan. Table scans can cause transaction timeouts for queries on large tables or queries that return large result sets. They can also lead to OCC conflicts with competing transactions.

Note

When you use the IN operator for an indexed field, be sure to enclose the list of values in parentheses. Without the parentheses, the statement invokes a table scan rather than an indexed lookup.

High-cardinality indexes

We recommend indexing fields that contain high-cardinality values. For example, the VIN and LicensePlateNumber fields in the VehicleRegistration table are indexed fields that are intended to be unique.

Avoid indexing low-cardinality fields such as status codes, address states or provinces, and postal codes. If you index such a field, your queries can produce large result sets that are more likely to result in transaction timeouts or cause unintended OCC conflicts.

Committed view queries

Queries that you run in the committed view follow the same optimization guidelines as user view queries. Indexes that you create on a table are also used for queries in the committed view.

History function queries

History function queries don't use the indexes that you create on a table. QLDB history is indexed by document ID only, and you can't create additional history indexes at this time.

As a best practice, qualify a history query with both a date range (start time and end time) and a document ID (metadata.id). History queries that include a start time and end time gain the benefit of date range qualification.

Inner join queries

For inner join queries, use join criteria that includes at least an indexed field for the table on the right side of the join. Without a join index, a join query invokes multiple table scans—for every document in the left table of the join, the query fully scans the right table. The best practice is to join on fields that are indexed for each table that you're joining, in addition to specifying a WHERE equality predicate for at least one table.

For example, the following query joins the VehicleRegistration and Vehicle tables on their respective VIN fields, which are both indexed. This query also has an equality predicate on VehicleRegistration.VIN.

SELECT * FROM VehicleRegistration AS r INNER JOIN Vehicle AS v ON r.VIN = v.VIN WHERE r.VIN IN ('1N4AL11D75C109151', 'KM8SRDHF6EU074761')

Choose high-cardinality indexes for both the join criteria and the equality predicates in your join queries.

Query patterns to avoid

The following are some examples of suboptimal statements that don't scale well for larger tables in QLDB. We strongly recommend that you do not rely on these types of queries for tables that grow over time because your queries will eventually result in transaction timeouts. Because tables contain documents that vary in size, it's difficult to define precise limits for non-indexed queries.

--No predicate clause SELECT * FROM Vehicle --COUNT() is not an optimized function SELECT COUNT(*) FROM Vehicle --Low-cardinality predicate SELECT * FROM Vehicle WHERE Color = 'Silver' --Inequality (>) does not qualify for indexed lookup SELECT * FROM Vehicle WHERE "Year" > 2019 --Inequality (LIKE) SELECT * FROM Vehicle WHERE VIN LIKE '1N4AL%' --Inequality (BETWEEN) SELECT SUM(PendingPenaltyTicketAmount) FROM VehicleRegistration WHERE ValidToDate BETWEEN `2020-01-01T` AND `2020-07-01T` --No predicate clause DELETE FROM Vehicle --No document id, and no date range for the history() function SELECT * FROM history(Vehicle)

In general, we don't recommend running the following types of query patterns for production use cases in QLDB:

  • Online analytical processing (OLAP) queries

  • Exploratory queries without a predicate clause

  • Reporting queries

  • Text search

Instead, we recommend streaming your data to a purpose-built database service that is optimized for analytical use cases. For example, you can stream QLDB data to Amazon OpenSearch Service to provide full text search capabilities over documents. For a sample application that demonstrates this use case, see the GitHub repository aws-samples/amazon-qldb-streaming-amazon-opensearch-service-sample-python. For information about QLDB streams, see Streaming journal data from Amazon QLDB.

Monitoring performance

The QLDB driver provides consumed I/O usage and timing information in the result object of a statement. These metrics enable you to identify inefficient PartiQL statements. To learn more, proceed to Getting PartiQL statement statistics.

You can also use Amazon CloudWatch to track your ledger's performance for data operations. Monitor the CommandLatency metric for a specified LedgerName and CommandType. For more information, see Monitoring with Amazon CloudWatch. To learn how QLDB uses commands to manage data operations, see Session management with the driver.