Data Verification in Amazon QLDB - Amazon Quantum Ledger Database (Amazon QLDB)

Data Verification in Amazon QLDB

With Amazon QLDB, you can trust that the history of changes to your application data is accurate. QLDB uses an immutable transactional log, known as a journal, for data storage. The journal tracks every change to your data and maintains a complete and verifiable history of changes over time.

QLDB uses the SHA-256 hash function with a Merkle tree–based model to generate a cryptographic representation of your journal, known as a digest. The digest acts as a unique signature of your data's entire change history as of a point in time. It enables you to look back and verify the integrity of your document revisions relative to that signature.

What Kind of Data Can You Verify in QLDB?

In QLDB, each ledger has exactly one journal. A journal can have multiple strands, which are partitions of the journal.

Note

QLDB currently supports journals with a single strand only.

A block is an object that is committed to the journal strand during a transaction. This block contains entry objects, which represent the document revisions that resulted from the transaction. You can verify either an individual revision or an entire journal block in QLDB.

The following diagram illustrates this journal structure.


                Amazon QLDB journal structure diagram showing a set of hash-chained blocks
                    that make up a strand, and the sequence number and hash of each block.

The diagram shows that transactions are committed to the journal as blocks that contain document revision entries. It also shows that each block is hash-chained to subsequent blocks and has a sequence number to specify its address within the strand.

What Does Data Integrity Mean?

Data integrity in QLDB means that your ledger's journal is in fact immutable. In other words, your data (specifically, each document revision) is in a state where the following are true:

  1. It exists at the same location in your journal where it was first written.

  2. It hasn't been altered in any way since it was written.

How Does Verification Work?

To understand how verification works in Amazon QLDB, you can break down the concept into four basic components.

Hashing

QLDB uses the SHA-256 cryptographic hash function to create 256-bit hash values. A hash acts as a unique, fixed-length signature of any arbitrary amount of input data. If you change any part of the input—even a single character or bit—then the output hash changes completely.


                    Diagram showing that the SHA-256 cryptographic hash function creates
                        completely unique hash values for two QLDB Ion documents that differ by
                        only a single digit.

The SHA-256 hash function is one-way, meaning that it's not mathematically feasible to compute the input when given an output.


                    Diagram showing that it's not feasible to compute the input QLDB Ion
                        document when given an output hash value.

The following data inputs are hashed in QLDB for verification purposes:

  • document revisions

  • PartiQL statements

  • revision entries

  • journal blocks

Digest

A digest is a cryptographic representation of your ledger's entire journal at a point in time. A journal is append-only, and journal blocks are sequenced and hash-chained similar to blockchains.

QLDB enables you to generate a digest as a secure output file. Then, you can use that digest to verify the integrity of document revisions that were committed at a prior point in time. If you recalculate hashes by starting with a revision and ending with the digest, you prove that your data has not been altered in between.

Merkle Tree

As the size of your ledger grows, it becomes increasingly inefficient to recalculate the journal's full hash chain for verification. QLDB uses a Merkle tree model to address this inefficiency.

A Merkle tree is a tree data structure in which each leaf node represents a hash of a data block. Each non-leaf node is a hash of its child nodes. Commonly used in blockchains, a Merkle tree enables efficient verification of large datasets with an audit proof mechanism. For more information about Merkle trees, see the Merkle tree Wikipedia page. To learn more about Merkle audit proofs and for an example use case, see How Log Proofs Work on the Certificate Transparency site.

The QLDB implementation of the Merkle tree is constructed from a journal's full hash chain. In this model, the leaf nodes are the set of all individual document revision hashes. The root node represents the digest of the entire journal as of a point in time.

Using a Merkle audit proof, you can verify a revision by checking only a small subset of your ledger's revision history. You do this by traversing the tree from a given leaf node (revision) to its root (digest). Along this traversal path, you recursively hash sibling pairs of nodes to compute their parent hash until you end with the digest. This traversal has a time complexity of log(n) nodes in the tree.

Proof

A proof is the ordered list of node hashes that QLDB returns for a given digest and document revision. It consists of the hashes that are required by a Merkle tree model to chain the given leaf node hash (a revision) to the root hash (the digest).

Changing any committed data between a revision and a digest breaks your journal's hash chain and makes it impossible to generate a proof.

Example

The following diagram illustrates the Amazon QLDB hash tree model. It shows a set of block hashes that rolls up to the top root node, which represents the digest of a journal strand. In a ledger with a single-strand journal, this root node is also the digest of the entire ledger.


                Amazon QLDB hash tree diagram for a set of block hashes in a journal
                    strand.

Suppose that node A is the block that contains the document revision whose hash you want to verify. The following nodes represent the ordered list of hashes that QLDB provides in your proof: B, E, G. These hashes are required to recalculate the digest from hash A.

To do this, start with hash A and concatenate it with hash B. Then, hash the result to compute D. Next, use D and E to compute F. Finally, use F and G to compute the digest. The verification is successful if your recalculated digest matches the expected value. Given a revision hash and a digest, it's not feasible to reverse engineer the hashes in a proof. Therefore, this exercise proves that your revision was indeed written in this journal location relative to the digest.

What Is the Verification Process in QLDB?

Before you can verify data, you must request a digest from your ledger and save it for later. Any document revision that is committed before the latest block covered by the digest is eligible for verification against that digest.

Then, you request a proof from Amazon QLDB for an eligible revision that you want to verify. Using this proof, you call a client-side API to recalculate the digest, starting with your revision hash. As long as the previously saved digest is known and trusted outside of QLDB, the integrity of your document is proven if your recalculated digest hash matches the saved digest hash.

Note

What you are specifically proving is that the document revision was not altered between the time that you saved this digest and when you run the verification.

For step-by-step guides on how to request a digest from your ledger and then verify your data, see the following: