Data verification in Amazon QLDB
With Amazon QLDB, you can trust that the history of changes to your application data is accurate. QLDB uses an immutable transactional log, known as a journal, for data storage. The journal tracks every change to your committed data and maintains a complete and verifiable history of changes over time.
QLDB uses the SHA256 hash function with a Merkle tree–based model to generate a cryptographic representation of your journal, known as a digest. The digest acts as a unique signature of your data's entire change history as of a point in time. It enables you to look back and verify the integrity of your document revisions relative to that signature.
Topics
What kind of data can you verify in QLDB?
In QLDB, each ledger has exactly one journal. A journal can have multiple strands, which are partitions of the journal.
QLDB currently supports journals with a single strand only.
A block is an object that is committed to the journal strand during a transaction. This block contains entry objects, which represent the document revisions that resulted from the transaction. You can verify either an individual revision or an entire journal block in QLDB.
The following diagram illustrates this journal structure.
The diagram shows that transactions are committed to the journal as blocks that contain document revision entries. It also shows that each block is hashchained to subsequent blocks and has a sequence number to specify its address within the strand.
For information about the data contents in a block, see Journal contents in Amazon QLDB.
What does data integrity mean?
Data integrity in QLDB means that your ledger's journal is in fact immutable. In other words, your data (specifically, each document revision) is in a state where the following are true:

It exists at the same location in your journal where it was first written.

It hasn't been altered in any way since it was written.
How does verification work?
To understand how verification works in Amazon QLDB, you can break down the concept into four basic components.
Hashing
QLDB uses the SHA256 cryptographic hash function to create 256bit hash values. A hash acts as a unique, fixedlength signature of any arbitrary amount of input data. If you change any part of the input—even a single character or bit—then the output hash changes completely.
The SHA256 hash function is oneway, meaning that it's not mathematically feasible to compute the input when given an output.
The following data inputs are hashed in QLDB for verification purposes:

Document revisions

PartiQL statements

Revision entries

Journal blocks
Digest
A digest is a cryptographic representation of your ledger's entire journal at a point in time. A journal is appendonly, and journal blocks are sequenced and hashchained similar to blockchains.
QLDB enables you to generate a digest as a secure output file. Then, you can use that digest to verify the integrity of document revisions that were committed at a prior point in time. If you recalculate hashes by starting with a revision and ending with the digest, you prove that your data has not been altered in between.
Merkle tree
As the size of your ledger grows, it becomes increasingly inefficient to recalculate the journal's full hash chain for verification. QLDB uses a Merkle tree model to address this inefficiency.
A Merkle tree is a tree data structure in which each leaf
node represents a hash of a data block. Each nonleaf node is a hash of its child
nodes. Commonly used in blockchains, a Merkle tree enables efficient verification
of
large datasets with an audit proof mechanism. For more information about Merkle
trees, see the Merkle tree
Wikipedia page
The QLDB implementation of the Merkle tree is constructed from a journal's full hash chain. In this model, the leaf nodes are the set of all individual document revision hashes. The root node represents the digest of the entire journal as of a point in time.
Using a Merkle audit proof, you can verify a revision by checking only a small
subset of your ledger's revision history. You do this by traversing the tree from
a
given leaf node (revision) to its root (digest). Along this traversal path, you
recursively hash sibling pairs of nodes to compute their parent hash until you end
with the digest. This traversal has a time complexity of log(n)
nodes
in the tree.
Proof
A proof is the ordered list of node hashes that QLDB returns for a given digest and document revision. It consists of the hashes that are required by a Merkle tree model to chain the given leaf node hash (a revision) to the root hash (the digest).
Changing any committed data between a revision and a digest breaks your journal's hash chain and makes it impossible to generate a proof.
Verification example
The following diagram illustrates the Amazon QLDB hash tree model. It shows a set of block hashes that rolls up to the top root node, which represents the digest of a journal strand. In a ledger with a singlestrand journal, this root node is also the digest of the entire ledger.
Suppose that node A is the block that contains the document revision whose hash you want to verify. The following nodes represent the ordered list of hashes that QLDB provides in your proof: B, E, G. These hashes are required to recalculate the digest from hash A.
To recalculate the digest, do the following:

Start with hash A and concatenate it with hash B. Then, hash the result to compute D.

Use D and E to compute F.

Use F and G to compute the digest.
The verification is successful if your recalculated digest matches the expected value. Given a revision hash and a digest, it's not feasible to reverse engineer the hashes in a proof. Therefore, this exercise proves that your revision was indeed written in this journal location relative to the digest.
Getting started with verification
Before you can verify data, you must request a digest from your ledger and save it for later. Any document revision that is committed before the latest block covered by the digest is eligible for verification against that digest.
Then, you request a proof from Amazon QLDB for an eligible revision that you want to verify. Using this proof, you call a clientside API to recalculate the digest, starting with your revision hash. As long as the previously saved digest is known and trusted outside of QLDB, the integrity of your document is proven if your recalculated digest hash matches the saved digest hash.

What you are specifically proving is that the document revision was not altered between the time that you saved this digest and when you run the verification. As a best practice, save the digest as soon as the document revision that you want to verify is written to the journal.

We recommend that you request a digest and save it in a secure place at regular intervals. Determine the frequency at which you save digests based on how often you commit revisions in your ledger.
For stepbystep guides on how to request a digest from your ledger and then verify your data, see the following: