Data verification in Amazon QLDB - Amazon Quantum Ledger Database (Amazon QLDB)

Data verification in Amazon QLDB

With Amazon QLDB, you can trust that the history of changes to your application data is accurate. QLDB uses an immutable transactional log, known as a journal, for data storage. The journal tracks every change to your committed data and maintains a complete and verifiable history of changes over time.

QLDB uses the SHA-256 hash function with a Merkle tree–based model to generate a cryptographic representation of your journal, known as a digest. The digest acts as a unique signature of your data's entire change history as of a point in time. You use the digest to verify the integrity of your document revisions relative to that signature.

What kind of data can you verify in QLDB?

In QLDB, each ledger has exactly one journal. A journal can have multiple strands, which are partitions of the journal.

Note

QLDB currently supports journals with a single strand only.

A block is an object that is committed to the journal strand during a transaction. This block contains entry objects, which represent the document revisions that resulted from the transaction. You can verify either an individual revision or an entire journal block in QLDB.

The following diagram illustrates this journal structure.

Amazon QLDB journal structure diagram showing a set of hash-chained blocks that make up a strand, and the sequence number and hash of each block.

The diagram shows that transactions are committed to the journal as blocks that contain document revision entries. It also shows that each block is hash-chained to subsequent blocks and has a sequence number to specify its address within the strand.

For information about the data contents in a block, see Journal contents in Amazon QLDB.

What does data integrity mean?

Data integrity in QLDB means that your ledger's journal is in fact immutable. In other words, your data (specifically, each document revision) is in a state where the following are true:

  1. It exists at the same location in your journal where it was first written.

  2. It hasn't been altered in any way since it was written.

How does verification work?

To understand how verification works in Amazon QLDB, you can break down the concept into four basic components.

Hashing

QLDB uses the SHA-256 cryptographic hash function to create 256-bit hash values. A hash acts as a unique, fixed-length signature of any arbitrary amount of input data. If you change any part of the input—even a single character or bit—then the output hash changes completely.

The following diagram shows that the SHA-256 hash function creates completely unique hash values for two QLDB documents that differ by only a single digit.

Diagram showing that the SHA-256 cryptographic hash function creates completely unique hash values for two QLDB documents that differ by only a single digit.

The SHA-256 hash function is one way, which means that it's not mathematically feasible to compute the input when given an output. The following diagram shows that it's not feasible to compute the input QLDB document when given an output hash value.

Diagram showing that it's not feasible to compute the input QLDB document when given an output hash value.

The following data inputs are hashed in QLDB for verification purposes:

  • Document revisions

  • PartiQL statements

  • Revision entries

  • Journal blocks

Digest

A digest is a cryptographic representation of your ledger's entire journal at a point in time. A journal is append-only, and journal blocks are sequenced and hash-chained similar to blockchains.

You can request a digest for a ledger at any time. QLDB generates the digest and returns it to you as a secure output file. Then you use that digest to verify the integrity of document revisions that were committed at a prior point in time. If you recalculate hashes by starting with a revision and ending with the digest, you prove that your data has not been altered in between.

Merkle tree

As the size of your ledger grows, it becomes increasingly inefficient to recalculate the journal's full hash chain for verification. QLDB uses a Merkle tree model to address this inefficiency.

A Merkle tree is a tree data structure in which each leaf node represents a hash of a data block. Each non-leaf node is a hash of its child nodes. Commonly used in blockchains, a Merkle tree helps you efficiently verify large datasets with an audit proof mechanism. For more information about Merkle trees, see the Merkle tree Wikipedia page. To learn more about Merkle audit proofs and for an example use case, see How Log Proofs Work on the Certificate Transparency site.

The QLDB implementation of the Merkle tree is constructed from a journal's full hash chain. In this model, the leaf nodes are the set of all individual document revision hashes. The root node represents the digest of the entire journal as of a point in time.

Using a Merkle audit proof, you can verify a revision by checking only a small subset of your ledger's revision history. You do this by traversing the tree from a given leaf node (revision) to its root (digest). Along this traversal path, you recursively hash sibling pairs of nodes to compute their parent hash until you end with the digest. This traversal has a time complexity of log(n) nodes in the tree.

Proof

A proof is the ordered list of node hashes that QLDB returns for a given digest and document revision. It consists of the hashes that are required by a Merkle tree model to chain the given leaf node hash (a revision) to the root hash (the digest).

Changing any committed data between a revision and a digest breaks your journal's hash chain and makes it impossible to generate a proof.

Verification example

The following diagram illustrates the Amazon QLDB hash tree model. It shows a set of block hashes that rolls up to the top root node, which represents the digest of a journal strand. In a ledger with a single-strand journal, this root node is also the digest of the entire ledger.

Amazon QLDB hash tree diagram for a set of block hashes in a journal strand.

Suppose that node A is the block that contains the document revision whose hash you want to verify. The following nodes represent the ordered list of hashes that QLDB provides in your proof: B, E, G. These hashes are required to recalculate the digest from hash A.

To recalculate the digest, do the following:

  1. Start with hash A and concatenate it with hash B. Then, hash the result to compute D.

  2. Use D and E to compute F.

  3. Use F and G to compute the digest.

The verification is successful if your recalculated digest matches the expected value. Given a revision hash and a digest, it's not feasible to reverse engineer the hashes in a proof. Therefore, this exercise proves that your revision was indeed written in this journal location relative to the digest.

How does data redaction affect verification?

In Amazon QLDB, a DELETE statement only logically deletes a document by creating a new revision that marks it as deleted. QLDB also supports a data redaction operation that lets you permanently delete inactive document revisions in the history of a table.

The redaction operation deletes only the user data in the specified revision, and leaves the journal sequence and the document metadata unchanged. After a revision is redacted, the user data in the revision (represented by the data structure) is replaced by a new dataHash field. The value of this field is the Amazon Ion hash of the removed data structure. For more information and an example of a redaction operation, see Redacting document revisions.

As a result, the ledger maintains its overall data integrity and remains cryptographically verifiable through the existing verification API operations. You can still use these API operations as expected to request a digest (GetDigest), request a proof (GetBlock or GetRevision), and then run your verification algorithm using the returned objects.

Recalculating a revision hash

If you plan to verify an individual document revision by recalculating its hash, you must conditionally check whether the revision was redacted. If the revision was redacted, you can use the hash value that is provided in the dataHash field. If it wasn't redacted, you can recalculate the hash by using the data field.

By doing this conditional check, you can identify redacted revisions and take the appropriate action. For example, you can log data manipulation events for monitoring purposes.

Getting started with verification

Before you can verify data, you must request a digest from your ledger and save it for later. Any document revision that is committed before the latest block covered by the digest is eligible for verification against that digest.

Then, you request a proof from Amazon QLDB for an eligible revision that you want to verify. Using this proof, you call a client-side API to recalculate the digest, starting with your revision hash. As long as the previously saved digest is known and trusted outside of QLDB, the integrity of your document is proven if your recalculated digest hash matches the saved digest hash.

Important
  • What you're specifically proving is that the document revision wasn't altered between the time that you saved this digest and when you run the verification. You can request and save a digest as soon as a revision that you want to verify later is committed to the journal.

  • As a best practice, we recommend that you request digests on a regular basis and store them away from the ledger. Determine the frequency that you request digests based on how often you commit revisions in your ledger.

    For a detailed AWS blog post that discusses the value of cryptographic verification in the context of a realistic use case, see Real-world cryptographic verification with Amazon QLDB.

For step-by-step guides on how to request a digest from your ledger and then verify your data, see the following: