Redacting document revisions - Amazon Quantum Ledger Database (Amazon QLDB)

Redacting document revisions

In Amazon QLDB, a DELETE statement only logically deletes a document by creating a new revision that marks it as deleted. QLDB also supports a data redaction operation that lets you permanently delete inactive document revisions in the history of a table.

Note

Any ledgers that were created before July 22, 2021 are currently not eligible for redaction. You can view the creation time of your ledger on the Amazon QLDB console.

The redaction operation deletes only the user data in the specified revision and leaves the journal sequence and the document metadata unchanged. This maintains the overall data integrity of your ledger.

Before you get started with data redaction in QLDB, make sure that you review Redaction considerations and limitations in the Amazon QLDB PartiQL reference.

Redaction stored procedure

You can use the REDACT_REVISION stored procedure to permanently delete an individual, inactive revision in a ledger. This stored procedure deletes all of the user data in the specified revision in both indexed storage and journal storage. However, it leaves the journal sequence and the document metadata, including the document ID and hash, unchanged. This operation is irreversible.

The specified document revision must be an inactive revision in history. The latest active revision of a document is not eligible for redaction.

To redact multiple revisions, you must run the stored procedure once for each revision. You can redact one revision per transaction.

Syntax

EXEC REDACT_REVISION `block-address`, 'table-id', 'document-id'
Arguments
`block-address`

The journal block location of the document revision to be redacted. An address is an Amazon Ion structure that has two fields: strandId and sequenceNo.

This is an Ion literal value that is denoted by backticks. For example:

`{strandId:"JdxjkR9bSYB5jMHWcI464T", sequenceNo:17}`
'table-id'

The unique ID of the table whose document revision you want to redact, denoted by single quotation marks.

'document-id'

The unique document ID of the revision to be redacted, denoted by single quotation marks.

Checking whether a redaction is complete

When you submit a redaction request by running the stored procedure, QLDB processes the redaction of data asynchronously. Upon completion, the user data in the revision (represented by the data structure) is removed permanently. To check whether a redaction request has completed, you can use one of the following:

After a revision redaction is complete, the revision's data structure is replaced by a new dataHash field. The value of this field is the Ion hash of the removed data structure, as shown in the following example. As a result, the ledger maintains its overall data integrity and remains cryptographically verifiable through the existing verification API operations. To learn more about verification, see Data verification in Amazon QLDB.

Redaction example

Consider the vehicle registration document that you previously reviewed in Querying revision history. Suppose that you want to redact the second revision (version:1). The following query example shows this revision before redaction. In the query results, the data structure that will be redacted is highlighted in red italics.

SELECT * FROM history(VehicleRegistration) AS h WHERE h.metadata.id = 'ADR2Ll1fGsU4Jr4EqTdnQF' --replace with your id AND h.metadata.version = 1
{ blockAddress:{ strandId:"JdxjkR9bSYB5jMHWcI464T", sequenceNo:17 }, hash:{{LGSFZ4iEYWZeMwmAqcxxNyT4wbCtuMOmFCj8pEd6Mp0=}}, data: { VIN: "1HVBBAANXWH544237", LicensePlateNumber: "LS477D", State: "WA", PendingPenaltyTicketAmount: 42.20, ValidFromDate: 2011-10-26T, ValidToDate: 2023-09-25T, Owners: { PrimaryOwner: { PersonId: "KmA3XPKKFqYCP2zhR3d0Ho" }, SecondaryOwners: [] }, City: "Bellevue" }, metadata:{ id:"ADR2Ll1fGsU4Jr4EqTdnQF", version:1, txTime:2019-06-05T21:01:442d-3Z, txId:"9cArhIQV5xf5Tf5vtsPwPq" } }

Note the blockAddress in the query results because you need to pass this value to the REDACT_REVISION stored procedure. Then, find the unique ID of the VehicleRegistration table by querying the system catalog, as follows.

SELECT tableId FROM information_schema.user_tables WHERE name = 'VehicleRegistration'

Use this table ID along with the document ID and block address to run REDACT_REVISION. The table ID and document ID are string literals that must be enclosed in single quotation marks, and the block address is an Ion literal that is enclosed in backticks. Be sure to replace these arguments with your own values as appropriate.

EXEC REDACT_REVISION `{strandId:"JdxjkR9bSYB5jMHWcI464T", sequenceNo:17}`, '5PLf9SXwndd63lPaSIa0O6', 'ADR2Ll1fGsU4Jr4EqTdnQF'
Tip

When you use the QLDB console or the QLDB shell to query for a table ID or document ID (or any string literal value), the returned value is enclosed in double quotation marks. However, when you specify the table ID and document ID arguments of the REDACT_REVISION stored procedure, you must enclose the values in single quotation marks.

This is because you write statements in PartiQL format, but QLDB returns results in Amazon Ion format. For details on the syntax and semantics of PartiQL in QLDB, see Querying Ion with PartiQL.

A valid redaction request returns an Ion structure that represents the document revision that you are redacting, as follows.

{ blockAddress: { strandId: "JdxjkR9bSYB5jMHWcI464T", sequenceNo: 17 }, tableId: "5PLf9SXwndd63lPaSIa0O6", documentId: "ADR2Ll1fGsU4Jr4EqTdnQF", version: 1 }

When you run this stored procedure, QLDB processes your redaction request asynchronously. Upon completion of the redaction, the data structure is permanently removed and replaced by a new dataHash field. The value of this field is the Ion hash of the removed data structure, as follows.

Note

This dataHash example is provided for informational purposes only and isn't a real calculated hash value.

{ blockAddress:{ strandId:"JdxjkR9bSYB5jMHWcI464T", sequenceNo:17 }, hash:{{LGSFZ4iEYWZeMwmAqcxxNyT4wbCtuMOmFCj8pEd6Mp0=}}, dataHash: {{s83jd7sfhsdfhksj7hskjdfjfpIPP/DP2hvionas2d4=}}, metadata:{ id:"ADR2Ll1fGsU4Jr4EqTdnQF", version:1, txTime:2019-06-05T21:01:442d-3Z, txId:"9cArhIQV5xf5Tf5vtsPwPq" } }

Deleting and redacting an active revision

Active document revisions (that is, the latest non-deleted revisions of each document) are not eligible for data redaction. Before you can redact an active revision, you must first update or delete it. This moves the previously active revision to history and makes it eligible for redaction.

If your use case requires the entire document to be marked as deleted, you first use a DELETE statement. For example, the following statement logically deletes the VehicleRegistration document with a VIN of 1HVBBAANXWH544237.

DELETE FROM VehicleRegistration AS r WHERE r.VIN = '1HVBBAANXWH544237'

Then, redact the previous revision before this deletion, as described previously. If required, you can also individually redact any prior revisions.

If your use case requires the document to remain active, you first use an UPDATE or FROM statement to obscure or remove the fields that you want to redact. This process is described in the following section.

Redacting a particular field within a revision

QLDB doesn't support the redaction of a particular field within a document revision. To do so, you can first use an UPDATE-REMOVE or FROM-REMOVE statement to remove an existing field from a revision. For example, the following statement removes the LicensePlateNumber field from the VehicleRegistration document with a VIN of 1HVBBAANXWH544237.

UPDATE VehicleRegistration AS r REMOVE r.LicensePlateNumber WHERE r.VIN = '1HVBBAANXWH544237'

Then, redact the previous revision before this removal, as described previously. If required, you can also individually redact any prior revisions that include this now removed field.

To learn how to optimize your queries, proceed to Optimizing query performance.