View a markdown version of this page

Understanding CDC records - Amazon Aurora DSQL

Understanding CDC records

Important

This feature is provided as an AWS Preview and is subject to change. For more information, see section 2, Betas and Previews, in the AWS Service Terms. To learn more about pricing for CDC streams, see the Aurora DSQL pricing page.

Before general availability, we will add new operation types ("op": "u" for updates) to your stream payload. To ensure your application handles these changes without modification, treat any unrecognized op value as an upsert by applying the after payload. See Understanding CDC records for details.

Aurora DSQL CDC delivers each change as a JSON record. The record uses an envelope structure with operation type, before and after row images, and source metadata.

How records map to Amazon Kinesis

Aurora DSQL writes each CDC record as a single Kinesis record. The Kinesis record's Data field contains the JSON payload. Aurora DSQL uses a randomized Kinesis partition key to distribute CDC records evenly across shards. To read all changes, consume all shards on the Kinesis data stream. If a record exceeds the Kinesis record size limit, Aurora DSQL splits it across multiple Kinesis records. For details, see Handling oversized records.

Note

A Kinesis record has one Data blob. The primary key values appear in the JSON payload's before field for deletes, or the after field for inserts and updates. To extract the primary key for downstream processing, read it from the appropriate field in the payload.

Primary key in the payload

For tables with a primary key, the primary key column values appear in the payload:

  • For inserts and updates, the payload includes the primary key columns along with all other columns in the after field.

  • For deletes, the primary key columns appear in the before field.

For example, consider a table with a composite primary key:

CREATE TABLE order_items ( order_id INT, item_id INT, quantity INT, price NUMERIC, PRIMARY KEY (order_id, item_id) );

A delete on this table produces a payload where "before": {"order_id": 1001, "item_id": 42}.

Record payload

The payload uses the following JSON envelope format.

INSERT example

The following example shows a CDC record for an insert operation:

{ "type": "full", "op": "c", "before": null, "after": {"order_id": 1001, "item_id": 42, "quantity": 5, "price": "29.99"}, "source": { "version": "1.0", "ts_ms": 1705318200000, "ts_ns": 1705318200000000000, "txId": "ffthunp5stx6ffs2vyfqoatmfu", "schema": "public", "table": "order_items", "db": "postgres", "cluster": "kmabugltfmjdaj2siqr2qbxgju" }, "ts_ms": 1705318200125, "ts_ns": 1705318200125483291 }
UPDATE example

The following example shows what a CDC record produced by an UPDATE statement will look like after Aurora DSQL begins emitting op: "u":

Important

Currently Aurora DSQL emits op: "c" for both inserts and updates. A subsequent release will emit op: "u" for updates, and op: "c" for inserts. Design your app to handle c, u, and d so your consumer keeps working across the transition.

{ "type": "full", "op": "u", "before": null, "after": {"order_id": 1001, "item_id": 42, "quantity": 10, "price": "29.99"}, "source": { "version": "1.0", "ts_ms": 1705318300000, "ts_ns": 1705318300000000000, "txId": "qvtiesgmd55cvlfukm3dfuotji", "schema": "public", "table": "order_items", "db": "postgres", "cluster": "kmabugltfmjdaj2siqr2qbxgju" }, "ts_ms": 1705318300125, "ts_ns": 1705318300125483291 }
DELETE example

For deletes on tables with a primary key, the before field contains the primary key values of the deleted row:

{ "type": "full", "op": "d", "before": {"order_id": 1001, "item_id": 42}, "after": null, "source": { "version": "1.0", "ts_ms": 1705318400000, "ts_ns": 1705318400000000000, "txId": "xyzabc123def456ghi789jklmno", "schema": "public", "table": "order_items", "db": "postgres", "cluster": "kmabugltfmjdaj2siqr2qbxgju" }, "ts_ms": 1705318400125, "ts_ns": 1705318400125483291 }

Payload fields

Field Description
type The record type. full for a complete record that includes inline before and after values. chunked for a main record that references fragment records for one or both images. fragment for an individual piece of a chunked image. For details, see Handling oversized records.
op Operation type. c = create (insert), u = update, d = delete. Currently Aurora DSQL emits c for both inserts and updates. A subsequent release will emit u for updates, and c for inserts. Design your app to handle all three values.
before For deletes on tables with a primary key, contains the primary key values of the deleted row. Aurora DSQL sets this field to null for inserts, updates, and deletes on tables without a primary key.
after The full row state after the change, including all columns. Aurora DSQL sets this field to null for deletes.
chunked Present only when type is chunked. Contains reassembly metadata for the before image, the after image, or both. Aurora DSQL omits the chunked image from the top-level before or after field and places it under chunked instead. For details, see Handling oversized records.
source.version The CDC source metadata format version. The current version is 1.0.
source.ts_ms The transaction commit timestamp in milliseconds since the Unix epoch, Coordinated Universal Time (UTC).
source.ts_ns Transaction commit timestamp in nanoseconds, UTC. The highest precision timestamp available. Use this field to establish a total order of transactions.
source.txId A unique transaction identifier, encoded as base32. All records from the same transaction share the same txId value. Use this field to group records that belong to the same transaction.
source.schema The PostgreSQL schema name (for example, public).
source.table The table name.
source.db The database name. Always postgres for Aurora DSQL.
source.cluster The Aurora DSQL cluster identifier.
ts_ms The time at which the CDC system processed the record, in milliseconds, UTC. The difference between ts_ms and source.ts_ms is a measure of replication lag.
ts_ns The time at which the CDC system processed the record, in nanoseconds, UTC.

Format details

The following details describe how Aurora DSQL CDC formats records. Design your app to handle these behaviors.

  • Full after-image for inserts and updates. Aurora DSQL includes the complete row state in the after field for all writes. The before field is null for inserts and updates. Currently both inserts and updates use op: "c", but a subsequent release will emit op: "u" for updates. Design your app to use source.ts_ns per primary key for ordering rather than relying on the op field to distinguish between inserts and updates.

  • Post-change row state only. CDC records include the full row state after each change. The state of the row before an update isn't included. For deletes on tables with a primary key, the before field contains the primary key values.

  • Numeric types serialized as strings. Aurora DSQL serializes numeric and decimal values as JSON strings to preserve exact precision.

  • Binary data encoded as Base64. Aurora DSQL encodes bytea values as Base64 strings.

  • Special floating-point and numeric values. Aurora DSQL serializes NaN and ±Infinity as the strings "NaN", "Infinity", and "-Infinity". This applies to real, double precision, and numeric types.

  • JSON columns serialized as JSON strings. Aurora DSQL serializes json column values as JSON strings that contain the raw JSON text stored in the column. Parse the string value in your app (for example, with JSON.parse in JavaScript or json.loads in Python) to access the underlying JSON value.

  • Overflow values emitted as null. If a value can't be represented in the target JSON type during serialization, Aurora DSQL emits JSON null for that column. This applies to interval values whose total microseconds exceed the 64-bit signed integer range (±9,223,372,036,854,775,807 microseconds, approximately ±292,271 years). Design your app to handle unexpected null values in columns that aren't nullable in the database schema.

  • Oversized records split into chunks. If a record exceeds the Amazon Kinesis record size limit, Aurora DSQL splits the affected before or after image into fragments and delivers them as separate Kinesis records so you still receive the change. Design your app to reassemble the images. For details, see Handling oversized records.

Handling oversized records

When a CDC record's serialized JSON exceeds 9 MiB, Aurora DSQL splits the before and/or after images, delivering multiple Kinesis records. Each record contains a top-level type field that indicates its structure: full for a complete record, chunked for a main record that references fragments, and fragment for an individual piece of a chunked image. The op, source, ts_ms, and ts_ns fields on a chunked main record behave the same as on a full record. Records that fit in a single Kinesis record have type set to full and don't require any extra handling.

A chunk_id is stable across retries. If Aurora DSQL redelivers a fragment, it carries the same chunk_id as the original delivery, so your app can continue buffering under the same identifier without handling partial sets from previous attempts.

Main record

A chunked main record replaces the top-level before or after field for the split image with a chunked object that describes how to reassemble it. Each entry under chunked has a chunk_id (the identifier that links fragments to this record), total_fragments (the number of fragments that make up that image), and crc32c (a CRC32C checksum, as a decimal string, over the reassembled image text). If one image is inline and the other is chunked, the inline image still appears at the top level as either a value or null.

{ "type": "chunked", "op": "c", "before": null, "after": null, "source": { "version": "1.0", "ts_ms": 1705318200000, "ts_ns": 1705318200000000000, "txId": "ffthunp5stx6ffs2vyfqoatmfu", "schema": "public", "table": "order_items", "db": "postgres", "cluster": "cluster-id" }, "chunked": { "after": { "chunk_id": "chunk-id", "total_fragments": 3, "crc32c": "2073618257" } }, "ts_ms": 1705318200125, "ts_ns": 1705318200125483291 }
Fragment record

Each fragment is its own Kinesis record with type set to fragment and three fields: chunk_id matches the value in the corresponding chunked.before.chunk_id or chunked.after.chunk_id on the main record, index is the zero-based position of the fragment within the image, and data is a segment of the image's JSON text split on UTF-8 character boundaries (each fragment's data value is a valid UTF-8 string on its own). Because Aurora DSQL CDC uses UNORDERED mode and randomized partition keys, fragments and the main record can arrive on different shards and in any order. To read all fragments, consume all shards on the Kinesis data stream. For more information about delivery ordering, see Ordering.

{ "type": "fragment", "chunk_id": "chunk-id", "index": 0, "data": "partial-JSON-text" }

To reassemble an oversized image, buffer each record with type fragment by its chunk_id. When you receive a main record with type chunked, wait until you have total_fragments fragments for each chunk_id referenced under chunked.before or chunked.after, sort the fragments by index ascending, and concatenate the data strings. The concatenated result is the original before or after object as JSON text—parse it to access the column values. To verify delivery integrity, compute CRC32C over the concatenated string and compare the result to chunked.before.crc32c or chunked.after.crc32c.

Data type serialization

The following tables describe how Aurora DSQL serializes each PostgreSQL data type in CDC records.

Integer types

PostgreSQL type JSON representation Example
smallint (int2) JSON number 42
integer (int4) JSON number 1001
bigint (int8) JSON number 9223372036854775807
oid JSON number (unsigned) 16384

Values of bigint beyond ±2^53 might lose precision in JavaScript environments. Use BigInt or arbitrary-precision libraries in those cases.

Floating-point types

PostgreSQL type JSON representation Example Notes
real (float4) JSON number 3.14159 NaN and ±Infinity are serialized as the strings "NaN", "Infinity", "-Infinity".
double precision (float8) JSON number 3.141592653589793 Same special value handling as real.
numeric / decimal JSON string "123.45" Always a string to preserve exact precision. NaN and ±Infinity are serialized as the strings "NaN", "Infinity", "-Infinity".

Boolean

PostgreSQL type JSON representation Example
boolean JSON boolean true or false

Character types

PostgreSQL type JSON representation Example
varchar / text JSON string "Hello, world!"
bpchar (char(n)) JSON string "ABC" (trailing spaces stripped)
name JSON string "pg_class"
"char" (single-byte) JSON string "A"

Binary

PostgreSQL type JSON representation Example
bytea JSON string (Base64) "SGVsbG8gV29ybGQh"

Date and time types

PostgreSQL type JSON representation Example Notes
date JSON number (days since Unix epoch) 19797 +infinity and -infinity are represented as sentinel day counts derived from epoch-offset arithmetic. These values don't correspond to meaningful calendar dates.
time JSON number (microseconds since midnight) 52200123456
timetz JSON number (microseconds since midnight, UTC) 52200123456 The local time is adjusted to UTC by applying the stored timezone offset (seconds west of UTC). The result is wrapped to the range [0, 86400000000) microseconds.
timestamp JSON number (microseconds since Unix epoch) 1710510600123456 ±Infinity maps to sentinel values: 9223372036825200000 for +infinity and -9223372036832400000 for -infinity.
timestamptz JSON number (microseconds since Unix epoch) 1710510600123456 Stored and emitted in UTC. Same ±infinity sentinel values as timestamp.
interval JSON number (approximate total microseconds) 2802603000000 Months are approximated as 30.4375 days (2,629,800 seconds). The total is computed as (months × 2,629,800 + days × 86,400) × 1,000,000 + microseconds. If the result exceeds the 64-bit signed integer range (±9,223,372,036,854,775,807 microseconds, approximately ±292,271 years), Aurora DSQL emits JSON null for the column.

Other types

PostgreSQL type JSON representation Example
uuid JSON string (standard 8-4-4-4-12 hex format) "550e8400-e29b-41d4-a716-446655440000"
oidvector JSON empty array []
json JSON string containing the raw JSON text "{\"key\": \"value\"}"

NULL values

For any data type, NULL column values are represented as JSON null.

Schema evolution in CDC records

When you modify a table's schema—for example, by adding, dropping, or renaming a column—CDC records reflect the change starting from the transaction that committed the DDL change. Records from transactions committed before the DDL change use the previous schema. For example:

  • If you add a column, records from earlier transactions don't include the new column. Records from the adding transaction onward include the new column.

  • If you drop a column, records from the dropping transaction onward no longer include that column.

  • If you rename a column, records from the renaming transaction onward use the new column name.

Track schema changes in your downstream consumer by inspecting the column names present in each record's after and before fields. The source.version field in each record identifies the CDC envelope format.