AWS Encryption SDK message format reference - AWS Encryption SDK

AWS Encryption SDK message format reference

The information on this page is a reference for building your own encryption library that is compatible with the AWS Encryption SDK. If you are not building your own compatible encryption library, you likely do not need this information.

To use the AWS Encryption SDK in one of the supported programming languages, see Programming languages.

For the specification that defines the elements of a proper AWS Encryption SDK implementation, see the AWS Encryption SDK Specification in GitHub.

The encryption operations in the AWS Encryption SDK return a single data structure or encrypted message that contains the encrypted data (ciphertext) and all encrypted data keys. To understand this data structure, or to build libraries that read and write it, you need to understand the message format.

The message format consists of at least two parts: a header and a body. In some cases, the message format consists of a third part, a footer. The message format defines an ordered sequence of bytes in network byte order, also called big-endian format. The message format begins with the header, followed by the body, followed by the footer (when there is one).

The algorithms suites supported by the AWS Encryption SDK use one of two message format versions. Algorithm suites without key commitment use message format version 1. Algorithm suites with key commitment use message format version 2.

Header structure

The message header contains the encrypted data key and information about how the message body is formed. The following table describes the fields that form the header in message format versions 1 and 2. The bytes are appended in the order shown.

The Not present value indicates that the field doesn't exist in that version of the message format. Bold text indicates values that are different in each version.

Note

You might need to scroll horizontally or vertically to see all of the data in this table.

Header Structure
Field Message format version 1

Length (bytes)

Message format version 2

Length (bytes)

Version 1 1
Type 1 Not present
Algorithm ID 2 2
Message ID 16 32
AAD Length

2

When the encryption context is empty, the value of the 2-byte AAD Length field is 0.

2

When the encryption context is empty, the value of the 2-byte AAD Length field is 0.

AAD

Variable. The length of this field appears in the previous 2 bytes (AAD Length field).

When the encryption context is empty, there is no AAD field in the header.

Variable. The length of this field appears in the previous 2 bytes (AAD Length field).

When the encryption context is empty, there is no AAD field in the header.

Encrypted Data Key Count 2 2
Encrypted Data Key(s) Variable. Determined by the number of encrypted data keys and the length of each. Variable. Determined by the number of encrypted data keys and the length of each.
Content Type 1 1
Reserved 4 Not present
IV Length 1 Not present
Frame Length 4 4
Algorithm Suite Data Not present Variable. Determined by the algorithm that generated the message.
Header Authentication Variable. Determined by the algorithm that generated the message. Variable. Determined by the algorithm that generated the message.
Version

The version of this message format. The version is either 1 or 2 encoded as the byte 01 or 02 in hexadecimal notation

Type

The type of this message format. The type indicates the kind of structure. The only supported type is described as customer authenticated encrypted data. Its type value is 128, encoded as byte 80 in hexadecimal notation.

This field is not present in message format version 2.

Algorithm ID

An identifier for the algorithm used. It is a 2-byte value interpreted as a 16-bit unsigned integer. For more information about the algorithms, see AWS Encryption SDK algorithms reference.

Message ID

A randomly generated value that identifies the message. The Message ID:

  • Uniquely identifies the encrypted message.

  • Weakly binds the message header to the message body.

  • Provides a mechanism to securely reuse a data key with multiple encrypted messages.

  • Protects against accidental reuse of a data key or the wearing out of keys in the AWS Encryption SDK.

This value is 128 bits in message format version 1 and 256 bits in version 2.

AAD Length

The length of the additional authenticated data (AAD). It is a 2-byte value interpreted as a 16-bit unsigned integer that specifies the number of bytes that contain the AAD.

When the encryption context is empty, the value of the AAD Length field is 0.

AAD

The additional authenticated data. The AAD is an encoding of the encryption context, an array of key-value pairs where each key and value is a string of UTF-8 encoded characters. The encryption context is converted to a sequence of bytes and used for the AAD value. When the encryption context is empty, there is no AAD field in the header.

When the algorithms with signing are used, the encryption context must contain the key-value pair {'aws-crypto-public-key', Qtxt}. Qtxt represents the elliptic curve point Q compressed according to SEC 1 version 2.0 and then base64-encoded. The encryption context can contain additional values, but the maximum length of the constructed AAD is 2^16 - 1 bytes.

The following table describes the fields that form the AAD. Key-value pairs are sorted, by key, in ascending order according to UTF-8 character code. The bytes are appended in the order shown.

AAD Structure
Field Length (bytes)
Key-Value Pair Count 2
Key Length 2
Key Variable. Equal to the value specified in the previous 2 bytes (Key Length).
Value Length 2
Value Variable. Equal to the value specified in the previous 2 bytes (Value Length).
Key-Value Pair Count

The number of key-value pairs in the AAD. It is a 2-byte value interpreted as a 16-bit unsigned integer that specifies the number of key-value pairs in the AAD. The maximum number of key-value pairs in the AAD is 2^16 - 1.

When there is no encryption context or the encryption context is empty, this field is not present in the AAD structure.

Key Length

The length of the key for the key-value pair. It is a 2-byte value interpreted as a 16-bit unsigned integer that specifies the number of bytes that contain the key.

Key

The key for the key-value pair. It is a sequence of UTF-8 encoded bytes.

Value Length

The length of the value for the key-value pair. It is a 2-byte value interpreted as a 16-bit unsigned integer that specifies the number of bytes that contain the value.

Value

The value for the key-value pair. It is a sequence of UTF-8 encoded bytes.

Encrypted Data Key Count

The number of encrypted data keys. It is a 2-byte value interpreted as a 16-bit unsigned integer that specifies the number of encrypted data keys. The maximum number of encrypted data keys in each message is 65,535 (2^16 - 1).

Encrypted Data Key(s)

A sequence of encrypted data keys. The length of the sequence is determined by the number of encrypted data keys and the length of each. The sequence contains at least one encrypted data key.

The following table describes the fields that form each encrypted data key. The bytes are appended in the order shown.

Encrypted Data Key Structure
Field Length (bytes)
Key Provider ID Length 2
Key Provider ID Variable. Equal to the value specified in the previous 2 bytes (Key Provider ID Length).
Key Provider Information Length 2
Key Provider Information Variable. Equal to the value specified in the previous 2 bytes (Key Provider Information Length).
Encrypted Data Key Length 2
Encrypted Data Key Variable. Equal to the value specified in the previous 2 bytes (Encrypted Data Key Length).
Key Provider ID Length

The length of the key provider identifier. It is a 2-byte value interpreted as a 16-bit unsigned integer that specifies the number of bytes that contain the key provider ID.

Key Provider ID

The key provider identifier. It is used to indicate the provider of the encrypted data key and intended to be extensible.

Key Provider Information Length

The length of the key provider information. It is a 2-byte value interpreted as a 16-bit unsigned integer that specifies the number of bytes that contain the key provider information.

Key Provider Information

The key provider information. It is determined by the key provider.

When AWS KMS is the master key provider or you are using an AWS KMS keyring, this value contains the Amazon Resource Name (ARN) of the AWS KMS key.

Encrypted Data Key Length

The length of the encrypted data key. It is a 2-byte value interpreted as a 16-bit unsigned integer that specifies the number of bytes that contain the encrypted data key.

Encrypted Data Key

The encrypted data key. It is the data encryption key encrypted by the key provider.

Content Type

The type of encrypted data, either nonframed or framed.

Note

Whenever possible, use framed data. The AWS Encryption SDK supports nonframed data only for legacy use. Some language implementations of the AWS Encryption SDK can still generate nonframed ciphertext. All supported language implementations can decrypt framed and nonframed ciphertext.

Framed data is divided into equal-length parts; each part is encrypted separately. Framed content is type 2, encoded as the byte 02 in hexadecimal notation.

Nonframed data is not divided; it is a single encrypted blob. Non-framed content is type 1, encoded as the byte 01 in hexadecimal notation.

Reserved

A reserved sequence of 4 bytes. This value must be 0. It is encoded as the bytes 00 00 00 00 in hexadecimal notation (that is, a 4-byte sequence of a 32-bit integer value equal to 0).

This field is not present in message format version 2.

IV Length

The length of the initialization vector (IV). It is a 1-byte value interpreted as an 8-bit unsigned integer that specifies the number of bytes that contain the IV. This value is determined by the IV bytes value of the algorithm that generated the message.

This field is not present in message format version 2, which only supports algorithm suites that use deterministic IV values in the message header.

Frame Length

The length of each frame of framed data. It is a 4-byte value interpreted as a 32-bit unsigned integer that specifies the number of bytes in each frame. When the data is nonframed, that is, when the value of the Content Type field is 1, this value must be 0.

Note

Whenever possible, use framed data. The AWS Encryption SDK supports nonframed data only for legacy use. Some language implementations of the AWS Encryption SDK can still generate nonframed ciphertext. All supported language implementations can decrypt framed and nonframed ciphertext.

Algorithm Suite Data

Supplementary data needed by the algorithm that generated the message. The length and contents are determined by the algorithm. Its length might be 0.

This field is not present in message format version 1.

Header Authentication

The header authentication is determined by the algorithm that generated the message. The header authentication is calculated over the entire header. It consists of an IV and an authentication tag. The bytes are appended in the order shown.

Header Authentication Structure
Field Length in version 1.0 (bytes) Length in version 2.0 (bytes)
IV Variable. Determined by the IV bytes value of the algorithm that generated the message. N/A
Authentication Tag Variable. Determined by the authentication tag bytes value of the algorithm that generated the message. Variable. Determined by the authentication tag bytes value of the algorithm that generated the message.
IV

The initialization vector (IV) used to calculate the header authentication tag.

This field is not present in the header of message format version 2. Message format version 2 only supports algorithm suites that use deterministic IV values in the message header.

Authentication Tag

The authentication value for the header. It is used to authenticate the entire contents of the header.

Body structure

The message body contains the encrypted data, called the ciphertext. The structure of the body depends on the content type (nonframed or framed). The following sections describe the format of the message body for each content type. The message body structure is the same in message format versions 1 and 2.

Non-framed data

Non-framed data is encrypted in a single blob with a unique IV and body AAD.

Note

Whenever possible, use framed data. The AWS Encryption SDK supports nonframed data only for legacy use. Some language implementations of the AWS Encryption SDK can still generate nonframed ciphertext. All supported language implementations can decrypt framed and nonframed ciphertext.

The following table describes the fields that form nonframed data. The bytes are appended in the order shown.

Non-Framed Body Structure
Field Length, in bytes
IV Variable. Equal to the value specified in the IV Length byte of the header.
Encrypted Content Length 8
Encrypted Content Variable. Equal to the value specified in the previous 8 bytes (Encrypted Content Length).
Authentication Tag Variable. Determined by the algorithm implementation used.
IV

The initialization vector (IV) to use with the encryption algorithm.

Encrypted Content Length

The length of the encrypted content, or ciphertext. It is an 8-byte value interpreted as a 64-bit unsigned integer that specifies the number of bytes that contain the encrypted content.

Technically, the maximum allowed value is 2^63 - 1, or 8 exbibytes (8 EiB). However, in practice the maximum value is 2^36 - 32, or 64 gibibytes (64 GiB), due to restrictions imposed by the implemented algorithms.

Note

The Java implementation of this SDK further restricts this value to 2^31 - 1, or 2 gibibytes (2 GiB), due to restrictions in the language.

Encrypted Content

The encrypted content (ciphertext) as returned by the encryption algorithm.

Authentication Tag

The authentication value for the body. It is used to authenticate the message body.

Framed data

In framed data, the plaintext data is divided into equal-length parts called frames. The AWS Encryption SDK encrypts each frame separately with a unique IV and body AAD.

Note

Whenever possible, use framed data. The AWS Encryption SDK supports nonframed data only for legacy use. Some language implementations of the AWS Encryption SDK can still generate nonframed ciphertext. All supported language implementations can decrypt framed and nonframed ciphertext.

The frame length, which is the length of the encrypted content in the frame, can be different for each message. The maximum number of bytes in a frame is 2^32 - 1. The maximum number of frames in a message is 2^32 - 1.

There are two types of frames: regular and final. Every message must consist of or include a final frame.

All regular frames in a message have the same frame length. The final frame can have a different frame length.

The composition of frames in framed data varies with the length of the encrypted content.

  • Equal to the frame length — When the encrypted content length is the same as the frame length of the regular frames, the message can consist of a regular frame that contains the data, followed by a final frame of zero (0) length. Or, the message can consist only of a final frame that contains the data. In this case, the final frame has the same frame length as the regular frames.

  • Multiple of the frame length — When the encrypted content length is an exact multiple of the frame length of the regular frames, the message can end in a regular frame that contains the data, followed by a final frame of zero (0) length. Or, the message can end in a final frame that contains the data. In this case, the final frame has the same frame length as the regular frames.

  • Not a multiple of the frame length — When the encrypted content length is not an exact multiple of the frame length of the regular frames, the final frame contains the remaining data. The frame length of the final frame is less than the frame length of the regular frames.

  • Less than the frame length — When the encrypted content length is less than the frame length of the regular frames, the message consists of a final frame that contains all of the data. The frame length of the final frame is less than the frame length of the regular frames.

The following tables describe the fields that form the frames. The bytes are appended in the order shown.

Framed Body Structure, Regular Frame
Field Length, in bytes
Sequence Number 4
IV Variable. Equal to the value specified in the IV Length byte of the header.
Encrypted Content Variable. Equal to the value specified in the Frame Length of the header.
Authentication Tag Variable. Determined by the algorithm used, as specified in the Algorithm ID of the header.
Sequence Number

The frame sequence number. It is an incremental counter number for the frame. It is a 4-byte value interpreted as a 32-bit unsigned integer.

Framed data must start at sequence number 1. Subsequent frames must be in order and must contain an increment of 1 of the previous frame. Otherwise, the decryption process stops and reports an error.

IV

The initialization vector (IV) for the frame. The SDK uses a deterministic method to construct a different IV for each frame in the message. Its length is specified by the algorithm suite used.

Encrypted Content

The encrypted content (ciphertext) for the frame, as returned by the encryption algorithm.

Authentication Tag

The authentication value for the frame. It is used to authenticate the entire frame.

Framed Body Structure, Final Frame
Field Length, in bytes
Sequence Number End 4
Sequence Number 4
IV Variable. Equal to the value specified in the IV Length byte of the header.
Encrypted Content Length 4
Encrypted Content Variable. Equal to the value specified in the previous 4 bytes (Encrypted Content Length).
Authentication Tag Variable. Determined by the algorithm used, as specified in the Algorithm ID of the header.
Sequence Number End

An indicator for the final frame. The value is encoded as the 4 bytes FF FF FF FF in hexadecimal notation.

Sequence Number

The frame sequence number. It is an incremental counter number for the frame. It is a 4-byte value interpreted as a 32-bit unsigned integer.

Framed data must start at sequence number 1. Subsequent frames must be in order and must contain an increment of 1 of the previous frame. Otherwise, the decryption process stops and reports an error.

IV

The initialization vector (IV) for the frame. The SDK uses a deterministic method to construct a different IV for each frame in the message. The length of the IV length is specified by the algorithm suite.

Encrypted Content Length

The length of the encrypted content. It is a 4-byte value interpreted as a 32-bit unsigned integer that specifies the number of bytes that contain the encrypted content for the frame.

Encrypted Content

The encrypted content (ciphertext) for the frame, as returned by the encryption algorithm.

Authentication Tag

The authentication value for the frame. It is used to authenticate the entire frame.

When the algorithms with signing are used, the message format contains a footer. The message footer contains a digital signature calculated over the message header and body. The following table describes the fields that form the footer. The bytes are appended in the order shown. The message footer structure is the same in message format versions 1 and 2.

Footer Structure
Field Length, in bytes
Signature Length 2
Signature Variable. Equal to the value specified in the previous 2 bytes (Signature Length).

The length of the signature. It is a 2-byte value interpreted as a 16-bit unsigned integer that specifies the number of bytes that contain the signature.

The signature.