Working with rows in Amazon Keyspaces
This section provides details about working with rows in Amazon Keyspaces (for Apache Cassandra). Tables are the primary data structures in Amazon Keyspaces and data in tables is organized into columns and rows.
Calculating row size in Amazon Keyspaces
Amazon Keyspaces (for Apache Cassandra) provides fully managed storage that offers single-digit millisecond read and write performance and stores data durably across multiple AWS Availability Zones. Amazon Keyspaces attaches metadata to all rows and primary key columns to support efficient data access and high availability.
This section provides details about how to estimate the encoded size of rows in Amazon Keyspaces. The encoded row size is used when calculating your bill and quota use. You should also use the encoded row size when calculating provisioned throughput capacity requirements for tables. To calculate the encoded size of rows in Amazon Keyspaces, you can use the following guidelines.
Partition keys can contain up to 2048 bytes of data. Each key column in the partition key requires up to 3 bytes of metadata. These metadata bytes count towards your 1 MB row size quota. When calculating the size of your row, you should assume each partition key column uses the full 3 bytes of metadata.
Each row can have up to 850 bytes of clustering column data and each clustering column requires up to 4 bytes for metadata. These metadata bytes count towards your 1 MB row size quota. When calculating the size of your row, you should assume each clustering column uses the full 4 bytes of metadata.
For regular, nonstatic, nonprimary key columns, use the raw size of the cell data based on the data type. For more information about data types, see Data types.
Static column data does not count towards the maximum row size of 1 MB. To calculate the data size of static columns, see Calculating static column size per logical partition in Amazon Keyspaces.
Client-side timestamps are stored for every column in each row when the feature is turned on. These timestamps take up approximately 20–40 bytes (depending on your data), and contribute to the storage and throughput cost for the row. Client-side timestamps count towards the maximum row size of 1 MB. For more information, see How client-side timestamps work in Amazon Keyspaces.
Add 100 bytes to the size of each row for row metadata.
The total size of an encoded row of data is based on the following formula:
partition key columns + clustering columns + regular columns + row metadata = total encoded size of row
Consider the following example of a table where all columns are of type integer. The table has two partition key columns, two clustering columns, and one regular column.
CREATE TABLE mykeyspace.mytable(pk_col1 int, pk_col2 int, ck_col1 int, ck_col2 int, reg_col1 int, primary key((pk_col1, pk_col2),ck_col1, ck_col2));
In this example, we calculate the size of data when we write a row to the table as shown in the following statement:
INSERT INTO mykeyspace.mytable (pk_col1, pk_col2, ck_col1, ck_col2, reg_col1) values(1,2,3,4,5);
To estimate the total bytes required by this write operation, you can use the following steps.
Calculate the size of a partition key column by adding the bytes for the data type stored in the column and the metadata bytes. Repeat this for all partition key columns.
Calculate the size of the first column of the partition key (pk_col1):
4 bytes for the integer data type + 3 bytes for partition key metadata = 7 bytes
Calculate the size of the second column of the partition key (pk_col2):
4 bytes for the integer data type + 3 bytes for partition key metadata = 7 bytes
Add both columns to get the total estimated size of the partition key columns:
7 bytes + 7 bytes = 14 bytes for the partition key columns
Calculate the size of the clustering column by adding the bytes for the data type stored in the column and the metadata bytes. Repeat this for all clustering columns.
Calculate the size of the first column of the clustering column (ck_col1):
4 bytes for the integer data type + 4 bytes for clustering column metadata = 8 bytes
Calculate the size of the second column of the clustering column (ck_col2):
4 bytes for the integer data type + 4 bytes for clustering column metadata = 8 bytes
Add both columns to get the total estimated size of the clustering columns:
8 bytes + 8 bytes = 16 bytes for the clustering columns
Add the size of the regular columns. In this example we only have one column that stores an integer, which requires 4 bytes.
Finally, to get the total encoded row size, add up the bytes for all columns and add the additional 100 bytes for row metadata:
14 bytes for the partition key columns + 16 bytes for clustering columns + 4 bytes for the regular column + 100 bytes for row metadata = 134 bytes.
To learn how to monitor serverless resources with Amazon CloudWatch, see Monitoring Amazon Keyspaces with Amazon CloudWatch.