Working with tables in Amazon Keyspaces - Amazon Keyspaces (for Apache Cassandra)

Working with tables in Amazon Keyspaces

This section provides details about working with tables in Amazon Keyspaces (for Apache Cassandra).

Creating tables in Amazon Keyspaces

Amazon Keyspaces performs data definition language (DDL) operations, such as creating and deleting tables, asynchronously. You can monitor the creation status of new tables in the AWS Management Console, which indicates when a table is pending or active. You can also monitor the creation status of a new table programmatically by using the system schema table.

A table shows as active in the system schema when it's ready for use. The recommended design pattern to check when a new table is ready for use is to poll the Amazon Keyspaces system schema tables (system_schema_mcs.*). For a list of DDL statements for tables, see the Tables section in the CQL language reference.

The following query shows the status of a table.

SELECT keyspace_name, table_name, status FROM system_schema_mcs.tables WHERE keyspace_name = 'mykeyspace' AND table_name = 'mytable';

For a table that is still being created and is pending,the output of the query looks like this.

keyspace_name | table_name | status --------------+------------+-------- mykeyspace | mytable | CREATING

For a table that has been successfully created and is active, the output of the query looks like the following.

keyspace_name | table_name | status --------------+------------+-------- mykeyspace | mytable | ACTIVE

Working with multi-Region tables in Amazon Keyspaces

A multi-Region table must have the write throughput capacity configured in one of two ways:

  • On-demand capacity mode, measured in write request units (WRUs)

  • Provisioned capacity mode with auto scaling, measured in write capacity units (WCUs)

You can use provisioned capacity mode with auto scaling or on-demand capacity mode to help ensure that a multi-Region table has sufficient capacity to perform replicated writes to all AWS Regions.

Note

Changing the capacity mode of the table in one of the Regions changes the capacity mode for all replicas.

By default, Amazon Keyspaces uses on-demand mode for multi-Region tables. With on-demand mode, you don't need to specify how much read and write throughput that you expect your application to perform. Amazon Keyspaces instantly accommodates your workloads as they ramp up or down to any previously reached traffic level. If a workload’s traffic level hits a new peak, Amazon Keyspaces adapts rapidly to accommodate the workload.

If you choose provisioned capacity mode for a table, you have to configure the number of read capacity units (RCUs) and write capacity units (WCUs) per second that your application requires.

To plan a multi-Region table's throughput capacity needs, you should first estimate the number of WCUs per second needed for each Region. Then you add the writes from all Regions that your table is replicated in, and use the sum to provision capacity for each Region. This is required because every write that is performed in one Region must also be repeated in each replica Region.

If the table doesn't have enough capacity to handle the writes from all Regions, capacity exceptions will occur. In addition, inter-Regional replication wait times will rise.

For example, if you have a multi-Region table where you expect 5 writes per second in US East (N. Virginia), 10 writes per second in US East (Ohio), and 5 writes per second in Europe (Ireland), you should expect the table to consume 20 WCUs in each Region: US East (N. Virginia), US East (Ohio), and Europe (Ireland). That means that in this example, you need to provision 20 WCUs for each of the table's replicas. You can monitor your table's capacity consumption using Amazon CloudWatch. For more information, see Monitoring Amazon Keyspaces with Amazon CloudWatch.

Because each multi-Region write is billed as 1.25 times WCUs, you would see a total of 75 WCUs billed in this example. For more information about pricing, see Amazon Keyspaces (for Apache Cassandra) pricing.

For more information about provisioned capacity with Amazon Keyspaces auto scaling, see Manage throughput capacity automatically with Amazon Keyspaces auto scaling.

Note

If a table is running in provisioned capacity mode with auto scaling, the provisioned write capacity is allowed to float within those auto scaling settings for each Region.

Static columns in Amazon Keyspaces

In an Amazon Keyspaces table with clustering columns, you can use the STATIC keyword to create a static column. The value stored in a static column is shared between all rows in a logical partition. When you update the value of this column, Amazon Keyspaces applies the change automatically to all rows in the partition.

This section describes how to calculate the encoded size of data when you're writing to static columns. This process is handled separately from the process that writes data to the nonstatic columns of a row. In addition to size quotas for static data, read and write operations on static columns also affect metering and throughput capacity for tables independently.

Calculating static column size per logical partition in Amazon Keyspaces

This section provides details about how to estimate the encoded size of static columns in Amazon Keyspaces. The encoded size is used when you're calculating your bill and quota use. You should also use the encoded size when you calculate provisioned throughput capacity requirements for tables. To calculate the encoded size of static columns in Amazon Keyspaces, you can use the following guidelines.

  • Partition keys can contain up to 2048 bytes of data. Each key column in the partition key requires up to 3 bytes of metadata. These metadata bytes count towards your static data size quota of 1 MB per partition. When calculating the size of your static data, you should assume that each partition key column uses the full 3 bytes of metadata.

  • Use the raw size of the static column data values based on the data type. For more information about data types, see Data types.

  • Add 104 bytes to the size of the static data for metadata.

  • Clustering columns and regular, nonprimary key columns do not count towards the size of static data. To learn how to estimate the size of nonstatic data within rows, see Calculating row size in Amazon Keyspaces.

The total encoded size of a static column is based on the following formula:

partition key columns + static columns + metadata = total encoded size of static data

Consider the following example of a table where all columns are of type integer. The table has two partition key columns, two clustering columns, one regular column, and one static column.

CREATE TABLE mykeyspace.mytable(pk_col1 int, pk_col2 int, ck_col1 int, ck_col2 int, reg_col1 int, static_col1 int static, primary key((pk_col1, pk_col2),ck_col1, ck_col2));

In this example, we calculate the size of static data of the following statement:

INSERT INTO mykeyspace.mytable (pk_col1, pk_col2, static_col1) values(1,2,6);

To estimate the total bytes required by this write operation, you can use the following steps.

  1. Calculate the size of a partition key column by adding the bytes for the data type stored in the column and the metadata bytes. Repeat this for all partition key columns.

    1. Calculate the size of the first column of the partition key (pk_col1):

      4 bytes for the integer data type + 3 bytes for partition key metadata = 7 bytes
    2. Calculate the size of the second column of the partition key (pk_col2):

      4 bytes for the integer data type + 3 bytes for partition key metadata = 7 bytes
    3. Add both columns to get the total estimated size of the partition key columns:

      7 bytes + 7 bytes = 14 bytes for the partition key columns
  2. Add the size of the static columns. In this example, we only have one static column that stores an integer (which requires 4 bytes).

  3. Finally, to get the total encoded size of the static column data, add up the bytes for the primary key columns and static columns, and add the additional 104 bytes for metadata:

    14 bytes for the partition key columns + 4 bytes for the static column + 104 bytes for metadata = 122 bytes.

You can also update static and nonstatic data with the same statement. To estimate the total size of the write operation, you must first calculate the size of the nonstatic data update. Then calculate the size of the row update as shown in the example at Calculating row size in Amazon Keyspaces, and add the results.

In this case, you can write a total of 2 MB—1 MB is the maximum row size quota, and 1 MB is the quota for the maximum static data size per logical partition.

To calculate the total size of an update of static and nonstatic data in the same statement, you can use the following formula:

(partition key columns + static columns + metadata = total encoded size of static data) + (partition key columns + clustering columns + regular columns + row metadata = total encoded size of row) = total encoded size of data written

Consider the following example of a table where all columns are of type integer. The table has two partition key columns, two clustering columns, one regular column, and one static column.

CREATE TABLE mykeyspace.mytable(pk_col1 int, pk_col2 int, ck_col1 int, ck_col2 int, reg_col1 int, static_col1 int static, primary key((pk_col1, pk_col2),ck_col1, ck_col2));

In this example, we calculate the size of data when we write a row to the table, as shown in the following statement:

INSERT INTO mykeyspace.mytable (pk_col1, pk_col2, ck_col1, ck_col2, reg_col1, static_col1) values(2,3,4,5,6,7);

To estimate the total bytes required by this write operation, you can use the following steps.

  1. Calculate the total encoded size of static data as shown earlier. In this example, it's 122 bytes.

  2. Add the size of the total encoded size of the row based on the update of nonstatic data, following the steps at Calculating row size in Amazon Keyspaces. In this example, the total size of the row update is 134 bytes.

    122 bytes for static data + 134 bytes for nonstatic data = 256 bytes.

Metering read/write operations of static data in Amazon Keyspaces

Static data is associated with logical partitions in Cassandra, not individual rows. Logical partitions in Amazon Keyspaces can be virtually unbound in size by spanning across multiple physical storage partitions. As a result, Amazon Keyspaces meters write operations on static and nonstatic data separately. Furthermore, writes that include both static and nonstatic data require additional underlying operations to provide data consistency.

If you perform a mixed write operation of both static and nonstatic data, this results in two separate write operations—one for nonstatic and one for static data. This applies to both on-demand and provisioned read/write capacity modes.

The following example provides details about how to estimate the required read capacity units (RCUs) and write capacity units (WCUs) when you're calculating provisioned throughput capacity requirements for tables in Amazon Keyspaces that have static columns. You can estimate how much capacity your table needs to process writes that include both static and nonstatic data by using the following formula:

2 x WCUs required for nonstatic data + 2 x WCUs required for static data

For example, if your application writes 27 KBs of data per second and each write includes 25.5 KBs of nonstatic data and 1.5 KBs of static data, then your table requires 56 WCUs (2 x 26 WCUs + 2 x 2 WCUs).

Amazon Keyspaces meters the reads of static and nonstatic data the same as reads of multiple rows. As a result, the price of reading static and nonstatic data in the same operation is based on the aggregate size of the data processed to perform the read.

To learn how to monitor serverless resources with Amazon CloudWatch, see Monitoring Amazon Keyspaces with Amazon CloudWatch.