Schema Design
A region can run hot when dealing with a write pattern that does not distribute the load across all servers evenly. This is a common scenario when dealing with streams processing events with time series data. The gradually increasing nature of time series data can cause all incoming data to be written to the same region.
This concentrated write activity on a single server can slow down the overall performance of the cluster. This is because inserting data is now bound to the performance of a single machine. This problem is easily overcome by employing key design strategies such as the following.
-
Applying salting prefixes to keys; in other words, prepending a random number to a row.
-
Randomizing the key with a hash function.
-
Promoting another field to prefix the row key.
These techniques can achieve a more evenly distributed load across all servers.