Strategies for Resharding
The purpose of resharding is to enable your stream to adapt to changes in the rate of data flow. You split shards to increase the capacity (and cost) of your stream. You merge shards to reduce the cost (and capacity) of your stream.
One approach to resharding could be to simply split every shard in the stream—which would double the stream's capacity. However, this might provide more additional capacity than you actually need and therefore create unnecessary cost.
You can also use metrics to determine which are your "hot" or "cold" shards, that is, shards that are receiving much more data, or much less data, than expected. You could then selectively split the hot shards to increase capacity for the hash keys that target those shards. Similarly, you could merge cold shards to make better use of their unused capacity.
You can obtain some performance data for your stream from the CloudWatch metrics that Streams publishes. However, you can also collect some of your own metrics for your streams. One approach would be to log the hash key values generated by the partition keys for your data records. Recall that you specify the partition key at the time that you add the record to the stream.
putRecordRequest.setPartitionKey( String.format( "myPartitionKey" ) );
Streams uses MD5 to compute the hash key from the partition key. Because you specify the partition key for the record, you could use MD5 to compute the hash key value for that record and log it.
You could also log the IDs of the shards that your data records are assigned to.
The shard ID is available by using the
getShardId method of the
putRecordResults object returned by the
method, and the
putRecordResult object returned by the
String shardId = putRecordResult.getShardId();
With the shard IDs and the hash key values, you can determine which shards and hash keys are receiving the most or least traffic. You can then use resharding to provide more or less capacity, as appropriate for these keys.