Apache Flink Settings - Amazon Kinesis Data Analytics

Apache Flink Settings

Kinesis Data Analytics for Apache Flink is an implementation of the Apache Flink framework. Kinesis Data Analytics uses the default values described in this section. Some of these values can be set by Kinesis Data Analytics applications in code, and others cannot be changed.

This topic contains the following sections:

State Backend

Kinesis Data Analytics stores transient data in a state backend. Kinesis Data Analytics uses the RocksDBStateBackend. Calling setStateBackend to set a different backend has no effect.

We enable the following features on the state backend:

  • Incremental state backend snapshots

  • Asynchronous state backend snapshots

  • Local recovery of checkpoints

In Kinesis Data Analytics, the state.backend.rocksdb.ttl.compaction.filter.enabled configuration is enabled by default. Using this filter, you can update your application code to enable the compaction cleanup strategy. For more information, see State TTL in Flink 1.8.0 in the Apache Flink documentation.

For more information about state backends, see State Backends in the Apache Flink documentation.

Checkpointing

Kinesis Data Analytics for Apache Flink uses a default checkpoint configuration with the following values. Some of these vales can be changed. You must set CheckpointConfiguration.ConfigurationType to CUSTOM for Kinesis Data Analytics to use modified checkpointing values.

Setting Can be modified? Default Value
CheckpointingEnabled Modifiable True
CheckpointInterval Modifiable 60000
MinPauseBetweenCheckpoints Modifiable 5000
Number of Concurrent Checkpoints Not Modifiable 1
Checkpointing Mode Not Modifiable Exactly Once
Checkpoint Retention Policy Not Modifiable On Failure
Checkpoint Timeout Not Modifiable 60 minutes
Max Checkpoints Retained Not Modifiable 1
Restart Strategy Not Modifiable Fixed Delay, with infinite retries every 10 seconds.
Checkpoint and Savepoint Location Not Modifiable We store durable checkpoint and savepoint data to a service-owned S3 bucket.
State Backend Memory Threshold Not Modifiable 1048576

Savepointing

By default, when restoring from a savepoint, the resume operation will try to map all state of the savepoint back to the program you are restoring with. If you dropped an operator, by default, restoring from a savepoint that has data that corresponds to the missing operator will fail. You can allow the operation to succeed by setting the AllowNonRestoredState parameter of the application's FlinkRunConfiguration to true. This will allow the resume operation to skip state that cannot be mapped to the new program.

For more information, see Allowing Non-Restored State in the Apache Flink documentation.

Heap Sizes

Kinesis Data Analytics allocates each KPU 3 GiB of JVM heap, and reserves 1 GiB for native code allocations. For information about increasing your application capacity, see Application Scaling in Kinesis Data Analytics for Apache Flink.

For more information about JVM heap sizes, see Configuration in the Apache Flink documentation.