Redundancy and Failover - AWS Elemental Conductor Live 3

This is version 3.17 of the AWS Elemental Conductor Live 3 documentation. This is the latest version. For prior versions, see the Previous Versions section of AWS Elemental Conductor Live 3 Documentation.

Redundancy and Failover

AWS Elemental Conductor Live 3 supports redundancy and failover for both:

  • The Conductor node (the node running AWS Elemental Conductor Live 3)

  • The worker nodes (AWS Elemental Live and Statmux nodes)

For a short summary of how the cluster behaves during a failover, see below. For information on configuring for redundancy, see the AWS Elemental Conductor Live 3 Configuration Guide.

Conductor Node Redundancy and Failover

You can set up two Conductor nodes in the cluster, one as the “primary” or “leader” and one as the “backup”. If the leader node fails, the backup automatically takes over management of the cluster. The leader Conductor maintains the Conductor database; the backup database is a copy of that leader database and is continually being synchronized. The backup Conductor is continually monitoring the leader.

As soon as the backup can no longer detect the leader on the network, it assumes that it has failed and takes over the leader role. This change in role takes a few seconds.

Once failover has occurred, the two Conductor nodes continue in their new roles. Even when the failed Conductor comes back online, it does not take back the leader role.

For more details on how Conductor redundancy works, see the AWS Elemental Conductor Live 3 Configuration Guide.

Worker Node Redundancy and Failover

The worker nodes in the cluster can be configured in a redundant fashion, so that if a worker node fails, its activity automatically fails over to another node. Implementation of redundancy involves having backup nodes that are inactive until a failure occurs.

Node Failure Detection

AWS Elemental Conductor Live 3 maintains contact with the AWS Elemental Live and Statmux nodes in the cluster. If the cluster can no longer communicate with the node, then Conductor Live 3 assumes that the node has failed and reports this failure. Node failure detection is always enabled in Conductor; it does not need to be configured.

Worker Node Failover

AWS Elemental Conductor Live 3 can be configured so that when a failure is detected, channels and events are automatically moved to another node, to ensure no loss of service.

For failover, nodes must be set up in a redundancy group. Each node in the group is assigned a role – active (meaning it normally runs channels) and reserve (backup). When an active node fails, the AWS Elemental Conductor Live 3 attempts a failover: it attempts to move the channels on the failed node to a reserve node (the “failover node”) and to restart the previously running channels so that they run on the failover node. The channels are now associated with the new node and will not automatically flip back to the old node. AWS Elemental Live nodes must be set up in redundancy groups separate from the redundancy group for AWS Elemental Statmux nodes.

Nodes that are not part of a redundancy group will not fail over, but AWS Elemental Conductor Live 3 will still detect a failure.

After Failover

Once failover has completed, the failed node and the failover node continue in their new roles. Specifically, the moved channels remain on the failover node unless you manually move the channels.

AWS Elemental Conductor Live 3 does perform some special actions for a “false failover”; see How Worker Node Failover Occurs.

Redundancy Status Alert

Alerts are raised if a redundancy group has one or more active, online nodes but has no backup, online nodes. The alert persists until a node is restored to a backup role, or a node without channels is manually moved to a backup role.

For more information about alerts and messages, see Monitoring Alerts and Messages.