Details
-
Fix
-
Status: Released (View Workflow)
-
Major
-
Resolution: Fixed
-
2.0.0
-
None
Description
Current Situation
- If a Controller Cluster is operated in a network environment that frequently isolates individual Controller instances from the network for a short period then the cluster will stop to work after repeated fail-over in short intervals.
- Network isolation includes that a Controller instance cannot connect to other components and cannot be connected to. At the same time remaining components (Secondary Controller, Cluster Watch Agent) can continue to communicate with each other.
- Repeated fail-over means that fail-over occurs in intervals that are shorter than the time required by the Controller Cluster to synchronize from a previous fail-over.
- As a result of this situation both Primary and Secondary Controller instances can become active at the same point in time.
Desired Behavior
- The Controller Cluster should cope with a situation when individual Controller instances are repeatedly isolated from the network for a short period and when fail-over repeatedly occurs in short intervals.
- No two Controller instances in a cluster should be active at the same point in time.