Currently classy reports state of peer sites as "connected" and "disconnected".
This is a conservative approach that side-steps problems related to network partitions,
but we should do better.
In addition to connected/disconnected state,
classy should calculate up/down state as well
(with possible combinations like up+disconnected).
Peer down detection should be based on the collective checks,
when a quorum of sites decides that a peer is down.
Likewise, when a site learns that its peers decided that it is down,
then it should react, either by:
- Changing run level from whatever to stopped and back
- Or, more conservatively, by providing a hook that allows for something similar
Currently classy reports state of peer sites as "connected" and "disconnected".
This is a conservative approach that side-steps problems related to network partitions,
but we should do better.
In addition to connected/disconnected state,
classy should calculate up/down state as well
(with possible combinations like up+disconnected).
Peer down detection should be based on the collective checks,
when a quorum of sites decides that a peer is down.
Likewise, when a site learns that its peers decided that it is down,
then it should react, either by: