PHOENIX-7843 TLA+ specification of Consistent Failover#2461
Conversation
| @@ -0,0 +1,380 @@ | |||
| # Phoenix Consistent Failover -- TLA+ Specification | |||
|
|
|||
| Formal specification of the Phoenix Consistent Failover protocol using TLA+ and the TLC model checker. The spec verifies safety properties (mutual exclusion, zero RPO, abort correctness) under arbitrary interleavings of admin actions, HDFS failures, RS crashes, ZK connection/session failures, watcher retry exhaustion, and the anti-flapping timer. | |||
There was a problem hiding this comment.
This claims that the model is for verifying safety properties but at the end of this README file, it states that three liveness properties are also verified. Should we revise this statement?
|
|
||
| ### Post-Condition | ||
|
|
||
| Cluster `c` transitions to ATS or ANISTS, both of which map to the ACTIVE_TO_STANDBY role, blocking mutations (`isMutationBlocked() = true`). |
There was a problem hiding this comment.
This will change, shall I update it later when I make the implementation change? or do we want to right now? ANISTS will map to ACTIVE. This will optimize the failover time
There was a problem hiding this comment.
Let's wait until the implementation changes.
|
|
||
| ## Overview | ||
|
|
||
| `HAGroupStore` models the peer-reactive transitions and auto-completion actions of the Phoenix Consistent Failover protocol. These actions correspond to the `FailoverManagementListener` (`HAGroupStoreManager.java` L633-706), which reacts to peer ZK state changes via `PathChildrenCache` watchers, and the local auto-completion resolvers from `createLocalStateTransitions()` (L140-150). |
There was a problem hiding this comment.
There is no discussion of SYSTEM.HA_GROUP system table. Is that intentional?
There was a problem hiding this comment.
Yes.
The spec's runtime state machine operates on the per-cluster ZK znode managed by HAGroupStoreClient, which is the source of truth for HA group state. Catalog reads and writes happen out of band of the failover protocol and are intentionally not modeled.
| \/ /\ clusterState[c] = "AbTAIS" | ||
| /\ clusterState' = [clusterState EXCEPT ![c] = | ||
| IF outDirEmpty[c] /\ \A rs \in RS : writerMode[c][rs] \in {"INIT", "SYNC"} | ||
| THEN "AIS" | ||
| ELSE "ANIS"] |
There was a problem hiding this comment.
This is not the implementation today. IIRC, this change is also not planned yet. Currently we go to AIS unconditionally as we expect the Active (A) cluster to be in a sync state when it moved to ATS so no new mutation has happened since then. Writer should handle AIS to ANIS transition once new write comes in.
There was a problem hiding this comment.
Ok, this is a valid finding. Thanks @ritegarg . I will update the model.
|
|
||
| | TLA+ Action | Java Source | | ||
| |---|---| | ||
| | `AdminStartFailover(c)` | `HAGroupStoreManager.initiateFailoverOnActiveCluster()` L375-400 | |
There was a problem hiding this comment.
The line number citations can become stale, should we use a commit hash to pin these or we expect the reader to figure out from the method name?
There was a problem hiding this comment.
Agreed, I think file level citation is sufficient. The robots like to pin to line numbers but I can ask them to be removed.
HAGroupStore.tla AutoComplete: AbTAIS resolves unconditionally to AIS,
matching createLocalStateTransitions() in HAGroupStoreManager. The
clause atomically snaps any remaining SYNC_AND_FWD writers to SYNC,
like ANISToAIS, modeling the ACTIVE_IN_SYNC ZK event firing.
Types.tla: Introduced MutationServingActiveStates == { "AIS", "ANIS",
"AWOP", "ANISWOP" }
Writer.tla: Tightened the cluster state guards on every writer
degradation action to fire only in states where the cluster is
serving mutations and thus could arrive at a writer.
ConsistentFailover.tla WriterClusterConsistency: narrowed the allowed
degraded writer set to {"ANIS", "ANISTS", "ATS", "ANISWOP", "AbTAIS"}
(drops AbTANIS as unreachable, drops AWOP since the AIS coupling
forbids (AWOP, S&F)).
Note system tables like SYSTEM.HA_GROUP are not modeled explicitly.
Removed every Lxxx-yyy citation across all 21 spec files, keeping
file level citations and method/field names.
|
Rebased and updated, incorporating review feedback. |
Formal specification of the Phoenix Consistent Failover protocol and implementation using TLA+ and the TLC model checker. The spec verifies safety properties such as mutual exclusion, zero RPO, and abort correctness under arbitrary interleavings of admin actions, HDFS failures, RS crashes, ZK connection/session failures, watcher retry exhaustion, and the anti-flapping timer.
Literate programming versions of all specification files are available in the
markdown/directory, referenced from theREADME.md. Each file includes the complete TLA+ code with comments converted to prose that discusses modeling choices, tradeoffs, and implementation traceability in depth.