Skip to content

PHOENIX-7843 TLA+ specification of Consistent Failover#2461

Open
apurtell wants to merge 2 commits into
apache:PHOENIX-7562-feature-newfrom
apurtell:PHOENIX-7843
Open

PHOENIX-7843 TLA+ specification of Consistent Failover#2461
apurtell wants to merge 2 commits into
apache:PHOENIX-7562-feature-newfrom
apurtell:PHOENIX-7843

Conversation

@apurtell
Copy link
Copy Markdown
Contributor

@apurtell apurtell commented May 5, 2026

Formal specification of the Phoenix Consistent Failover protocol and implementation using TLA+ and the TLC model checker. The spec verifies safety properties such as mutual exclusion, zero RPO, and abort correctness under arbitrary interleavings of admin actions, HDFS failures, RS crashes, ZK connection/session failures, watcher retry exhaustion, and the anti-flapping timer.

Literate programming versions of all specification files are available in the markdown/ directory, referenced from the README.md. Each file includes the complete TLA+ code with comments converted to prose that discusses modeling choices, tradeoffs, and implementation traceability in depth.

@apurtell apurtell requested a review from kadirozde May 6, 2026 20:43
@@ -0,0 +1,380 @@
# Phoenix Consistent Failover -- TLA+ Specification

Formal specification of the Phoenix Consistent Failover protocol using TLA+ and the TLC model checker. The spec verifies safety properties (mutual exclusion, zero RPO, abort correctness) under arbitrary interleavings of admin actions, HDFS failures, RS crashes, ZK connection/session failures, watcher retry exhaustion, and the anti-flapping timer.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This claims that the model is for verifying safety properties but at the end of this README file, it states that three liveness properties are also verified. Should we revise this statement?


### Post-Condition

Cluster `c` transitions to ATS or ANISTS, both of which map to the ACTIVE_TO_STANDBY role, blocking mutations (`isMutationBlocked() = true`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will change, shall I update it later when I make the implementation change? or do we want to right now? ANISTS will map to ACTIVE. This will optimize the failover time

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait until the implementation changes.


## Overview

`HAGroupStore` models the peer-reactive transitions and auto-completion actions of the Phoenix Consistent Failover protocol. These actions correspond to the `FailoverManagementListener` (`HAGroupStoreManager.java` L633-706), which reacts to peer ZK state changes via `PathChildrenCache` watchers, and the local auto-completion resolvers from `createLocalStateTransitions()` (L140-150).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no discussion of SYSTEM.HA_GROUP system table. Is that intentional?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

The spec's runtime state machine operates on the per-cluster ZK znode managed by HAGroupStoreClient, which is the source of truth for HA group state. Catalog reads and writes happen out of band of the failover protocol and are intentionally not modeled.

Comment on lines +252 to +256
\/ /\ clusterState[c] = "AbTAIS"
/\ clusterState' = [clusterState EXCEPT ![c] =
IF outDirEmpty[c] /\ \A rs \in RS : writerMode[c][rs] \in {"INIT", "SYNC"}
THEN "AIS"
ELSE "ANIS"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the implementation today. IIRC, this change is also not planned yet. Currently we go to AIS unconditionally as we expect the Active (A) cluster to be in a sync state when it moved to ATS so no new mutation has happened since then. Writer should handle AIS to ANIS transition once new write comes in.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is a valid finding. Thanks @ritegarg . I will update the model.


| TLA+ Action | Java Source |
|---|---|
| `AdminStartFailover(c)` | `HAGroupStoreManager.initiateFailoverOnActiveCluster()` L375-400 |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line number citations can become stale, should we use a commit hash to pin these or we expect the reader to figure out from the method name?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think file level citation is sufficient. The robots like to pin to line numbers but I can ask them to be removed.

@apurtell apurtell requested a review from ritegarg May 23, 2026 04:58
HAGroupStore.tla AutoComplete: AbTAIS resolves unconditionally to AIS,
matching createLocalStateTransitions() in HAGroupStoreManager. The
clause atomically snaps any remaining SYNC_AND_FWD writers to SYNC,
like ANISToAIS, modeling the ACTIVE_IN_SYNC ZK event firing.

Types.tla: Introduced MutationServingActiveStates == { "AIS", "ANIS",
"AWOP", "ANISWOP" }

Writer.tla: Tightened the cluster state guards on every writer
degradation action to fire only in states where the cluster is
serving mutations and thus could arrive at a writer.

ConsistentFailover.tla WriterClusterConsistency: narrowed the allowed
degraded writer set to {"ANIS", "ANISTS", "ATS", "ANISWOP", "AbTAIS"}
(drops AbTANIS as unreachable, drops AWOP since the AIS coupling
forbids (AWOP, S&F)).

Note system tables like SYSTEM.HA_GROUP are not modeled explicitly.

Removed every Lxxx-yyy citation across all 21 spec files, keeping
file level citations and method/field names.
@apurtell apurtell requested a review from kadirozde May 23, 2026 04:59
@apurtell
Copy link
Copy Markdown
Contributor Author

Rebased and updated, incorporating review feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants