Background
Follow-up from the PR #74 review (terylt). Split out from #75 (retry/circuit-breaker), which is a separate concern.
The Valkey session store depends on two operator-owned Valkey server settings that the backend documents as deployment contract but does not verify at runtime:
maxmemory-policy noeviction (runbook §2 / R9) — prevents silent eviction of taint labels.
- AOF persistence,
appendfsync everysec floor (runbook §5 / R19) — prevents crash-before-fsync silent loss of acknowledged appends.
Getting either wrong silently weakens the fail-closed guarantee: a label vanishes, the next read returns a normal Ok(empty) (not an error), so fail-closed never trips and the request proceeds under-labeled. This is invisible to all alarms because nothing errors.
Note: the operator runbook (docs/operations/valkey-session-store.md) previously claimed a CONFIG GET maxmemory-policy self-check already existed. It did not — that claim was corrected in PR #74. This issue tracks actually building it.
Proposal
Add a best-effort startup self-check that issues CONFIG GET maxmemory-policy, CONFIG GET appendonly, and CONFIG GET appendfsync, and emits a structured warning when the instance looks cache-shaped (eviction enabled, or AOF off / RDB-only). Mirror the existing session_store_ttl_unsound warning style.
Design constraint
The connection pool is currently lazy — build_pool does not dial Valkey, and connection failures surface on first request (where they correctly fail closed). A startup self-check requires dialing at config-load, which is a deliberate design change:
- It must stay best-effort: a self-check failure (unreachable, ACL lacks
CONFIG|GET) must not fail config-load — it warns and proceeds, so the check never becomes a new availability dependency.
- The ACL minimum already grants
+config|get (runbook §3), but the check must tolerate its absence gracefully.
References
- R9 (
noeviction operator contract), R19 (persistence/durability) in docs/brainstorms/valkey-session-store-requirements.md
- Runbook §2, §5 in
docs/operations/valkey-session-store.md
Background
Follow-up from the PR #74 review (terylt). Split out from #75 (retry/circuit-breaker), which is a separate concern.
The Valkey session store depends on two operator-owned Valkey server settings that the backend documents as deployment contract but does not verify at runtime:
maxmemory-policy noeviction(runbook §2 / R9) — prevents silent eviction of taint labels.appendfsync everysecfloor (runbook §5 / R19) — prevents crash-before-fsync silent loss of acknowledged appends.Getting either wrong silently weakens the fail-closed guarantee: a label vanishes, the next read returns a normal
Ok(empty)(not an error), so fail-closed never trips and the request proceeds under-labeled. This is invisible to all alarms because nothing errors.Proposal
Add a best-effort startup self-check that issues
CONFIG GET maxmemory-policy,CONFIG GET appendonly, andCONFIG GET appendfsync, and emits a structured warning when the instance looks cache-shaped (eviction enabled, or AOF off / RDB-only). Mirror the existingsession_store_ttl_unsoundwarning style.Design constraint
The connection pool is currently lazy —
build_pooldoes not dial Valkey, and connection failures surface on first request (where they correctly fail closed). A startup self-check requires dialing at config-load, which is a deliberate design change:CONFIG|GET) must not fail config-load — it warns and proceeds, so the check never becomes a new availability dependency.+config|get(runbook §3), but the check must tolerate its absence gracefully.References
noevictionoperator contract), R19 (persistence/durability) indocs/brainstorms/valkey-session-store-requirements.mddocs/operations/valkey-session-store.md