Add a config-selectable, Valkey-backed SessionStore alongside the in-process MemorySessionStore, so session security labels (extensions.security.labels, monotonic taint driving information-flow authz) persist across restarts and are shared across gateway nodes. Fail-closed, primary-only reads, optional sliding TTL.
Plan: docs/plans/2026-06-17-001-feat-valkey-session-store-plan.md
Requirements: docs/brainstorms/valkey-session-store-requirements.md
Implementation units
- Make
SessionStore trait fallible (Result + crate-local thiserror error); adapt MemorySessionStore + tests. (R4, R15)
- Propagate fail-closed through
cmf_invoker (for_request, persist_session) + route_handler; append-error → Deny via post-persist_session continue_processing, with violation-merge precedence + distinguished alarm. (R4, R5, R18; AE1, AE6)
SessionStoreFactory trait + config-selection seam (in-visitor store swap during visit_global; default Memory when absent). (R2, R3; AE3, AE5)
- New
apl-session-valkey crate: config + internal connection layer (redis-rs + deadpool-redis over rustls); feature-gated, excluded from default-members. (R10, R11, R13, R14)
ValkeySessionStore: atomic SADD+EXPIRE, SMEMBERS load, taint:v1:<sha256(session_id)> key, R5/R15 error mapping, TTL refresh fail-open, noeviction/TTL self-checks. (R1, R5, R6, R7, R8, R9, R15, R16, R17)
ValkeySessionStoreFactory (kind: valkey) + feature-gated FFI wiring (mirrors apl-cedarling). (R2, R13; AE5)
- Container-backed integration tests (testcontainers valkey): cross-node union, TTL refresh, noeviction, ACL denial, fail-closed, decode-error; loud skip + CI env gate. (R12; AE2, AE4)
- Operator runbook: noeviction, least-privilege ACL, TLS/mTLS, TTL soundness rule, refresh-failure alarm, blast-radius (session-bearing only). (R8, R9, R10)
Key decisions
- Fail-closed on store errors; append-error fails the request closed uniformly with load-error (
continue_processing computed after persist_session).
- Client: redis-rs 1.x + deadpool-redis 0.23,
default-features=false, rustls via tokio-rustls-comp.
- Atomic union via server-side
SADD (no client read-modify-write); pipe().atomic() SADD+EXPIRE in one round trip.
- No HMAC in v0 — trust Valkey within the boundary (TLS/mTLS + least-privilege ACL +
noeviction + network isolation).
- Committed timeout defaults: 250ms connect / 500ms command / 1 retry / breaker after 5 failures.
Notable risks
- Trait
Result change ripples to ~10 trait-method test files + the AplOptions struct-literal sites (U3).
- Availability tradeoff: a Valkey outage fail-closes session-bearing requests fleet-wide (anonymous traffic unaffected).
noeviction and sliding-TTL refresh are operator/alarm-guarded, not client-enforced.
Add a config-selectable, Valkey-backed
SessionStorealongside the in-processMemorySessionStore, so session security labels (extensions.security.labels, monotonic taint driving information-flow authz) persist across restarts and are shared across gateway nodes. Fail-closed, primary-only reads, optional sliding TTL.Plan:
docs/plans/2026-06-17-001-feat-valkey-session-store-plan.mdRequirements:
docs/brainstorms/valkey-session-store-requirements.mdImplementation units
SessionStoretrait fallible (Result+ crate-localthiserrorerror); adaptMemorySessionStore+ tests. (R4, R15)cmf_invoker(for_request,persist_session) +route_handler; append-error → Deny via post-persist_sessioncontinue_processing, with violation-merge precedence + distinguished alarm. (R4, R5, R18; AE1, AE6)SessionStoreFactorytrait + config-selection seam (in-visitor store swap duringvisit_global; defaultMemorywhen absent). (R2, R3; AE3, AE5)apl-session-valkeycrate: config + internal connection layer (redis-rs + deadpool-redis over rustls); feature-gated, excluded fromdefault-members. (R10, R11, R13, R14)ValkeySessionStore: atomicSADD+EXPIRE,SMEMBERSload,taint:v1:<sha256(session_id)>key, R5/R15 error mapping, TTL refresh fail-open,noeviction/TTL self-checks. (R1, R5, R6, R7, R8, R9, R15, R16, R17)ValkeySessionStoreFactory(kind: valkey) + feature-gated FFI wiring (mirrorsapl-cedarling). (R2, R13; AE5)Key decisions
continue_processingcomputed afterpersist_session).default-features=false, rustls viatokio-rustls-comp.SADD(no client read-modify-write);pipe().atomic()SADD+EXPIRE in one round trip.noeviction+ network isolation).Notable risks
Resultchange ripples to ~10 trait-method test files + theAplOptionsstruct-literal sites (U3).noevictionand sliding-TTL refresh are operator/alarm-guarded, not client-enforced.