Skip to content

[FEATURE]: Valkey-backed SessionStore (cross-node / cross-restart session labels) #73

@araujof

Description

@araujof

Add a config-selectable, Valkey-backed SessionStore alongside the in-process MemorySessionStore, so session security labels (extensions.security.labels, monotonic taint driving information-flow authz) persist across restarts and are shared across gateway nodes. Fail-closed, primary-only reads, optional sliding TTL.

Plan: docs/plans/2026-06-17-001-feat-valkey-session-store-plan.md
Requirements: docs/brainstorms/valkey-session-store-requirements.md

Implementation units

  • Make SessionStore trait fallible (Result + crate-local thiserror error); adapt MemorySessionStore + tests. (R4, R15)
  • Propagate fail-closed through cmf_invoker (for_request, persist_session) + route_handler; append-error → Deny via post-persist_session continue_processing, with violation-merge precedence + distinguished alarm. (R4, R5, R18; AE1, AE6)
  • SessionStoreFactory trait + config-selection seam (in-visitor store swap during visit_global; default Memory when absent). (R2, R3; AE3, AE5)
  • New apl-session-valkey crate: config + internal connection layer (redis-rs + deadpool-redis over rustls); feature-gated, excluded from default-members. (R10, R11, R13, R14)
  • ValkeySessionStore: atomic SADD+EXPIRE, SMEMBERS load, taint:v1:<sha256(session_id)> key, R5/R15 error mapping, TTL refresh fail-open, noeviction/TTL self-checks. (R1, R5, R6, R7, R8, R9, R15, R16, R17)
  • ValkeySessionStoreFactory (kind: valkey) + feature-gated FFI wiring (mirrors apl-cedarling). (R2, R13; AE5)
  • Container-backed integration tests (testcontainers valkey): cross-node union, TTL refresh, noeviction, ACL denial, fail-closed, decode-error; loud skip + CI env gate. (R12; AE2, AE4)
  • Operator runbook: noeviction, least-privilege ACL, TLS/mTLS, TTL soundness rule, refresh-failure alarm, blast-radius (session-bearing only). (R8, R9, R10)

Key decisions

  • Fail-closed on store errors; append-error fails the request closed uniformly with load-error (continue_processing computed after persist_session).
  • Client: redis-rs 1.x + deadpool-redis 0.23, default-features=false, rustls via tokio-rustls-comp.
  • Atomic union via server-side SADD (no client read-modify-write); pipe().atomic() SADD+EXPIRE in one round trip.
  • No HMAC in v0 — trust Valkey within the boundary (TLS/mTLS + least-privilege ACL + noeviction + network isolation).
  • Committed timeout defaults: 250ms connect / 500ms command / 1 retry / breaker after 5 failures.

Notable risks

  • Trait Result change ripples to ~10 trait-method test files + the AplOptions struct-literal sites (U3).
  • Availability tradeoff: a Valkey outage fail-closes session-bearing requests fleet-wide (anonymous traffic unaffected).
  • noeviction and sliding-TTL refresh are operator/alarm-guarded, not client-enforced.

Metadata

Metadata

Assignees

Type

No fields configured for Task.

Projects

Status
In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions