feat: add ExecApprovalsStore write path and wire coordinator side effects#526
feat: add ExecApprovalsStore write path and wire coordinator side effects#526AlexAlves87 wants to merge 7 commits into
Conversation
|
Codex review: needs maintainer review before merge. Reviewed June 7, 2026, 8:36 PM ET / 00:36 UTC. Summary Reproducibility: not applicable. this is a feature PR rather than a bug report with a current-main failure path. The relevant proof is the filesystem-backed coordinator/store runtime test shown in the PR body. Review metrics: 3 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Merge only after maintainers explicitly accept the exec-approvals metadata boundary and exact-head validation is clean; otherwise narrow the persisted metadata before landing the write path. Do we have a high-confidence way to reproduce the issue? Not applicable: this is a feature PR rather than a bug report with a current-main failure path. The relevant proof is the filesystem-backed coordinator/store runtime test shown in the PR body. Is this the best way to solve the issue? Unclear: the staged coordinator/store slice is coherent and avoids command-text persistence, but it is the best merge path only if maintainers accept the remaining persisted metadata boundary. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 25c11dd07f09. Label changesLabel justifications:
Evidence reviewedSecurity concerns:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg ✨ Hatched: 🌱 uncommon Clockwork Patch Peep Hatch commandComment Hatchability rules:
Rarity: 🌱 uncommon. What is this egg doing here?
|
|
@clawsweeper re-review |
|
🦞👀 Command router queued. I will update this comment with the next step. |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
First-pass triage: holding this one rather than merging. A dual-model review found enough security-boundary/correctness risk that I don’t think it clears the quick-merge bar yet:
Suggested follow-up: avoid persisting full command text (or store only a redacted/structured summary), and wrap approval metadata side effects so they log/fail-soft without changing the approval decision. |
|
Both findings addressed in the latest commit (90b36db): 1. lastUsedCommand removed. The field is gone from 2. Side effects isolated. Each side effect ( Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com |
…ects (PR8)
Implements the store write path (AddAllowlistEntryAsync, RecordAllowlistUseAsync)
and wires the side-effect calls into ExecApprovalsCoordinator.
Side effects fire strictly after the final allow decision is confirmed (both
pass1 pre-approved and post-pass2 branches). Best-effort: never throw; any
I/O failure degrades to a logged warning. UpdateFileAsync refuses to write
a malformed file. SemaphoreSlim serializes intra-process writes.
Pattern validation is non-empty only, matching macOS parity. New entries
carry {id, pattern} only — lastUsedAt is absent on creation and stamped by
RecordAllowlistUseAsync on first use (macOS addAllowlistEntry parity).
Rail 19 preserved: coordinator not referenced in any production src/ file.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Strip rail codes, PR numbers, research doc references, D-labels, CVE/ADR tags, and other planning labels from all ExecApprovals source files. No logic changed — comments only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ce and lastUsed recording End-to-end coordinator/store test using real filesystem I/O. Surfaces the on-disk exec-approvals.json content at three points (initial, post-AllowAlways, post-allowlist-hit) via ITestOutputHelper so the JSON is visible under `dotnet test ... --logger "console;verbosity=detailed"`. Demonstrates both side-effect paths: AllowAlways persists a new entry, and a later allowlist hit records lastUsed* metadata against the same entry id (dedup). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ResolveReadOnly merges entries from agents["*"] into the resolved allowlist for any concrete agent, so a hit can be authorized by the wildcard bucket. RecordAllowlistUseAsync was only searching the concrete agent bucket, so wildcard-authorized executions never accumulated lastUsed* metadata. The method now iterates both the concrete agent bucket and "*", updating metadata wherever the pattern matches. If agentId is already "*", only the wildcard bucket is searched (no double-pass). Tests added: - Store: wildcard-only bucket hit updates metadata. - Store: same pattern in both buckets updates both entries. - Coordinator: end-to-end regression with allowlist living under "*". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ation
lastUsedCommand persisted the full command text in exec-approvals.json,
leaking tokens or secrets embedded in command arguments. Remove the field
from ExecAllowlistEntry and the command parameter from RecordAllowlistUseAsync;
lastUsedAt and lastResolvedPath provide sufficient operational metadata.
Approval-store side effects (PersistAllowlistEntriesAsync, RecordAllowlistUsageAsync)
were unguarded, so an unexpected exception could cause an already-approved command
to be reported as InternalError. Each side effect in both the pre-approved (pass1)
and post-prompt (step 8) paths is now wrapped in its own best-effort try/catch so
a failure in one does not skip the other and never flips an allow to a deny.
Add DoesNotContain("lastUsedCommand") assertion to the store round-trip test
as a regression guard against accidental reintroduction of the field.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
df28407 to
ad4021d
Compare
What
Adds the write path to
ExecApprovalsStoreand wires the two side-effect calls intoExecApprovalsCoordinator, closing the persistence loop for the exec approvals V2 pipeline.Store — new public API:
AddAllowlistEntryAsync(agentId, pattern)— persists a new allowlist entry after an AllowAlways prompt decision. Deduplicates on write (OrdinalIgnoreCase). Returnstrueif the entry is present after the call (added or already there),falseon empty pattern or I/O failure. New entries carry{id, pattern};lastUsedAtis stamped later byRecordAllowlistUseAsyncon first successful use (matches macOS parity).RecordAllowlistUseAsync(agentId, pattern, resolvedPath)— updateslastUsedAtandlastResolvedPathfor every matching entry after a final allow. No command text is persisted. Returnsfalseif pattern not found or on I/O failure.Store — private infrastructure:
UpdateFileAsync(mutate)— load → mutate → atomic save, serialized by the existingSemaphoreSlim. Never throws. Refuses to overwrite a malformed file. Handles transientIOExceptionon the atomic move as a degraded path: logsWarn, no retry.Coordinator — side effects wired:
RecordAllowlistUsageAsyncfires on both allow exit points: the pass1 pre-approved branch and the post-pass2 branch. Both are required to cover the common allowlist-satisfied case.PersistAllowlistEntriesAsyncfires only after pass2 = Allow andfollowupDecision == AllowAlways, strictly outside the_promptLockblock.try/catchso a failure in one does not skip the other and cannot flip an already-decided allow intoInternalError.Not wired in production yet: the coordinator is still not referenced in any production
src/file. TheProductionWiring_CoordinatorNotReferencedInSrctest remains green.Design notes
Side effects fire strictly after the final allow decision is confirmed — not before the second evaluator pass. This is a deliberate structural safety choice: the guarantee is structural rather than relying on proof that
Evaluate(context, AllowAlways)always produces Allow.Pattern validation in
AddAllowlistEntryAsyncis non-empty only, matching macOS parity. Basename-only patterns are inert at match time but not rejected at persist time.No command text reaches disk.
lastUsedAt(Unix ms) andlastResolvedPathprovide sufficient operational metadata without exposing tokens or secrets embedded in command arguments.Testing
149 ExecApprovals tests passing, 0 failures. Build: 0 warnings, 0 errors (
TreatWarningsAsErrorsclean). Tray tests: 961/961.Store tests cover: success paths, dedup, not-found, malformed-file refusal, I/O failure degradation on both mutators, concurrency (5 concurrent writes produce a single entry), round-trip JSON validation, and a regression guard asserting
lastUsedCommandis absent from the persisted file.Coordinator tests cover: AllowAlways persistence, non-allowlist security guard, duplicate pattern dedup, allowlist usage recording, allowlist-not-satisfied guard, pass1 pre-approved path, and fallback path with allowlist satisfied.
Real behavior proof
End-to-end coordinator/store runtime proof using real filesystem I/O (test
RuntimeProof_AllowAlways_PersistsAndRecordsLastUsedintests/OpenClaw.Shared.Tests/ExecApprovalsCoordinatorTests.cs).Scope clarification: this is slice runtime proof, not full production runtime proof. The coordinator is intentionally not wired in production yet. A follow-up production wiring slice connects the coordinator, and the WinUI prompt dialog is a separate slice after that. UI-driven proof against the live app is only meaningful once those land.
Reproduce locally:
Captured output — current head (ad4021d),
lastUsedCommandabsent:The entry
idis identical between the two invocations, proving on-disk dedup.lastUsedAtandlastResolvedPathappear only after the second invocation, provingRecordAllowlistUseAsyncfires on the allowlist-hit path. NolastUsedCommandfield is present at any stage.Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com