feat(secops): persist operator decisions and tighten edge-case prompt by oscarvalenzuelab · Pull Request #135 · AInvirion/ctrlrelay

oscarvalenzuelab · 2026-05-26T05:27:27Z

Why

The 6am secops sweep was waking the operator with the same major-version questions every day because automation_decisions had a schema but no producer and no consumer — a sweep that asked "approve PR #60?" on Monday asked the same thing on Tuesday regardless of how it was answered. On top of that, the agent invented options outside the policy ladder when the gate couldn't fire (no CI configured, or CI failing on a pre-existing unrelated check like a transitive idna CVE in the lockfile blocking every Dependabot PR).

Example BLOCKED row from production that triggered this fix:

"Want me to batch the 4 patch + 4 minor PRs into a single review PR like before, or proceed merging the patch ones individually despite no CI?"

That option set was never in the prompt. The agent freelanced it because the policy ladder went silent on the "no CI" case.

What

Wires the persistence layer end-to-end and pins three prompt edge cases the agent was reliably fumbling.

Persistence (both halves of the loop now exist)

After the BLOCKED loop in run_secops_all receives an answer (and in resume_secops_from_pending for the out-of-band sweeper path), regex extracts every PR# from the question and records one automation_decisions row per PR# keyed on (repo, "dependabot_pr", "#N") with the operator's verbatim Telegram answer. Regex handles PR #60, bare #60, dedupes, and writes nothing when the question has no PR# (CodeQL-only questions, etc).
Before each sweep, run_secops_all pulls the last 30 days of decisions for that repo and threads them via ctx.extra['prior_decisions'] into _build_prompt, which renders a ## Prior operator decisions (last 30 days) block. Carries instructions to act on prior answers unless circumstances changed materially (different version, CI state flipped).

Storing the verbatim answer rather than a canonical yes/no means the agent can distinguish "approve" from "approve once we bump idna in the lockfile" from "skip until next quarter" without us trying to model operator nuance upstream.

Edge cases the prompt now covers

No CI configured (no .github/workflows/*.yml): treat ALL Dependabot PRs as ASK regardless of tier and signal BLOCKED ONCE with a consolidated question per repo (not one per PR).
CI failing on a check unrelated to the PR's package: signal BLOCKED with a one-line question naming the underlying issue and a root-cause hint pulled from gh run view --log-failed. Previously the agent reported the failure and moved on, leaving patch PRs stuck for weeks.
Scope discipline: three legitimate exits (merge / leave / BLOCKED) with an explicit prohibition on batching PRs into a review PR, opening tracking issues, or proposing manual reviews.

Test plan

17 new unit tests covering state helpers (per-repo scoping, since_ts window, newest-first ordering), PR# regex (all observed variants + dedupe + no-PR case), decision capture in both the in-loop BLOCKED path and the resume-from-pending path, prompt injection of prior decisions, and the three new prompt directives
Targeted test suites green: tests/test_secops_pipeline.py, tests/test_state.py, tests/test_config.py, tests/test_cli_secops.py — 109/109
Full suite: 664/665 (one pre-existing tests/test_transport.py::test_ask_waits_past_intermediate_ack_and_returns_answer flake that also hangs on main and is unrelated to this change)
After merge: pipx install --force ~/Projects/ainvirion/ctrlrelay + launchctl kickstart -k gui/501/com.ctrlrelay.ctrlrelay-poller and verify sqlite3 ~/.ctrlrelay/state.db "SELECT COUNT(*) FROM automation_decisions" grows after the next BLOCKED answer

Migration notes

Schema: no migration needed — automation_decisions already exists in v0.5.0.
Behavior: prior ask-policy answers given before this change are not retroactively populated; the rolling window starts filling on the first sweep after deploy. Expect 1-2 sweeps of "asks about the same PRs you've seen before" while operators re-answer for the persistence layer to learn them.

The 6am sweep was waking the operator with the same major-version questions every day because automation_decisions had a schema with no producer or consumer — a sweep that asked "approve PR #60?" yesterday asked the same thing today regardless of how it was answered. On top of that, the agent invented options outside the policy ladder when the policy gate couldn't fire (no CI configured, or CI failing on a pre-existing unrelated check). Wires the persistence layer end-to-end and pins three prompt edge cases the agent was reliably fumbling. ### Persistence Both halves of the loop now exist: - After the BLOCKED loop in run_secops_all receives an answer (and in resume_secops_from_pending for the out-of-band sweeper path), regex extracts every PR# from the question and records one automation_decisions row per PR# keyed on (repo, "dependabot_pr", "#N") with the operator's verbatim Telegram answer. Regex handles PR #60, bare #60, dedupes, and writes nothing when the question has no PR# (CodeQL-only questions, etc). - Before each sweep, run_secops_all pulls the last 30 days of decisions for that repo and threads them via ctx.extra['prior_decisions'] into _build_prompt, which renders a "Prior operator decisions (last 30 days)" block. Carries instructions to act on prior answers unless circumstances changed materially (different version, CI state flipped). The verbatim- answer approach avoids semantic parsing of Telegram replies — the agent already interprets free-form text well. Storing the verbatim answer rather than a canonical yes/no means the agent can distinguish "approve" from "approve once we bump idna in the lockfile" from "skip until next quarter" without us trying to model operator nuance upstream. ### Edge cases the prompt now covers - No CI configured (no .github/workflows/*.yml): the auto-merge-with- passing-CI gate cannot evaluate, so previously the agent freelanced. Prompt now: treat ALL Dependabot PRs as ASK regardless of tier and signal BLOCKED ONCE with a consolidated question per repo (not one per PR). - CI failing on a check unrelated to the PR's package (pip-audit flagging a transitive idna CVE while the PR bumps boto3): the agent was reporting the failure and moving on, leaving patch PRs stuck for weeks. Prompt now requires BLOCKED with a one-line question naming the underlying issue and a root-cause hint pulled from gh run view --log-failed. - Scope discipline: three legitimate exits (merge / leave / BLOCKED) with an explicit prohibition on batching PRs into a review PR, opening tracking issues, or proposing manual reviews. Cuts off the freelance options observed in production sweeps ("Want me to batch the patch + minor PRs into a single review PR like before?"). ### Tests 17 new tests cover: the state helpers (per-repo scoping, since_ts window, newest-first ordering), the PR# regex (all observed variants + dedupe + no-PR case), decision capture in both the in-loop BLOCKED path and the resume-from-pending path, prompt injection of prior decisions, and the three new prompt directives. 109 of the targeted test files green; full suite is 664/665 with one pre-existing test_transport.py flake that also hangs on main code.

Two correctness gaps in the persistence layer surfaced during review: 1. list_recent_automation_decisions filtered only by repo, so a `codeql_alert` (or any future `operation`) row in the same repo would render into the Dependabot-PR prompt as a "prior PR decision". A "suppress this CVE" answer for a CodeQL alert could end up being applied to a Dependabot PR. Added an optional `operation` kwarg, default None (full audit view preserved for debugging / future audit UI), and the secops pipeline now passes `operation="dependabot_pr"` at the read site. 2. The prompt rule said "act on prior decision only when circumstances are unchanged (different version bump, CI state flipped)" but the rendered entry only showed item_id + answer. The agent literally could not detect a force-pushed PR that swapped the version under the same PR number, so the rule was either unenforceable or forced re-asking every time (defeating the persistence layer). We were already storing the question snippet as `context` at write time but never reading it back. Now rendered inline as `(prior question: ...)` with a 240-char cap, and the rule directive explicitly points the agent at the snippet to compare. Older rows without context degrade to the item+answer form (not `prior question: None`). 3 new tests cover both gaps + the missing-context degradation path. Targeted suite 73/73.

oscarvalenzuelab added 2 commits May 25, 2026 22:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(secops): persist operator decisions and tighten edge-case prompt#135

feat(secops): persist operator decisions and tighten edge-case prompt#135
oscarvalenzuelab wants to merge 2 commits into
mainfrom
feat/secops-persist-decisions-and-edge-cases

oscarvalenzuelab commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

oscarvalenzuelab commented May 26, 2026

Why

What

Persistence (both halves of the loop now exist)

Edge cases the prompt now covers

Test plan

Migration notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant