feat(secops): persist operator decisions and tighten edge-case prompt#135
Open
oscarvalenzuelab wants to merge 2 commits into
Open
feat(secops): persist operator decisions and tighten edge-case prompt#135oscarvalenzuelab wants to merge 2 commits into
oscarvalenzuelab wants to merge 2 commits into
Conversation
The 6am sweep was waking the operator with the same major-version questions every day because automation_decisions had a schema with no producer or consumer — a sweep that asked "approve PR #60?" yesterday asked the same thing today regardless of how it was answered. On top of that, the agent invented options outside the policy ladder when the policy gate couldn't fire (no CI configured, or CI failing on a pre-existing unrelated check). Wires the persistence layer end-to-end and pins three prompt edge cases the agent was reliably fumbling. ### Persistence Both halves of the loop now exist: - After the BLOCKED loop in run_secops_all receives an answer (and in resume_secops_from_pending for the out-of-band sweeper path), regex extracts every PR# from the question and records one automation_decisions row per PR# keyed on (repo, "dependabot_pr", "#N") with the operator's verbatim Telegram answer. Regex handles PR #60, bare #60, dedupes, and writes nothing when the question has no PR# (CodeQL-only questions, etc). - Before each sweep, run_secops_all pulls the last 30 days of decisions for that repo and threads them via ctx.extra['prior_decisions'] into _build_prompt, which renders a "Prior operator decisions (last 30 days)" block. Carries instructions to act on prior answers unless circumstances changed materially (different version, CI state flipped). The verbatim- answer approach avoids semantic parsing of Telegram replies — the agent already interprets free-form text well. Storing the verbatim answer rather than a canonical yes/no means the agent can distinguish "approve" from "approve once we bump idna in the lockfile" from "skip until next quarter" without us trying to model operator nuance upstream. ### Edge cases the prompt now covers - No CI configured (no .github/workflows/*.yml): the auto-merge-with- passing-CI gate cannot evaluate, so previously the agent freelanced. Prompt now: treat ALL Dependabot PRs as ASK regardless of tier and signal BLOCKED ONCE with a consolidated question per repo (not one per PR). - CI failing on a check unrelated to the PR's package (pip-audit flagging a transitive idna CVE while the PR bumps boto3): the agent was reporting the failure and moving on, leaving patch PRs stuck for weeks. Prompt now requires BLOCKED with a one-line question naming the underlying issue and a root-cause hint pulled from gh run view --log-failed. - Scope discipline: three legitimate exits (merge / leave / BLOCKED) with an explicit prohibition on batching PRs into a review PR, opening tracking issues, or proposing manual reviews. Cuts off the freelance options observed in production sweeps ("Want me to batch the patch + minor PRs into a single review PR like before?"). ### Tests 17 new tests cover: the state helpers (per-repo scoping, since_ts window, newest-first ordering), the PR# regex (all observed variants + dedupe + no-PR case), decision capture in both the in-loop BLOCKED path and the resume-from-pending path, prompt injection of prior decisions, and the three new prompt directives. 109 of the targeted test files green; full suite is 664/665 with one pre-existing test_transport.py flake that also hangs on main code.
Two correctness gaps in the persistence layer surfaced during review: 1. list_recent_automation_decisions filtered only by repo, so a `codeql_alert` (or any future `operation`) row in the same repo would render into the Dependabot-PR prompt as a "prior PR decision". A "suppress this CVE" answer for a CodeQL alert could end up being applied to a Dependabot PR. Added an optional `operation` kwarg, default None (full audit view preserved for debugging / future audit UI), and the secops pipeline now passes `operation="dependabot_pr"` at the read site. 2. The prompt rule said "act on prior decision only when circumstances are unchanged (different version bump, CI state flipped)" but the rendered entry only showed item_id + answer. The agent literally could not detect a force-pushed PR that swapped the version under the same PR number, so the rule was either unenforceable or forced re-asking every time (defeating the persistence layer). We were already storing the question snippet as `context` at write time but never reading it back. Now rendered inline as `(prior question: ...)` with a 240-char cap, and the rule directive explicitly points the agent at the snippet to compare. Older rows without context degrade to the item+answer form (not `prior question: None`). 3 new tests cover both gaps + the missing-context degradation path. Targeted suite 73/73.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The 6am secops sweep was waking the operator with the same major-version questions every day because
automation_decisionshad a schema but no producer and no consumer — a sweep that asked "approve PR #60?" on Monday asked the same thing on Tuesday regardless of how it was answered. On top of that, the agent invented options outside the policy ladder when the gate couldn't fire (no CI configured, or CI failing on a pre-existing unrelated check like a transitiveidnaCVE in the lockfile blocking every Dependabot PR).Example BLOCKED row from production that triggered this fix:
That option set was never in the prompt. The agent freelanced it because the policy ladder went silent on the "no CI" case.
What
Wires the persistence layer end-to-end and pins three prompt edge cases the agent was reliably fumbling.
Persistence (both halves of the loop now exist)
run_secops_allreceives an answer (and inresume_secops_from_pendingfor the out-of-band sweeper path), regex extracts every PR# from the question and records oneautomation_decisionsrow per PR# keyed on(repo, "dependabot_pr", "#N")with the operator's verbatim Telegram answer. Regex handlesPR #60, bare#60, dedupes, and writes nothing when the question has no PR# (CodeQL-only questions, etc).run_secops_allpulls the last 30 days of decisions for that repo and threads them viactx.extra['prior_decisions']into_build_prompt, which renders a## Prior operator decisions (last 30 days)block. Carries instructions to act on prior answers unless circumstances changed materially (different version, CI state flipped).Storing the verbatim answer rather than a canonical yes/no means the agent can distinguish "approve" from "approve once we bump idna in the lockfile" from "skip until next quarter" without us trying to model operator nuance upstream.
Edge cases the prompt now covers
.github/workflows/*.yml): treat ALL Dependabot PRs as ASK regardless of tier and signal BLOCKED ONCE with a consolidated question per repo (not one per PR).gh run view --log-failed. Previously the agent reported the failure and moved on, leaving patch PRs stuck for weeks.Test plan
since_tswindow, newest-first ordering), PR# regex (all observed variants + dedupe + no-PR case), decision capture in both the in-loop BLOCKED path and the resume-from-pending path, prompt injection of prior decisions, and the three new prompt directivestests/test_secops_pipeline.py,tests/test_state.py,tests/test_config.py,tests/test_cli_secops.py— 109/109tests/test_transport.py::test_ask_waits_past_intermediate_ack_and_returns_answerflake that also hangs onmainand is unrelated to this change)pipx install --force ~/Projects/ainvirion/ctrlrelay+launchctl kickstart -k gui/501/com.ctrlrelay.ctrlrelay-pollerand verifysqlite3 ~/.ctrlrelay/state.db "SELECT COUNT(*) FROM automation_decisions"grows after the next BLOCKED answerMigration notes
automation_decisionsalready exists in v0.5.0.ask-policy answers given before this change are not retroactively populated; the rolling window starts filling on the first sweep after deploy. Expect 1-2 sweeps of "asks about the same PRs you've seen before" while operators re-answer for the persistence layer to learn them.