Skip to content

feat(secops): persist operator decisions and tighten edge-case prompt#135

Open
oscarvalenzuelab wants to merge 2 commits into
mainfrom
feat/secops-persist-decisions-and-edge-cases
Open

feat(secops): persist operator decisions and tighten edge-case prompt#135
oscarvalenzuelab wants to merge 2 commits into
mainfrom
feat/secops-persist-decisions-and-edge-cases

Conversation

@oscarvalenzuelab
Copy link
Copy Markdown
Collaborator

Why

The 6am secops sweep was waking the operator with the same major-version questions every day because automation_decisions had a schema but no producer and no consumer — a sweep that asked "approve PR #60?" on Monday asked the same thing on Tuesday regardless of how it was answered. On top of that, the agent invented options outside the policy ladder when the gate couldn't fire (no CI configured, or CI failing on a pre-existing unrelated check like a transitive idna CVE in the lockfile blocking every Dependabot PR).

Example BLOCKED row from production that triggered this fix:

"Want me to batch the 4 patch + 4 minor PRs into a single review PR like before, or proceed merging the patch ones individually despite no CI?"

That option set was never in the prompt. The agent freelanced it because the policy ladder went silent on the "no CI" case.

What

Wires the persistence layer end-to-end and pins three prompt edge cases the agent was reliably fumbling.

Persistence (both halves of the loop now exist)

  • After the BLOCKED loop in run_secops_all receives an answer (and in resume_secops_from_pending for the out-of-band sweeper path), regex extracts every PR# from the question and records one automation_decisions row per PR# keyed on (repo, "dependabot_pr", "#N") with the operator's verbatim Telegram answer. Regex handles PR #60, bare #60, dedupes, and writes nothing when the question has no PR# (CodeQL-only questions, etc).
  • Before each sweep, run_secops_all pulls the last 30 days of decisions for that repo and threads them via ctx.extra['prior_decisions'] into _build_prompt, which renders a ## Prior operator decisions (last 30 days) block. Carries instructions to act on prior answers unless circumstances changed materially (different version, CI state flipped).

Storing the verbatim answer rather than a canonical yes/no means the agent can distinguish "approve" from "approve once we bump idna in the lockfile" from "skip until next quarter" without us trying to model operator nuance upstream.

Edge cases the prompt now covers

  • No CI configured (no .github/workflows/*.yml): treat ALL Dependabot PRs as ASK regardless of tier and signal BLOCKED ONCE with a consolidated question per repo (not one per PR).
  • CI failing on a check unrelated to the PR's package: signal BLOCKED with a one-line question naming the underlying issue and a root-cause hint pulled from gh run view --log-failed. Previously the agent reported the failure and moved on, leaving patch PRs stuck for weeks.
  • Scope discipline: three legitimate exits (merge / leave / BLOCKED) with an explicit prohibition on batching PRs into a review PR, opening tracking issues, or proposing manual reviews.

Test plan

  • 17 new unit tests covering state helpers (per-repo scoping, since_ts window, newest-first ordering), PR# regex (all observed variants + dedupe + no-PR case), decision capture in both the in-loop BLOCKED path and the resume-from-pending path, prompt injection of prior decisions, and the three new prompt directives
  • Targeted test suites green: tests/test_secops_pipeline.py, tests/test_state.py, tests/test_config.py, tests/test_cli_secops.py — 109/109
  • Full suite: 664/665 (one pre-existing tests/test_transport.py::test_ask_waits_past_intermediate_ack_and_returns_answer flake that also hangs on main and is unrelated to this change)
  • After merge: pipx install --force ~/Projects/ainvirion/ctrlrelay + launchctl kickstart -k gui/501/com.ctrlrelay.ctrlrelay-poller and verify sqlite3 ~/.ctrlrelay/state.db "SELECT COUNT(*) FROM automation_decisions" grows after the next BLOCKED answer

Migration notes

  • Schema: no migration needed — automation_decisions already exists in v0.5.0.
  • Behavior: prior ask-policy answers given before this change are not retroactively populated; the rolling window starts filling on the first sweep after deploy. Expect 1-2 sweeps of "asks about the same PRs you've seen before" while operators re-answer for the persistence layer to learn them.

The 6am sweep was waking the operator with the same major-version
questions every day because automation_decisions had a schema with no
producer or consumer — a sweep that asked "approve PR #60?" yesterday
asked the same thing today regardless of how it was answered. On top
of that, the agent invented options outside the policy ladder when
the policy gate couldn't fire (no CI configured, or CI failing on a
pre-existing unrelated check).

Wires the persistence layer end-to-end and pins three prompt edge
cases the agent was reliably fumbling.

### Persistence

Both halves of the loop now exist:

- After the BLOCKED loop in run_secops_all receives an answer (and
  in resume_secops_from_pending for the out-of-band sweeper path),
  regex extracts every PR# from the question and records one
  automation_decisions row per PR# keyed on (repo, "dependabot_pr",
  "#N") with the operator's verbatim Telegram answer. Regex handles
  PR #60, bare #60, dedupes, and writes nothing when the question
  has no PR# (CodeQL-only questions, etc).
- Before each sweep, run_secops_all pulls the last 30 days of
  decisions for that repo and threads them via
  ctx.extra['prior_decisions'] into _build_prompt, which renders a
  "Prior operator decisions (last 30 days)" block. Carries
  instructions to act on prior answers unless circumstances changed
  materially (different version, CI state flipped). The verbatim-
  answer approach avoids semantic parsing of Telegram replies —
  the agent already interprets free-form text well.

Storing the verbatim answer rather than a canonical yes/no means the
agent can distinguish "approve" from "approve once we bump idna in the
lockfile" from "skip until next quarter" without us trying to model
operator nuance upstream.

### Edge cases the prompt now covers

- No CI configured (no .github/workflows/*.yml): the auto-merge-with-
  passing-CI gate cannot evaluate, so previously the agent freelanced.
  Prompt now: treat ALL Dependabot PRs as ASK regardless of tier and
  signal BLOCKED ONCE with a consolidated question per repo (not one
  per PR).
- CI failing on a check unrelated to the PR's package (pip-audit
  flagging a transitive idna CVE while the PR bumps boto3): the
  agent was reporting the failure and moving on, leaving patch PRs
  stuck for weeks. Prompt now requires BLOCKED with a one-line
  question naming the underlying issue and a root-cause hint pulled
  from gh run view --log-failed.
- Scope discipline: three legitimate exits (merge / leave / BLOCKED)
  with an explicit prohibition on batching PRs into a review PR,
  opening tracking issues, or proposing manual reviews. Cuts off the
  freelance options observed in production sweeps ("Want me to batch
  the patch + minor PRs into a single review PR like before?").

### Tests

17 new tests cover: the state helpers (per-repo scoping, since_ts
window, newest-first ordering), the PR# regex (all observed variants
+ dedupe + no-PR case), decision capture in both the in-loop BLOCKED
path and the resume-from-pending path, prompt injection of prior
decisions, and the three new prompt directives. 109 of the targeted
test files green; full suite is 664/665 with one pre-existing
test_transport.py flake that also hangs on main code.
Two correctness gaps in the persistence layer surfaced during review:

1. list_recent_automation_decisions filtered only by repo, so a
   `codeql_alert` (or any future `operation`) row in the same repo
   would render into the Dependabot-PR prompt as a "prior PR
   decision". A "suppress this CVE" answer for a CodeQL alert could
   end up being applied to a Dependabot PR.

   Added an optional `operation` kwarg, default None (full audit
   view preserved for debugging / future audit UI), and the secops
   pipeline now passes `operation="dependabot_pr"` at the read site.

2. The prompt rule said "act on prior decision only when
   circumstances are unchanged (different version bump, CI state
   flipped)" but the rendered entry only showed item_id + answer.
   The agent literally could not detect a force-pushed PR that
   swapped the version under the same PR number, so the rule was
   either unenforceable or forced re-asking every time (defeating
   the persistence layer).

   We were already storing the question snippet as `context` at
   write time but never reading it back. Now rendered inline as
   `(prior question: ...)` with a 240-char cap, and the rule
   directive explicitly points the agent at the snippet to
   compare. Older rows without context degrade to the item+answer
   form (not `prior question: None`).

3 new tests cover both gaps + the missing-context degradation
path. Targeted suite 73/73.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant