From f69a82d5c38c9ea3c8517a0847d4954b345e732a Mon Sep 17 00:00:00 2001 From: Claude Date: Sat, 13 Jun 2026 21:26:00 +0000 Subject: [PATCH 1/9] docs(research): diagnose /ba:plan review-cost verbosity Evidence-based diagnosis of where reviewing a plan doc costs too much. Volume map across all 24 plan docs by detail level and section (code vs prose split); review-value inferred from the user's keep-code criterion, the hand-built HTML companion workaround (PR #15), and the existing plan-iteration LoC gate. Finds the cost is the template's framing tax at COMPREHENSIVE (altitude redundancy, intra-doc and brainstorm->plan duplication), not the code blocks. Proposes a restructure + demotion direction with a line target, no code or scope content removed. https://claude.ai/code/session_014UWUxvppgQFWuYRTFz8ssu --- .../2026-06-13-plan-verbosity-research.md | 247 ++++++++++++++++++ 1 file changed, 247 insertions(+) create mode 100644 docs/research/2026-06-13-plan-verbosity-research.md diff --git a/docs/research/2026-06-13-plan-verbosity-research.md b/docs/research/2026-06-13-plan-verbosity-research.md new file mode 100644 index 0000000..f2ece11 --- /dev/null +++ b/docs/research/2026-06-13-plan-verbosity-research.md @@ -0,0 +1,247 @@ +--- +title: Where the review cost lives in /ba:plan output — a verbosity diagnosis +type: research +status: complete +date: 2026-06-13 +tags: [plan, verbosity, review-cost, form, dev-workflow] +scope: form (how/where plan content is presented for review) — NOT purpose (what a plan is for) +evidence: 24 plan docs, plan.md + sibling command specs, 1 session transcript, 7 PRs +--- + +## Summary + +**Problem (fixed):** reviewing a `/ba:plan` doc costs too much to read. + +**Diagnosis:** The cost is **not** the code blocks (kept on purpose, ~40% of volume) and **not** +the core review payload (scope, behaviors-to-test, success criteria, the exact file changes). The +cost is a **framing tax that the template imposes by level**: COMPREHENSIVE plans mandate ~10 prose +sections, several of which restate the same thing at different altitudes (Overview vs Proposed +Solution vs Technical Approach), duplicate content that already lives elsewhere in the same doc +(Documentation Plan, Testing Strategy), or re-host decision framing the **brainstorm already owns** +(scope rationale, alternatives considered). The pain concentrates almost entirely at the +COMPREHENSIVE level (avg 1,040 lines; worst 1,151), is mild at STANDARD (avg 510), and is absent at +MINIMAL (avg 179 — already near-ideal). + +**Recommended direction (not a blanket trim):** a **targeted restructure + selective demotion**, +concentrated at COMPREHENSIVE — (1) put the high-scrutiny "review spine" up top in a fixed order, +(2) demote low-scrutiny framing to a collapsed appendix, (3) collapse the 3–4 overlapping "approach" +sections into one, (4) pointer-ize (not restate) the brainstorm→plan decision duplication, and +(5) make code blocks *cheaper to scan* (per-file intent headers), never removed. A read-optimized +companion view is a real option — the user already hand-built one (PR #15's HTML render) — but it +treats the navigation symptom, not the content cause. + +**Proposed target:** cut the non-code framing tax ~40–50% so COMPREHENSIVE lands at ~650–750 lines +and STANDARD at ~400, **without removing a single line of code, acceptance criterion, behavior, or +scope boundary.** MINIMAL is left alone. + +--- + +## Evidence base & method + +| Source | What it gave | Reliability | +|---|---|---| +| 24 plan docs in `docs/plans/` (55–1,151 lines, 12,340 total) | Volume by level, by section, code-vs-prose split | **High** — the actual output | +| `commands/ba/plan.md` + `brainstorm.md`, `slice.md`, `review-plan.md` | Template structure; division of labor; duplication seams | **High** | +| Session transcript (`2de4a809-…jsonl`) | — | **None** — it is *this* research session; self-referential, no plan-review behavior captured | +| 7 PRs (#2–#18) incl. plan-bearing #12, #15 | Behavioral workaround signals from PR bodies | **Medium** — no inline review comments exist (solo repo; review happens in-session) | + +**Honest gap:** the handoff's best-hoped evidence — the user's actual skim-vs-scrutinize behavior — +is **not directly observable** here. No usable transcript, no PR review comments. Review-value below +is therefore **inferred** from three indirect signals, flagged as such: +1. The user's one hard stated criterion (keep code blocks — getting code roughly right at plan time + prevents rewrites later). +2. A **behavioral workaround**: for PR #15's 1,063-line plan the user hand-built a companion HTML + render ("sticky TOC, scrollspy, phase-gate callouts") and told reviewers "HTML render recommended + for the 1063-line scroll." → review cost is dominated by **navigation/scroll of large docs**. +3. The system **already encodes "plan growth is bad"**: `/ba:review-plan` Step 5.5 runs a + `plan-iteration-gate` that tracks plan-body LoC delta across review rounds and fires a reminder at + iteration ≥ 3. Verbosity is already a recognized failure mode internally. + +--- + +## Finding 1 — Volume map: where the lines are + +### By detail level + +| Level | n | avg total | avg code | code % | +|---|---|---|---|---| +| MINIMAL | 6 | **179** | 38 | 22% | +| STANDARD | 16 | **510** | 202 | 40% | +| COMPREHENSIVE | 3 | **1,040** | 450 | 43% | + +The cost curve is steep and level-driven. MINIMAL is already cheap. The review pain the handoff +describes is a COMPREHENSIVE (and high-end STANDARD) phenomenon. + +### By section (aggregate lines across all 24 docs; code vs prose split) + +| Section | docs | total | code | prose | +|---|---|---:|---:|---:| +| Changes Required | 10 | 2,244 | 1,412 | 832 | +| Phase N bodies | 4 | 2,217 | 1,339 | 878 | +| Convention Compliance | **25** | 359 | 0 | 359 | +| Behaviors to Test | 14 | 351 | 22 | 329 | +| Success Criteria | 15 | 293 | 0 | 293 | +| What We're NOT Doing | **25** | 277 | 0 | 277 | +| Current State | 19 | 242 | 0 | 242 | +| Proposed Solution | 19 | 207 | 0 | 207 | +| Dependencies & Risks | 18 | 159 | 0 | 159 | +| System-Wide Impact | 19 | 135 | 0 | 135 | +| Technical Considerations | 16 | 126 | 0 | 126 | +| Overview | 19 | 92 | 0 | 92 | +| Sources / Internal References | 19 | 188 | 0 | 188 | + +Two-thirds of total volume is the **implementation body** (Changes Required + Phase bodies), and it +is ~60% code. The remaining third is a **long tail of recurring prose sections present in nearly +every doc** — that tail is where the cheap wins are. + +--- + +## Finding 2 — Review-value: high-scrutiny payload vs skim-tax (inferred) + +Bucketing each recurring section by *inferred* review-value (criteria above; honest about the gap): + +**HIGH value — the review spine (the user must read these to trust the plan):** +- **The implementation body** — Changes Required / Phase bodies. The exact file paths + actual + code/markdown. This is the user's stated keep-criterion. *Keep; make cheaper to scan.* +- **Behaviors to Test** + **Success Criteria** + **Acceptance Criteria** — the testable contract; + concrete and checkable. *Keep.* +- **What We're NOT Doing** — scope boundary; checkable. *Keep — but see Finding 4 (duplicated from + brainstorm).* + +**LOW value — skim-tax (template-mandated prose that rarely earns its space):** +- **Overview / Proposed Solution / Technical Approach / Architecture** — in a COMPREHENSIVE plan you + get 3–4 sections narrating the same approach at different altitudes before any code appears. + Collapsible to one short "Approach." +- **Risk Analysis & Mitigation** *vs* **Dependencies & Risks** — two risk sections saying + overlapping things. +- **System-Wide Impact** (5 mandated subsections at COMPREHENSIVE: Interaction Graph, Error + Propagation, State Lifecycle, API Surface Parity, Integration Test Scenarios) — high prose, often + speculative; rarely all 5 apply. +- **Testing Strategy** / **Documentation Plan** — frequently *pure duplication*: the propose plan's + Documentation Plan just re-lists files already in the implementation body; its Testing Strategy + says "this repo has no test harness" and points back to scenarios listed above. +- **Convention Compliance** — in all 25 docs (~14 lines each); an audit checklist with low + *ongoing* review value once resolved. +- **Sources / Internal References** — reference apparatus, skimmed. + +The skim-tax group (Overview + Proposed Solution + Technical Considerations + Current State + +System-Wide Impact + Dependencies/Risk + Convention Compliance + Sources) totals **~1,500 prose +lines across the corpus** — roughly **60–80 low-scrutiny prose lines per STANDARD/COMPREHENSIVE doc, +none of it code.** That is the target. + +--- + +## Finding 3 — The repo-specific twist: "code %" undercounts the payload + +This plugin's *deliverable is markdown* (command specs). So a plan's "code blocks" are often +markdown, and large stretches of the implementation body that *look* like prose (e.g. the propose +plan's `## Important Guidelines`, 143L, and its embedded `Step 2/3/5` bodies, ~160L each) are +actually the **to-be-written command text** — exactly the content the user keeps on purpose to avoid +rewrites. **Implication: a naïve prose-trim would hit the highest-value content.** The fix must +distinguish *deliverable payload* (the command text being authored — keep) from *meta-framing about +the deliverable* (Overview, Risk Analysis, Documentation Plan — demote/merge). The lever is the +framing, not the body. + +--- + +## Finding 4 — Duplication at the brainstorm→plan seam + +Decision framing is **owned by `/ba:brainstorm`** (`## Why This Approach`, `## Scope Boundaries`, +`## Key Decisions`, `## Rejected Designs`/Locked Design). But `plan.md` **re-hosts it**: + +- Step 0 (`plan.md:39–49`): "Extract and carry forward **ALL** … Scope boundaries (What We're NOT + Doing) … Chosen approach and why alternatives were rejected … Acceptance criteria." +- Step 6 cross-check (`plan.md:457–471`) forces every brainstorm decision back into the plan body. +- COMPREHENSIVE's `### Alternative Approaches Considered` is a near-verbatim re-derivation of the + brainstorm's `## Why This Approach` / `## Rejected Designs`. + +Step 0 says to reference decisions via `(see brainstorm: …)` and *not paraphrase* — but Step 6's +"every decision must be reflected" pushes toward **restating**. The net effect: scope, alternatives, +and acceptance criteria are written twice (brainstorm + plan). For a brainstorm-originated +COMPREHENSIVE plan this is a meaningful, avoidable chunk of the framing tax — and `slice`/`review-plan` +already prove the repo's "one source of truth" instinct (slice annotates in place; review-plan edits +in place). The plan should **pointer-ize** carried decisions, not re-narrate them. + +--- + +## Where the cost actually lives (the verdict) + +1. **Level inflation** is the biggest single driver: COMPREHENSIVE is 2× STANDARD and 6× MINIMAL. + The template offers no middle demotion path, so plans land COMPREHENSIVE and absorb all ~10 prose + sections. +2. **Altitude redundancy**: 3–4 sections narrate the approach before the code; 2 sections narrate + risk; 5 mandated System-Wide-Impact subsections. +3. **Intra-doc duplication**: Documentation Plan / Testing Strategy restate the implementation body. +4. **Cross-doc duplication**: scope + alternatives + acceptance criteria restated from the brainstorm. +5. **Navigation**: even well-structured, a 1,000-line doc is costly to scroll — the user's HTML + workaround targets exactly this. + +Code blocks are **not** in this list. They are not the problem. + +--- + +## Solution options (open) — with recommendation + +Ranked by leverage-to-risk. These are **form** changes (presentation/structure), not purpose changes. + +1. **★ Restructure into a "review spine" + demoted appendix (recommended core).** Fixed top-of-doc + order: scope (What We're NOT Doing) → Behaviors to Test → the file-by-file changes (code) → + Success Criteria. Everything low-scrutiny (Risk Analysis, Testing Strategy, Documentation Plan, + full System-Wide Impact, Sources, Convention Compliance) moves below a `
`-collapsed + "Appendix / supporting analysis" fold. Nothing is deleted; the *first screenful* becomes the + review surface. Mirrors the already-shipped collapsed-section pattern + (`2026-06-07-feat-preexisting-collapsed-section`). +2. **★ Collapse overlapping framing (recommended).** Merge Overview + Proposed Solution + Technical + Approach → one short **Approach** (≤8 lines). Merge Dependencies & Risks + Risk Analysis → one + **Risks** table. Make System-Wide Impact's 5 subsections *include-only-if-it-applies* instead of + always-5. Drop Documentation Plan + Testing Strategy when they only restate the body. +3. **★ Pointer-ize brainstorm duplication (recommended).** When `origin:` is set, the plan cites + `(see brainstorm: …#section)` for alternatives/scope rationale and keeps only the one-line scope + checklist — relax Step 6 from "reflect every decision" to "link every decision." +4. **★ Make code cheaper to scan, not removed (recommended — honors the hard constraint).** Precede + each code block with a one-line `**File** — intent` header and order changes consistently so the + user scans diff-shaped content fast. This is the only change that touches the body, and it *adds* + scannability without cutting. +5. **Demote COMPREHENSIVE itself.** Add guidance/tripwire steering toward STANDARD unless 10+ files + *and* phasing are genuinely needed; most current COMPREHENSIVE plans would survive as STANDARD. +6. **Companion read-optimized view (secondary).** Auto-generate the HTML/TOC render the user already + builds by hand. Real value for navigation, but a band-aid: it leaves the full doc's content cost + intact and adds a generated artifact to maintain. Do this only *after* 1–4. +7. **Do nothing / "cost is elsewhere" (rejected).** The evidence does not support this: the framing + tax is real, measured, and concentrated. + +**Division-of-labor check (per handoff):** none of 1–6 duplicates a sibling. brainstorm still owns +decision framing (option 3 *reduces* plan's encroachment); slice still owns delivery order; +review-plan still owns verdicts + the iteration/LoC gate (options 1–3 would *lower* the LoC the gate +polices). No new responsibility is invented. + +--- + +## Proposed target + +| Level | now (avg) | target | how | +|---|---|---|---| +| MINIMAL | 179 | **leave as-is** | already near-ideal | +| STANDARD | 510 | **~400** | options 2–4 | +| COMPREHENSIVE | 1,040 | **~650–750** | options 1–4 (spine + fold + merge + pointer-ize) | + +Hard floor: **no code, acceptance criterion, behavior, or scope line is removed** — the reduction is +entirely framing collapse, intra/cross-doc de-duplication, and demotion below a fold. Success is +"the review spine fits in the first 1–2 screens; the rest is reachable but folded," not a raw line +count. + +--- + +## Explicitly out of scope / constraints respected + +- **Purpose untouched.** No redefinition of what a plan is for; only how/where content is presented. +- **Code blocks preserved.** Made cheaper to scan (option 4), never cut. +- **No new sibling responsibility.** Decision framing stays in brainstorm. + +## Suggested next step + +A `/ba:plan` template-revision plan implementing options 1–4 (spine + fold + section merges + +pointer-ize the brainstorm seam + per-file intent headers), with option 5 (COMPREHENSIVE-demotion +nudge) as a small add and option 6 (auto companion view) parked as a follow-up. Validate by +re-rendering 2–3 existing COMPREHENSIVE plans under the new structure and confirming the review spine +lands in the first 1–2 screens. From 1de2a29572260ad64975f30e9f6f2e19476b2bcd Mon Sep 17 00:00:00 2001 From: Claude Date: Sat, 13 Jun 2026 21:41:14 +0000 Subject: [PATCH 2/9] docs(research): label plan-verbosity pass as md-only repo scope Add a scope banner naming what this pass could not see (real code repo, larger plan corpus, execution transcripts, PR review comments) so the planned code-repo pass and later consolidation know which findings are measured (volume) vs hypotheses to validate (review-value). https://claude.ai/code/session_014UWUxvppgQFWuYRTFz8ssu --- docs/research/2026-06-13-plan-verbosity-research.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/docs/research/2026-06-13-plan-verbosity-research.md b/docs/research/2026-06-13-plan-verbosity-research.md index f2ece11..380f875 100644 --- a/docs/research/2026-06-13-plan-verbosity-research.md +++ b/docs/research/2026-06-13-plan-verbosity-research.md @@ -8,6 +8,15 @@ scope: form (how/where plan content is presented for review) — NOT purpose (wh evidence: 24 plan docs, plan.md + sibling command specs, 1 session transcript, 7 PRs --- +> **Scope of this pass (read first).** This is the **md-only-repo pass**, run against the +> `dev-workflow` plugin repo itself — whose deliverable *is* markdown, so plan "code blocks" are +> command specs (see Finding 3). It did **not** have access to: a real code (non-md) repo, a larger +> plan corpus, actual execution transcripts, or inline PR review comments. A separate **code-repo +> pass** is planned on another machine and will be pushed to this branch for consolidation. +> Treat the volume findings as solid (measured), but treat every **review-value** judgment as a +> hypothesis to validate against the code-repo pass's transcripts and PR comments — that is the +> evidence this pass had to infer around (see "Honest gap"). + ## Summary **Problem (fixed):** reviewing a `/ba:plan` doc costs too much to read. From 5733e5019122c42db701a5e533ecbcdd88f8e338 Mon Sep 17 00:00:00 2001 From: Claude Date: Sat, 13 Jun 2026 22:34:15 +0000 Subject: [PATCH 3/9] docs(research): add code-repo pass prompt for plan-verbosity research Self-contained prompt to run the second (code-repo) plan-verbosity pass: bakes in the fixed problem/scope, points at behavioral evidence (transcripts + PR review comments) the md-only pass lacked, lists the prior hypotheses to validate or overturn, and instructs pushing a new file to this branch for consolidation. https://claude.ai/code/session_014UWUxvppgQFWuYRTFz8ssu --- docs/research/CODEREPO-PASS-PROMPT.md | 72 +++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) create mode 100644 docs/research/CODEREPO-PASS-PROMPT.md diff --git a/docs/research/CODEREPO-PASS-PROMPT.md b/docs/research/CODEREPO-PASS-PROMPT.md new file mode 100644 index 0000000..2014d81 --- /dev/null +++ b/docs/research/CODEREPO-PASS-PROMPT.md @@ -0,0 +1,72 @@ +# Plan-verbosity research — code-repo pass (consolidates with an earlier md-only pass) + +> This prompt is meant to be run on a machine with a **real code repository** (not the dev-workflow +> plugin repo, whose deliverable is markdown). It is the second of two research passes; the first +> (md-only) lives at `docs/research/2026-06-13-plan-verbosity-research.md` on branch +> `claude/plan-verbosity-research-3al1it`. Paste everything below the line as the session prompt. + +--- + +## Operate in research mode +Investigate and document only — do NOT modify the `/ba:plan` command (or any command). Diagnose; +the fix is a separate later step. Persist findings to a durable doc under `docs/research/` named +`YYYY-MM-DD-plan-verbosity-coderepo-research.md` so it can be consolidated with a prior pass. + +## The problem (fixed — do not redrift) +Reviewing a `/ba:plan` output doc costs me too much to read. That's the whole problem. +This is about the FORM of plan output (how/where content is presented for review), NOT its PURPOSE +(what a plan is for). Changing presentation = in scope. Redefining what a plan is, or "should this +command exist" = out of scope. + +## Why this pass exists: there's a prior pass, and it was evidence-starved +I already ran this on my dev-workflow *plugin* repo. That repo's deliverable is markdown, so its +plans are atypical (plan "code blocks" are command specs), the corpus was small (24 plans), and it +had NO usable execution transcripts and NO inline PR review comments — so its review-value findings +are INFERRED, not observed. That doc's diagnosis: the cost is a template "framing tax" at the +COMPREHENSIVE detail level (redundant approach/risk sections, intra-doc duplication like +Testing-Strategy/Documentation-Plan restating the body, and brainstorm→plan duplication of +scope/alternatives) — NOT the code blocks. Volume target it proposed: COMPREHENSIVE 1040→~650-750, +STANDARD 510→~400, MINIMAL left alone. + +THIS repo is a real code repo with more plans, real transcripts, and PR review threads — the +behavioral evidence the prior pass lacked. Your job is to get the FULL picture and either +corroborate or overturn the prior diagnosis. + +## Start from MY actual evidence — do not take my self-diagnosis at face value +1. Pull real plan docs across ALL three detail levels (MINIMAL/STANDARD/COMPREHENSIVE), including a + painful large COMPREHENSIVE one and a lean MINIMAL one. Sample every level. +2. Bucket their content by section × (volume, review-value). Measure volume (lines per section, + code vs prose). This part is objective — do it carefully. +3. For review-value, use BEHAVIORAL evidence, not my claims: + - Conversation transcripts (.jsonl under ~/.claude/projects//) — what I skim, where I + intervene mid-flight, what I ask to change/cut/expand when a plan is on screen. + - My PR review comments / review threads on plan-bearing PRs — what I actually flag. + Infer what I scrutinize vs skim from these, then map it onto the section buckets. + +## Things I know about my own review (inputs to verify, not gospel) +- Code blocks are bulky but I keep them on PURPOSE — getting code roughly right at plan time stops + me rewriting later. Do NOT propose cutting them; at most, make them cheaper to scan. +- My self-reported review criteria are lossy. Prefer inference from transcripts/PR comments. + +## Specifically validate or overturn the prior pass's hypotheses +- Do the framing sections (Overview/Proposed Solution/Technical Approach, Risk Analysis, Testing + Strategy, Documentation Plan, System-Wide Impact) actually get skimmed in my real transcripts? +- Does the "cost lives in framing, not code" claim hold when code blocks are real code, not + markdown specs? Re-derive the prose/code ratio and the volume targets from THIS corpus. +- Is the COMPREHENSIVE-level inflation the dominant driver here too, or is the cost distributed + differently? +- Use the SAME section-taxonomy buckets so the two volume maps merge directly. + +## Don't duplicate sibling commands +Before proposing new structure, check what brainstorm/slice/review-plan (or this repo's equivalents) +already own. Don't have `plan` re-host a responsibility a sibling already covers. + +## Output +A research doc that: (a) presents the volume map and the BEHAVIORAL review-value findings for THIS +repo; (b) explicitly states where it agrees with / contradicts the md-only pass; (c) proposes a fix +direction (trim, compress, demote, restructure, progressive disclosure, a companion read view, or +"leave it — cost is elsewhere") with an evidence-based line/time target. Goal: plan docs that are +"cheap and trustworthy to review." Then commit and push to branch +`claude/plan-verbosity-research-3al1it` (the same branch holding the md-only pass) so I can +consolidate both. Pull the branch first; add a new file, don't overwrite the existing +`*-plan-verbosity-research.md`. From c8092391b2f2b186a0f6ed30c1012b56f034f9e8 Mon Sep 17 00:00:00 2001 From: Bruno Azevedo Date: Mon, 15 Jun 2026 00:31:26 +0100 Subject: [PATCH 4/9] docs: consolidate plan-verbosity research + brainstorm justification-gated plan code MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit research: sanitized consolidation of the md-only and code-repo plan-verbosity passes into one diagnosis — combined volume map, reconciled review-value (corroborated/overturned/unresolved), re-derived per-level targets, and fix direction. brainstorm: justification-gated hybrid for /ba:plan code — decisions-by-default (approach, exact file paths, patterns, test scenarios); real code only behind a Code-shape decision label, anchored to brainstorm Locked Design. Reframes plan code as design validation, not rework-prevention; review-plan restructure deferred. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...plan-code-justification-gate-brainstorm.md | 123 ++++++++ ...14-plan-verbosity-consolidated-research.md | 294 ++++++++++++++++++ 2 files changed, 417 insertions(+) create mode 100644 docs/brainstorms/2026-06-15-plan-code-justification-gate-brainstorm.md create mode 100644 docs/research/2026-06-14-plan-verbosity-consolidated-research.md diff --git a/docs/brainstorms/2026-06-15-plan-code-justification-gate-brainstorm.md b/docs/brainstorms/2026-06-15-plan-code-justification-gate-brainstorm.md new file mode 100644 index 0000000..57b250e --- /dev/null +++ b/docs/brainstorms/2026-06-15-plan-code-justification-gate-brainstorm.md @@ -0,0 +1,123 @@ +--- +date: 2026-06-15 +topic: plan-code-justification-gate +status: approved +triage_level: full +tags: [plan, execute, brainstorm, pipeline, verbosity, keep-code] +--- + +# Justification-Gated Hybrid: Decisions-by-Default Code in /ba:plan + +## What We're Building + +A change to how the dev-workflow planning pipeline treats code in plan documents. Today `/ba:plan` +mandates literal implementation code in every template tier (`Include **actual code** — not +descriptions of code`). We're moving to a **justification-gated hybrid**: + +- **Default** plan content is *decisions* — approach, scope boundaries, exact file paths, + patterns-to-follow, and test scenarios (pseudo-code allowed for shape, not literal implementation). +- A **real code block is permitted only** where the code's *shape* is itself the design decision, + and only when flagged with an explicit `**Code-shape decision:** ` + label. +- When a brainstorm precedes the plan, code-bearing blocks **anchor to its `## Locked Design`**. + +This reframes the *purpose* of code in a plan from **rework-prevention** (unmeasured, and already +half-disabled by `/ba:execute`) to **design validation** — a forcing function applied precisely where +the design is non-obvious. It is a form/contract change, not a redefinition of what a plan is for. + +## Why This Approach + +Provenance forced the question. `/ba:plan`'s keep-code rule is inherited ce-plan heritage, and ce +reversed it ("separate planning from implementation") six days after our fork, then hardened a no-code +model for 2.5+ months. Our own consolidated verbosity research +(`docs/research/2026-06-14-plan-verbosity-consolidated-research.md`) found the implementation body is +60–80% of every painful plan and the #1 review-cost driver is prose *inside* the code — but that +research took keep-code as a fixed input and never tested dropping it. + +Three approaches were weighed: + +- **(a) Keep full code** (status quo) — **rejected.** It is the volume and the cost, its stated + benefit (rework-prevention) is unmeasured, and `execute.md:213` already treats plan code as a + "structural reference, not complete comment inventory" — so the mechanism that would prevent + rewrites is already disabled. +- **(b) Pure decisions-only** (ce's headline) — **rejected.** Throws away the design-validation + forcing function that has real value where the shape is non-obvious. +- **(c) Justification-gated hybrid** — **chosen.** It is where ce *actually* landed (their carve-out: + code allowed "unless the plan is about code shape as a design artifact"), and it matches the + reviewer's own split instinct — code earns its place only where shape is the risk, and doesn't + the rest of the time. It also dissolves the unmeasured-benefit problem by changing *why* code is + kept (design validation, not rework-prevention), so no rework measurement is required. + +## Key Decisions + +- **Hybrid, not binary** — decisions-by-default; literal code only where shape is the design decision. + *Rationale:* preserves the forcing function where it pays, removes it where it is pure cost. +- **Trigger = justification-gated** — a real code block requires a `**Code-shape decision:** ` label. *Rationale:* self-policing, works with or without a brainstorm, and the label + simultaneously fills the under-served "design rationale" gap the research identified. +- **Anchor to Locked Design** — when a brainstorm origin exists, code-bearing blocks lock onto its + `## Locked Design` (interface, signatures, invariants, error modes). *Rationale:* the pipeline + already has this binding decisions-only artifact; generalize it rather than invent a new one. Wire + `plan.md` Step 0 to name it as a source (it currently doesn't). +- **Purpose reframe** — code in a plan is for *design validation*, not *rework-prevention*. + *Rationale:* removes the need to measure rework (moot) and aligns with how `/ba:execute` already + behaves. +- **Plan stays the authority** — the decisions-first contract must keep the plan unambiguously + authoritative so `/ba:execute` implements the plan's decisions and does not silently invent build + choices the plan deliberately left as pseudo-code. *Rationale:* preserves the execution-commands + convention (the plan is the authority on what to build). +- **Review-plan restructure deferred** — not bundled here. *Rationale:* orthogonal to keep-code, and + the hybrid changes review-plan's inputs (lighter, code-light plans), so decide its fate after the + hybrid lands. [considered, deferred] + +## Scope Boundaries + +- **NOT** redefining what a plan is *for* — purpose untouched; form/contract only. +- **NOT** pure decisions-only — the code-shape carve-out stays. +- **NOT** measuring rework-prevention — moot under the design-validation reframe. +- **NOT** changing `/ba:review-plan` in this effort (deferred). Also out: the integrated-economy-pass + idea and the auto-HTML-companion idea (both parked for separate evaluation). +- **NOT** changing the MINIMAL/STANDARD/COMPREHENSIVE tier *purposes* — the decisions-default applies + across all tiers (tiers scale context/phasing, not the presence of code). + +## Acceptance Criteria + +- `plan.md`'s three templates (MINIMAL / STANDARD / COMPREHENSIVE) default to decisions + pseudo + + exact file paths + patterns-to-follow + test scenarios; literal code appears only under a + `**Code-shape decision:** ` label. +- The "Key rules for all templates" section no longer mandates "actual code, not descriptions"; it + states the justification-gated rule instead. +- `plan.md` Step 0 names `## Locked Design` as a binding source when a brainstorm origin exists. +- `execute.md`'s default path writes code from the plan's decisions; literal-code reuse is reserved + for `**Code-shape decision:**` blocks; the "follow it exactly / already reviewed" framing is + softened toward "implement to the plan's decisions"; Step 1.5b LoC projection default flips to + estimate-from-description. +- `review-plan.md` and `brainstorm.md` require no contract-breaking change (confirmed by the + pipeline-contract map). +- **Validation:** re-render one STANDARD and one COMPREHENSIVE plan under the new rule — code volume + drops to only justified blocks, and the design rationale (the "why") now appears where code used to. + +## Open Questions + +None — trigger rule, purpose reframe, and review-plan scope were all resolved in dialogue. + +## Convention Compliance + +Convention-checker run before write: **0 violations**, 4 aligned, 4 N/A. The design is convention-clean +and *strengthens* the "planning commands never write code" rule (it pushes `/ba:plan` further toward a +documenter role). Downstream requirements to carry into the plan/execute stage (not brainstorm +violations): + +1. **README.md sync** — update command descriptions/contracts that describe plan output as code-bearing. +2. **plugin.json version bump** — current `0.23.0`; bump on the shipped change (it is the auto-update + cache key — do not defer). +3. **Cross-file consistency** — the `**Code-shape decision:**` label and the `## Locked Design` anchor + wording must stay in sync across `plan.md`, `execute.md`, and any README description (mirror the + existing `/ba:review` "keep them in sync" discipline). + +## Next Steps + +→ `/ba:plan` to create the implementation plan. Primary target: `plan.md` template rewrite + inverted +"Key rules" + Step 0 Locked-Design wiring (HIGHEST blast radius). Secondary: `execute.md` default-flip +and softened "follow exactly" framing (MODERATE). Include the README sync and version bump as explicit +plan steps. diff --git a/docs/research/2026-06-14-plan-verbosity-consolidated-research.md b/docs/research/2026-06-14-plan-verbosity-consolidated-research.md new file mode 100644 index 0000000..9b9e4c5 --- /dev/null +++ b/docs/research/2026-06-14-plan-verbosity-consolidated-research.md @@ -0,0 +1,294 @@ +--- +title: Where the review cost lives in /ba:plan output — consolidated diagnosis +type: research +status: complete +date: 2026-06-14 +tags: [plan, verbosity, review-cost, form, dev-workflow, consolidated] +scope: form (how/where plan content is presented for review) — NOT purpose (what a plan is for) +consolidates: + - docs/research/2026-06-13-plan-verbosity-research.md # md-only pass (plugin repo) + - "[internal] code-repo pass, 2026-06-14" # real code repo; local-only, withheld +sanitization: SANITIZED — generic section names, ratios, and line counts only. No internal repo/file/ + feature/ticket names and no verbatim transcript or review-comment quotes. Findings that can only be + supported by an internal specific are marked "[internal evidence, withheld]". Safe to push. +--- + +> **What this is.** A single reconciliation of two prior passes that asked the same question — *why +> does reviewing a `/ba:plan` doc cost too much to read?* Pass A (**md-only**) ran against a +> markdown-deliverable repo: volume was measured but review-value was **inferred**. Pass B +> (**code-repo**) ran against a real code repo with execution transcripts: review-value is +> **observed behavioral evidence**. Where B confirms A, the claim is promoted to a finding. Where B +> contradicts A, A's claim is dropped or revised and the overturning evidence is named. +> +> **Sensitivity.** The code-repo pass contains company-internal data. This consolidated doc is the +> sanitized public-safe version: ratios, line counts, and patterns only. The internal source doc is +> kept local and must not be pushed. + +## Summary + +**Problem (fixed):** reviewing a `/ba:plan` doc costs the reviewer too much to read. This is about +the **form** of plan output (how/where content is presented), not its **purpose**. Code blocks are +kept on purpose and are never proposed for cutting — at most made cheaper to scan. + +**Reconciled diagnosis.** The two passes agree on *what is skimmed* and disagree on *where the volume +is* — and the disagreement is fully explained by the deliverable type. + +1. **Both passes agree the standalone framing sections are low-scrutiny skim-tax.** The md-only pass + inferred it; the code-repo transcripts **confirm it on direct evidence** — the reviewer never + engages Overview / Proposed Solution / Technical Considerations / Risk Analysis / Testing Strategy + / Documentation Plan / System-Wide Impact / Sources / Convention Compliance as content during + review. +2. **The code-repo pass overturns the md-only headline that framing is the *volume*.** In a real code + repo the framing sections are only **~5% of a COMPREHENSIVE doc and ~14% of a STANDARD doc**. The + volume the reviewer pays to read is the **implementation body** (60–80% of every painful plan) — + code plus the prose interleaved *with* the code. +3. **The dominant, previously-invisible cost is prose *inside* the code blocks** — comment/JSDoc + verbosity. This was undetectable in a markdown repo (its "code" was command specs) and is the #1 + recurring review-cost complaint in the real-repo transcripts. +4. **Design-tension rationale is under-served** — the one place the reviewer wants *more* prose, not + less. Verbosity is not uniformly the enemy. + +**Recommended fix direction (re-derived, not averaged):** the lever is **inside the body, not the +framing**. Restructure to a review-spine + progressively disclose (fold) the skimmed framing; +**compress the prose inside the kept code** (lean why-only comments); make the body **scannable** +(per-change intent headers + consistent ordering); **expand** design-tension rationale and +pointer-ize the brainstorm duplication; reserve a companion read-view for the rare phase-heavy giant. +**Reframe the target from line count to read-time / scannability** — a framing trim cannot make a +code-heavy plan cheap, because the residue is code kept on purpose. + +--- + +## Evidence base & method + +| Pass | Corpus | Review-value signal | Reliability of review-value | +|---|---|---|---| +| **A — md-only** (markdown-deliverable repo) | 24 plan docs; template + sibling specs | No usable transcripts, no inline review comments | **Inferred** (hypothesis) | +| **B — code-repo** (real TypeScript/React repo) | 35 plan docs; template + sibling specs; 11 plan-on-screen execution/authoring transcripts; 1 hand-built HTML companion | What the reviewer reacts to vs skims with a plan on screen | **Observed** (behavioral) | + +Both passes used the **same section × (volume, code-vs-prose) taxonomy**, so the volume maps merge +directly. "Code" = lines inside ` ``` ` fences; "prose" = non-blank lines outside fences. Neither +pass mined PR/MR review threads — in both repos the plan docs are local and never appear in MRs, so +review-comment data is an *unresolved* source (see the reconciled table). The code-repo pass's 11 +plan-on-screen transcripts were the decisive direct signal and stand on their own. + +--- + +## (a) Combined volume map + +### By detail level + +| Level | n (A / B) | avg lines (A / B) | code % (A / B) | reconciled reading | +|---|---|---|---|---| +| MINIMAL | 6 / 10 | 179 / 251 | 22% / **51%** | near-ideal in both; mostly spine | +| STANDARD | 16 / 23 | 510 / 568 | 40% / **57%** | the workhorse level; majority code in a real repo | +| COMPREHENSIVE | 3 / 2 | 1,040 / 1,532 | 43% / 42% | rare; dominated by code-bearing phase bodies | + +**The code ratio flips, exactly as the reconciliation rule predicted (md-only undercounts real +code).** At the two common levels the real repo is **majority code** (51% / 57%) where the md-only +repo read as ~22% / 40% — because the md-only "code" was prose-like command specs. **Consequence: the +prose actually available to trim at the dominant level (STANDARD) is only ~150 lines total, roughly +half of it review-spine.** + +**`detail_level` does not predict volume (code-repo only — A could not see this).** STANDARD spanned +99 → 1,722 lines (17×); the largest STANDARD plans were 70–72% code and exceeded the md-only pass's +COMPREHENSIVE *average*. Level inflation runs the **opposite** way from A's claim: plans don't +over-select COMPREHENSIVE and absorb framing — they *under*-select it and cram comprehensive-scale +**code** into STANDARD. + +### By section (merged taxonomy) + +The striking merge result: **the standalone framing prose budget is roughly constant in absolute +lines across both repos (~80–100 lines/doc).** What changes is the denominator — in a code repo the +body dwarfs it. Both passes measured framing correctly; they only disagreed on its *share*. + +| Section | Bucket | Pass A signal (prose-ish) | Pass B per-doc prose (MIN / STD / COMP) | +|---|---|---|---| +| Changes Required / Phase bodies (impl body) | **SPINE** (code + impl prose) | ~⅔ of total volume, ~60% code | 15 / 45 / 362 **(+ code; 60–80% of painful plans)** | +| Behaviors to Test | **SPINE** | 351 ln / 14 docs | 10 / 14 / 47–93 | +| Success / Acceptance Criteria | **SPINE** | 293 ln / 15 docs | 9 / (nested) / (nested) | +| What We're NOT Doing | **SPINE** (scope) | 277 ln / all docs | 6 / 9 / 10 | +| Technical Approach / Considerations | framing | 126 ln / 16 docs | — / 18 / 32 | +| Current State | framing | 242 ln / 19 docs | — / 13 / 25 | +| Proposed Solution | framing (altitude-dup) | 207 ln / 19 docs | — / 9 / 11 | +| Overview | framing | 92 ln / 19 docs | — / 3 / 3 | +| Risk Analysis / Dependencies & Risks | framing | 159 ln / 18 docs | 3 / 6 / 12 | +| System-Wide Impact | framing (5 subs, mostly vestigial) | 135 ln / 19 docs | — / 5 / 14 | +| Testing Strategy / Documentation Plan | framing (often restates body) | in skim group | — / — / 15 | +| Convention Compliance | framing (audit artifact, all docs) | 359 ln / all docs | 10 / 12 / 19 | +| Sources / References | framing | 188 ln / 19 docs | 6 / 16 / 22 | + +**Trimmable framing budget (code-repo, measured):** ≈ **82 prose lines on a STANDARD doc** (~14%) and +≈ **98 on a COMPREHENSIVE doc** (~5% of a ~1,976-line plan). The md-only pass's own skim-tax estimate +was ~60–80 prose lines/doc — **the same order of magnitude.** The passes never actually disagreed on +how big the framing is; they disagreed on whether collapsing it solves the problem. + +### What generalizes vs what was a plugin-repo artifact + +| md-only claim | Generalizes? | +|---|---| +| "Code ≈ 40% of volume" | **Plugin artifact** — real code is 51–57% at MIN/STD; the figure held only because md "code" was specs | +| "Cost is framing, not code" (headline) | **Plugin artifact** — re-tested against real code blocks, the body *is* the cost | +| "COMPREHENSIVE-level inflation is the dominant driver" | **Plugin artifact** — small COMP sample + a repo habit of landing COMPREHENSIVE; inflation actually runs into STANDARD | +| "Target COMPREHENSIVE 1040→650–750 via framing collapse" | **Plugin artifact** — the math does not reach it once code blocks are real | +| Framing sections are low-scrutiny skim-tax | **Generalizes** (now observed) | +| Code/payload is kept on purpose; don't cut it | **Generalizes** | +| brainstorm→plan duplication of scope/alternatives | **Generalizes** | +| review-plan already polices plan-body LoC growth | **Generalizes** | +| MINIMAL is near-ideal; leave it | **Generalizes** | +| Review-spine-first + fold; pointer-ize brainstorm; intent headers | **Generalizes** (helps first-screen in both) | + +--- + +## (b) Reconciled review-value table + +Each prior claim marked **corroborated** (behavioral evidence confirms the inference), +**overturned** (behavioral evidence / real code contradicts it), or **unresolved**. + +| Claim | Status | What decided it | +|---|---|---| +| Code blocks are kept on purpose — don't cut them | **Corroborated** | Behavioral; and the constraint *extends* — the trim target is the prose/comments *inside* the code, not the code | +| Standalone framing sections are low review-value / skimmed | **Corroborated** | Observed in transcripts: framing is never engaged as content in any plan-on-screen session [internal evidence, withheld for specifics] | +| MINIMAL is near-ideal — leave it | **Corroborated** | Mostly code + ~75 spine prose lines; both repos | +| review-plan already polices plan-body LoC | **Corroborated** | Step 5.5 iteration/LoC gate confirmed in both repos | +| brainstorm→plan duplication of scope / alternatives / acceptance criteria | **Corroborated** | Step 0 ("carry forward ALL…") + Step 6 ("reflect every decision") force restatement; majority of plans are brainstorm-originated | +| **Cost lives in the framing tax (~10 prose sections)** | **Overturned** | Framing is ~5% of a COMP doc / ~14% of STANDARD — skimmed, but **not the volume** | +| **COMPREHENSIVE-level inflation is the dominant driver** | **Overturned** | Inflation runs *into* STANDARD (comprehensive-scale code crammed into STANDARD); COMP is rare | +| **Target: COMPREHENSIVE 1040→650–750 via framing collapse** | **Overturned** | Framing collapse buys ~5% on COMP / ~14% on STANDARD; the residue is code kept on purpose. Wrong target metric | +| Navigation (HTML companion) is a primary cost signal | **Revised (partially overturned)** | Real, but a single instance, on a ~489-line STANDARD plan — a read-comfort nicety, not a giant-doc rescue; secondary | +| Comment / JSDoc verbosity *inside* the code blocks is a cost | **New & dominant** | The #1 recurring review-cost friction in transcripts; structurally invisible to a markdown-deliverable repo [internal evidence, withheld for specifics] | +| Design-tension rationale is under-served (wants *more*) | **New** | The reviewer interrogates approach / alternatives in essentially every painful session; the template gives it ~5 prose lines | +| PR/MR review-comment behavior | **Unresolved** | Neither pass has it — A had none; B did not mine MRs and the plan docs are local. Likely low-impact for a local-doc workflow, but untested | +| Does the comment-verbosity lever generalize to non-code deliverables? | **Unresolved** | In a markdown repo the analog is verbose prose inside command specs; plausible but unmeasured. Stated as "lean prose inside the kept payload" | + +--- + +## (c) Where the review cost actually lives — final diagnosis + +1. **The implementation body is 60–80% of every painful plan, and it is code kept on purpose.** Line + count is intrinsic to the work; it cannot be trimmed away. *(Overturns A's "framing is the + volume.")* +2. **Comment / JSDoc verbosity inside that body is the #1 recurring review friction.** This is the + reconciliation of "don't cut code" with "cheaper to read": tighten the *prose in the code*, not + the code. *(New; invisible to the md-only pass.)* +3. **The standalone framing sections are genuinely skimmed but small (~5–14%).** Folding them improves + the first-screen review surface; trimming them does **not** solve the read-cost problem. + *(Corroborates A's review-value inference; overturns A's volume thesis.)* +4. **Design-tension rationale is thin and repeatedly demanded** — an *expansion* target, the one place + more prose lowers total review cost (it shortens the interrogation loop). *(New.)* +5. **Navigation of phase-heavy giant plans** is a real but occasional pain — secondary. *(Revises A.)* + +Code blocks are not the thing to cut — both passes agree. But unlike the md-only pass, the dominant +lever is **inside the body** (comments + scannability), not the framing around it. + +--- + +## (d) Recommended fix direction & evidence-based targets + +**Direction: restructure + progressive disclosure + compress-in-body + scannability + expand-rationale.** +Not a blanket trim, not a demotion nudge, not a line-count target. Ranked by leverage; all are +**form** changes; none cuts code, criteria, behaviors, or scope. + +1. **★ Compress the prose inside the authored code — lean, why-only comments (highest leverage; new + from code-repo).** Bake comment hygiene into the code the command emits: why-only, no branch + enumeration, no restating the renderer/contract, concise prose. Directly kills the #1 repeated + friction. *Target:* zero post-plan "your comment is too verbose" review turns. +2. **★ Per-change intent headers + consistent file ordering in the body.** The body is 60–80% of the + doc; a one-line `**File** — intent` per block and a fixed change order make it scan fast without + removing a line. *Target:* body becomes diff-shaped; read-*time* down with line-count ~flat. +3. **★ Review-spine-first + fold the skimmed framing (progressive disclosure).** Fixed top order: + `What We're NOT Doing` → `Behaviors to Test` → file-by-file changes → `Success Criteria`. Move + Risk / Testing Strategy / Documentation Plan / full System-Wide Impact / Sources / Convention + Compliance below a `
` fold. *Target:* spine in the first 1–2 screens; ~82 (STANDARD) / + ~98 (COMP) framing lines demoted, **not deleted**. +4. **★ Expand design-tension rationale; pointer-ize the brainstorm duplication.** Replace the + three-altitude Overview / Proposed Solution / Technical Approach narration with one short + **Approach** *plus* a real **design-tension** note (the fork actually taken and why). When a plan + is brainstorm-originated, cite the brainstorm section instead of re-deriving it — relax Step 6 from + "reflect every decision" to "link every decision." *Target:* fewer "what alternatives did you + consider?" turns. +5. **Auto-generate an HTML companion for phase-heavy giant plans only.** Reproduce the read-view the + reviewer hand-built, gated to ~1,000+ line / multi-phase plans. Secondary; navigation only. + +**Explicitly rejected:** (a) a blanket framing *trim* as the headline fix — it touches ~5–14% and +leaves the cost intact; (b) a COMPREHENSIVE→STANDARD demotion nudge — inflation runs the other way, +this would make it worse; (c) any line-count target premised on framing collapse — the math does not +reach it once code blocks are real. + +### Re-derived per-level targets (combined evidence, not an average of the two passes) + +| Level | Prior A target | **Reconciled target** | Reasoning | +|---|---|---|---| +| MINIMAL | leave as-is | **leave as-is** | Both passes agree — mostly code, ~75 spine prose lines, near-ideal | +| STANDARD | 510 → ~400 lines | **line count ~flat; first-screen surface ≈ spine only; in-code comments lean** | Trimmable framing is only ~14%; the real win is fold + lean comments + scannable body. A line target misrepresents it | +| COMPREHENSIVE | 1,040 → 650–750 lines | **No framing-collapse line target. Measure read-time. Fold ~98 framing lines; lean comments cut a meaningful share of the ~500 impl-adjacent prose lines on a giant plan; HTML companion for navigation** | 80% of a giant plan is code-bearing phase bodies kept on purpose. A ~2,000-line plan stays large; comment hygiene + scannability + fold lower read-cost without cutting code (~10–20% off via comment hygiene, not ~35%) | + +**Success metric:** "the review spine and a scannable body land first; comments are why-only; framing +is folded; design-tension is surfaced up front" — measured by **re-read time and post-plan correction +turns**, not raw line count. + +--- + +## Sibling division-of-labor check (carried from both passes — no new responsibility invented) + +- **brainstorm** owns decision framing: *Why This Approach*, *Key Decisions*, *Scope Boundaries*, + *Acceptance Criteria*, *Rejected Designs* / *Locked Design*, *Convention Compliance*. Fix #4 + *reduces* `plan`'s encroachment here (link, don't re-derive). +- **slice** owns delivery order / LoC counting / slice markers — annotates the plan **in place**. +- **review-plan** owns reviewer discovery/run/consolidate **and** the Plan-Iteration Discipline Check + (Step 5.5): it polices plan-body LoC *growth across rounds*, not *birth verbosity*. Fixes #1–#3 + *lower* the LoC it polices. Verbosity is already a recognized internal failure mode — these fixes + attack it at authoring time, where the gate cannot reach. + +All five fixes stay inside `plan`'s lane; none duplicates a sibling. + +--- + +## (e) Provenance — what each pass contributed, and where they disagreed + +**Pass A (md-only) contributed:** +- The reusable **section × volume × code/prose taxonomy** (both maps merge because of it). +- The **volume-by-section map** and the per-level cost curve. +- The **brainstorm→plan duplication** analysis (Step 0 + Step 6 seam) and the pointer-ize fix. +- The **review-spine + fold + intent-header** structural options. +- The "code kept on purpose" constraint framing and the sibling division-of-labor check. +- *Limitation:* review-value was **inferred** (no transcripts); its "code" was prose-like specs, so + it undercounted real code and over-weighted framing as the volume. + +**Pass B (code-repo) contributed:** +- **Behavioral confirmation** of review-value from transcripts (framing is skimmed; spine is + scrutinized) — turning A's hypotheses into findings. +- The **real code/prose ratio** (code share flips to majority at MIN/STD). +- The **comment/JSDoc-verbosity finding** — the #1 recurring friction, invisible to a markdown repo. +- The **design-rationale-under-served** finding (an expansion target). +- The **level-inflation-runs-into-STANDARD** correction and the "`detail_level` doesn't predict + volume" observation. +- The **navigation-is-rare** correction. +- *Limitation:* did not mine MR review threads (plans are local); single-instance HTML-companion data. + +**Where they disagreed (and who won):** +- *Is framing the volume?* A: yes. B: no (~5–14%). **B wins** — direct measurement on real code. +- *Is COMPREHENSIVE-inflation the driver?* A: yes. B: no, it runs into STANDARD. **B wins** — larger, + level-spanning corpus. +- *Is the 650–750 line target right?* A: yes. B: unachievable. **B wins** — the residue is code. +- *Is navigation a primary cost?* A: yes. B: secondary/occasional. **B wins** — single instance. +- *Are framing sections skim-tax?* Both: yes. **No conflict** — A inferred, B observed; promoted. + +--- + +## Constraints respected / out of scope + +- **Purpose untouched.** Only how/where content is presented; no redefinition of what a plan is for. +- **Code blocks preserved.** Made cheaper to *scan* (#2) and cheaper to *read* via lean comments + (#1); never cut. +- **No new sibling responsibility.** brainstorm still owns decision framing; slice still owns delivery + order; review-plan still owns the LoC/iteration gate. +- **Sanitization.** No internal repo/file/feature/ticket names; no verbatim transcript or + review-comment quotes. Internal-only support points are marked "[internal evidence, withheld]". + +## Suggested next step + +A `/ba:plan` template-revision plan implementing #1–#4 — comment-hygiene for authored code → body +intent-headers + consistent ordering → review-spine + fold → expand-rationale / pointer-ize the +brainstorm seam — with #5 (auto HTML companion for the rare phase-heavy giant) parked as a follow-up. +**Validate by re-rendering one giant COMPREHENSIVE plan and one mid STANDARD plan under the new +structure and timing a re-read — not by counting lines.** From 46af3d514909a6c3cfabc645777210b578c7f0b0 Mon Sep 17 00:00:00 2001 From: Bruno Azevedo Date: Tue, 16 Jun 2026 15:38:43 +0100 Subject: [PATCH 5/9] refactor(plan): default plan templates to decisions, code under Code-shape decision label Invert plan.md's keep-code mandate: templates and Key rules now default to decisions + pseudo-code; a literal code block is permitted only under a **Code-shape decision:** label. Wire Step 0 + Key rules to the brainstorm's ## Locked Design as the anchor for code-bearing blocks. Plan: docs/plans/2026-06-16-refactor-plan-code-justification-gate-plan.md Co-Authored-By: Claude Opus 4.8 (1M context) --- commands/ba/plan.md | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/commands/ba/plan.md b/commands/ba/plan.md index 5656f68..06da71e 100644 --- a/commands/ba/plan.md +++ b/commands/ba/plan.md @@ -44,6 +44,7 @@ ls -la docs/brainstorms/*.md 2>/dev/null | head -10 - Open questions (flag for resolution during planning) - Acceptance criteria - Convention compliance findings + - `## Locked Design` (if the brainstorm has one): reference it explicitly in any `**Code-shape decision:**` block — its interface/signatures/invariants/error modes are the anchor for code-bearing blocks. Do not paraphrase its signatures. This is *additive* — a code-shape block is still permitted with no brainstorm (it anchors to the plan's own Proposed Solution / research). 4. **Skip the idea refinement below** — the brainstorm already answered WHAT to build 5. **The brainstorm is the origin document.** Reference specific decisions with `(see brainstorm: docs/brainstorms/)` throughout the plan. Do not paraphrase decisions in a way that loses their original context. 6. Do not omit brainstorm content — if the brainstorm discussed it, the plan must address it. @@ -146,7 +147,7 @@ Select based on complexity. Simpler is mostly better. **Best for:** Simple bugs, small improvements, clear features with ≤3 files. -Sections: Problem/feature description, acceptance criteria, context, MVP code. +Sections: Problem/feature description, acceptance criteria, context, MVP (decisions + pseudo-code by default; literal code only under a `**Code-shape decision:**` label). #### STANDARD (Most Features) @@ -227,7 +228,8 @@ A Kent C. Dodds-style checklist of user-observable behaviors this plan must sati ### [filename.ext] ```language -[Actual code — not descriptions of code] +[Decisions: approach, exact paths, patterns to follow, pseudo-code for shape, test scenarios. +Add a literal code block only under a **Code-shape decision:** label — see "Key rules".] ``` ## Sources @@ -283,7 +285,8 @@ A Kent C. Dodds-style checklist of user-observable behaviors this plan must sati **File**: `exact/path/to/file.ext` ```language -[Actual code — not descriptions] +[Decisions: approach, exact paths, patterns to follow, pseudo-code for shape, test scenarios. +Add a literal code block only under a **Code-shape decision:** label — see "Key rules".] ``` ### Success Criteria @@ -351,7 +354,8 @@ A Kent C. Dodds-style checklist of user-observable behaviors this plan must sati #### Changes Required **File**: `exact/path/to/file.ext` ```language -[Actual code] +[Decisions: approach, exact paths, patterns to follow, pseudo-code for shape, test scenarios. +Add a literal code block only under a **Code-shape decision:** label — see "Key rules".] ``` #### Success Criteria @@ -419,11 +423,17 @@ A Kent C. Dodds-style checklist of user-observable behaviors this plan must sati **Key rules for all templates:** - Include **exact file paths** — never placeholders -- Include **actual code** — not descriptions of code +- **Default to decisions, not code** — approach, exact file paths, patterns to follow, pseudo-code for shape, and test scenarios. Include a literal code block ONLY under a `**Code-shape decision:** ` label. - Separate success criteria into **Automated** and **Manual** - Phase gates in COMPREHENSIVE: automated passes first, then pause for manual verification - Always include "What We're NOT Doing" +**Code-shape decision rule:** Add a literal code block + label only when re-deriving the shape from a prose decision would plausibly produce a *different, wrong* structure (a specific reducer state machine, a concurrency-sensitive ordering, a tricky query window). +- ✅ Positive: a state-machine reducer whose exact case/transition set is the decision → include it, labeled. +- ❌ Negative: a standard CRUD handler or an obvious mapping → describe it; do not include literal code. +- When unsure, include the code + label: a false-positive label costs a little review attention; a false-negative loses the design. +- **Anchor:** when a brainstorm origin exists, a code-shape block anchors to the brainstorm's `## Locked Design` (interface, signatures, invariants, error modes). With no brainstorm, it anchors to the plan's own Proposed Solution / research findings. + --- ## Step 5: Convention-Compliance Check @@ -532,7 +542,7 @@ Convention compliance: [N aligned, N overrides, N debt items] ## Important Guidelines - **Research before writing** — understand the codebase before proposing changes -- **Exact file paths and code** — never use placeholders +- **Exact file paths and decisions** — never use placeholders; literal code only under a `**Code-shape decision:**` label - **Separate automated and manual verification** — different audiences, different timing - **"What We're NOT Doing" at every level** — prevents scope creep - **Convention compliance is mandatory** — not optional, not skippable From 644b2a9e7f7c24d81b8f28e7c452cba39a24c47c Mon Sep 17 00:00:00 2001 From: Bruno Azevedo Date: Tue, 16 Jun 2026 15:40:27 +0100 Subject: [PATCH 6/9] refactor(execute,slice): consume Code-shape decision label, stay backward-safe Soften execute Step 2b to implement-to-decisions while keeping any literal code block binding-when-present (backward-safe for pre-change plans). Add a literal-vs-pseudo discriminator to Step 1.5b and slice LoC counting (label- preceded = literal; unlabeled fence = pseudo -> estimate; no labels anywhere = treat all fences as literal). Reconcile the plan-is-authority guideline. Plan: docs/plans/2026-06-16-refactor-plan-code-justification-gate-plan.md Co-Authored-By: Claude Opus 4.8 (1M context) --- commands/ba/execute.md | 14 +++++++------- commands/ba/slice.md | 4 ++-- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/commands/ba/execute.md b/commands/ba/execute.md index 5f762e0..f1f4324 100644 --- a/commands/ba/execute.md +++ b/commands/ba/execute.md @@ -134,11 +134,13 @@ Do **not** include files you "might also touch" — only files the slice's tasks ### 1.5b. Project LoC per file +A fenced block counts as **literal code** only when it is immediately preceded by a `**Code-shape decision:**` label; any unlabeled fence is pseudo-code. Exception: if the plan has no `**Code-shape decision:**` labels anywhere (a pre-change plan), treat every fenced block as literal. + For each file in the list: -- **Plan provides code in this slice's tasks**: count the lines of the provided code block. -- **No code in plan, file exists**: estimate the diff size from the task description; reference similar implementations in the codebase if needed. -- **New file, no code in plan**: estimate from the closest analogue (similar new files in this codebase). +- **Plan provides a literal code block for this file** (fence under a `**Code-shape decision:**` label, or any fence in a pre-change plan with no labels at all): count the lines of that block. +- **Decisions/pseudo-code only, file exists**: estimate the diff size from the task description; reference similar implementations in the codebase if needed. +- **New file, decisions only**: estimate from the closest analogue (similar new files in this codebase). Sum the per-file estimates. Call this the **projection** (M). @@ -206,9 +208,7 @@ For COMPREHENSIVE plans, also announce phase transitions: "**--- Phase [N]: [Pha ### 2b. Implement -Read the plan's code for this task and implement it. Follow the plan exactly — it has already been reviewed and approved. - -If the plan provides actual code, use it. If the plan describes the change without full code, implement it following existing codebase patterns. +Implement the plan's decisions for this task. Where the plan provides a literal code block, implement that code as specified — it captures a committed decision (a `**Code-shape decision:**` block makes this explicit; any literal code block in a plan is binding). Where the plan gives decisions, pseudo-code, or descriptions, implement to them following existing codebase patterns. Where a literal code block and prose both address the same file or function, the code block governs the structure; the prose is context. When rewriting an existing file, read the original first and carry over any WHY comments (non-obvious rationale, workarounds, invariant explanations) that are not reproduced in the plan's code block but are not explicitly removed by the plan. Plan code samples are structural references, not complete comment inventories. @@ -440,7 +440,7 @@ Use **AskUserQuestion**: ## Important Guidelines -- **The plan is the authority.** Follow it. Don't add features, refactor surrounding code, or "improve" things beyond what the plan specifies. +- **The plan's decisions are the authority.** Literal code blocks are authoritative verbatim; implement everything else to the plan's decisions. Don't add features, refactor surrounding code, or invent build choices the plan deliberately left as decisions. - **Track progress in the plan file.** Every completed task gets `[x]`. This is how resume works. - **Test after every task — targeted, not full suite.** Run tests related to changed files. Defer full suite + lint to completion or CI. - **Report deviations immediately.** Don't silently work around plan/reality mismatches. diff --git a/commands/ba/slice.md b/commands/ba/slice.md index 5baa2bb..7b8b85f 100644 --- a/commands/ba/slice.md +++ b/commands/ba/slice.md @@ -69,9 +69,9 @@ Based on detail level, extract the task list and estimate LoC for each: ### LoC Counting Rules -- Count only lines inside code fences (triple-backtick blocks) +- Count only lines inside **literal** code fences. A fence counts as literal only when it is immediately preceded by a `**Code-shape decision:**` label; any unlabeled fence is pseudo-code — do not line-count it, fall through to the estimate rule below. **Backward-compat:** if the plan has no `**Code-shape decision:**` labels anywhere (a pre-change plan), treat every fenced block as literal and count it — otherwise old plans route all code to the estimate fallback and slice under-sizes MRs. (Mirror execute Step 1.5b's routing.) - Exclude test file changes (files matching: `*.test.*`, `*.spec.*`, `*_test.*`, `test_*.*`, files under `tests/`, `__tests__/`, `test/`) -- If a task has no code fence, estimate conservatively (~30-50 LoC per described file change) and mark as "est. approximate" +- If a task has no literal code fence (pseudo-code/decisions only), estimate conservatively (~30-50 LoC per described file change) and mark as "est. approximate" - Track total estimated LoC across all tasks ### Single-Slice Check From 2d184ab2e7f3b70ae89a79fc5f75a8c539c1aabd Mon Sep 17 00:00:00 2001 From: Bruno Azevedo Date: Tue, 16 Jun 2026 15:42:34 +0100 Subject: [PATCH 7/9] refactor(docs): document Code-shape decision label, add sync convention, bump version Soften README /ba:plan description (code -> decisions + labeled code-shape), add the cross-file label-sync convention to CLAUDE.md (plan/execute/slice/ README mirrors), bump plugin.json 0.23.0 -> 0.24.0, and land the executed plan. Plan: docs/plans/2026-06-16-refactor-plan-code-justification-gate-plan.md Co-Authored-By: Claude Opus 4.8 (1M context) --- .claude-plugin/plugin.json | 2 +- CLAUDE.md | 1 + README.md | 2 +- ...actor-plan-code-justification-gate-plan.md | 301 ++++++++++++++++++ 4 files changed, 304 insertions(+), 2 deletions(-) create mode 100644 docs/plans/2026-06-16-refactor-plan-code-justification-gate-plan.md diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json index 6b4cd50..5473e17 100644 --- a/.claude-plugin/plugin.json +++ b/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "dev-workflow", - "version": "0.23.0", + "version": "0.24.0", "description": "Research, brainstorm, plan, slice, execute, review, compound, and propose commands with triage, convention compliance, and knowledge compounding", "author": { "name": "Bruno Azevedo" diff --git a/CLAUDE.md b/CLAUDE.md index bc7049e..9ce8ae9 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -79,5 +79,6 @@ Claude Code plugin providing brainstorm and plan commands with triage, conventio - Agents may declare `tools` in frontmatter to restrict available tools (e.g., locator agents use Grep, Glob, LS only — no Read) - `/ba:review` selection is a stateless per-diff judgment — every reviewer (built-in and discovered external) appears in the **selection ledger** each run, selected or set aside with a one-line reason, and is reachable via **Adjust**. Reviewers are never silently dropped; no selection state is persisted. (This never-hide convention is mirrored in `README.md` and `commands/ba/review.md` Step 2 — keep them in sync.) - `/ba:review` dispatches reviewer subagents with a protected-artifacts guard naming `docs/brainstorms/`, `docs/plans/`, `docs/solutions/`, `docs/research/`, and `docs/reviews/` — reviewers must not suggest deleting, relocating, or otherwise removing files under these roots (content review is unaffected) +- Plan documents default to **decisions** (approach, exact file paths, patterns, pseudo-code for shape, test scenarios); a literal code block is permitted only under a `**Code-shape decision:** ` label. The label wording is mirrored across `commands/ba/plan.md` ("Key rules for all templates" trigger block), `commands/ba/execute.md` (Step 2b + Step 1.5b LoC projection), `commands/ba/slice.md` (LoC Counting Rules), and `README.md` (`/ba:plan` description) — keep them in sync. (This convention covers the *label* only; the `## Locked Design` anchor it references is owned by `commands/ba/brainstorm.md`.) - Update README.md whenever commands, agents, or artifact paths are added or changed - Git workflow commands (`ba:propose`) commit, push, and open PR/MR — they never modify source files outside the staged diff diff --git a/README.md b/README.md index 6bcaf1d..68f4445 100644 --- a/README.md +++ b/README.md @@ -98,7 +98,7 @@ Brainstorm docs are saved to `docs/brainstorms/YYYY-MM-DD--brainstorm.md` ### `/ba:plan [feature]` -Transforms feature descriptions into implementation plans with exact file paths and code. +Transforms feature descriptions into implementation plans with exact file paths and decisions (literal code only under a `**Code-shape decision:**` label, where the code's shape is the design decision). - **Auto-detects brainstorms** — searches `docs/brainstorms/` for recent (14-day) topic-matched docs and carries forward all decisions - **Parallel research** — dispatches agents to analyze codebase patterns and search prior learnings diff --git a/docs/plans/2026-06-16-refactor-plan-code-justification-gate-plan.md b/docs/plans/2026-06-16-refactor-plan-code-justification-gate-plan.md new file mode 100644 index 0000000..d6bef13 --- /dev/null +++ b/docs/plans/2026-06-16-refactor-plan-code-justification-gate-plan.md @@ -0,0 +1,301 @@ +--- +title: Justification-Gated Code in /ba:plan +type: refactor +status: completed +date: 2026-06-16 +origin: docs/brainstorms/2026-06-15-plan-code-justification-gate-brainstorm.md +detail_level: standard +iteration_count: 1 +tags: [plan, execute, slice, verbosity, keep-code, code-shape-decision] +--- + +# Justification-Gated Code in /ba:plan Implementation Plan + +## Overview + +Change the dev-workflow planning pipeline so plan documents default to **decisions** (approach, exact file +paths, patterns-to-follow, pseudo-code for shape, test scenarios) and permit a **literal code block only where +the code's shape *is* the design decision** — flagged with an explicit `**Code-shape decision:** ` label. +This reframes the purpose of code in a plan from *rework-prevention* (unmeasured, already half-disabled by +`/ba:execute`) to *design validation* — a forcing function applied precisely where the design is non-obvious. +It is a form/contract change to existing command prompts, not a redefinition of what a plan is for (see +brainstorm: docs/brainstorms/2026-06-15-plan-code-justification-gate-brainstorm.md). + +## Current State + +The keep-code mandate is asserted in several independent places and consumed downstream: + +- **plan.md** mandates literal code in all three template tiers and at the rule level: + - MINIMAL detail-level description: `commands/ba/plan.md:149` — "Sections: ... context, MVP code." + - Template placeholders: `commands/ba/plan.md:230` ("[Actual code — not descriptions of code]"), + `:286` ("[Actual code — not descriptions]"), `:354` ("[Actual code]"). + - "Key rules for all templates": `commands/ba/plan.md:422` — "Include **actual code** — not descriptions of code". + - "Important Guidelines": `commands/ba/plan.md:535` — "**Exact file paths and code** — never use placeholders". + - Step 0 brainstorm carry-forward list (`commands/ba/plan.md:39-46`) does **not** name `## Locked Design`. +- **execute.md** already half-tolerates code-light plans but frames the plan as follow-exactly: + - Step 2b: `commands/ba/execute.md:209` ("Follow the plan exactly — it has already been reviewed and approved"), + `:211` (forks on "provides actual code" vs "describes the change without full code"), + `:213` ("Plan code samples are structural references, not complete comment inventories"). + - Step 1.5b LoC projection: `commands/ba/execute.md:139-141` (counts literal code-block lines; estimates when "no code in plan"). + - Standing guideline: `commands/ba/execute.md:443` — "**The plan is the authority.** Follow it." +- **slice.md** sizes tasks by counting code-fence lines (`commands/ba/slice.md:66-68`, `:72`), with a graceful + no-fence fallback at `:74` ("estimate conservatively (~30-50 LoC per described file change)"). +- **README.md:101** — "Transforms feature descriptions into implementation plans **with exact file paths and code**." +- **`## Locked Design`** is authored by `commands/ba/brainstorm.md:272` (and gated at `:303`/`:305`) and already + consumed by `commands/ba/propose.md:295,423,441`. It contains interface, signatures, invariants, error modes, + and a usage example — the pipeline's existing decisions-only "shape" artifact. +- **No contract break needed** in `review-plan.md` or `plan-iteration-gate` — both count plan-body LoC + shape-agnostically (`review-plan.md:216-220`, `plan-iteration-gate.md:38`); the only effect is a smaller LoC + number when code becomes prose. `convention-checker` does **not** mandate literal code (it lists "code patterns" + as a convention *category* at `convention-checker.md:127`, not a presence requirement) — verified, no edit needed. + +## What We're NOT Doing + +- **NOT** redefining what a plan is *for* — purpose untouched; form/contract only (see brainstorm: Scope Boundaries). +- **NOT** going pure decisions-only — the code-shape carve-out stays. +- **NOT** measuring rework-prevention — moot under the design-validation reframe. +- **NOT** changing `/ba:review-plan` (deferred — orthogonal, and the hybrid changes its inputs; decide its fate + after the hybrid lands). +- **NOT** changing the MINIMAL/STANDARD/COMPREHENSIVE tier *purposes* — decisions-default applies across all tiers + (tiers scale context/phasing, not the presence of code). +- **NOT** editing `brainstorm.md` or `propose.md` — `## Locked Design` is owned by brainstorm.md and already + correctly consumed by propose.md; plan.md only *references* it. (Confirmed by the convention check — those files + need no contract change.) +- **NOT** recalibrating the scope-tripwire / estimate threshold — flagged as a follow-up to observe whether + estimate-dominated projections under-fire; out of scope here. Note the two consumers have different exposure: + execute's tripwire (`T = 2 × N`) only degrades on decisions-only steps where estimation was already the fallback, + whereas `/ba:slice` now estimates on the *common* path and its low fallback (`~30-50 LoC`) directly drives MR + sizing — so slice is the more urgent recalibration candidate when this follow-up is taken up. + +## Behaviors to Test + +- [ ] A plan authored by `/ba:plan` defaults to decisions + exact file paths + patterns + pseudo-code + test + scenarios, with no literal implementation code unless a `**Code-shape decision:**` label is present. +- [ ] A `**Code-shape decision:**` block appears only where re-deriving the shape from prose would plausibly + produce a different, wrong structure; the rule's example pair makes the boundary unambiguous. +- [ ] When a brainstorm origin exists, a code-shape block names the brainstorm's `## Locked Design` as its anchor. +- [ ] When no brainstorm exists, a code-shape block is still permitted and anchors to the plan's own Proposed + Solution / research findings (the MINIMAL/standalone case is not gutted to prose-only). +- [ ] `/ba:execute` on a **pre-change plan** (all literal code, no labels) still implements that code as written — + no fidelity regression. +- [ ] `/ba:execute` on a **new decisions-default plan** writes code from the decisions and reuses literal code + verbatim only where a code block is present. +- [ ] `/ba:execute` Step 1.5b and `/ba:slice` estimate diff size from the description for pseudo-code/decisions and + line-count only literal code blocks (those preceded by a `**Code-shape decision:**` label) — the scope-tripwire + still fires. +- [ ] `/ba:slice` on a pre-change plan (no labels anywhere, all literal code) line-counts every fenced block — it + does not route old plans to the low estimate fallback and under-size MRs. +- [ ] The softened execute framing does not contradict "the plan is the authority": decisions remain binding, + literal code blocks remain authoritative verbatim, and execute does not invent build choices around them. + +## Proposed Solution + +Invert the keep-code default everywhere it is asserted, introduce a single self-policing trigger (the +`**Code-shape decision:**` label), wire plan.md to brainstorm's `## Locked Design`, and make the downstream +consumers (execute, slice) backward-safe and pseudo-code-aware. The label is the load-bearing mechanism: it is +self-policing, works with or without a brainstorm, and simultaneously fills the under-served "design rationale" +gap the verbosity research identified (see brainstorm: Key Decisions). + +**Backward-compatibility strategy (resolves the largest gap):** rather than a frontmatter flag, make execute treat +**any literal code block as binding when present**. For pre-change plans (all literal code) this preserves full +fidelity; for new plans — which under the authoring rule carry literal code *only* under a `**Code-shape +decision:**` label — "any literal code is binding" is equivalent to "code-shape blocks are binding." The two +cases unify with no version marker required. + +## Technical Considerations + +- **Trigger precision is the main risk.** A judgment-call label over-applies (justify everything → status quo with + noise) or under-applies (drop a genuinely non-obvious shape → execute re-derives it wrong). Mitigation: the rule + ships with one positive example, one negative example, and a tiebreaker ("when unsure, include + label — a + false-positive label costs review attention; a false-negative loses the design"). +- **Pseudo-code must not be line-counted as literal code** in execute Step 1.5b and slice LoC estimation, or the + scope-tripwire projection is biased low. Route pseudo-code/decisions to estimate-from-description; line-count only + literal code blocks. +- **Cross-file sync.** The `**Code-shape decision:**` label wording must stay identical across plan.md (author), + execute.md (consume), and README.md (describe). Add a CLAUDE.md convention naming these mirrors, following the + existing `/ba:review` "keep them in sync" precedent (`CLAUDE.md:80`). + +## System-Wide Impact + +- **Interaction graph**: `/ba:plan` (author) → plan doc → consumed by `/ba:execute` (implement), `/ba:slice` + (size tasks), `/ba:review-plan` (review), `plan-iteration-gate` (LoC discipline). Only plan.md, execute.md, + slice.md change behavior; review-plan and the gate are shape-agnostic and unaffected. +- **Error propagation**: the only failure mode is a mis-applied label; the example pair + tiebreaker bound it. + No runtime/source-code error path is touched (these are prompt files). +- **State lifecycle risks**: pre-change plans on disk are the partial-state risk — addressed by the + "literal code is binding when present" rule so old plans execute unchanged. + +## Implementation Approach + +All edits are to Markdown command-prompt files plus one JSON version bump. The exact target wording is itself the +deliverable — for a prompt-engineering change the wording *is* the design decision, so each edit below gives the +precise replacement text. + +### Changes Required + +**File**: `commands/ba/plan.md` + +**Code-shape decision:** the prompt wording is the artifact; the exact phrasing of the label rule, examples, and +tiebreaker determines whether the model over/under-applies the trigger, so the literal text is the design. + +1. **MINIMAL detail-level description** (`:149`) — replace "context, MVP code." with wording that makes MVP + decisions-default: e.g. "context, MVP (decisions + pseudo-code by default; literal code only under a + `**Code-shape decision:**` label)." +2. **Template placeholders** (`:230`, `:286`, `:354`) — replace each `[Actual code ...]` placeholder with a + decisions-default placeholder, e.g.: + ``` + [Decisions: approach, exact paths, patterns to follow, pseudo-code for shape, test scenarios. + Add a literal code block only under a **Code-shape decision:** label — see "Key rules".] + ``` +3. **"Key rules for all templates"** (`:420-425`) — replace line 422 ("Include **actual code** — not descriptions + of code") with the justification-gated rule, and append the trigger spec block immediately after the list: + ``` + - **Default to decisions, not code** — approach, exact file paths, patterns to follow, pseudo-code for + shape, and test scenarios. Include a literal code block ONLY under a `**Code-shape decision:** ` label. + + **Code-shape decision rule:** Add a literal code block + label only when re-deriving the shape from a prose + decision would plausibly produce a *different, wrong* structure (a specific reducer state machine, a + concurrency-sensitive ordering, a tricky query window). + - ✅ Positive: a state-machine reducer whose exact case/transition set is the decision → include it, labeled. + - ❌ Negative: a standard CRUD handler or an obvious mapping → describe it; do not include literal code. + - When unsure, include the code + label: a false-positive label costs a little review attention; a + false-negative loses the design. + - **Anchor:** when a brainstorm origin exists, a code-shape block anchors to the brainstorm's + `## Locked Design` (interface, signatures, invariants, error modes). With no brainstorm, it anchors to the + plan's own Proposed Solution / research findings. + ``` +4. **"Important Guidelines"** (`:535`) — soften "**Exact file paths and code** — never use placeholders" to + "**Exact file paths and decisions** — never use placeholders; literal code only under a `**Code-shape + decision:**` label". +5. **Step 0 carry-forward list** (`:39-46`) — add a bullet that, *when a brainstorm origin exists*, code-shape + blocks **additionally** anchor to its `## Locked Design` (interface/signatures/invariants/error modes) — phrased + as additive, not as a gate. A code-shape block is still permitted with no brainstorm (it anchors to the plan's + own Proposed Solution / research). Distinguish `## Locked Design` from the prose carry-forward items already in + the list: reference it explicitly in code-shape blocks; do not paraphrase its interface signatures. + +**File**: `commands/ba/execute.md` + +**Code-shape decision:** the binding-when-present rule is the backward-compat contract; its exact phrasing +determines whether old plans regress, so the literal replacement text matters. + +6. **Step 2b** (`:209-213`) — soften "Follow the plan exactly — it has already been reviewed and approved" to an + implement-to-decisions framing, and make literal code binding-when-present: + ``` + Implement the plan's decisions for this task. Where the plan provides a literal code block, implement that + code as specified — it captures a committed decision (a `**Code-shape decision:**` block makes this explicit; + any literal code block in a plan is binding). Where the plan gives decisions, pseudo-code, or descriptions, + implement to them following existing codebase patterns. Where a literal code block and prose both address the + same file or function, the code block governs the structure; the prose is context. + ``` + Keep `:213` (carry over WHY comments not reproduced in the plan) — it stays coherent under the new wording. +7. **Step 1.5b LoC projection** (`:139-141`) — route pseudo-code/decisions to estimate-from-description; line-count + only literal code blocks; flip the default branch. A fenced block is **literal** only when it is immediately + preceded by a `**Code-shape decision:**` label; any unlabeled fence is pseudo-code. Exception: if the plan has + no `**Code-shape decision:**` labels anywhere (a pre-change plan), treat every fenced block as literal: + ``` + - **Plan provides a literal code block for this file** (fence under a `**Code-shape decision:**` label, or any + fence in a pre-change plan with no labels at all): count the lines of that block. + - **Decisions/pseudo-code only, file exists**: estimate the diff size from the description; reference similar + implementations if needed. + - **New file, decisions only**: estimate from the closest analogue. + ``` +8. **Standing guideline** (`:443`) — reconcile "**The plan is the authority.** Follow it." with decisions-default: + ``` + - **The plan's decisions are the authority.** Literal code blocks are authoritative verbatim; implement + everything else to the plan's decisions. Don't add features, refactor surrounding code, or invent build + choices the plan deliberately left as decisions. + ``` + +**File**: `commands/ba/slice.md` + +9. **LoC counting rules** (`:66`, `:72-74`) — clarify that pseudo-code in fences is not literal code: only literal + code blocks (fences immediately preceded by a `**Code-shape decision:**` label) are line-counted; pseudo-code/ + decision prose uses the estimate fallback already at `:74`. **Backward-compat:** if the plan has no + `**Code-shape decision:**` labels anywhere (a pre-change plan), treat every fenced block as literal and + line-count it — otherwise old plans route all code to the low estimate fallback and slice under-sizes MRs. + Mirror execute Step 1.5b's routing exactly so the two stay consistent. + +**File**: `README.md` + +10. **Line 101** — replace "with exact file paths and code" with "with exact file paths and decisions (literal + code only where the code's shape is the design decision)". + +**File**: `.claude-plugin/plugin.json` + +11. **Version** — bump `0.23.0` → `0.24.0` (behavioral contract change; it is the auto-update cache key — do not + defer). + +**File**: `CLAUDE.md` + +12. **Add a cross-file sync convention** (mirroring the `:80` precedent) naming the `**Code-shape decision:**` + label mirror locations: `commands/ba/plan.md` ("Key rules" trigger block), `commands/ba/execute.md` (Step 2b + and Step 1.5b), `commands/ba/slice.md` (LoC classification rule), and `README.md` (`/ba:plan` description) — + keep them in sync. Scope this convention to the *label* only; the `## Locked Design` anchor is owned by + `commands/ba/brainstorm.md` and is referenced, not redefined, here. + +### Success Criteria + +#### Automated: +- [x] `grep -n "actual code" commands/ba/plan.md` — returns nothing (mandate removed at `:230/:286/:354/:422`). +- [x] `grep -rn "Code-shape decision" commands/ba/plan.md commands/ba/execute.md commands/ba/slice.md README.md` — + label present and identically worded in all four. +- [x] `grep -n '"version"' .claude-plugin/plugin.json` — shows `0.24.0`. +- [x] `grep -n "Locked Design" commands/ba/plan.md` — Step 0 names it as a source. +- [x] `grep -n "exact file paths and code" README.md` — returns nothing (line 101 softened). + +#### Manual: +- [ ] Re-render one STANDARD and one COMPREHENSIVE plan under the new rule: code volume drops to only justified + blocks, and the design "why" now appears where code used to (brainstorm Validation criterion). +- [ ] Dry-read execute Step 2b against an existing pre-change plan in `docs/plans/` — confirm its literal code is + still treated as binding. +- [ ] Confirm the `**Code-shape decision:**` example pair reads unambiguously (a reviewer can classify a sample + block correctly using only the positive/negative examples + tiebreaker). + +## Dependencies & Risks + +- **Risk: trigger drift.** Mitigated by the example pair + tiebreaker; revisit if re-rendered plans show + over/under-application. +- **Risk: label fatigue.** The include-when-unsure tiebreaker biases toward applying the label, which over time can + normalize it and weaken its signal value (if every block is labeled, the label stops marking non-obvious shape). + Acceptable for a first version — the example pair bounds the clearest cases; named here as a known drift risk to + watch, not mitigated with extra mechanism. +- **Risk: scope-tripwire under-fires** on estimate-dominated plans (estimates may run low vs. counted code). + Accepted for this change; threshold recalibration is a flagged follow-up, not bundled (see What We're NOT Doing). +- **Dependency: none external.** All edits are self-contained prompt/JSON files. Prompt-only changes ship on a + dry-run; no real-harness integration test gates this merge (per standing practice). + +## Sources & References + +- Origin brainstorm: `docs/brainstorms/2026-06-15-plan-code-justification-gate-brainstorm.md` — key decisions + carried forward: hybrid (decisions-default, code-by-exception), justification-gated trigger, anchor to + `## Locked Design`, purpose reframe to design-validation, plan stays the authority, review-plan deferred. +- Consolidated verbosity research: `docs/research/2026-06-14-plan-verbosity-consolidated-research.md` — implementation + body is 60–80% of every painful plan; #1 review-cost driver is prose inside code. +- Current state: `commands/ba/plan.md:149,230,286,354,422,535`, `:39-46`; `commands/ba/execute.md:209-213,139-141,443`; + `commands/ba/slice.md:66-74`; `README.md:101`; `.claude-plugin/plugin.json:3`. +- `## Locked Design` definition: `commands/ba/brainstorm.md:272,303,305`; consumers: `commands/ba/propose.md:295,423,441`. +- Sync precedent: `CLAUDE.md:80` (`/ba:review` "keep them in sync"). + +## Deviations + +### Task 4 (README edit 10): wording adjusted to carry the label token +- **Expected**: Edit 10's illustrative text "…where the code's shape is the design decision" (no `**Code-shape decision:**` token). +- **Found**: That wording fails AC2 (label "identically worded in all four" files) and makes edit-12's sync convention — which names README as a label mirror — untruthful. +- **Why**: Edit 10's phrasing predates the review round that added README to the sync mirror set; intent (two sources) is that README carries the label. +- **Resolution**: accepted — README now reads "literal code only under a `**Code-shape decision:**` label, where the code's shape is the design decision", satisfying AC2 and the sync convention. + +## Convention Compliance + +- [x] **Planning commands never write code** (`CLAUDE.md:75`) — aligned; rule governs command runtime, not plan-doc + content. A decisions-default plan is *less* code-bearing than today, so cannot newly violate it. The change + strengthens plan.md's documenter role. +- [x] **Update README.md on behavior change** (`CLAUDE.md:81`) — aligned; edit 10 softens the one code-bearing claim. +- [x] **Bump plugin.json version** (`CLAUDE.md:79`) — aligned; edit 11 bumps `0.23.0` → `0.24.0` in the same change. +- [x] **Cross-file sync discipline** (`CLAUDE.md:80`) — aligned; edit 12 adds a convention scoped to the + `**Code-shape decision:**` label across plan.md/execute.md/slice.md/README.md, naming section anchors per the + precedent. +- [x] **`## Locked Design` mirror completeness** — resolved: the new sync convention is scoped to the label only; + the anchor stays owned by brainstorm.md and consumed by propose.md (no contract change there), so the convention + is not factually incomplete. +- [x] **No new artifact files** — all 12 edits modify existing files; frontmatter/agent-naming/namespace conventions N/A. From d56f5bb315251ee05c1922a4194058febf22aa7d Mon Sep 17 00:00:00 2001 From: Bruno Azevedo Date: Tue, 16 Jun 2026 16:42:36 +0100 Subject: [PATCH 8/9] refactor: apply code-review fixes to Code-shape decision contract Resolve two High internal contradictions and four Medium consistency nits from post-implementation review: - execute.md Step 2b: reference Step 1.5b's literal/pseudo classification instead of implying every fence binds. - slice.md detail-level bullets: defer to the LoC Counting Rules instead of 'count lines in code fences' (which contradicted the literal-only rule). - execute.md Step 1.5b: render the 3-case classification as a list. - plan.md template placeholders: carry the suffix. - slice.md: move the mirror pointer to a maintainer HTML comment. - CLAUDE.md: name the template placeholders as label mirror targets. Plan: docs/plans/2026-06-16-refactor-plan-code-justification-gate-plan.md Co-Authored-By: Claude Opus 4.8 (1M context) --- CLAUDE.md | 2 +- commands/ba/execute.md | 8 ++++++-- commands/ba/plan.md | 6 +++--- commands/ba/slice.md | 8 ++++---- 4 files changed, 14 insertions(+), 10 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 9ce8ae9..6cc7671 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -79,6 +79,6 @@ Claude Code plugin providing brainstorm and plan commands with triage, conventio - Agents may declare `tools` in frontmatter to restrict available tools (e.g., locator agents use Grep, Glob, LS only — no Read) - `/ba:review` selection is a stateless per-diff judgment — every reviewer (built-in and discovered external) appears in the **selection ledger** each run, selected or set aside with a one-line reason, and is reachable via **Adjust**. Reviewers are never silently dropped; no selection state is persisted. (This never-hide convention is mirrored in `README.md` and `commands/ba/review.md` Step 2 — keep them in sync.) - `/ba:review` dispatches reviewer subagents with a protected-artifacts guard naming `docs/brainstorms/`, `docs/plans/`, `docs/solutions/`, `docs/research/`, and `docs/reviews/` — reviewers must not suggest deleting, relocating, or otherwise removing files under these roots (content review is unaffected) -- Plan documents default to **decisions** (approach, exact file paths, patterns, pseudo-code for shape, test scenarios); a literal code block is permitted only under a `**Code-shape decision:** ` label. The label wording is mirrored across `commands/ba/plan.md` ("Key rules for all templates" trigger block), `commands/ba/execute.md` (Step 2b + Step 1.5b LoC projection), `commands/ba/slice.md` (LoC Counting Rules), and `README.md` (`/ba:plan` description) — keep them in sync. (This convention covers the *label* only; the `## Locked Design` anchor it references is owned by `commands/ba/brainstorm.md`.) +- Plan documents default to **decisions** (approach, exact file paths, patterns, pseudo-code for shape, test scenarios); a literal code block is permitted only under a `**Code-shape decision:** ` label. The label wording is mirrored across `commands/ba/plan.md` ("Key rules for all templates" trigger block **and** the three template placeholders), `commands/ba/execute.md` (Step 2b + Step 1.5b LoC projection), `commands/ba/slice.md` (LoC Counting Rules), and `README.md` (`/ba:plan` description) — keep them in sync. (This convention covers the *label* only; the `## Locked Design` anchor it references is owned by `commands/ba/brainstorm.md`.) - Update README.md whenever commands, agents, or artifact paths are added or changed - Git workflow commands (`ba:propose`) commit, push, and open PR/MR — they never modify source files outside the staged diff diff --git a/commands/ba/execute.md b/commands/ba/execute.md index f1f4324..92b0466 100644 --- a/commands/ba/execute.md +++ b/commands/ba/execute.md @@ -134,7 +134,11 @@ Do **not** include files you "might also touch" — only files the slice's tasks ### 1.5b. Project LoC per file -A fenced block counts as **literal code** only when it is immediately preceded by a `**Code-shape decision:**` label; any unlabeled fence is pseudo-code. Exception: if the plan has no `**Code-shape decision:**` labels anywhere (a pre-change plan), treat every fenced block as literal. +Classify each fenced block as **literal** or **pseudo-code** before counting (three cases): + +- **Fence under a `**Code-shape decision:**` label** → literal. +- **Unlabeled fence, in a plan that has at least one `**Code-shape decision:**` label** → pseudo-code. +- **Any fence, in a plan with no `**Code-shape decision:**` labels anywhere** (a pre-change plan) → literal. For each file in the list: @@ -208,7 +212,7 @@ For COMPREHENSIVE plans, also announce phase transitions: "**--- Phase [N]: [Pha ### 2b. Implement -Implement the plan's decisions for this task. Where the plan provides a literal code block, implement that code as specified — it captures a committed decision (a `**Code-shape decision:**` block makes this explicit; any literal code block in a plan is binding). Where the plan gives decisions, pseudo-code, or descriptions, implement to them following existing codebase patterns. Where a literal code block and prose both address the same file or function, the code block governs the structure; the prose is context. +Implement the plan's decisions for this task. Where the plan provides a literal code block — classified per Step 1.5b (a fence under a `**Code-shape decision:**` label, or any fence in a pre-change plan with no labels anywhere) — implement that code as specified; it captures a committed decision and is binding verbatim. Where the plan gives decisions, pseudo-code, or unlabeled fences (pseudo-code), implement to them following existing codebase patterns. Where a literal code block and prose both address the same file or function, the code block governs the structure; the prose is context. When rewriting an existing file, read the original first and carry over any WHY comments (non-obvious rationale, workarounds, invariant explanations) that are not reproduced in the plan's code block but are not explicitly removed by the plan. Plan code samples are structural references, not complete comment inventories. diff --git a/commands/ba/plan.md b/commands/ba/plan.md index 06da71e..e271e8d 100644 --- a/commands/ba/plan.md +++ b/commands/ba/plan.md @@ -229,7 +229,7 @@ A Kent C. Dodds-style checklist of user-observable behaviors this plan must sati ```language [Decisions: approach, exact paths, patterns to follow, pseudo-code for shape, test scenarios. -Add a literal code block only under a **Code-shape decision:** label — see "Key rules".] +Add a literal code block only under a **Code-shape decision:** label — see "Key rules".] ``` ## Sources @@ -286,7 +286,7 @@ A Kent C. Dodds-style checklist of user-observable behaviors this plan must sati **File**: `exact/path/to/file.ext` ```language [Decisions: approach, exact paths, patterns to follow, pseudo-code for shape, test scenarios. -Add a literal code block only under a **Code-shape decision:** label — see "Key rules".] +Add a literal code block only under a **Code-shape decision:** label — see "Key rules".] ``` ### Success Criteria @@ -355,7 +355,7 @@ A Kent C. Dodds-style checklist of user-observable behaviors this plan must sati **File**: `exact/path/to/file.ext` ```language [Decisions: approach, exact paths, patterns to follow, pseudo-code for shape, test scenarios. -Add a literal code block only under a **Code-shape decision:** label — see "Key rules".] +Add a literal code block only under a **Code-shape decision:** label — see "Key rules".] ``` #### Success Criteria diff --git a/commands/ba/slice.md b/commands/ba/slice.md index 7b8b85f..df17da3 100644 --- a/commands/ba/slice.md +++ b/commands/ba/slice.md @@ -63,13 +63,13 @@ Read the `detail_level` field from YAML frontmatter. If missing, infer: Based on detail level, extract the task list and estimate LoC for each: -- **MINIMAL**: Each acceptance criterion checkbox is a task. Estimate LoC from the "MVP" code section -- count lines in code fences, distribute across criteria proportionally. -- **STANDARD**: Each `**File**:` block under "Changes Required" is a task. Count lines in each file block's code fence. -- **COMPREHENSIVE**: Each `**File**:` block within a phase is a task. Count code fence lines per block. Respect phase boundaries -- never merge tasks across phases. +- **MINIMAL**: Each acceptance criterion checkbox is a task. Estimate LoC from the "MVP" section by applying the LoC Counting Rules below to each fence, distributing across criteria proportionally. +- **STANDARD**: Each `**File**:` block under "Changes Required" is a task. Apply the LoC Counting Rules below to each file block's fence. +- **COMPREHENSIVE**: Each `**File**:` block within a phase is a task. Apply the LoC Counting Rules below to each block's fence. Respect phase boundaries -- never merge tasks across phases. ### LoC Counting Rules -- Count only lines inside **literal** code fences. A fence counts as literal only when it is immediately preceded by a `**Code-shape decision:**` label; any unlabeled fence is pseudo-code — do not line-count it, fall through to the estimate rule below. **Backward-compat:** if the plan has no `**Code-shape decision:**` labels anywhere (a pre-change plan), treat every fenced block as literal and count it — otherwise old plans route all code to the estimate fallback and slice under-sizes MRs. (Mirror execute Step 1.5b's routing.) +- Count only lines inside **literal** code fences. A fence counts as literal only when it is immediately preceded by a `**Code-shape decision:**` label; any unlabeled fence is pseudo-code — do not line-count it, fall through to the estimate rule below. **Backward-compat:** if the plan has no `**Code-shape decision:**` labels anywhere (a pre-change plan), treat every fenced block as literal and count it — otherwise old plans route all code to the estimate fallback and slice under-sizes MRs. - Exclude test file changes (files matching: `*.test.*`, `*.spec.*`, `*_test.*`, `test_*.*`, files under `tests/`, `__tests__/`, `test/`) - If a task has no literal code fence (pseudo-code/decisions only), estimate conservatively (~30-50 LoC per described file change) and mark as "est. approximate" - Track total estimated LoC across all tasks From 40891c936ffb131410ea0177d61623fb59e9a231 Mon Sep 17 00:00:00 2001 From: Bruno Azevedo Date: Wed, 17 Jun 2026 00:56:54 +0100 Subject: [PATCH 9/9] fix(plan): add when-unsure tiebreaker to Important Guidelines line The Important Guidelines summary mentioned the **Code-shape decision:** label but omitted the when-unsure tiebreaker stated in the Key-rules body (line 434). Append a short pointer so the quick-reference matches the body. Closes the one finding suppressed by the review's confidence gate. Plan: docs/plans/2026-06-16-refactor-plan-code-justification-gate-plan.md Co-Authored-By: Claude Opus 4.8 (1M context) --- commands/ba/plan.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/commands/ba/plan.md b/commands/ba/plan.md index e271e8d..7129a67 100644 --- a/commands/ba/plan.md +++ b/commands/ba/plan.md @@ -542,7 +542,7 @@ Convention compliance: [N aligned, N overrides, N debt items] ## Important Guidelines - **Research before writing** — understand the codebase before proposing changes -- **Exact file paths and decisions** — never use placeholders; literal code only under a `**Code-shape decision:**` label +- **Exact file paths and decisions** — never use placeholders; literal code only under a `**Code-shape decision:**` label (when unsure, include the label) - **Separate automated and manual verification** — different audiences, different timing - **"What We're NOT Doing" at every level** — prevents scope creep - **Convention compliance is mandatory** — not optional, not skippable