Skip to content

[roadmap] /ba:polish — conversational browser polish command (v0) #24

@azevedo

Description

@azevedo

Summary

Add a new slash command, /ba:polish, that introduces a polish phase to the plugin workflow — the human-judgment bookend that runs after /ba:review. It's a human-in-the-loop, conversational step where the main agent drives a real browser so the developer can iterate on the feel of a just-built feature (design, spacing, copy, empty/error states, motion). Inspired by Every's ce-polish-beta and their Ideate → brainstorm → plan → work → review → polish → compound loop, where polish became "the new end" once the middle of the cycle got automated.

It removes the manual setup tax (launch dev server, open browser, screenshot) and closes the screenshot → diagnose → fix → hot-reload → re-screenshot loop inside one conversation.

This issue tracks v0 only.

Existing assets

  • Branch: claude/compound-engineering-workflow-gTOMG (where the design work lives)
  • Approved brainstorm: docs/brainstorms/2026-06-03-ba-polish-command-brainstorm.mdstatus: approved, full triage. Commits 9957096 (design) and 7bec19c (cheap objective signals folded into observe).
  • Not yet started: the implementation plan (/ba:plan) and the command itself (commands/ba/polish.md).

Locked design (from the brainstorm)

Invocation: /ba:polish [scope-hint] [--route <path>] [--no-launch] [--base <ref>]

Phase contract (ordered; each gates the next):

  1. Preflight — verify agent-browser on PATH; verify a git repo with a non-empty diff vs base. Hard-stops only.
  2. Scope — compute changed routes/components from the branch diff; ordered most-changed-first, narrowed by scope-hint/--route.
  3. Serve — resolve a base URL: probe for a running dev server; else auto-detect framework + launch (--no-launch skips to probe-or-assume).
  4. Drive + Iterate — the only interactive phase. Per target: open via the ui-driver seam (navigate/observe/act); observe surfaces the FEEL read plus cheap objective signals the driver already produces (layout shift / CLS and console errors). Take direction, apply UI/feel fixes, hot-reload, re-observe. Bug-notes accumulate in memory.
  5. Wrap — fires once on session-end; replays accumulated bug-notes as a single deferral handoff to /ba:review / /ba:debug. Writes nothing.

Key decisions:

  • Sole v0 driver: Vercel agent-browser CLI (over Chrome DevTools MCP) — ~90% of the inspection surface at near-zero standing token cost vs ~17–18k tokens/turn of MCP schemas; also what lets the main agent drive directly to keep the conversational loop intact.
  • Main agent drives, no sub-agents — polish is interactive; sub-agent fan-out suits autonomous work, not tight human-in-the-loop.
  • Note bugs, don't fix them — "review catches the bug; polish asks if it feels right." Correctness bugs are surfaced and deferred.
  • Owns getting the app running (detect-or-launch; tear down only what it started).
  • Scopes from the current branch diff vs base.
  • No persisted artifacts — purely conversational.
  • New "Polish Commands" category in CLAUDE.md/README.md/ba:polish doesn't fit the documented "execution = plan-driven" class (it's plan-less, driven by live feel + the diff).
  • Single navigate/observe/act ui-driver seam retained from rejected Design C — the one axis with concrete expected churn (driver may be swapped, as Every did once); other speculative seams dropped as YAGNI.

Acceptance criteria

  • /ba:polish exists at commands/ba/polish.md with frontmatter matching sibling commands (name, description, argument-hint).
  • Bare /ba:polish runs the full Preflight→Scope→Serve→Drive→Wrap loop with no flags in the common case.
  • --route, --no-launch, --base behave as the documented escape hatches.
  • Missing agent-browser → clear install hint, stops before any server launch; empty diff and server-won't-start each stop with their specified message.
  • All browser interaction goes through a single navigate/observe/act seam with agent-browser as the v0 adapter.
  • Noticed correctness bugs are surfaced and deferred, never auto-fixed; wrap replays them once.
  • observe surfaces CLS + console errors alongside the FEEL read; no broader objective-verification suite runs in v0.
  • No file is written by a polish session.
  • A new Polish Commands category is added to CLAUDE.md and README.md, /ba:polish is listed, and version in .claude-plugin/plugin.json is bumped.

Out of scope (v0)

  • Tier-2 platform-map seam (consuming an external repo-owned "how the app works" map) — parked as a separate roadmap item.
  • Chrome DevTools MCP as primary or secondary driver — researched and rejected (token cost).
  • Persisted artifacts of any kind.
  • Fixing correctness bugs.
  • Sub-agent dispatch.
  • Browser-driven objective verification suites (a11y scoring, web vitals, visual regression, a /frontend-verify skill) — deferred as a loose idea.

Next step

/ba:plan against the approved brainstorm to produce the implementation plan, then execute on claude/compound-engineering-workflow-gTOMG.

Related

https://claude.ai/code/session_014AfNMUnKn3oAsZhvNk5Vxa

Metadata

Metadata

Assignees

No one assigned

    Labels

    cluster:polishBrowser/last-mile polish & evidencereadyClear starting point — can build now

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions