Skip to content

[roadmap] /ba:prove — capture screenshot evidence of a feature for the MR #20

@azevedo

Description

@azevedo

Idea

A last-mile command that captures objective screenshot evidence that a feature works and stages it for attachment to the MR/PR. Lately this has been done by hand with chrome-devtools MCP (or agent-browser): drive the app, often mock network requests to reach different scenarios (empty / error / loading / success), snap screenshots, attach to the MR.

This lives at the same last-mile moment as /ba:polish (server up, browser open, last step before shipping) and shares the ui-driver / browser seam — but it is a different lane and should not be folded into polish.

Why it is not polish (and not review)

Same moment + same tooling, but different on every axis the polish brainstorm locked down:

polish /ba:prove (this) (verification — separately deferred)
Signal subjective (feel) objective (it works) objective (it works)
Audience you, now the reviewer, later the agent, as a gate
Output ephemeral, no artifacts durable artifacts (PNGs on the MR) pass/fail
Drives state observes current state constructs states (mock network per scenario) constructs states

Folding it into polish would break polish's locked identity ("purely conversational, no persisted artifacts, subjective feel"). It also doesn't belong in /ba:review (server-less and already slow).

Proposed shape

  • Own command, /ba:prove (its own lane: distinct audience + a non-trivial mocking core), orchestrated by /ba:propose since the artifact is MR-facing — e.g. /ba:propose --capture calls it, or it runs standalone.
  • Shares the ui-driver seam with polish; does not reach into polish's conversational loop.
  • Cheap → expensive split:
    • v1 (mock-free): capture the happy path — navigate (or hand it routes), snap, stage images for the MR body. Immediately useful.
    • v2 (scenario-driven): define states, mock network per state, drive + snap each. The real value, the real cost. Likely overlaps with the repo's existing mocking (MSW-style / fixtures).

Prior art — Every's compound-engineering plugin

  • ce-demo-reel — closest match. Core principle: "Evidence means USING THE PRODUCT, not running tests." Captures tiers (browser GIF / terminal recording / screenshot reel / static PNGs / no-evidence), scans for secrets before upload, uploads to a public host, returns {tier, description, url, path}, and the caller integrates it into the PR description under a "Demo"/"Screenshots" label. Strong template for the artifact + propose hand-off.
  • ce-test-browser — diff→route mapping, test each page, pass/fail table. This is the verification lane (objective gate for the agent), not this issue — useful reference for the deferred verification idea instead.

Key divergence: neither Every skill does network mocking / scenario states — they capture whatever happy-path state the app is in. The scenario-mocking layer is the net-new, expensive part of this idea and the thing to scope carefully.

Borrow from ce-demo-reel

  • Tiered capture with a graceful "no evidence needed" tier (text/config-only changes).
  • Secret scanning before any upload (patterns like sk-, ghp_, Bearer, ?token=); set credentials outside the captured region.
  • Standardized return that propose splices into the MR body.

Meta-insight (ui-driver seam)

This is the second concrete consumer of the ui-driver seam (after polish; verification would be a third). That's the justification we said was missing when we deferred the seam-as-extension-point — it upgrades "ui-driver seam" from polish-private convenience to deliberate shared last-mile browser infra. Doesn't change v0 polish scope; it's a reason to keep the seam clean.

Open questions

  • Standalone /ba:prove vs. /ba:propose --capture vs. both?
  • Where do scenario definitions live — inline args, a small config, or reuse existing test fixtures/mocks?
  • chrome-devtools MCP vs. agent-browser as the driver (request interception for mocking is the deciding capability).
  • Host/attach screenshots how — upload to a public host (ce-demo-reel style) vs. commit into the MR vs. inline?

Status

Deliberately a roadmap item, not scheduled. Gated on the ui-driver seam existing. Distinct from /ba:polish (feel) and the separately-deferred verification/gate idea.

https://claude.ai/code/session_014AfNMUnKn3oAsZhvNk5Vxa

Metadata

Metadata

Assignees

No one assigned

    Labels

    cluster:polishBrowser/last-mile polish & evidenceneeds-brainstormIdea not yet shaped into a plan

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions