Idea
A last-mile command that captures objective screenshot evidence that a feature works and stages it for attachment to the MR/PR. Lately this has been done by hand with chrome-devtools MCP (or agent-browser): drive the app, often mock network requests to reach different scenarios (empty / error / loading / success), snap screenshots, attach to the MR.
This lives at the same last-mile moment as /ba:polish (server up, browser open, last step before shipping) and shares the ui-driver / browser seam — but it is a different lane and should not be folded into polish.
Why it is not polish (and not review)
Same moment + same tooling, but different on every axis the polish brainstorm locked down:
|
polish |
/ba:prove (this) |
(verification — separately deferred) |
| Signal |
subjective (feel) |
objective (it works) |
objective (it works) |
| Audience |
you, now |
the reviewer, later |
the agent, as a gate |
| Output |
ephemeral, no artifacts |
durable artifacts (PNGs on the MR) |
pass/fail |
| Drives state |
observes current state |
constructs states (mock network per scenario) |
constructs states |
Folding it into polish would break polish's locked identity ("purely conversational, no persisted artifacts, subjective feel"). It also doesn't belong in /ba:review (server-less and already slow).
Proposed shape
- Own command,
/ba:prove (its own lane: distinct audience + a non-trivial mocking core), orchestrated by /ba:propose since the artifact is MR-facing — e.g. /ba:propose --capture calls it, or it runs standalone.
- Shares the ui-driver seam with polish; does not reach into polish's conversational loop.
- Cheap → expensive split:
- v1 (mock-free): capture the happy path — navigate (or hand it routes), snap, stage images for the MR body. Immediately useful.
- v2 (scenario-driven): define states, mock network per state, drive + snap each. The real value, the real cost. Likely overlaps with the repo's existing mocking (MSW-style / fixtures).
Prior art — Every's compound-engineering plugin
ce-demo-reel — closest match. Core principle: "Evidence means USING THE PRODUCT, not running tests." Captures tiers (browser GIF / terminal recording / screenshot reel / static PNGs / no-evidence), scans for secrets before upload, uploads to a public host, returns {tier, description, url, path}, and the caller integrates it into the PR description under a "Demo"/"Screenshots" label. Strong template for the artifact + propose hand-off.
ce-test-browser — diff→route mapping, test each page, pass/fail table. This is the verification lane (objective gate for the agent), not this issue — useful reference for the deferred verification idea instead.
Key divergence: neither Every skill does network mocking / scenario states — they capture whatever happy-path state the app is in. The scenario-mocking layer is the net-new, expensive part of this idea and the thing to scope carefully.
Borrow from ce-demo-reel
- Tiered capture with a graceful "no evidence needed" tier (text/config-only changes).
- Secret scanning before any upload (patterns like
sk-, ghp_, Bearer, ?token=); set credentials outside the captured region.
- Standardized return that
propose splices into the MR body.
Meta-insight (ui-driver seam)
This is the second concrete consumer of the ui-driver seam (after polish; verification would be a third). That's the justification we said was missing when we deferred the seam-as-extension-point — it upgrades "ui-driver seam" from polish-private convenience to deliberate shared last-mile browser infra. Doesn't change v0 polish scope; it's a reason to keep the seam clean.
Open questions
- Standalone
/ba:prove vs. /ba:propose --capture vs. both?
- Where do scenario definitions live — inline args, a small config, or reuse existing test fixtures/mocks?
- chrome-devtools MCP vs.
agent-browser as the driver (request interception for mocking is the deciding capability).
- Host/attach screenshots how — upload to a public host (ce-demo-reel style) vs. commit into the MR vs. inline?
Status
Deliberately a roadmap item, not scheduled. Gated on the ui-driver seam existing. Distinct from /ba:polish (feel) and the separately-deferred verification/gate idea.
https://claude.ai/code/session_014AfNMUnKn3oAsZhvNk5Vxa
Idea
A last-mile command that captures objective screenshot evidence that a feature works and stages it for attachment to the MR/PR. Lately this has been done by hand with chrome-devtools MCP (or
agent-browser): drive the app, often mock network requests to reach different scenarios (empty / error / loading / success), snap screenshots, attach to the MR.This lives at the same last-mile moment as
/ba:polish(server up, browser open, last step before shipping) and shares the ui-driver / browser seam — but it is a different lane and should not be folded into polish.Why it is not polish (and not review)
Same moment + same tooling, but different on every axis the polish brainstorm locked down:
Folding it into polish would break polish's locked identity ("purely conversational, no persisted artifacts, subjective feel"). It also doesn't belong in
/ba:review(server-less and already slow).Proposed shape
/ba:prove(its own lane: distinct audience + a non-trivial mocking core), orchestrated by/ba:proposesince the artifact is MR-facing — e.g./ba:propose --capturecalls it, or it runs standalone.Prior art — Every's compound-engineering plugin
ce-demo-reel— closest match. Core principle: "Evidence means USING THE PRODUCT, not running tests." Captures tiers (browser GIF / terminal recording / screenshot reel / static PNGs / no-evidence), scans for secrets before upload, uploads to a public host, returns{tier, description, url, path}, and the caller integrates it into the PR description under a "Demo"/"Screenshots" label. Strong template for the artifact + propose hand-off.ce-test-browser— diff→route mapping, test each page, pass/fail table. This is the verification lane (objective gate for the agent), not this issue — useful reference for the deferred verification idea instead.Key divergence: neither Every skill does network mocking / scenario states — they capture whatever happy-path state the app is in. The scenario-mocking layer is the net-new, expensive part of this idea and the thing to scope carefully.
Borrow from
ce-demo-reelsk-,ghp_,Bearer,?token=); set credentials outside the captured region.proposesplices into the MR body.Meta-insight (ui-driver seam)
This is the second concrete consumer of the ui-driver seam (after polish; verification would be a third). That's the justification we said was missing when we deferred the seam-as-extension-point — it upgrades "ui-driver seam" from polish-private convenience to deliberate shared last-mile browser infra. Doesn't change v0 polish scope; it's a reason to keep the seam clean.
Open questions
/ba:provevs./ba:propose --capturevs. both?agent-browseras the driver (request interception for mocking is the deciding capability).Status
Deliberately a roadmap item, not scheduled. Gated on the ui-driver seam existing. Distinct from
/ba:polish(feel) and the separately-deferred verification/gate idea.https://claude.ai/code/session_014AfNMUnKn3oAsZhvNk5Vxa