chore(release): prepare 1.0.0#275
Conversation
…76) crystallize wrote a durable Page directly through store.put_page, embedding sess.task, sess.note, and sess.agent verbatim into the rendered markdown body. The body was never gated by propose_page + approve, so an agent calling kb.session_start(task=<payload>) and getting any one claim approved via crystallize could land arbitrary content into pages/. The page surfaces in kb.read_page, kb.list_pages, kb.context, and (once #60 is fixed) kb.search. Restrict the summary body to fields the proposing agent cannot influence: session id (server-generated), timestamps (server clock), and the list of approved artifact ids. The agent-controlled fields remain on the Session model itself and are still queryable, but no longer promoted into a durable Page. Also include summary_page_id in the session.crystallize audit event's object_ids when a page is written, so vouch audit truthfully attributes the write. Adds two regression tests: - test_crystallize_summary_page_does_not_leak_agent_controlled_fields - test_crystallize_audit_event_records_summary_page_id
…index FTS5 on update (#78) Two compounding bugs let archived / superseded / redacted claims keep flowing back to agents through kb.context: 1. build_context_pack had no claim.status filter, so any retrieval hit was appended to the pack regardless of subsequent lifecycle mutations. 2. store.update_claim refreshed the embedding cache but never re-indexed the FTS5 row, so claims_fts.status stayed frozen at first-index time. Every lifecycle op (archive, supersede, contradict, confirm) routes through update_claim and was therefore invisible to FTS5 / semantic search. This made ClaimStatus.{ARCHIVED, SUPERSEDED, REDACTED} purely decorative on the read side — agents kept quoting retracted knowledge as if it were live. Fix: - storage.update_claim now also calls index_db.index_claim under a short-lived sqlite connection (mirroring proposals.approve's first-index pattern). FTS5 errors are logged-and-skipped so a lifecycle op never fails because the embedding/FTS5 layer is misconfigured. - context.build_context_pack resolves each claim hit, drops it if the on-disk status is in {ARCHIVED, SUPERSEDED, REDACTED}, and drops it if the claim YAML has been deleted between index time and now. CONTESTED claims keep surfacing so contradictions remain visible. Tests: - test_context_pack_excludes_archived_claims - test_context_pack_excludes_superseded_claims - test_update_claim_refreshes_fts5_status (asserts the FTS5 status column via direct SQL against state.db) No on-disk-layout, schema, or bundle-format change.
Resolves conflicts in CHANGELOG.md, tests/test_sessions.py, and auto-merges src/vouch/sessions.py with the recently-merged: - #61 (FTS5 indexing of crystallize summary page) — adds index_db.index_page(...) right after store.put_page(...) inside crystallize. Composes cleanly with this PR's strict-derivation _build_summary_body and the summary_page_id audit fix. - #62 (test_crystallize_collects_approval_failures) — preserved as a sibling test. - #75 (bundle import sha256 verification) — CHANGELOG entry only. - #73 (bundle POSIX separators) — CHANGELOG entry only. Both regression tests added by this PR (test_crystallize_summary_page_does_not_leak_agent_controlled_fields and test_crystallize_audit_event_records_summary_page_id) still pass against the merged sessions.crystallize.
#81) The 'claims must cite sources' guarantee (README §'Why this exists' point 3; CONTRIBUTING §'Things we won't merge') used to live only in proposals.propose_claim, so every other write path silently accepted Claim(evidence=[]) and landed an uncited claim: - store.put_claim direct: existence-check loop iterates zero times. - store.update_claim: writes the YAML without re-validating. - bundle.import_apply via _validate_content: defers to Claim.model_validate, which accepted evidence=[] because the model had no min-length constraint. Add @field_validator('evidence') on Claim — raises ValueError when the list is empty. Closes all three bypass paths in one place. store.update_claim additionally re-validates via Claim.model_validate(claim.model_dump()) before persisting, so in-place mutation (c.evidence = []; store.update_claim(c)) raises before the YAML hits disk — the field validator only fires at construction time, not on attribute assignment. Four regression tests: - test_claim_model_rejects_empty_evidence (tests/test_storage.py) — Claim(evidence=[]) raises pydantic.ValidationError. - test_put_claim_rejects_empty_evidence — store.put_claim raises; no claims/<id>.yaml is written. - test_update_claim_rejects_empty_evidence — in-place mutation + update_claim raises; the on-disk YAML is unchanged. - test_import_rejects_uncited_claim (tests/test_bundle.py) — a schema-valid bundle whose claim YAML has evidence: [] is rejected by import_check (schema validation issue) and import_apply raises before writing. The existing guard in proposals.propose_claim becomes a redundant user-facing error message and is left in place for the friendlier CLI/JSONL error string. No on-disk-layout, schema, or bundle-format change; data the model never should have accepted now raises.
…ing from tarball (#80) import_check previously only verified that tar members listed in the manifest had matching hashes, but never checked the reverse: whether every manifest entry had a corresponding tar member. A bundle whose manifest.json referenced claims/c1.yaml but whose tarball contained only manifest.json would pass import_check with ok=True, and import_apply would silently write nothing — no exception, no audit event indicating data loss. Add the missing-member pass (mirroring the existing check in export_check) so that manifest entries without a matching tar member produce a "manifest lists missing file" issue. import_apply then refuses to import because check.issues is non-empty.
read-only `vouch diff <old> <new>` that shows what changed between two claim or two page revisions: field-level changes plus a line-diff of the long text/body. auto-detects kind, hides churning metadata.
new read-only `vouch diff <old> <new>` compares two claims or two pages by id and shows what changed: scalar/list fields as old → new, plus a line-diff of the long text/body. auto-detects kind, hides churning metadata (timestamps, approved_by), supports --json. closes a roadmap 0.1 item.
mypy inferred old/new as Claim from the first branch, so the page branch's reassignment tripped an incompatible-assignment error in CI. annotate the union up front.
feat(diff): add `vouch diff` for claim/page revisions
…review) The new Claim.evidence min-citation validator (#81) also fires when claims are read back from disk. A KB that has a pre-existing uncited claims/<id>.yaml from before the fix would otherwise crash vouch lint / vouch doctor with a bare pydantic.ValidationError deep in store.list_claims(). Add _load_claims_for_lint(), a per-file iteration that catches pydantic.ValidationError (and any other load error) and surfaces each bad file as a Finding with code='invalid_claim' and an explicit repair hint: 'edit the YAML to add a citation, or delete the file'. lint() also stops calling status() to populate counts — status() calls the strict store.list_claims() which would re-raise on the same files — and builds the counts dict inline from the safely-loaded valid claims. Regression test in tests/test_health.py: - test_lint_surfaces_legacy_uncited_claim_yaml_without_crashing hand-crafts a claims/legacy.yaml with evidence: [] (matches the on-disk shape an older buggy write path would have left), asserts vouch lint runs to completion, surfaces invalid_claim in findings with the repair-hint message, and that the well-formed sibling claim is still discovered. CHANGELOG migration note expanded to describe the repair hint.
The new inline-built counts in lint() (from 5c881de) is a literal dict with mixed value types (str/int/bool), which mypy correctly inferred as dict[str, object] and rejected against the HealthReport.counts: dict[str, int] annotation. The original status() returned the same mixed dict via an untyped 'dict' return, which masked the mismatch — the narrow type was effectively never checked at the call site. Widen counts to dict[str, Any] to match runtime reality. Also tighten status()'s return annotation from 'dict' to 'dict[str, Any]' for consistency. No caller does arithmetic on counts values; they just echo or pass through, so the widening is risk-free.
feat: add guided proposal review CLI
Feat/pending json
Feat/deterministic vouch sync
…age-bypass fix(sessions): close crystallize review-gate bypass via summary page
fix(models): require Claim.evidence to be non-empty at the model layer
the plan assumed a KBStore.list_pending() that does not exist; the codebase queries pending via list_proposals(ProposalStatus.PENDING) (server.py, cli.py). also fix payload dict-access and note the ascii-only claim-text constraint forced by storage's latin-1 yaml write.
SubprocessRunner lives in auto_pr (not dual_solve); add choice: str|None annotation mypy requires; ascii -- in the proposed-id echo to dodge the locale latin-1 encode issue.
…rrupt the kb a github issue title or engine summary with a non-latin-1 char (em dash, smart quote) flowed into propose_claim's yaml write; storage encodes with the locale default (latin-1 here), so the write raised UnicodeEncodeError mid-stream and left a zero-byte proposal that poisoned list_proposals for the whole kb. record_to_kb now coerces claim text to ascii at the boundary (the verbatim original stays in the Source content, written as bytes), the cli wraps finalize so any residual write error is a clean ClickException, and a regression test exercises a non-ascii title. also fixes the ground_prompt docstring (M3.2).
feat(dual-solve): run claude + codex on an issue, operator picks the winner
each engine run is multi-minute and fully captured, so the operator watched a blank terminal with no signal of which engine was active. thread an optional on_progress callback through prepare() and wire the cli to echo it to stderr: a line per phase -- fetch, ground, and each engine's run with elapsed time and diff size. defaults to none so existing behaviour and tests are unchanged, and stderr keeps stdout and --json output clean.
a single-page app, shipped inside the review-ui, that takes a github issue link, runs dual-solve against the repo the server lives in, streams progress over the existing websocket, shows both engines' diffs side by side, and lets a human pick the winner -- keeping the branch and proposing the rationale into the kb through the same review gate. captures the decisions made during brainstorming: shipped feature in src/vouch/web, fixed target repo, buildless vue 3, full pick semantics, and an executing http surface that stays gated (off by default, edit-only, bearer-token guarded) and requires a vep before merge.
a bite-sized, tdd task plan turning the design spec into the shipped feature: vep, the gate + plumbing, the run/job/progress-bridge backend, the choose endpoint that finalizes through the review gate, the buildless vue spa, and the changelog. carries the dual-solve constraints forward (import-as-used, ascii-coerced kb text, .venv tooling, edit-only over http).
the vep documents the new HTTP routes (/dual-solve, /dual-solve/run,
/dual-solve/job/{id}, /dual-solve/choose), the --allow-dual-solve gate,
the edit-only autonomy constraint, and the security argument that the
review-gate invariant is preserved (web finalize only proposes, never
auto-approves).
mount the /dual-solve shell route only when the server is started with --allow-dual-solve. when the flag is absent, register() returns immediately and the route does not exist (404). this is the security gate: the dual-solve runner spawns external processes (claude+codex) so it must be an explicit opt-in. changes: - src/vouch/web/dual_solve_api.py: new register() function; calls ds.repo_root() at app-build time to fail fast if not in a git repo. - src/vouch/web/templates/dual_solve.html: spa shell template with the #dual-solve-app vue mount point. - src/vouch/web/server.py: build_app gains allow_dual_solve=False; _tmpl injects dual_solve_enabled; _register_dual_solve called before return app. - src/vouch/web/__init__.py: create_app gains allow_dual_solve=False and passes it to build_app. - src/vouch/web/templates/base.html: conditional nav link for dual-solve. - src/vouch/cli.py: review-ui gains --allow-dual-solve flag; passed to create_app. - tests/test_web_dual_solve.py: two tests — enabled renders 200 with the vue mount point; disabled returns 404.
add DualSolveJob dataclass, _serialize helper, POST /dual-solve/run (201),
GET /dual-solve/job/{id}, and the sync→async progress bridge.
the job is created synchronously in the route handler before the
asyncio.create_task fires, so the single-flight 409 check is reliable.
the on_progress callback runs on the worker thread and bridges to the hub
via run_coroutine_threadsafe, capturing the event loop in the handler (not
the worker). autonomy is hard-forced to "edit" regardless of what the caller
sends.
deviation from spec: the single-flight guard uses `status not in
("done", "error")` rather than `status in ("running", "finalizing")`.
this is required because in the starlette testclient's per-request portal
environment, the background task (run_in_threadpool → anyio thread) completes
during portal teardown before the second post arrives, leaving status as
"ready". the broader guard matches the semantic intent (reject while a job is
still active/awaiting-decision).
the previous guard rejected new runs whenever a prior job was in any
non-terminal state, including "ready". a ready job the operator abandons
(closes the tab, never chooses) would block every future run forever and
make the stale-worktree cleanup path unreachable — a permanent leak.
corrected to 409 only on ("running", "finalizing"). ready/done/error jobs
are replaced by a new run, which first runs ds.cleanup on their candidates.
test_run_is_single_flight is rewritten to set the precondition directly
(no timing dependency on when the background task completes). a new test,
test_run_replaces_abandoned_ready_job_and_cleans_up, locks in the
replace-and-cleanup behavior for the ready case.
adds POST /dual-solve/choose which takes {job_id, winner, reason},
looks up the candidate matching the winner engine on the in-memory job,
calls ds.finalize with record=True and proposed_by=reviewer() so the
rationale lands in the kb review queue, then marks the job done and
broadcasts a "done" ws frame.
winner=null skips finalize entirely and returns empty proposed_ids.
guards: 404 when job_id doesn't match the active job, 409 when the
job is not in "ready" status, 500 on ValueError/RuntimeError from
finalize. the review gate is preserved — finalize only ever calls
proposals.propose_claim; no approve/durable-write is added here.
…s, constrain winner rename test_choose_before_ready_is_conflict to test_choose_unknown_job_is_not_found to match what it actually tests (404 on bogus job_id, not the 409 guard). add test_choose_when_not_ready_is_conflict that exercises the real 409 path by pre-planting a job in status="finalizing" and asserting the guard fires. add an error broadcast in the choose handler's except block so clients see the error frame before the 500 is raised, matching _run_job's error path. tighten _ChooseReq.winner to Literal["claude","codex"]|None so an unrecognised engine name is rejected with 422 rather than silently treated as "keep neither". also enter testclient as a context manager in _client() so the blocking portal stays alive across all requests within a test; without this asyncio.create_task background jobs are cancelled when the per-request ephemeral portal closes, which caused test_choose_winner_finalizes_and_returns_ids and test_choose_neither to fail intermittently depending on test ordering.
vendor vue 3.4.38 (esm-browser.prod build, 150 kb) under
src/vouch/web/static/vendor/ with sha256 recorded in VENDOR.md.
add dual_solve.js — the full vue 3 spa: runform, websocket progress
log, side-by-side diff panes (with a minimal unified-diff parser), and
a choice bar that posts to /dual-solve/choose.
add dual_solve.css — two-column pane grid with diff line colouring.
extend base.html with a {% block head %} slot so per-page stylesheets
can be injected without touching the shared layout. dual_solve.html
uses it to pull in dual_solve.css.
add test_spa_assets_are_served to tests/test_web_dual_solve.py:
asserts the js, css, and vendor vue files all return 200, and that
the rendered page references both dual_solve.js and createApp.
no npm, no bundler, no build step — the esm-browser build includes the
template compiler so template: strings work at runtime.
note the new `review-ui --allow-dual-solve` browser surface under [unreleased]/added, pointing at vep-0006. the pick keeps the branch and proposes through the existing review gate; nothing auto-approves.
…recondition from the final whole-branch review: assert the executing run/choose routes 404 when --allow-dual-solve is off (not just the page), so a future refactor cannot silently expose them; and document why a prepare failure leaks no worktrees today.
feat(dual-solve): run agents in docker sandbox
extracts the inline styles and script from the original single-file landing page into shared base.css / app.css / app.js, and adds three sibling pages (gittensor, how-it-works, reference) that reuse the same nav and colophon. index.html shrinks to the masthead and plates; the rest of the chrome and the scroll/draw behaviour now live in the shared assets, so the pages stay in step instead of each carrying their own copy.
feat(web): split the landing page into a shared-asset multi-page site
bump the version to 1.0.0 across pyproject.toml, src/vouch/__init__.py, and openclaw.plugin.json, and move the [Unreleased] changelog entries into a dated [1.0.0] section. also restore the pypi packaging that the test branch had regressed: the distribution name is set back to vouch-kb (the bare name "vouch" is owned by an unrelated project on pypi, and trusted publishing is configured for vouch-kb), and the release.yml trusted-publishing workflow is re-added from main. the reviewGatedKB contract is bumped to vouch-kb-1.0.0 to match. verified green before tagging: pytest (utf-8), mypy src, ruff, and python -m build producing vouch_kb-1.0.0.
|
Important Review skippedToo many files! This PR contains 230 files, which is 80 over the limit of 150. To get a review, narrow the scope: Upgrade to a paid plan to raise the limit. ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (230)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
summary
prepares and integrates the 1.0.0 release. brings the
testintegration line(everything since 0.1.0 — dual-solve, auto-pr, the review-ui SPA, typed page
kinds, and more) into
main, and bumps to 1.0.0.release prep
1.0.0inpyproject.toml,src/vouch/__init__.py,openclaw.plugin.json[Unreleased]→[1.0.0] — 2026-06-26testbranch had regressed: distributionname back to
vouch-kb(the bare namevouchis owned by an unrelatedproject on pypi; trusted publishing is configured for
vouch-kb), and therelease.ymltrusted-publishing workflow re-added frommainreviewGatedKBcontract →vouch-kb-1.0.0verification (before tagging)
pytest(utf-8, ignoring embeddings) greenmypy srcclean ·ruff check src testscleanpython -m buildproducesvouch_kb-1.0.0(sdist + wheel)after merge
tag
v1.0.0onmain→release.ymlpublishesvouch-kb 1.0.0to pypi viatrusted publishing.
note
the
testbranch independently regressed the packaging (name reverted tovouch,release.ymldropped). this release fixes it on the way tomain; thetestbranch itself should be re-synced separately so it doesn't reintroduce theregression.