feat: knowledge base doc-freshness layer (drift detection + refresh) by efenocchi · Pull Request #285 · activeloopai/hivemind

efenocchi · 2026-06-23T16:50:06Z

What

Adds a Knowledge Base / per-file documentation layer that stays fresh on code deltas — the third layer alongside session summaries and the AST code graph. For the Krisp use case: internal docs that go stale because code changes many times a day.

Honest framing (per the team's own oracle eval): this is justified as doc freshness / onboarding trust / token savings, NOT QA-answer accuracy.

How it works

A doc anchors each described symbol to a content_hash of its source slice. On a commit, the graph diff + blast-radius selects only the docs whose anchored code (or its callers) changed; a bounded LLM edit regenerates them through an objective gate; the new version is written INSERT-only (version-bumped, immutable created_at).

code change -> graph diff + blast-radius -> impacted docs
            -> host LLM bounded rewrite -> gate -> setDoc (version+1)

Steps in this PR

1 - schema + config: hivemind_docs table (INSERT-only, version-bumped) + docsTableName.
2 - store: src/docs insert/edit/list/get + idempotent setDoc (no history fork).
3 - drift detector: anchors.ts (source-slice hash) + impact.ts (direct hash drift + blast-radius widening over the graph).
5 - refresh engine: gate.ts (objective invariants: bounded edit, anchors exist, slow-tier protected) + refresh.ts (re-anchor -> generate -> gate -> write) + refresh-llm.ts (host claude -p, reuses the wiki-worker seam) + hivemind docs CLI (set/show/list/archive/refresh, --anchor, --dry-run).
8 - trigger: opt-in HIVEMIND_DOCS_AUTO_REFRESH=1 fires a detached refresh at the end of graph build.

Testing

~105 unit tests; new docs files >=85% line coverage (gate/write/read/refresh-llm 100%).
End-to-end through the real hivemind binary against a test Deeplake table: real code change -> drift detected -> claude rewrote the doc to match -> gate passed -> v2 written, created_at preserved.

Not in scope

No VFS routing for the docs memory subtree yet (CLI/worker only); no spec-layer correlation; per-agent doc-worker variants (codex/cursor/hermes/pi) follow the existing wiki-worker fork pattern.

Summary by CodeRabbit

Release Notes

New Features
- Added hivemind docs CLI command with subcommands: set, show, list, archive, and refresh
- Documentation can be anchored to code symbols for automatic staleness tracking
- Automatic documentation refresh when code changes (opt-in via HIVEMIND_DOCS_AUTO_REFRESH)
- LLM-assisted documentation updates with change-limit validation
Chores
- Updated configuration schema to support documentation table settings

github-actions · 2026-06-23T16:51:35Z

Coverage Report

Scope: files changed in this PR. Enforced threshold: 90% per metric (per file via vitest.config.ts).

Status	Category	Percentage	Covered / Total
🔴	Lines	88.26% (🎯 90%)	1263 / 1431
🔴	Statements	88.01% (🎯 90%)	1432 / 1627
🟢	Functions	98.24% (🎯 90%)	167 / 170
🔴	Branches	80.70% (🎯 90%)	786 / 974

File Coverage — 17 files changed

File	Stmts	Branches	Functions	Lines
`src/cli/index.ts`	🔴 86.1%	🔴 84.6%	🟢 100.0%	🔴 83.7%
`src/commands/docs.ts`	🔴 82.8%	🔴 71.9%	🟢 93.8%	🔴 85.1%
`src/commands/graph.ts`	🔴 74.3%	🔴 62.9%	🟢 95.2%	🔴 73.9%
`src/config.ts`	🟢 100.0%	🟢 100.0%	🟢 100.0%	🟢 100.0%
`src/deeplake-api.ts`	🟢 99.0%	🔴 88.6%	🟢 100.0%	🟢 99.6%
`src/deeplake-schema.ts`	🟢 96.0%	🔴 84.8%	🟢 100.0%	🟢 95.2%
`src/docs/anchors.ts`	🟢 94.3%	🔴 89.3%	🟢 100.0%	🟢 100.0%
`src/docs/auto-refresh-trigger.ts`	🟢 100.0%	🔴 81.8%	🟢 100.0%	🟢 100.0%
`src/docs/gate.ts`	🟢 100.0%	🟢 100.0%	🟢 100.0%	🟢 100.0%
`src/docs/impact.ts`	🔴 87.3%	🔴 81.8%	🟢 100.0%	🟢 91.1%
`src/docs/index.ts`	🟢 100.0%	🟢 100.0%	🟢 100.0%	🟢 100.0%
`src/docs/read.ts`	🟢 100.0%	🔴 86.2%	🟢 100.0%	🟢 100.0%
`src/docs/refresh-llm.ts`	🟢 100.0%	🔴 85.7%	🟢 100.0%	🟢 100.0%
`src/docs/refresh.ts`	🟢 94.0%	🔴 75.0%	🟢 100.0%	🟢 100.0%
`src/docs/write.ts`	🟢 100.0%	🟢 100.0%	🟢 100.0%	🟢 100.0%
`src/graph/load-current.ts`	🟢 94.1%	🟢 100.0%	🟢 100.0%	🟢 92.8%
`src/graph/render/impact.ts`	🟢 92.2%	🔴 70.3%	🔴 87.5%	🟢 98.7%

_{Generated for commit 9fc89ec.}

coderabbitai · 2026-06-23T16:54:38Z

📝 Walkthrough

Walkthrough

Adds a complete hivemind docs subsystem: a versioned INSERT-only Deeplake docs table (schema, config, read/write helpers), source-anchor tracking via SHA-256 hashing of symbol line slices, drift/impact detection using direct hash comparison and reverse-BFS blast-radius widening, an LLM refresh pipeline with an LCS-based edit gate, a maybeSpawnDocsRefresh post-build trigger, and a five-subcommand CLI (set, show, list, archive, refresh).

Changes

hivemind docs subsystem

Layer / File(s)	Summary
Schema, config, and DB provisioning `src/deeplake-schema.ts`, `src/config.ts`, `src/deeplake-api.ts`, `tests/shared/deeplake-api.test.ts`, `tests/claude-code/flush-memory.test.ts`, `tests/claude-code/skillify-auto-pull.test.ts`, `tests/claude-code/spawn-wiki-worker.test.ts`, `tests/shared/graph/deeplake-p.test.ts`	Adds `DOCS_COLUMNS` schema constant with module-load validation, `docsTableName` to `Config`/`loadConfig` (env `HIVEMIND_DOCS_TABLE`), and `DeeplakeApi.ensureDocsTable` that creates/heals the docs table and ensures the `(doc_id, version)` index. Updates test fixtures with `docsTableName`.
Docs read/write data layer and barrel `src/docs/read.ts`, `src/docs/write.ts`, `src/docs/index.ts`, `tests/shared/docs.test.ts`	Adds typed INSERT-only versioned write helpers (`insertDoc`, `editDoc`, `setDoc`, `archiveDoc`, `appendVersion`) and read helpers (`listDocs`, `getDocLatest`, `parseAnchors`, `normalize`). Barrel `index.ts` re-exports all public API. Tests cover SQL shape, escaping, version immutability, deduplication, and `parseAnchors` edge cases.
Anchor tracking and graph snapshot loading `src/docs/anchors.ts`, `src/graph/load-current.ts`, `src/graph/render/impact.ts`, `tests/shared/docs-impact.test.ts`, `tests/shared/graph/load-current.test.ts`	Adds `parseSourceLocation`, line-slice reading, SHA-256 hashing, `buildAnchor`, and `anchorStatus`. Adds `loadCurrentSnapshot` (null-safe local snapshot loader). Refactors `graph/render/impact.ts` into a shared `reverseBfs` helper exposing new `impactedNodes` export. Tests cover all anchor status states and snapshot loading scenarios.
Doc drift detection and impact computation `src/docs/impact.ts`, `tests/shared/docs-impact.test.ts`	Adds `computeStaleDocs` (direct anchor hash comparison), `widenByBlastRadius` (caller-closure relational pass via `impactedNodes`), and `computeImpactedDocs` orchestrator unioning both passes over an optional `SnapshotDiff`. Tests cover all staleness paths and relational widening.
Edit gate `src/docs/gate.ts`, `tests/shared/docs-refresh.test.ts`	Adds `countChangedLines` (LCS-based rolling DP), `gateDocEdit` that rejects empty/overlong content, slow-tier docs, missing-symbol anchors, and over-budget edits. Tests cover all rejection paths and the acceptance path.
LLM refresh orchestration, prompt building, and auto-refresh trigger `src/docs/refresh.ts`, `src/docs/refresh-llm.ts`, `src/docs/auto-refresh-trigger.ts`, `tests/shared/docs-refresh.test.ts`, `tests/shared/docs-refresh-llm.test.ts`, `tests/shared/docs-auto-refresh-trigger.test.ts`	Adds `refreshDocs` orchestrator (reanchor, gather changed symbols, generate, gate, persist), `buildRefreshPrompt`, `makeClaudeGenerate` (spawns `claude` CLI via `execFileSync`), and `maybeSpawnDocsRefresh` (opt-in detached trigger gated by `HIVEMIND_DOCS_AUTO_REFRESH=1`). Tests cover prompt content, LLM wiring, all orchestrator outcome paths, and spawn gating.
CLI command routing and graph build hook `src/cli/index.ts`, `src/commands/docs.ts`, `src/commands/graph.ts`, `tests/claude-code/cli-docs.test.ts`	Adds `runDocsCommand` with `set`/`show`/`list`/`archive`/`refresh` subcommands (flag parsing, content resolution, anchor building from `--anchor`). Wires it into `cli/index.ts`. Adds `maybeSpawnDocsRefresh` as Step 8 in graph build after snapshot write. Tests cover dispatch, all subcommands, SQL patterns, temp-file integration, and gate rejection.

Sequence Diagrams

sequenceDiagram
  actor User
  participant CLI as hivemind docs refresh
  participant loadCurrentSnapshot
  participant listDocs
  participant computeImpactedDocs
  participant ensureDocsTable
  participant refreshDocs
  participant makeClaudeGenerate as Claude CLI
  participant gateDocEdit
  participant setDoc as setDoc (Deeplake)

  User->>CLI: docs refresh [--dry-run]
  CLI->>loadCurrentSnapshot: load local graph snapshot
  CLI->>listDocs: fetch active docs
  CLI->>computeImpactedDocs: detect stale docs via anchors + blast radius
  alt --dry-run
    CLI-->>User: print impacted docs, no writes
  else live run
    CLI->>ensureDocsTable: create/heal docs table
    CLI->>refreshDocs: iterate impacted docs
    refreshDocs->>makeClaudeGenerate: buildRefreshPrompt + execFileSync(claude)
    makeClaudeGenerate-->>refreshDocs: proposed markdown
    refreshDocs->>gateDocEdit: validate (tier, size, anchors)
    alt gate pass
      refreshDocs->>setDoc: persist new version
      setDoc-->>refreshDocs: WriteResult
    else gate fail
      refreshDocs-->>refreshDocs: record rejected
    end
    refreshDocs-->>CLI: RefreshReport
    CLI-->>User: print refreshed/rejected/skipped counts
  end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

activeloopai/hivemind#192: Modifies src/commands/graph.ts around the snapshot write and cloud push flow — the exact location where this PR inserts the maybeSpawnDocsRefresh post-build hook.
activeloopai/hivemind#194: Modifies src/cli/index.ts main() to add a new top-level subcommand branch, using the same dispatch pattern this PR replicates for docs/doc.

Poem

🐇 Hop hop, the docs are alive!
Each symbol hashed, each anchor set just right,
A slow tier? Gate says: not tonight!
The Claude CLI spawns in the night,
✨ Versions stack — insert, never UPDATE —
And the rabbit refreshes drift at dawn's first light! 🌅

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 61.76% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main feature: a documentation layer that maintains freshness through drift detection and refresh mechanisms, which aligns perfectly with the changeset's primary objectives.
Description check	✅ Passed	The PR description comprehensively covers the feature objectives, implementation strategy, architecture, testing approach, and scoping boundaries, fulfilling the template's requirements for summary and test plan validation.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/knowledge-base-docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

…g parsing

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (4)

tests/shared/docs-impact.test.ts (1)

133-138: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Assert the full changed payload, not just state.

This test currently verifies only the discriminator and misses regressions in from/to values.

✅ Proposed assertion tightening

   it("changed when the code differs from the stored hash", () => {
     const n = node("a.ts:foo:function", "a.ts", "L1-L2");
     const stale: DocAnchor = { symbol_id: n.id, content_hash: "deadbeef" };
     const st = anchorStatus(stale, snap([n]), dir);
-    expect(st.state).toBe("changed");
+    expect(st).toEqual({
+      state: "changed",
+      from: "deadbeef",
+      to: sha("function foo() {\n  return 1;"),
+    });
   });

As per path instructions, tests/**: Prefer asserting on specific values (paths, messages) over generic substrings.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/shared/docs-impact.test.ts` around lines 133 - 138, The test for the
"changed when the code differs from the stored hash" case is only asserting on
the state property of the anchorStatus result, but it should verify the complete
changed payload including the from and to values. Replace the single assertion
on st.state with a comprehensive assertion that checks the full object structure
returned by anchorStatus to catch potential regressions in the from and to
fields.

Source: Path instructions

tests/shared/graph/load-current.test.ts (1)

54-58: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Make the fallback test assert concrete snapshot content.

not.toBeNull() is broad here; assert the expected nodes/links payload directly.

✅ Proposed assertion tightening

   it("falls back to snapshot_sha256 when there is no commit context", () => {
     writeLastBuild(cwd, { ts: 1, commit_sha: null, snapshot_sha256: "feedface" });
     writeSnapshotFile(cwd, "feedface", JSON.stringify({ nodes: [], links: [] }));
-    expect(loadCurrentSnapshot(cwd)).not.toBeNull();
+    expect(loadCurrentSnapshot(cwd)).toEqual({ nodes: [], links: [] });
   });

As per path instructions, tests/**: Prefer asserting on specific values (paths, messages) over generic substrings.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/shared/graph/load-current.test.ts` around lines 54 - 58, The test
assertion in the "falls back to snapshot_sha256 when there is no commit context"
test is too broad. Instead of using
expect(loadCurrentSnapshot(cwd)).not.toBeNull(), replace it with a specific
assertion that validates the returned snapshot object contains the expected
concrete values for the nodes and links properties that were written to the
snapshot file using writeSnapshotFile. This ensures the function correctly loads
and returns the actual snapshot payload rather than just checking it exists.

Source: Path instructions

tests/claude-code/cli-docs.test.ts (1)

87-223: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Tighten assertions to exact expected messages/values instead of generic substrings.

A large part of this suite validates only partial output via regex substring matches. Switching key checks to exact strings (or structured fields) will make these CLI contract tests significantly more robust.

As per path instructions, tests/**: Prefer asserting on specific values (paths, messages) over generic substrings.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/claude-code/cli-docs.test.ts` around lines 87 - 223, Replace generic
regex substring matching assertions with exact string or structured value checks
throughout the test suite. Instead of using toMatch() with broad regex patterns
like /hivemind docs/ or /Not logged in/, use toContain() for exact string
matches or direct equality checks for specific values. Apply this to key test
cases such as "prints usage with no subcommand", "exits 2 when not logged in",
"creates v1 via the real store", "prints (no doc) when nothing matches", "lists
rows", "archives an existing doc", and other test assertions that currently rely
on partial pattern matching to make the CLI contract tests more precise and
robust.
Source: Path instructions

tests/shared/docs.test.ts (1)

129-159: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use exact error-message assertions for failure paths.

Several rejection tests currently assert generic substrings (or just toThrow()), which weakens contract checks. Prefer exact message assertions for these cases.

✅ Example tightening

-    ).rejects.toThrow(/must not be empty/);
+    ).rejects.toThrow("Doc content must not be empty");
@@
-    ).rejects.toThrow(/doc_id must not be empty/);
+    ).rejects.toThrow("Doc doc_id must not be empty");
@@
-    ).rejects.toThrow(/Doc not found: missing.ts/);
+    ).rejects.toThrow("Doc not found: missing.ts");
@@
-    await expect(
-      insertDoc(query, `x"; DROP TABLE y; --`, { doc_id: "a.ts", path: "/p", content: "x" }),
-    ).rejects.toThrow();
+    await expect(
+      insertDoc(query, `x"; DROP TABLE y; --`, { doc_id: "a.ts", path: "/p", content: "x" }),
+    ).rejects.toThrow(`Invalid SQL identifier: "x\\"; DROP TABLE y; --"`);

As per path instructions, tests/**: “Prefer asserting on specific values (paths, messages) over generic substrings.”

Also applies to: 212-227, 290-293

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/shared/docs.test.ts` around lines 129 - 159, The rejection tests in
this section use generic error assertions that are too permissive. Replace the
regex substring matches in the tests "rejects empty content", "rejects empty
doc_id", and "rejects content longer than" with exact error message strings
instead of regex patterns like /must not be empty/ and /exceeds 50000 chars/.
For the "rejects SQL-identifier injection in the table name" test, replace the
generic toThrow() with a specific error message assertion to ensure the exact
failure mode is being tested. This tightens the contract checks to verify
precise error messages rather than loose substring matches.

Source: Path instructions

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/commands/docs.ts`:
- Around line 195-196: The resolveContent() function can throw exceptions when
provided with invalid file paths or unreadable stdin, but it is currently
executed before the try/catch block at line 195, allowing uncaught exceptions to
terminate the command abruptly. Move the resolveContent() function call (and any
dependent code that uses its result) inside the existing try/catch error handler
to ensure failures are caught and handled as controlled CLI errors instead of
uncaught exceptions.

In `@src/docs/gate.ts`:
- Around line 79-87: The countChangedLines function call on line 80 performs an
expensive O(n*m) LCS diff operation before validating the content size
constraints, which means very large inputs consume significant CPU even though
they will be rejected by the GATE_MAX_CONTENT_LENGTH check. Move the content
validation checks (the empty content check and the GATE_MAX_CONTENT_LENGTH
check) to execute before the countChangedLines call so that oversized proposals
are rejected before the quadratic diff algorithm runs. This applies to the
countChangedLines call and the subsequent size validation checks in the reasons
array population section.

In `@src/docs/refresh.ts`:
- Around line 147-153: The code invokes args.generate() without first checking
if the document should be rejected as a slow-tier doc via gateDocEdit, which
unnecessarily leaks protected content to the LLM and incurs avoidable costs.
Move the gateDocEdit check before the args.generate() call so that slow-tier
documents are rejected without ever being sent to the generate function,
preventing unnecessary LLM invocations and content exposure.

In `@src/docs/write.ts`:
- Around line 56-57: The SetDocInput.project field is currently being ignored
during version bumps when a document already exists. In the setDoc function, the
project value from SetDocInput is dropped and appendVersion always uses the
previous project value instead of the new one provided. To fix this, propagate
the project value from SetDocInput through the setDoc function and pass it as a
parameter to appendVersion so it uses the new project value (when provided)
instead of always defaulting to previous.project. This will enable proper
project reassignment and prevent stale metadata from affecting listDocs
filtering.

In `@src/graph/render/impact.ts`:
- Around line 24-27: The reverse traversal logic for collecting impacted nodes
is currently processing all relations but should be restricted to dependency
relations only as documented in the comments. Update the traversal in the
impactedNodes collection section (around lines 41-45) to filter and process only
the dependency-related edges: calls, imports, extends, implements, and
method_of. Remove or skip any other relation types that are not part of the
dependency graph to prevent over-expansion of the closure and avoid marking
unrelated documents as impacted.

In `@tests/shared/docs-refresh.test.ts`:
- Line 81: The test assertions in this file use `.join().toMatch()` with regex
patterns to broadly match substring content in the reasons array, which can hide
incorrect reason values. Replace these broad substring checks with direct
assertions on the specific reason values. Instead of joining and using regex
matching on r.reasons, assert the exact expected reason value directly using
methods like toContain() or by checking specific array indices with toBe().
Apply this fix to all occurrences at lines 81, 87, 92, 97, 103, 180, and 214.

---

Nitpick comments:
In `@tests/claude-code/cli-docs.test.ts`:
- Around line 87-223: Replace generic regex substring matching assertions with
exact string or structured value checks throughout the test suite. Instead of
using toMatch() with broad regex patterns like /hivemind docs/ or /Not logged
in/, use toContain() for exact string matches or direct equality checks for
specific values. Apply this to key test cases such as "prints usage with no
subcommand", "exits 2 when not logged in", "creates v1 via the real store",
"prints (no doc) when nothing matches", "lists rows", "archives an existing
doc", and other test assertions that currently rely on partial pattern matching
to make the CLI contract tests more precise and robust.

In `@tests/shared/docs-impact.test.ts`:
- Around line 133-138: The test for the "changed when the code differs from the
stored hash" case is only asserting on the state property of the anchorStatus
result, but it should verify the complete changed payload including the from and
to values. Replace the single assertion on st.state with a comprehensive
assertion that checks the full object structure returned by anchorStatus to
catch potential regressions in the from and to fields.

In `@tests/shared/docs.test.ts`:
- Around line 129-159: The rejection tests in this section use generic error
assertions that are too permissive. Replace the regex substring matches in the
tests "rejects empty content", "rejects empty doc_id", and "rejects content
longer than" with exact error message strings instead of regex patterns like
/must not be empty/ and /exceeds 50000 chars/. For the "rejects SQL-identifier
injection in the table name" test, replace the generic toThrow() with a specific
error message assertion to ensure the exact failure mode is being tested. This
tightens the contract checks to verify precise error messages rather than loose
substring matches.

In `@tests/shared/graph/load-current.test.ts`:
- Around line 54-58: The test assertion in the "falls back to snapshot_sha256
when there is no commit context" test is too broad. Instead of using
expect(loadCurrentSnapshot(cwd)).not.toBeNull(), replace it with a specific
assertion that validates the returned snapshot object contains the expected
concrete values for the nodes and links properties that were written to the
snapshot file using writeSnapshotFile. This ensures the function correctly loads
and returns the actual snapshot payload rather than just checking it exists.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 6517aef1-95e4-4670-9211-446883525e7a

📥 Commits

Reviewing files that changed from the base of the PR and between 9f7077a and 1de043e.

📒 Files selected for processing (31)

src/cli/index.ts
src/commands/docs.ts
src/commands/graph.ts
src/config.ts
src/deeplake-api.ts
src/deeplake-schema.ts
src/docs/anchors.ts
src/docs/auto-refresh-trigger.ts
src/docs/gate.ts
src/docs/impact.ts
src/docs/index.ts
src/docs/read.ts
src/docs/refresh-llm.ts
src/docs/refresh.ts
src/docs/write.ts
src/graph/load-current.ts
src/graph/render/impact.ts
tests/claude-code/cli-docs.test.ts
tests/claude-code/flush-memory-wiring.test.ts
tests/claude-code/flush-memory.test.ts
tests/claude-code/skillify-auto-pull.test.ts
tests/claude-code/spawn-wiki-worker.test.ts
tests/shared/deeplake-api.test.ts
tests/shared/docs-auto-refresh-trigger.test.ts
tests/shared/docs-impact.test.ts
tests/shared/docs-refresh-llm.test.ts
tests/shared/docs-refresh.test.ts
tests/shared/docs.test.ts
tests/shared/graph/deeplake-pull.test.ts
tests/shared/graph/deeplake-push.test.ts
tests/shared/graph/load-current.test.ts

coderabbitai · 2026-06-23T17:10:19Z

+    const content = resolveContent(positional[1], args);
+    const path = flagValue(args, "--path") ?? defaultVfsPath(project, docId);


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Handle content-read failures inside the guarded error path.

resolveContent() can throw (bad --file path, unreadable stdin), but it executes before the try/catch, so the command can terminate with an uncaught exception instead of a controlled CLI error.

Proposed fix

- const content = resolveContent(positional[1], args); + let content = ""; const path = flagValue(args, "--path") ?? defaultVfsPath(project, docId); const tier = parseTier(args); @@ - try { + try { + content = resolveContent(positional[1], args); const out = await setDoc(query, tableName, { doc_id: docId, path, content, anchors, tier, project, agent: cfg.userName, plugin_version: pluginVersion, }); console.log(`Set doc ${out.doc_id} → v${out.version}.`); } catch (err) { console.error(`Set failed: ${(err as Error).message}`); process.exit(1); }

Also applies to: 227-242

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/commands/docs.ts` around lines 195 - 196, The resolveContent() function can throw exceptions when provided with invalid file paths or unreadable stdin, but it is currently executed before the try/catch block at line 195, allowing uncaught exceptions to terminate the command abruptly. Move the resolveContent() function call (and any dependent code that uses its result) inside the existing try/catch error handler to ensure failures are caught and handled as controlled CLI errors instead of uncaught exceptions.

coderabbitai · 2026-06-23T17:10:19Z

+  const reasons: string[] = [];
+  const changedLines = countChangedLines(input.prevContent, input.newContent);
+
+  if (input.newContent.length === 0) {
+    reasons.push("proposed content is empty");
+  }
+  if (input.newContent.length > GATE_MAX_CONTENT_LENGTH) {
+    reasons.push(`proposed content exceeds ${GATE_MAX_CONTENT_LENGTH} chars (got ${input.newContent.length})`);
+  }


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Short-circuit oversized proposals before running the quadratic diff.

Line 80 computes countChangedLines(...) before Line 85 enforces GATE_MAX_CONTENT_LENGTH. Since the LCS diff is O(n*m), very large model output can consume substantial CPU/time even when it is guaranteed to be rejected by the size cap.

⚙️ Proposed fix

export function gateDocEdit(input: GateInput): GateResult { const reasons: string[] = []; - const changedLines = countChangedLines(input.prevContent, input.newContent); + let changedLines = 0; if (input.newContent.length === 0) { reasons.push("proposed content is empty"); } if (input.newContent.length > GATE_MAX_CONTENT_LENGTH) { reasons.push(`proposed content exceeds ${GATE_MAX_CONTENT_LENGTH} chars (got ${input.newContent.length})`); } if (input.tier === "slow") { reasons.push("slow-tier docs are human-curated; automatic refresh is not allowed"); } @@ - const budget = input.maxChangedLines ?? DEFAULT_MAX_CHANGED_LINES; - if (changedLines > budget) { - reasons.push(`edit exceeds the bounded-change budget: ${changedLines} > ${budget} lines`); + if (input.newContent.length <= GATE_MAX_CONTENT_LENGTH) { + changedLines = countChangedLines(input.prevContent, input.newContent); + const budget = input.maxChangedLines ?? DEFAULT_MAX_CHANGED_LINES; + if (changedLines > budget) { + reasons.push(`edit exceeds the bounded-change budget: ${changedLines} > ${budget} lines`); + } } return { ok: reasons.length === 0, reasons, changedLines }; }

Also applies to: 99-102

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/docs/gate.ts` around lines 79 - 87, The countChangedLines function call on line 80 performs an expensive O(n*m) LCS diff operation before validating the content size constraints, which means very large inputs consume significant CPU even though they will be rejected by the GATE_MAX_CONTENT_LENGTH check. Move the content validation checks (the empty content check and the GATE_MAX_CONTENT_LENGTH check) to execute before the countChangedLines call so that oversized proposals are rejected before the quadratic diff algorithm runs. This applies to the countChangedLines call and the subsequent size validation checks in the reasons array population section.

coderabbitai · 2026-06-23T17:10:19Z

+    let newContent: string;
+    try {
+      newContent = await args.generate({ doc, reasons: imp.reasons, changedSymbols });
+    } catch (err) {
+      outcomes.push({ doc_id: imp.doc_id, status: "skipped", reasons: [`generate failed: ${(err as Error).message}`] });
+      continue;
+    }


🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Do not invoke the LLM for slow-tier docs.

gateDocEdit rejects slow-tier edits, but Line 149 still sends slow-tier content to generate(...) first. That leaks protected content to the model path and incurs unnecessary cost before guaranteed rejection.

🔒 Proposed fix

for (const imp of args.impacted) { const doc = args.docsById.get(imp.doc_id); if (!doc) { outcomes.push({ doc_id: imp.doc_id, status: "skipped", reasons: ["no current doc row"] }); continue; } + if (doc.tier === "slow") { + outcomes.push({ + doc_id: imp.doc_id, + status: "rejected", + reasons: ["slow-tier docs are human-curated; automatic refresh is not allowed"], + }); + continue; + } const newAnchors = reanchor(doc, nodeById, args.repoRoot); const changedSymbols = gatherChangedSymbols(imp.reasons, nodeById, args.repoRoot);

Also applies to: 155-166

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/docs/refresh.ts` around lines 147 - 153, The code invokes args.generate() without first checking if the document should be rejected as a slow-tier doc via gateDocEdit, which unnecessarily leaks protected content to the LLM and incurs avoidable costs. Move the gateDocEdit check before the args.generate() call so that slow-tier documents are rejected without ever being sent to the generate function, preventing unnecessary LLM invocations and content exposure.

coderabbitai · 2026-06-23T17:10:19Z

+  /** Project key the doc belongs to. */
+  project?: string;


🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Propagate project during setDoc version bumps.

SetDocInput.project is currently ignored when the doc already exists: setDoc drops it and appendVersion always persists previous.project. This makes project reassignment impossible and leaves listDocs(..., { project }) filtering against stale metadata.

🔧 Proposed fix

export interface EditDocInput { /** Stable doc_id (the source file path). */ doc_id: string; @@ /** New VFS path. Omit to keep the previous path. */ path?: string; + /** New project key. Omit to keep the previous project. */ + project?: string; agent?: string; plugin_version?: string; } @@ return appendVersion(query, tableName, previous, { doc_id: input.doc_id, content: input.content, anchors: input.anchors, tier: input.tier, status: input.status, path: input.path, + project: input.project, agent: input.agent, plugin_version: input.plugin_version, }); } @@ const status = next.status ?? (previous.status as "active" | "archived"); const path = next.path ?? previous.path; + const project = next.project ?? previous.project; @@ - `'${sqlStr(previous.project)}', ` + + `'${sqlStr(project)}', ` +

Also applies to: 194-203, 225-255

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/docs/write.ts` around lines 56 - 57, The SetDocInput.project field is currently being ignored during version bumps when a document already exists. In the setDoc function, the project value from SetDocInput is dropped and appendVersion always uses the previous project value instead of the new one provided. To fix this, propagate the project value from SetDocInput through the setDoc function and pass it as a parameter to appendVersion so it uses the new project value (when provided) instead of always defaulting to previous.project. This will enable proper project reassignment and prevent stale metadata from affecting listDocs filtering.

coderabbitai · 2026-06-23T17:10:19Z

+ * `impactedNodes` (many seeds, for doc drift widening). Walks INCOMING
+ * `calls`/`imports`/`extends`/`implements`/`method_of` edges from the seeds
+ * to collect every transitive dependent. Deterministic: seeds and each level
+ * are processed in id order so `viaOf` is stable.


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Filter reverse traversal to dependency relations only.

Line 25 documents a dependency-only walk, but Lines 41-45 currently include every relation. That can over-expand the closure and mark unrelated docs as impacted.

🔧 Proposed fix

function reverseBfs( snap: GraphSnapshot, seeds: Iterable<string>, maxDepth = MAX_DEPTH, ): { depthOf: Map<string, number>; viaOf: Map<string, { rel: string; from: string }> } { + const DEP_RELATIONS = new Set<GraphEdge["relation"]>([ + "calls", + "imports", + "extends", + "implements", + "method_of", + ]); + // Reverse adjacency: target -> [edges pointing at it]. Only edges whose // SOURCE is a real node are kept, so a dependent is always a graph node. const nodeIds = new Set(snap.nodes.map((n) => n.id)); const incoming = new Map<string, GraphEdge[]>(); for (const e of snap.links) { if (!nodeIds.has(e.source)) continue; + if (!DEP_RELATIONS.has(e.relation)) continue; const list = incoming.get(e.target); if (list) list.push(e); else incoming.set(e.target, [e]); }

Also applies to: 41-45

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/graph/render/impact.ts` around lines 24 - 27, The reverse traversal logic for collecting impacted nodes is currently processing all relations but should be restricted to dependency relations only as documented in the comments. Update the traversal in the impactedNodes collection section (around lines 41-45) to filter and process only the dependency-related edges: calls, imports, extends, implements, and method_of. Remove or skip any other relation types that are not part of the dependency graph to prevent over-expansion of the closure and avoid marking unrelated documents as impacted.

coderabbitai · 2026-06-23T17:10:19Z

+  it("rejects empty content", () => {
+    const r = gateDocEdit({ tier: "fast", prevContent: "x", newContent: "", newAnchors: [], snap: s });
+    expect(r.ok).toBe(false);
+    expect(r.reasons.join()).toMatch(/empty/);


📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Use exact reason assertions instead of join()+regex checks.

These assertions are currently broad substring matches, which can hide wrong reason text. Assert specific reason values directly.

✅ Suggested assertion style

- expect(r.reasons.join()).toMatch(/empty/); + expect(r.reasons).toContain("proposed content is empty"); - expect(r.reasons.join()).toMatch(/slow-tier/); + expect(r.reasons).toContain("slow-tier docs are human-curated; automatic refresh is not allowed"); - expect(report.outcomes[0].reasons?.join()).toMatch(/LLM down/); + expect(report.outcomes[0].reasons).toContain("generate failed: LLM down");

As per path instructions, tests/**: Prefer asserting on specific values (paths, messages) over generic substrings.

Also applies to: 87-87, 92-92, 97-97, 103-103, 180-180, 214-214

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/shared/docs-refresh.test.ts` at line 81, The test assertions in this file use `.join().toMatch()` with regex patterns to broadly match substring content in the reasons array, which can hide incorrect reason values. Replace these broad substring checks with direct assertions on the specific reason values. Instead of joining and using regex matching on r.reasons, assert the exact expected reason value directly using methods like toContain() or by checking specific array indices with toBe(). Apply this fix to all occurrences at lines 81, 87, 92, 97, 103, 180, and 214.

Source: Path instructions

efenocchi added 11 commits June 22, 2026 10:28

feat: add hivemind_docs schema and config

da6a0ed

feat: add insert-only hivemind_docs store

30a87e0

fix: add idempotent setDoc to prevent doc history fork

10443fd

feat: add ensureDocsTable and hivemind docs command

0ce3c1b

refactor: extract reusable impactedNodes from renderImpact

2f9518e

feat: add doc drift detection via anchors and blast-radius

266bb36

feat: add doc-edit gate and refresh engine

7e2a1c8

feat: add loadCurrentSnapshot for non-vfs callers

68a141f

feat: add docs refresh command and anchor authoring

71c8b0d

feat: trigger doc refresh from graph build opt-in

2ddbcd6

test: cover docs refresh engine, cli, and snapshot loader

7822f38

test: cover ensureDocsTable create and heal paths

eb92ff8

coderabbitai Bot requested a review from khustup2 June 23, 2026 16:55

fix: keep refresh dry-run read-only, unwrap model fences, boolean fla…

1de043e

…g parsing

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: knowledge base doc-freshness layer (drift detection + refresh)#285

feat: knowledge base doc-freshness layer (drift detection + refresh)#285
efenocchi wants to merge 13 commits into
mainfrom
feat/knowledge-base-docs

efenocchi commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

github-actions Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Uh oh!

coderabbitai Bot Jun 23, 2026

Uh oh!

coderabbitai Bot Jun 23, 2026

Uh oh!

coderabbitai Bot Jun 23, 2026

Uh oh!

coderabbitai Bot Jun 23, 2026

Uh oh!

coderabbitai Bot Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		const content = resolveContent(positional[1], args);
		const path = flagValue(args, "--path") ?? defaultVfsPath(project, docId);

Conversation

efenocchi commented Jun 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How it works

Steps in this PR

Testing

Not in scope

Summary by CodeRabbit

Release Notes

Uh oh!

github-actions Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

efenocchi commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented Jun 23, 2026 •

edited

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading