Skip to content

feat: knowledge base doc-freshness layer (drift detection + refresh)#285

Open
efenocchi wants to merge 13 commits into
mainfrom
feat/knowledge-base-docs
Open

feat: knowledge base doc-freshness layer (drift detection + refresh)#285
efenocchi wants to merge 13 commits into
mainfrom
feat/knowledge-base-docs

Conversation

@efenocchi

@efenocchi efenocchi commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

What

Adds a Knowledge Base / per-file documentation layer that stays fresh on code deltas — the third layer alongside session summaries and the AST code graph. For the Krisp use case: internal docs that go stale because code changes many times a day.

Honest framing (per the team's own oracle eval): this is justified as doc freshness / onboarding trust / token savings, NOT QA-answer accuracy.

How it works

A doc anchors each described symbol to a content_hash of its source slice. On a commit, the graph diff + blast-radius selects only the docs whose anchored code (or its callers) changed; a bounded LLM edit regenerates them through an objective gate; the new version is written INSERT-only (version-bumped, immutable created_at).

code change -> graph diff + blast-radius -> impacted docs
            -> host LLM bounded rewrite -> gate -> setDoc (version+1)

Steps in this PR

  • 1 - schema + config: hivemind_docs table (INSERT-only, version-bumped) + docsTableName.
  • 2 - store: src/docs insert/edit/list/get + idempotent setDoc (no history fork).
  • 3 - drift detector: anchors.ts (source-slice hash) + impact.ts (direct hash drift + blast-radius widening over the graph).
  • 5 - refresh engine: gate.ts (objective invariants: bounded edit, anchors exist, slow-tier protected) + refresh.ts (re-anchor -> generate -> gate -> write) + refresh-llm.ts (host claude -p, reuses the wiki-worker seam) + hivemind docs CLI (set/show/list/archive/refresh, --anchor, --dry-run).
  • 8 - trigger: opt-in HIVEMIND_DOCS_AUTO_REFRESH=1 fires a detached refresh at the end of graph build.

Testing

  • ~105 unit tests; new docs files >=85% line coverage (gate/write/read/refresh-llm 100%).
  • End-to-end through the real hivemind binary against a test Deeplake table: real code change -> drift detected -> claude rewrote the doc to match -> gate passed -> v2 written, created_at preserved.

Not in scope

No VFS routing for the docs memory subtree yet (CLI/worker only); no spec-layer correlation; per-agent doc-worker variants (codex/cursor/hermes/pi) follow the existing wiki-worker fork pattern.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added hivemind docs CLI command with subcommands: set, show, list, archive, and refresh
    • Documentation can be anchored to code symbols for automatic staleness tracking
    • Automatic documentation refresh when code changes (opt-in via HIVEMIND_DOCS_AUTO_REFRESH)
    • LLM-assisted documentation updates with change-limit validation
  • Chores

    • Updated configuration schema to support documentation table settings

@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Coverage Report

Scope: files changed in this PR. Enforced threshold: 90% per metric (per file via vitest.config.ts).

Status Category Percentage Covered / Total
🔴 Lines 88.26% (🎯 90%) 1263 / 1431
🔴 Statements 88.01% (🎯 90%) 1432 / 1627
🟢 Functions 98.24% (🎯 90%) 167 / 170
🔴 Branches 80.70% (🎯 90%) 786 / 974
File Coverage — 17 files changed
File Stmts Branches Functions Lines
src/cli/index.ts 🔴 86.1% 🔴 84.6% 🟢 100.0% 🔴 83.7%
src/commands/docs.ts 🔴 82.8% 🔴 71.9% 🟢 93.8% 🔴 85.1%
src/commands/graph.ts 🔴 74.3% 🔴 62.9% 🟢 95.2% 🔴 73.9%
src/config.ts 🟢 100.0% 🟢 100.0% 🟢 100.0% 🟢 100.0%
src/deeplake-api.ts 🟢 99.0% 🔴 88.6% 🟢 100.0% 🟢 99.6%
src/deeplake-schema.ts 🟢 96.0% 🔴 84.8% 🟢 100.0% 🟢 95.2%
src/docs/anchors.ts 🟢 94.3% 🔴 89.3% 🟢 100.0% 🟢 100.0%
src/docs/auto-refresh-trigger.ts 🟢 100.0% 🔴 81.8% 🟢 100.0% 🟢 100.0%
src/docs/gate.ts 🟢 100.0% 🟢 100.0% 🟢 100.0% 🟢 100.0%
src/docs/impact.ts 🔴 87.3% 🔴 81.8% 🟢 100.0% 🟢 91.1%
src/docs/index.ts 🟢 100.0% 🟢 100.0% 🟢 100.0% 🟢 100.0%
src/docs/read.ts 🟢 100.0% 🔴 86.2% 🟢 100.0% 🟢 100.0%
src/docs/refresh-llm.ts 🟢 100.0% 🔴 85.7% 🟢 100.0% 🟢 100.0%
src/docs/refresh.ts 🟢 94.0% 🔴 75.0% 🟢 100.0% 🟢 100.0%
src/docs/write.ts 🟢 100.0% 🟢 100.0% 🟢 100.0% 🟢 100.0%
src/graph/load-current.ts 🟢 94.1% 🟢 100.0% 🟢 100.0% 🟢 92.8%
src/graph/render/impact.ts 🟢 92.2% 🔴 70.3% 🔴 87.5% 🟢 98.7%

Generated for commit 9fc89ec.

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a complete hivemind docs subsystem: a versioned INSERT-only Deeplake docs table (schema, config, read/write helpers), source-anchor tracking via SHA-256 hashing of symbol line slices, drift/impact detection using direct hash comparison and reverse-BFS blast-radius widening, an LLM refresh pipeline with an LCS-based edit gate, a maybeSpawnDocsRefresh post-build trigger, and a five-subcommand CLI (set, show, list, archive, refresh).

Changes

hivemind docs subsystem

Layer / File(s) Summary
Schema, config, and DB provisioning
src/deeplake-schema.ts, src/config.ts, src/deeplake-api.ts, tests/shared/deeplake-api.test.ts, tests/claude-code/flush-memory*.test.ts, tests/claude-code/skillify-auto-pull.test.ts, tests/claude-code/spawn-wiki-worker.test.ts, tests/shared/graph/deeplake-p*.test.ts
Adds DOCS_COLUMNS schema constant with module-load validation, docsTableName to Config/loadConfig (env HIVEMIND_DOCS_TABLE), and DeeplakeApi.ensureDocsTable that creates/heals the docs table and ensures the (doc_id, version) index. Updates test fixtures with docsTableName.
Docs read/write data layer and barrel
src/docs/read.ts, src/docs/write.ts, src/docs/index.ts, tests/shared/docs.test.ts
Adds typed INSERT-only versioned write helpers (insertDoc, editDoc, setDoc, archiveDoc, appendVersion) and read helpers (listDocs, getDocLatest, parseAnchors, normalize). Barrel index.ts re-exports all public API. Tests cover SQL shape, escaping, version immutability, deduplication, and parseAnchors edge cases.
Anchor tracking and graph snapshot loading
src/docs/anchors.ts, src/graph/load-current.ts, src/graph/render/impact.ts, tests/shared/docs-impact.test.ts, tests/shared/graph/load-current.test.ts
Adds parseSourceLocation, line-slice reading, SHA-256 hashing, buildAnchor, and anchorStatus. Adds loadCurrentSnapshot (null-safe local snapshot loader). Refactors graph/render/impact.ts into a shared reverseBfs helper exposing new impactedNodes export. Tests cover all anchor status states and snapshot loading scenarios.
Doc drift detection and impact computation
src/docs/impact.ts, tests/shared/docs-impact.test.ts
Adds computeStaleDocs (direct anchor hash comparison), widenByBlastRadius (caller-closure relational pass via impactedNodes), and computeImpactedDocs orchestrator unioning both passes over an optional SnapshotDiff. Tests cover all staleness paths and relational widening.
Edit gate
src/docs/gate.ts, tests/shared/docs-refresh.test.ts
Adds countChangedLines (LCS-based rolling DP), gateDocEdit that rejects empty/overlong content, slow-tier docs, missing-symbol anchors, and over-budget edits. Tests cover all rejection paths and the acceptance path.
LLM refresh orchestration, prompt building, and auto-refresh trigger
src/docs/refresh.ts, src/docs/refresh-llm.ts, src/docs/auto-refresh-trigger.ts, tests/shared/docs-refresh.test.ts, tests/shared/docs-refresh-llm.test.ts, tests/shared/docs-auto-refresh-trigger.test.ts
Adds refreshDocs orchestrator (reanchor, gather changed symbols, generate, gate, persist), buildRefreshPrompt, makeClaudeGenerate (spawns claude CLI via execFileSync), and maybeSpawnDocsRefresh (opt-in detached trigger gated by HIVEMIND_DOCS_AUTO_REFRESH=1). Tests cover prompt content, LLM wiring, all orchestrator outcome paths, and spawn gating.
CLI command routing and graph build hook
src/cli/index.ts, src/commands/docs.ts, src/commands/graph.ts, tests/claude-code/cli-docs.test.ts
Adds runDocsCommand with set/show/list/archive/refresh subcommands (flag parsing, content resolution, anchor building from --anchor). Wires it into cli/index.ts. Adds maybeSpawnDocsRefresh as Step 8 in graph build after snapshot write. Tests cover dispatch, all subcommands, SQL patterns, temp-file integration, and gate rejection.

Sequence Diagrams

sequenceDiagram
  actor User
  participant CLI as hivemind docs refresh
  participant loadCurrentSnapshot
  participant listDocs
  participant computeImpactedDocs
  participant ensureDocsTable
  participant refreshDocs
  participant makeClaudeGenerate as Claude CLI
  participant gateDocEdit
  participant setDoc as setDoc (Deeplake)

  User->>CLI: docs refresh [--dry-run]
  CLI->>loadCurrentSnapshot: load local graph snapshot
  CLI->>listDocs: fetch active docs
  CLI->>computeImpactedDocs: detect stale docs via anchors + blast radius
  alt --dry-run
    CLI-->>User: print impacted docs, no writes
  else live run
    CLI->>ensureDocsTable: create/heal docs table
    CLI->>refreshDocs: iterate impacted docs
    refreshDocs->>makeClaudeGenerate: buildRefreshPrompt + execFileSync(claude)
    makeClaudeGenerate-->>refreshDocs: proposed markdown
    refreshDocs->>gateDocEdit: validate (tier, size, anchors)
    alt gate pass
      refreshDocs->>setDoc: persist new version
      setDoc-->>refreshDocs: WriteResult
    else gate fail
      refreshDocs-->>refreshDocs: record rejected
    end
    refreshDocs-->>CLI: RefreshReport
    CLI-->>User: print refreshed/rejected/skipped counts
  end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • activeloopai/hivemind#192: Modifies src/commands/graph.ts around the snapshot write and cloud push flow — the exact location where this PR inserts the maybeSpawnDocsRefresh post-build hook.
  • activeloopai/hivemind#194: Modifies src/cli/index.ts main() to add a new top-level subcommand branch, using the same dispatch pattern this PR replicates for docs/doc.

Poem

🐇 Hop hop, the docs are alive!
Each symbol hashed, each anchor set just right,
A slow tier? Gate says: not tonight!
The Claude CLI spawns in the night,
✨ Versions stack — insert, never UPDATE —
And the rabbit refreshes drift at dawn's first light! 🌅

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.76% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main feature: a documentation layer that maintains freshness through drift detection and refresh mechanisms, which aligns perfectly with the changeset's primary objectives.
Description check ✅ Passed The PR description comprehensively covers the feature objectives, implementation strategy, architecture, testing approach, and scoping boundaries, fulfilling the template's requirements for summary and test plan validation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/knowledge-base-docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot requested a review from khustup2 June 23, 2026 16:55

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (4)
tests/shared/docs-impact.test.ts (1)

133-138: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Assert the full changed payload, not just state.

This test currently verifies only the discriminator and misses regressions in from/to values.

✅ Proposed assertion tightening
   it("changed when the code differs from the stored hash", () => {
     const n = node("a.ts:foo:function", "a.ts", "L1-L2");
     const stale: DocAnchor = { symbol_id: n.id, content_hash: "deadbeef" };
     const st = anchorStatus(stale, snap([n]), dir);
-    expect(st.state).toBe("changed");
+    expect(st).toEqual({
+      state: "changed",
+      from: "deadbeef",
+      to: sha("function foo() {\n  return 1;"),
+    });
   });

As per path instructions, tests/**: Prefer asserting on specific values (paths, messages) over generic substrings.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/shared/docs-impact.test.ts` around lines 133 - 138, The test for the
"changed when the code differs from the stored hash" case is only asserting on
the state property of the anchorStatus result, but it should verify the complete
changed payload including the from and to values. Replace the single assertion
on st.state with a comprehensive assertion that checks the full object structure
returned by anchorStatus to catch potential regressions in the from and to
fields.

Source: Path instructions

tests/shared/graph/load-current.test.ts (1)

54-58: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Make the fallback test assert concrete snapshot content.

not.toBeNull() is broad here; assert the expected nodes/links payload directly.

✅ Proposed assertion tightening
   it("falls back to snapshot_sha256 when there is no commit context", () => {
     writeLastBuild(cwd, { ts: 1, commit_sha: null, snapshot_sha256: "feedface" });
     writeSnapshotFile(cwd, "feedface", JSON.stringify({ nodes: [], links: [] }));
-    expect(loadCurrentSnapshot(cwd)).not.toBeNull();
+    expect(loadCurrentSnapshot(cwd)).toEqual({ nodes: [], links: [] });
   });

As per path instructions, tests/**: Prefer asserting on specific values (paths, messages) over generic substrings.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/shared/graph/load-current.test.ts` around lines 54 - 58, The test
assertion in the "falls back to snapshot_sha256 when there is no commit context"
test is too broad. Instead of using
expect(loadCurrentSnapshot(cwd)).not.toBeNull(), replace it with a specific
assertion that validates the returned snapshot object contains the expected
concrete values for the nodes and links properties that were written to the
snapshot file using writeSnapshotFile. This ensures the function correctly loads
and returns the actual snapshot payload rather than just checking it exists.

Source: Path instructions

tests/claude-code/cli-docs.test.ts (1)

87-223: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Tighten assertions to exact expected messages/values instead of generic substrings.

A large part of this suite validates only partial output via regex substring matches. Switching key checks to exact strings (or structured fields) will make these CLI contract tests significantly more robust.

As per path instructions, tests/**: Prefer asserting on specific values (paths, messages) over generic substrings.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/claude-code/cli-docs.test.ts` around lines 87 - 223, Replace generic
regex substring matching assertions with exact string or structured value checks
throughout the test suite. Instead of using toMatch() with broad regex patterns
like /hivemind docs/ or /Not logged in/, use toContain() for exact string
matches or direct equality checks for specific values. Apply this to key test
cases such as "prints usage with no subcommand", "exits 2 when not logged in",
"creates v1 via the real store", "prints (no doc) when nothing matches", "lists
rows", "archives an existing doc", and other test assertions that currently rely
on partial pattern matching to make the CLI contract tests more precise and
robust.

Source: Path instructions

tests/shared/docs.test.ts (1)

129-159: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use exact error-message assertions for failure paths.

Several rejection tests currently assert generic substrings (or just toThrow()), which weakens contract checks. Prefer exact message assertions for these cases.

✅ Example tightening
-    ).rejects.toThrow(/must not be empty/);
+    ).rejects.toThrow("Doc content must not be empty");
@@
-    ).rejects.toThrow(/doc_id must not be empty/);
+    ).rejects.toThrow("Doc doc_id must not be empty");
@@
-    ).rejects.toThrow(/Doc not found: missing.ts/);
+    ).rejects.toThrow("Doc not found: missing.ts");
@@
-    await expect(
-      insertDoc(query, `x"; DROP TABLE y; --`, { doc_id: "a.ts", path: "/p", content: "x" }),
-    ).rejects.toThrow();
+    await expect(
+      insertDoc(query, `x"; DROP TABLE y; --`, { doc_id: "a.ts", path: "/p", content: "x" }),
+    ).rejects.toThrow(`Invalid SQL identifier: "x\\"; DROP TABLE y; --"`);

As per path instructions, tests/**: “Prefer asserting on specific values (paths, messages) over generic substrings.”

Also applies to: 212-227, 290-293

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/shared/docs.test.ts` around lines 129 - 159, The rejection tests in
this section use generic error assertions that are too permissive. Replace the
regex substring matches in the tests "rejects empty content", "rejects empty
doc_id", and "rejects content longer than" with exact error message strings
instead of regex patterns like /must not be empty/ and /exceeds 50000 chars/.
For the "rejects SQL-identifier injection in the table name" test, replace the
generic toThrow() with a specific error message assertion to ensure the exact
failure mode is being tested. This tightens the contract checks to verify
precise error messages rather than loose substring matches.

Source: Path instructions

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/commands/docs.ts`:
- Around line 195-196: The resolveContent() function can throw exceptions when
provided with invalid file paths or unreadable stdin, but it is currently
executed before the try/catch block at line 195, allowing uncaught exceptions to
terminate the command abruptly. Move the resolveContent() function call (and any
dependent code that uses its result) inside the existing try/catch error handler
to ensure failures are caught and handled as controlled CLI errors instead of
uncaught exceptions.

In `@src/docs/gate.ts`:
- Around line 79-87: The countChangedLines function call on line 80 performs an
expensive O(n*m) LCS diff operation before validating the content size
constraints, which means very large inputs consume significant CPU even though
they will be rejected by the GATE_MAX_CONTENT_LENGTH check. Move the content
validation checks (the empty content check and the GATE_MAX_CONTENT_LENGTH
check) to execute before the countChangedLines call so that oversized proposals
are rejected before the quadratic diff algorithm runs. This applies to the
countChangedLines call and the subsequent size validation checks in the reasons
array population section.

In `@src/docs/refresh.ts`:
- Around line 147-153: The code invokes args.generate() without first checking
if the document should be rejected as a slow-tier doc via gateDocEdit, which
unnecessarily leaks protected content to the LLM and incurs avoidable costs.
Move the gateDocEdit check before the args.generate() call so that slow-tier
documents are rejected without ever being sent to the generate function,
preventing unnecessary LLM invocations and content exposure.

In `@src/docs/write.ts`:
- Around line 56-57: The SetDocInput.project field is currently being ignored
during version bumps when a document already exists. In the setDoc function, the
project value from SetDocInput is dropped and appendVersion always uses the
previous project value instead of the new one provided. To fix this, propagate
the project value from SetDocInput through the setDoc function and pass it as a
parameter to appendVersion so it uses the new project value (when provided)
instead of always defaulting to previous.project. This will enable proper
project reassignment and prevent stale metadata from affecting listDocs
filtering.

In `@src/graph/render/impact.ts`:
- Around line 24-27: The reverse traversal logic for collecting impacted nodes
is currently processing all relations but should be restricted to dependency
relations only as documented in the comments. Update the traversal in the
impactedNodes collection section (around lines 41-45) to filter and process only
the dependency-related edges: calls, imports, extends, implements, and
method_of. Remove or skip any other relation types that are not part of the
dependency graph to prevent over-expansion of the closure and avoid marking
unrelated documents as impacted.

In `@tests/shared/docs-refresh.test.ts`:
- Line 81: The test assertions in this file use `.join().toMatch()` with regex
patterns to broadly match substring content in the reasons array, which can hide
incorrect reason values. Replace these broad substring checks with direct
assertions on the specific reason values. Instead of joining and using regex
matching on r.reasons, assert the exact expected reason value directly using
methods like toContain() or by checking specific array indices with toBe().
Apply this fix to all occurrences at lines 81, 87, 92, 97, 103, 180, and 214.

---

Nitpick comments:
In `@tests/claude-code/cli-docs.test.ts`:
- Around line 87-223: Replace generic regex substring matching assertions with
exact string or structured value checks throughout the test suite. Instead of
using toMatch() with broad regex patterns like /hivemind docs/ or /Not logged
in/, use toContain() for exact string matches or direct equality checks for
specific values. Apply this to key test cases such as "prints usage with no
subcommand", "exits 2 when not logged in", "creates v1 via the real store",
"prints (no doc) when nothing matches", "lists rows", "archives an existing
doc", and other test assertions that currently rely on partial pattern matching
to make the CLI contract tests more precise and robust.

In `@tests/shared/docs-impact.test.ts`:
- Around line 133-138: The test for the "changed when the code differs from the
stored hash" case is only asserting on the state property of the anchorStatus
result, but it should verify the complete changed payload including the from and
to values. Replace the single assertion on st.state with a comprehensive
assertion that checks the full object structure returned by anchorStatus to
catch potential regressions in the from and to fields.

In `@tests/shared/docs.test.ts`:
- Around line 129-159: The rejection tests in this section use generic error
assertions that are too permissive. Replace the regex substring matches in the
tests "rejects empty content", "rejects empty doc_id", and "rejects content
longer than" with exact error message strings instead of regex patterns like
/must not be empty/ and /exceeds 50000 chars/. For the "rejects SQL-identifier
injection in the table name" test, replace the generic toThrow() with a specific
error message assertion to ensure the exact failure mode is being tested. This
tightens the contract checks to verify precise error messages rather than loose
substring matches.

In `@tests/shared/graph/load-current.test.ts`:
- Around line 54-58: The test assertion in the "falls back to snapshot_sha256
when there is no commit context" test is too broad. Instead of using
expect(loadCurrentSnapshot(cwd)).not.toBeNull(), replace it with a specific
assertion that validates the returned snapshot object contains the expected
concrete values for the nodes and links properties that were written to the
snapshot file using writeSnapshotFile. This ensures the function correctly loads
and returns the actual snapshot payload rather than just checking it exists.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 6517aef1-95e4-4670-9211-446883525e7a

📥 Commits

Reviewing files that changed from the base of the PR and between 9f7077a and 1de043e.

📒 Files selected for processing (31)
  • src/cli/index.ts
  • src/commands/docs.ts
  • src/commands/graph.ts
  • src/config.ts
  • src/deeplake-api.ts
  • src/deeplake-schema.ts
  • src/docs/anchors.ts
  • src/docs/auto-refresh-trigger.ts
  • src/docs/gate.ts
  • src/docs/impact.ts
  • src/docs/index.ts
  • src/docs/read.ts
  • src/docs/refresh-llm.ts
  • src/docs/refresh.ts
  • src/docs/write.ts
  • src/graph/load-current.ts
  • src/graph/render/impact.ts
  • tests/claude-code/cli-docs.test.ts
  • tests/claude-code/flush-memory-wiring.test.ts
  • tests/claude-code/flush-memory.test.ts
  • tests/claude-code/skillify-auto-pull.test.ts
  • tests/claude-code/spawn-wiki-worker.test.ts
  • tests/shared/deeplake-api.test.ts
  • tests/shared/docs-auto-refresh-trigger.test.ts
  • tests/shared/docs-impact.test.ts
  • tests/shared/docs-refresh-llm.test.ts
  • tests/shared/docs-refresh.test.ts
  • tests/shared/docs.test.ts
  • tests/shared/graph/deeplake-pull.test.ts
  • tests/shared/graph/deeplake-push.test.ts
  • tests/shared/graph/load-current.test.ts

Comment thread src/commands/docs.ts
Comment on lines +195 to +196
const content = resolveContent(positional[1], args);
const path = flagValue(args, "--path") ?? defaultVfsPath(project, docId);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Handle content-read failures inside the guarded error path.

resolveContent() can throw (bad --file path, unreadable stdin), but it executes before the try/catch, so the command can terminate with an uncaught exception instead of a controlled CLI error.

Proposed fix
-    const content = resolveContent(positional[1], args);
+    let content = "";
     const path = flagValue(args, "--path") ?? defaultVfsPath(project, docId);
     const tier = parseTier(args);
@@
-    try {
+    try {
+      content = resolveContent(positional[1], args);
       const out = await setDoc(query, tableName, {
         doc_id: docId,
         path,
         content,
         anchors,
         tier,
         project,
         agent: cfg.userName,
         plugin_version: pluginVersion,
       });
       console.log(`Set doc ${out.doc_id} → v${out.version}.`);
     } catch (err) {
       console.error(`Set failed: ${(err as Error).message}`);
       process.exit(1);
     }

Also applies to: 227-242

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/commands/docs.ts` around lines 195 - 196, The resolveContent() function
can throw exceptions when provided with invalid file paths or unreadable stdin,
but it is currently executed before the try/catch block at line 195, allowing
uncaught exceptions to terminate the command abruptly. Move the resolveContent()
function call (and any dependent code that uses its result) inside the existing
try/catch error handler to ensure failures are caught and handled as controlled
CLI errors instead of uncaught exceptions.

Comment thread src/docs/gate.ts
Comment on lines +79 to +87
const reasons: string[] = [];
const changedLines = countChangedLines(input.prevContent, input.newContent);

if (input.newContent.length === 0) {
reasons.push("proposed content is empty");
}
if (input.newContent.length > GATE_MAX_CONTENT_LENGTH) {
reasons.push(`proposed content exceeds ${GATE_MAX_CONTENT_LENGTH} chars (got ${input.newContent.length})`);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Short-circuit oversized proposals before running the quadratic diff.

Line 80 computes countChangedLines(...) before Line 85 enforces GATE_MAX_CONTENT_LENGTH. Since the LCS diff is O(n*m), very large model output can consume substantial CPU/time even when it is guaranteed to be rejected by the size cap.

⚙️ Proposed fix
 export function gateDocEdit(input: GateInput): GateResult {
   const reasons: string[] = [];
-  const changedLines = countChangedLines(input.prevContent, input.newContent);
+  let changedLines = 0;
 
   if (input.newContent.length === 0) {
     reasons.push("proposed content is empty");
   }
   if (input.newContent.length > GATE_MAX_CONTENT_LENGTH) {
     reasons.push(`proposed content exceeds ${GATE_MAX_CONTENT_LENGTH} chars (got ${input.newContent.length})`);
   }
   if (input.tier === "slow") {
     reasons.push("slow-tier docs are human-curated; automatic refresh is not allowed");
   }
@@
-  const budget = input.maxChangedLines ?? DEFAULT_MAX_CHANGED_LINES;
-  if (changedLines > budget) {
-    reasons.push(`edit exceeds the bounded-change budget: ${changedLines} > ${budget} lines`);
+  if (input.newContent.length <= GATE_MAX_CONTENT_LENGTH) {
+    changedLines = countChangedLines(input.prevContent, input.newContent);
+    const budget = input.maxChangedLines ?? DEFAULT_MAX_CHANGED_LINES;
+    if (changedLines > budget) {
+      reasons.push(`edit exceeds the bounded-change budget: ${changedLines} > ${budget} lines`);
+    }
   }
 
   return { ok: reasons.length === 0, reasons, changedLines };
 }

Also applies to: 99-102

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/docs/gate.ts` around lines 79 - 87, The countChangedLines function call
on line 80 performs an expensive O(n*m) LCS diff operation before validating the
content size constraints, which means very large inputs consume significant CPU
even though they will be rejected by the GATE_MAX_CONTENT_LENGTH check. Move the
content validation checks (the empty content check and the
GATE_MAX_CONTENT_LENGTH check) to execute before the countChangedLines call so
that oversized proposals are rejected before the quadratic diff algorithm runs.
This applies to the countChangedLines call and the subsequent size validation
checks in the reasons array population section.

Comment thread src/docs/refresh.ts
Comment on lines +147 to +153
let newContent: string;
try {
newContent = await args.generate({ doc, reasons: imp.reasons, changedSymbols });
} catch (err) {
outcomes.push({ doc_id: imp.doc_id, status: "skipped", reasons: [`generate failed: ${(err as Error).message}`] });
continue;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Do not invoke the LLM for slow-tier docs.

gateDocEdit rejects slow-tier edits, but Line 149 still sends slow-tier content to generate(...) first. That leaks protected content to the model path and incurs unnecessary cost before guaranteed rejection.

🔒 Proposed fix
   for (const imp of args.impacted) {
     const doc = args.docsById.get(imp.doc_id);
     if (!doc) {
       outcomes.push({ doc_id: imp.doc_id, status: "skipped", reasons: ["no current doc row"] });
       continue;
     }
+    if (doc.tier === "slow") {
+      outcomes.push({
+        doc_id: imp.doc_id,
+        status: "rejected",
+        reasons: ["slow-tier docs are human-curated; automatic refresh is not allowed"],
+      });
+      continue;
+    }
 
     const newAnchors = reanchor(doc, nodeById, args.repoRoot);
     const changedSymbols = gatherChangedSymbols(imp.reasons, nodeById, args.repoRoot);

Also applies to: 155-166

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/docs/refresh.ts` around lines 147 - 153, The code invokes args.generate()
without first checking if the document should be rejected as a slow-tier doc via
gateDocEdit, which unnecessarily leaks protected content to the LLM and incurs
avoidable costs. Move the gateDocEdit check before the args.generate() call so
that slow-tier documents are rejected without ever being sent to the generate
function, preventing unnecessary LLM invocations and content exposure.

Comment thread src/docs/write.ts
Comment on lines +56 to +57
/** Project key the doc belongs to. */
project?: string;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Propagate project during setDoc version bumps.

SetDocInput.project is currently ignored when the doc already exists: setDoc drops it and appendVersion always persists previous.project. This makes project reassignment impossible and leaves listDocs(..., { project }) filtering against stale metadata.

🔧 Proposed fix
 export interface EditDocInput {
   /** Stable doc_id (the source file path). */
   doc_id: string;
@@
   /** New VFS path. Omit to keep the previous path. */
   path?: string;
+  /** New project key. Omit to keep the previous project. */
+  project?: string;
   agent?: string;
   plugin_version?: string;
 }
@@
   return appendVersion(query, tableName, previous, {
     doc_id: input.doc_id,
     content: input.content,
     anchors: input.anchors,
     tier: input.tier,
     status: input.status,
     path: input.path,
+    project: input.project,
     agent: input.agent,
     plugin_version: input.plugin_version,
   });
 }
@@
   const status = next.status ?? (previous.status as "active" | "archived");
   const path = next.path ?? previous.path;
+  const project = next.project ?? previous.project;
@@
-    `'${sqlStr(previous.project)}', ` +
+    `'${sqlStr(project)}', ` +

Also applies to: 194-203, 225-255

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/docs/write.ts` around lines 56 - 57, The SetDocInput.project field is
currently being ignored during version bumps when a document already exists. In
the setDoc function, the project value from SetDocInput is dropped and
appendVersion always uses the previous project value instead of the new one
provided. To fix this, propagate the project value from SetDocInput through the
setDoc function and pass it as a parameter to appendVersion so it uses the new
project value (when provided) instead of always defaulting to previous.project.
This will enable proper project reassignment and prevent stale metadata from
affecting listDocs filtering.

Comment on lines +24 to +27
* `impactedNodes` (many seeds, for doc drift widening). Walks INCOMING
* `calls`/`imports`/`extends`/`implements`/`method_of` edges from the seeds
* to collect every transitive dependent. Deterministic: seeds and each level
* are processed in id order so `viaOf` is stable.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Filter reverse traversal to dependency relations only.

Line 25 documents a dependency-only walk, but Lines 41-45 currently include every relation. That can over-expand the closure and mark unrelated docs as impacted.

🔧 Proposed fix
 function reverseBfs(
   snap: GraphSnapshot,
   seeds: Iterable<string>,
   maxDepth = MAX_DEPTH,
 ): { depthOf: Map<string, number>; viaOf: Map<string, { rel: string; from: string }> } {
+  const DEP_RELATIONS = new Set<GraphEdge["relation"]>([
+    "calls",
+    "imports",
+    "extends",
+    "implements",
+    "method_of",
+  ]);
+
   // Reverse adjacency: target -> [edges pointing at it]. Only edges whose
   // SOURCE is a real node are kept, so a dependent is always a graph node.
   const nodeIds = new Set(snap.nodes.map((n) => n.id));
   const incoming = new Map<string, GraphEdge[]>();

   for (const e of snap.links) {
     if (!nodeIds.has(e.source)) continue;
+    if (!DEP_RELATIONS.has(e.relation)) continue;
     const list = incoming.get(e.target);
     if (list) list.push(e); else incoming.set(e.target, [e]);
   }

Also applies to: 41-45

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/graph/render/impact.ts` around lines 24 - 27, The reverse traversal logic
for collecting impacted nodes is currently processing all relations but should
be restricted to dependency relations only as documented in the comments. Update
the traversal in the impactedNodes collection section (around lines 41-45) to
filter and process only the dependency-related edges: calls, imports, extends,
implements, and method_of. Remove or skip any other relation types that are not
part of the dependency graph to prevent over-expansion of the closure and avoid
marking unrelated documents as impacted.

it("rejects empty content", () => {
const r = gateDocEdit({ tier: "fast", prevContent: "x", newContent: "", newAnchors: [], snap: s });
expect(r.ok).toBe(false);
expect(r.reasons.join()).toMatch(/empty/);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Use exact reason assertions instead of join()+regex checks.

These assertions are currently broad substring matches, which can hide wrong reason text. Assert specific reason values directly.

✅ Suggested assertion style
- expect(r.reasons.join()).toMatch(/empty/);
+ expect(r.reasons).toContain("proposed content is empty");

- expect(r.reasons.join()).toMatch(/slow-tier/);
+ expect(r.reasons).toContain("slow-tier docs are human-curated; automatic refresh is not allowed");

- expect(report.outcomes[0].reasons?.join()).toMatch(/LLM down/);
+ expect(report.outcomes[0].reasons).toContain("generate failed: LLM down");

As per path instructions, tests/**: Prefer asserting on specific values (paths, messages) over generic substrings.

Also applies to: 87-87, 92-92, 97-97, 103-103, 180-180, 214-214

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/shared/docs-refresh.test.ts` at line 81, The test assertions in this
file use `.join().toMatch()` with regex patterns to broadly match substring
content in the reasons array, which can hide incorrect reason values. Replace
these broad substring checks with direct assertions on the specific reason
values. Instead of joining and using regex matching on r.reasons, assert the
exact expected reason value directly using methods like toContain() or by
checking specific array indices with toBe(). Apply this fix to all occurrences
at lines 81, 87, 92, 97, 103, 180, and 214.

Source: Path instructions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant