Constraint-Adapted Subtask Cascade for Agentic Decomposition & Execution
Get frontier-quality work from small, local language models — by growing a
knowledge graph with a person in the loop until every step is small enough to do reliably.
A local command-line tool that decomposes hard, fuzzy tasks into tiny, verified steps a small model can actually finish — with you steering through an Obsidian vault.
Status — in active development. The engine, the knowledge-first two-stage pipeline, and the Obsidian human-in-the-loop layer are built and green on the model-free path (the whole engine is testable with no model in the loop). What is still being hardened is end-to-end convergence with a real small model — keeping a small model on-rails across a long live run is the open hard part, and that path is opt-in today.
Works today: build from source, then
cascade init,cascade solve, andcascade vault. Coming: an MCP server, a larger-model elevation path, and packaged release binaries for macOS / Windows / Linux.
CASCADE gets real work out of small language models running on your own machine (via
Ollama) instead of a large, metered frontier API — by shrinking every step
until it is small enough to do reliably, and keeping a person in the loop to supply the
judgment a small model can't.
The durable bet: a small model fails when the decision entropy of a single step exceeds its working-memory headroom — not because the overall task is big. So the fix is mechanical: shrink each step until the decisions it forces are within reach (often down to "write one function with one test"), and it becomes reliable.
The pivot that defines CASCADE today: a small model cannot enumerate its own ignorance — it doesn't know what it doesn't know. So design, decisions, and the unknown-unknowns are offloaded to a person (and, later, to larger models), and the model elevates when it gets stuck — it asks for help loudly instead of thrashing or faking progress. Human-in-the-loop is the standard operating mode here, not an escape hatch for failures.
One engine, two modes: "split this" and "do this" are the same constrained, schema-validated, low-entropy call — only the node's type and the schema its output is checked against differ. The bet within the bet: producing a valid decomposition is far lower-entropy than producing a good plan. The grammar removes the freedom small models lack.
CASCADE does not build one big plan upfront. It works in two stages across two linked graphs.
Stage 1 — Interrogate to a knowledge graph. Decomposition begins by asking, not planning:
ask a question → answer it → each answer raises sub-questions → recurse → exhaust a branch → back up.
Research questions are answered by grounding (fetching and probing real docs/APIs); capability
questions ("can this machine do X?") by a tool registry. Unknown-unknowns surface naturally as
answers raise new questions. The result is a durable knowledge.json.
Stage 2 — Plan from knowledge, then execute. The task DAG is projected out of the completed knowledge graph — not authored in advance. Tasks then execute bottom-up, and the leaves are agentic: they write files, run commands, and call cached tools. The facts gathered in Stage 1 flow down into each task as its grounding context.
Two cross-linked dimensions. There are two graphs — a knowledge dimension (questions and facts) and an execution dimension (tasks) — cross-indexed by unique ID in both directions. A task knows the facts that ground it; a fact knows the tasks it informs. A task's context is its deepest linked knowledge node's path-to-root: the specific fact plus the chain that situates it, and no more.
Two principles keep deep decomposition safe:
Verify before you descend — every split is checked (schema, then meaning) before any child is spawned, which resets per-step reliability at each level instead of letting it decay. Verify before you ascend — pieces that don't fit are caught by a separate reviewer as they recombine on the way back up.
And because structure is durable while conversation is disposable, the whole graph lives as JSON
on disk (knowledge.json, tree.json). No model context is load-bearing — a run survives crashes,
can be put down and picked up later, and can be rendered by any external tool. That last property is
exactly what makes the Obsidian round-trip possible.
The engine walks JSON; you never have to edit JSON. The same graph is projected into an
Obsidian vault as linked Markdown notes, because brainstorming, linking, and tagging are native
there.
- Round-trip engines. A deterministic, idempotent render (JSON → Markdown) and an ingest
(Markdown → JSON). Your edits merge in: hand-authored content wins on the fields you own, and the
engine-computed fields are never clobbered.
cascade vault synckeeps both sides live in both directions until you stop it. - Graph semantics in the vault.
[[links]]are hard directional edges (parent / raises / depends / knowledge);#tagsare soft topical labels; kind, status, and atomicity live in frontmatter. Reserved tags like#needs-humanand#atomicare a comfort layer that the engine normalizes back into structured fields. - Elevate-when-stuck → a Resolver. When the model dead-ends, blocks, or exhausts its budget, it
raises
#needs-humanin the vault instead of thrashing. A Resolver is whoever resolves it — a person, a larger model, or an external service — behind one common seam. Two calls are explicitly the person's, not the model's: is this atomic enough to just execute? and do we keep decomposing this knowledge, or stop here and start executing? - Iterative and online. Neither you nor the model solves it in one shot. You can jump in mid-run — add a fact, retag, relink — and the model picks up from the changed graph. Growing the graph live, together, is the point.
- Start from a concept — a Markdown note or a one-line ask becomes the seed.
- Interrogate — the model asks, grounds its answers against real sources, and recurses, growing the knowledge graph.
- Elevate when stuck —
#needs-humansurfaces in Obsidian; a person answers, links, tags, or says "that's enough — start executing." - Plan — the task DAG is projected from the knowledge graph.
- Execute — bottom-up, agentic leaves do the work; every result is reviewed by a separate step before it is accepted.
- Learn back (in progress) — facts discovered while executing flow back into the knowledge graph, so later tasks and reruns start smarter.
The person is the third gate, alongside verify before you descend and verify before you ascend — for the calls neither automated check can make.
Requirements
- Ollama installed and running, plus a small local model (
cascade initpulls one for you). - Go 1.26+ to build from source. macOS, Windows, or Linux.
- Obsidian recommended for the human-in-the-loop loop — optional; the CLI review gates work without it.
Build
git clone https://github.com/robinonsay/cascade
cd cascade
go build ./cmd/cascadePackaged release binaries are coming. Until then, build from source.
Commands
cascade init # detect Ollama, report installed models, create the cache + config
cascade solve # the nominal run: interrogate → knowledge graph → review → decompose + execute
cascade vault sync --base <dir> --vault <dir> # live two-way sync between the graph and your Obsidian vaultcascade solvetakes flags to point it at your own task (--root <file.md>), bound the interrogation (--max-questions <n>), tune the review gate (--review-depth <n>), turn on grounding sources (--project-root <dir>,--web-search), control the live stream (--quiet/--verbose), and auto-approve (--yes). Every flag and why it exists is in the command reference.- The human-in-the-loop loop is meant to be attended and file-mediated through the vault — not
run headless with
--yes.--yesis for demos and CI.
New here? Read the User Guide. It walks you from install → first run → steering in Obsidian → reading the result, with a full weather-app tutorial.
A capable agent can hand a big, decomposable task to CASCADE instead of burning its own context on it — CASCADE decomposes and executes on the small local model while the agent spends a handful of tool calls and plays the Resolver (the role a person plays at the vault). Register the stdio MCP server with any MCP client:
{ "mcpServers": { "cascade": { "command": "cascade", "args": ["mcp"] } } }It exposes three tools — solve, resolve, inspect. See the
MCP server reference for the full contract.
Using Claude Code? Drop the bundled using-cascade skill into your project so Claude knows how
to drive a run (grow the knowledge graph, check it for accuracy, give detailed resolutions). From your
own repo:
mkdir -p .claude/skills/using-cascade
curl -fsSL https://raw.githubusercontent.com/robinonsay/cascade/main/.claude/skills/using-cascade/SKILL.md \
-o .claude/skills/using-cascade/SKILL.mdThen ask Claude to "use cascade to solve …" and it will follow the operator playbook. The skill is a
single self-contained file — copy it anywhere .claude/skills/ is read.
- Local & private. Your task, code, and data never leave your machine; the models run on Ollama.
- Cheap. Small local models instead of per-token frontier API calls — cost scales with your hardware, not your token count.
- Reliable on hard work. Verification gates at every level keep quality from decaying as a task is broken down.
- A collaboration, not an autopilot. You supply judgment, decisions, and the unknown-unknowns; the small model does the narrow, well-specified work. You stay in the driver's seat.
- Gets better over time. Tools it builds are cached per-machine and reused, so common work (fetch a URL, parse some JSON, verify a result) isn't rewritten from scratch.
It's for someone who wants real output from local models on tasks too fuzzy for one-shot prompting — and who is willing to steer, brainstorming and correcting in Obsidian, rather than expecting full autonomy.
The engine was built along a phased axis — Phase 0 skeleton (no model) → 1 local execution via Ollama → 2 runnable verification of every result → 3 escalation + parallel execution → 4 templates + tool cache → 5 reliability study. Two later layers cut across those phases and are what the system is today:
- Knowledge-first interrogation — the two-stage rearchitecture: interrogate to a knowledge graph, then plan and execute from it.
- Obsidian-mediated co-development — the human-in-the-loop surface and the JSON⇔Markdown round-trip engines.
Near-future:
- MCP server + larger-model elevation — a second delivery surface, and the non-person Resolver path (hand a stuck node to a bigger model).
- Packaged release binaries + onboarding — cross-platform distribution.
- A co-evolving graph — execution teaches the knowledge graph, and the two are traversed asymmetrically (knowledge breadth-first, tasks depth-first).
The headline demo — building a working weather app from a one-line request — is the milestone the whole system is measured against.
Using CASCADE — the User Guide:
- Installation — dependencies, build,
cascade init. - Quickstart — your first run in five minutes.
- Command reference — every command and flag, and why it exists.
- The workflow — how a run goes, stage by stage.
- Steering in Obsidian — the vault round-trip, giving feedback, elevation.
- Watching & results — the live stream and reading the output.
- Tutorial: build a weather app — end to end.
- Working on a project over time — many tasks, one growing graph.
The design — the specs:
docs/cascade-spec.md— the architecture (the what).docs/cascade-spec-v0.2-additions.md— agentic execution, the tool cache, MCP, and the two-phase control flow.
For contributors: cmd/cascade is the CLI; internal/* holds the components (engine,
knowledge, interrogation, plan, vault, elevation, verifier, assembler, adapter,
store, …); docs/ holds the specs. The on-disk graph is plain JSON, renderable by any external
tool.