Skip to content

laimis91/assistant-framework

Repository files navigation

Assistant Framework

A Personal AI Assistant framework for developers. 16 first-class assistant-* skills: structured workflow, clarification, TDD enforcement, thinking tools, research, security analysis, cross-session memory, documentation generation, codebase onboarding, idea generation, visual diagrams, review automation, skill creation, self-improving reflexion, and purpose-driven context (Telos).

What it does

  1. Structured Workflow — TRIAGE > DISCOVER > PLAN > BUILD & TEST > DOCUMENT with approval gates and two-stage review
  2. Clarification — Converts ambiguous, fragmented, or multi-intent prompts into an executable brief
  3. TDD Enforcement — Red-Green-Refactor cycle with strict verification gates at each transition
  4. Debugging — Evidence-first root-cause workflow: reproduce, hypothesize, isolate, fix, verify
  5. Thinking Tools — On-demand structured reasoning (first principles, multi-perspective debate, stress testing, etc.)
  6. Research Tools — Tiered information gathering with URL verification and confidence scoring
  7. Security Analysis — STRIDE threat modeling, OWASP code review, CVE dependency audit, attack surface mapping
  8. Memory System — Cross-session learning: user preferences, feedback rules, task insights, project context
  9. Documentation — Auto-generates API docs, architecture docs, README, changelogs, migration guides, code explanations
  10. Onboarding — Systematic codebase learning: maps structure, identifies patterns, records project context
  11. Idea Generation — Diverge-converge-refine brainstorming pipeline with codebase awareness
  12. Visual Diagrams — Mermaid diagrams from code: architecture, sequence, ER, flow, component, class, state
  13. Review Automation — Autonomous review/fix/re-review loop with confidence thresholds
  14. Skill Creation — Scaffolds V1 skills with contracts, phase gates, and handoffs
  15. Reflexion — Self-improving agent: post-task reflection, lesson recall, strategy profiles, confidence calibration
  16. Telos — Purpose context framework (Daniel Miessler's Telos Method): problems, mission, goals, strategies, projects — so agents prioritize work that matters

Installation

Install all skills for any supported agent:

./install.sh --agent claude   # → ~/.claude/skills/assistant-*/
./install.sh --agent codex    # → ~/.codex/skills/assistant-*/
./install.sh --agent gemini   # → ~/.gemini/skills/assistant-*/

The release inventory is the tracked skills/assistant-* set. skills/unity-* directories are local-only and ignored by git; they are not installed or validated as framework release skills.

Plugin boundaries are contract-backed in docs/plugin-architecture.md. The current installer still uses the root skills/assistant-* release inventory by default, and it also supports focused profile installs:

./install.sh --agent codex --plugin assistant-core
./install.sh --agent codex --plugin assistant-research
./install.sh --agent codex --plugin assistant-dev

The repo also includes scaffolded Codex plugin manifests at plugins/assistant-core/.codex-plugin/plugin.json, plugins/assistant-research/.codex-plugin/plugin.json, and plugins/assistant-dev/.codex-plugin/plugin.json. The core scaffold has plugin-local copies of the four core skills, the research scaffold has plugin-local copies of the three research skills, and the dev scaffold has plugin-local copies of the nine development skills. These plugin-local copies are generated release artifacts from the root skills/assistant-* source of truth; verify or refresh them with tools/plugins/sync-plugin-skills.sh --check and tools/plugins/sync-plugin-skills.sh --apply. The installer performs manifest-aware dry-run validation for the core, research, and dev profiles, but the scaffolds are not marketplace-registered yet; root installs remain the compatibility path.

Install a single skill:

./install.sh --agent claude --skill assistant-thinking

Preview without making changes:

./install.sh --agent claude --dry-run

Each skill auto-triggers independently based on what you're doing.

Hook profiles control how much lifecycle automation is installed. The default is minimal: skill routing plus session/compaction context helpers. This follows the prompt-load reduction plan in docs/instruction-overload-reduction.md. Use strict only when you want the full enforcement stack (workflow-enforcer, guard, consolidated stop review, etc.), or none/--no-hooks for skills only:

./install.sh --agent claude --hook-profile minimal  # default, low-friction
./install.sh --agent claude --hook-profile strict   # full enforcement hooks
./install.sh --agent claude --hook-profile none     # skills/tools only
./install.sh --agent claude --no-hooks              # alias for none

Test hooks before installing:

./install.sh --agent claude --test-hooks

Skills

Only tracked assistant-* directories are first-class release skills.

assistant-workflow

Core development pipeline: idea-to-action decomposition, triage, discover, plan, build & test, verify, document.

Triggers on: build, implement, fix, refactor, plan, create, idea

assistant-clarify

Clarification workflow for ambiguous, fragmented, or multi-intent prompts. Restates the likely goal, surfaces constraints, asks targeted questions, and produces a structured execution brief.

Triggers on: messy prompt, unclear prompt, figure out what I mean, help me structure this

assistant-tdd

Test-Driven Development enforcement: Red-Green-Refactor cycle with verification gates. Bug fix pattern (reproduce → fix → protect). Integrates with workflow's build loop and review cycle.

Triggers on: TDD, tests first, test-driven, write the test first, red green refactor

assistant-debugging

Evidence-first debugging: reproduce or bound the failure, rank competing hypotheses, isolate root cause, apply the smallest durable fix, and verify with original reproduction plus regression checks.

Triggers on: debug, root cause, investigate failure, flaky test, failing test, production issue

assistant-thinking

Six structured reasoning tools: clarify, perspectives, stress-test, deep-think, hypothesize, creative.

Triggers on: think about, clarify, perspectives, stress test, brainstorm, debate

assistant-research

Tiered research (quick/standard/extensive/deep), five-lens decision briefing, deep investigation, URL verification.

Triggers on: research, investigate, look into, find out, what is

assistant-security

STRIDE threat model, OWASP code review, CVE dependency audit, attack surface mapping.

Triggers on: security, threat model, audit, vulnerability, OWASP

assistant-review

Autonomous code review loop: review, fix, re-review until clean or the loop reaches its cap. Prioritizes concrete bugs, regressions, risks, and missing tests.

Triggers on: review, fresh review, code review, review this, check the code

assistant-memory

Memory management via SQLite-backed knowledge graph (~/.{agent}/memory/memory.db). Records rules, preferences, insights, and project context. Survives skill reinstalls. Legacy graph.jsonl files are imported or used as fallback seed compatibility only.

Triggers on: remember this, save insight, update memory, preferences

assistant-docs

Documentation generation and maintenance. Six modes: API docs, architecture overview, README, changelog, migration guide, code explainer. Detects stale docs and offers updates.

Triggers on: document, write docs, update readme, changelog, API docs, architecture doc

assistant-skill-creator

Creates or updates V1 skills with input/output contracts, phase gates, and handoff definitions following the framework contract guide.

Triggers on: create skill, new skill, add contracts, skill contracts, scaffold skill

assistant-onboard

Systematic codebase learning for new projects. Six-phase protocol: surface scan, architecture map, pattern recognition, knowledge gaps, record project context, report.

Triggers on: learn this codebase, onboard, get familiar with, map this project

assistant-ideate

Structured brainstorming pipeline: understand → diverge (8-15 ideas) → converge (scored ranking) → refine (top candidates) → decide. Codebase-aware ideation scans TODOs, complexity hotspots, and recent momentum.

Triggers on: brainstorm, feature idea, what if, how could we, possibilities

assistant-diagrams

Visual documentation from code analysis. Seven diagram types: architecture, sequence, entity-relationship, flow, component, class, state. All output as Mermaid for markdown embedding.

Triggers on: diagram, draw, visualize, show me the flow, architecture diagram

assistant-reflexion

Self-improving agent loop. Post-task reflection captures what worked and what didn't. Pre-task lesson recall loads relevant lessons from past work. Strategy profiles accumulate per project type. Confidence calibration tracks prediction accuracy.

Triggers on: reflect, what did we learn, lessons, how did that go, calibrate

assistant-telos

Purpose context framework based on Daniel Miessler's Telos Method. Guides you through building a purpose chain (problems → mission → goals → challenges → strategies → projects) stored at ~/.claude/telos.md. Loaded at every session start so agents can prioritize work aligned with what actually matters to you.

Triggers on: telos, my purpose, why am I doing this, what matters most, my mission, update telos

Tools

Memory Graph (MCP Server)

The sole persistence layer for cross-session memory. Provides queryable context so the agent can ask targeted questions like "What do I know about the desktop app?" via MCP tools.

15 MCP tools: memory_context, memory_search (FTS5-powered), memory_doctor, memory_add_entity, memory_add_relation, memory_add_insight, memory_remove_entity, memory_remove_relation, memory_graph, memory_reflect, memory_decide, memory_pattern, memory_consolidate, memory_stats, memory_trend

Installed automatically to ~/.{agent}/tools/memory-graph/ by the installer. The installer auto-registers the MCP server in your agent settings when jq is available. If not auto-registered, add manually (replace ~ with your actual home directory — most MCP hosts do not expand tilde):

Claude Code (~/.claude.json):

{
  "mcpServers": {
    "memory-graph": {
      "command": "~/.claude/tools/memory-graph/run-memory-graph.sh",
      "args": ["--memory-dir", "~/.claude/memory"]
    }
  }
}

Codex (~/.codex/config.toml):

[mcp_servers.memory-graph]
command = "~/.codex/tools/memory-graph/run-memory-graph.sh"
args = ["--memory-dir", "~/.codex/memory"]

Gemini (~/.gemini/settings.json):

{
  "mcpServers": {
    "memory-graph": {
      "command": "~/.gemini/tools/memory-graph/run-memory-graph.sh",
      "args": ["--memory-dir", "~/.gemini/memory"]
    }
  }
}

Requires .NET 8+ SDK for the initial build (builds automatically on first run). See tools/memory-graph/DESIGN.md for architecture details.

Cognitive Complexity

Roslyn-based analyzer that scores method complexity. Used by the workflow skill's quality review stage. See tools/cognitive-complexity/.

Skill Validator

Source validator for first-class skill metadata and contract structure:

tools/skills/validate-skills.sh

By default it validates only the release inventory: tracked skills/assistant-*/SKILL.md skills and their contracts/*.yaml files. Local-only skills/unity-* directories are excluded by default.

Target a specific skill by name, directory, or SKILL.md path:

tools/skills/validate-skills.sh --skill assistant-thinking
tools/skills/validate-skills.sh --skill skills/assistant-thinking
tools/skills/validate-skills.sh --skill skills/assistant-thinking/SKILL.md

Use --include-local only when you explicitly want to validate every skills/*/SKILL.md, including local-only skill experiments:

tools/skills/validate-skills.sh --include-local
tools/skills/validate-skills.sh --include-local --list

Skill Evals

Provider-neutral per-skill eval fixtures live at skills/<skill>/evals/cases.json and run locally through tools/evals/run-skill-evals.sh:

tools/evals/run-skill-evals.sh --validate-fixture
tools/evals/run-skill-evals.sh --list
tools/evals/run-skill-evals.sh --emit-prompts /tmp/skill-eval-prompts
tools/evals/run-skill-evals.sh --responses /tmp/skill-eval-responses

The default eval inventory is first-class assistant-* skills with fixtures and excludes local-only unity-* skills unless --include-local is passed. Current coverage is complete first-class skill coverage for all 16 tracked assistant skills: assistant-clarify, assistant-debugging, assistant-diagrams, assistant-docs, assistant-ideate, assistant-memory, assistant-onboard, assistant-reflexion, assistant-research, assistant-review, assistant-security, assistant-skill-creator, assistant-tdd, assistant-telos, assistant-thinking, and assistant-workflow. Local-only Unity skills remain opt-in through --include-local. Local grading is heuristic substring-based checking, useful as a Level 4 conformance proxy but not a replacement for semantic review. Detailed usage is in docs/evals/README.md.

Structure

install.sh                         <- Top-level installer (skills + hooks + memory)
version.txt                        <- Framework version
graph-seed.jsonl                   <- Default knowledge graph seed data

skills/
  assistant-workflow/
    SKILL.md                       <- Core pipeline (always loaded when triggered)
    references/                    <- Plan templates, checklists, prompt packs
    playbooks/                     <- Project-type architecture guides
    scripts/                       <- Mega task automation
    agents/                        <- Agent presets (claude/codex/gemini.conf)

  assistant-clarify/
    SKILL.md                       <- Clarification workflow for ambiguous or multi-intent prompts
    evals/cases.json               <- Pilot provider-neutral behavior eval fixtures

  assistant-tdd/
    SKILL.md                       <- TDD enforcement (Red-Green-Refactor cycle)

  assistant-thinking/
    SKILL.md                       <- Tool descriptions and usage guidance
    clarify.md                     <- First principles: hard vs soft constraints
    perspectives.md                <- Multi-perspective debate (4 roles, 3 rounds)
    stress-test.md                 <- Steelman + counter-argument
    deep-think.md                  <- 8 analytical lenses
    hypothesize.md                 <- Goal-first + hypothesis plurality
    creative.md                    <- Low-probability sampling
    evals/cases.json               <- Pilot provider-neutral behavior eval fixtures

  assistant-research/
    SKILL.md                       <- Tool descriptions and usage guidance
    research.md                    <- Tiered: quick / standard / extensive / deep
    five-lens-briefing.md          <- STORM-inspired decision briefing: perspective scan / contradictions / synthesis / peer review
    investigate.md                 <- Deep investigation with ethical framework
    url-verify.md                  <- URL verification protocol

  assistant-security/
    SKILL.md                       <- Tool descriptions and severity scale
    threat-model.md                <- STRIDE analysis
    code-review.md                 <- OWASP Top 10 review
    dependency-audit.md            <- CVE dependency checking
    attack-surface.md              <- Attack surface mapping
    prompts/threat-model.md        <- Deep analysis prompt pack

  assistant-review/
    SKILL.md                       <- Autonomous review/fix/re-review loop

  assistant-memory/
    SKILL.md                       <- Memory categories, rules, hygiene
    templates/                     <- Entry format templates
      insight-template.md
      feedback-template.md
      user-pref-template.md

  assistant-docs/
    SKILL.md                       <- Mode selection and general protocol
    api-docs.md                    <- API surface documentation
    architecture.md                <- System overview generation
    readme-gen.md                  <- README generation from code analysis
    changelog.md                   <- Release notes from git history
    migration.md                   <- Breaking change migration guides
    explainer.md                   <- Code explanation for learning

  assistant-skill-creator/
    SKILL.md                       <- V1 skill scaffolding with contracts and phase gates

  assistant-onboard/
    SKILL.md                       <- Six-phase onboarding protocol

  assistant-ideate/
    SKILL.md                       <- Diverge-converge-refine pipeline

  assistant-diagrams/
    SKILL.md                       <- Diagram type selection and protocol
    arch-diagram.md                <- Architecture (component) diagrams
    sequence-diagram.md            <- Interaction sequence diagrams
    er-diagram.md                  <- Entity-relationship diagrams
    flow-diagram.md                <- Flowcharts and decision trees
    component-diagram.md           <- Module dependency diagrams
    class-diagram.md               <- Type hierarchy diagrams
    state-diagram.md               <- State machine diagrams

  assistant-reflexion/
    SKILL.md                       <- Self-improvement loop protocol

  assistant-telos/
    SKILL.md                       <- Purpose context framework (Telos Method)

  unity-*/                         <- Local-only skill experiments ignored by git, not release inventory

hooks/                             <- Automated behaviors (Claude + Codex + Gemini)
  scripts/
    learning-signals.sh             <- Detect learning signals in user prompts
    session-start.sh               <- Inject task journal + memory on start/resume
    pre-compress.sh                <- Save state before context compression
    post-compact.sh                <- Restore context after compaction
    stop-review.sh                 <- Enforce self-review before task handoff
    session-end.sh                 <- Reminder to capture insights
    skill-router.sh                <- Data-driven skill routing (UserPromptSubmit)
  claude-settings.json             <- Hook config for Claude Code
  codex-settings.json              <- Hook config for Codex hooks.json
  gemini-settings.json             <- Hook config for Gemini CLI

tools/
  skills/
    validate-skills.sh             <- Source validator for first-class skill metadata and contracts
  evals/
    run-skill-evals.sh             <- Provider-neutral per-skill eval fixture helper
    run-framework-instruction-evals.sh <- Provider-neutral framework instruction eval helper
  cognitive-complexity/             <- Roslyn-based complexity analyzer
  memory-graph/
    DESIGN.md                      <- Architecture and data model
    run-memory-graph.sh            <- Build-and-run script
    src/MemoryGraph/               <- C# MCP server (stdio, JSON-RPC)
      Graph/                       <- In-memory knowledge graph abstractions + legacy JSONL compatibility
      Storage/                     <- Authoritative SQLite + FTS5 store (graph memory, reflexions, decisions, strategies)
      Tools/                       <- 15 MCP tool implementations
      Server/                      <- JSON-RPC message loop
    tests/MemoryGraph.Tests/       <- 65 xUnit tests

tests/
  test-hooks.sh                    <- Hook integration tests

How it works

For ideas (vague)

You: "I want to add caching to our API"
Workflow skill: Decomposes into 6-8 testable criteria, asks for confirmation, then triages

For tasks (concrete)

You: "Fix the null reference in UserService.GetById"
Workflow skill: Triages as Small, quick discovery, lightweight plan, fix + test + self-review

For TDD

You: "Use TDD to add a password strength validator"
TDD skill: Activates Red-Green-Refactor. Writes failing test first, implements minimum to pass, refactors, logs each cycle in task journal.

For thinking

You: "Think about whether we should use microservices or modular monolith"
Thinking skill: Loads perspectives.md, runs 4-perspective debate

For research

You: "Research the best .NET caching libraries"
Research skill: Runs standard-tier research with URL verification

For security

You: "Audit the auth flow for vulnerabilities"
Security skill: Loads code-review.md, runs OWASP Top 10 analysis

For documentation

You: "Document the API"
Docs skill: Scans endpoints, extracts parameters/types, generates API reference with examples

For new projects

You: "Learn this codebase"
Onboard skill: Maps structure, identifies patterns, records project context through the memory graph, reports summary

For brainstorming

You: "What are some ideas for improving the search experience?"
Ideate skill: Understands context, generates 10+ ideas, scores them, refines top 3

For diagrams

You: "Draw the architecture diagram"
Diagrams skill: Traces code, maps components and dependencies, outputs Mermaid diagram

For self-improvement

[After completing a task]
Reflexion skill: Captures what worked, what didn't, extracts lessons for future tasks
[Before starting next task]
Reflexion: Recalls relevant lessons, adjusts plan based on past experience

For purpose alignment

You: "telos create"
Telos skill: Walks you through problems → mission → goals → challenges → strategies → projects
You: "Does this task align with my goals?"
Telos skill: Checks active work against your purpose chain

Hooks (automated behaviors)

Hooks fire automatically on agent lifecycle events. Installed for Claude Code, Gemini CLI, and Codex. Codex hooks use ~/.codex/hooks.json with the hooks feature enabled. Codex compaction hooks require Codex CLI 0.129.0 or newer; older Codex installs still receive the supported lifecycle and tool-use hooks.

Hook Event What it does
Session start Session begins/resumes Injects task journal + memory feedback into context
Skill router User submits prompt Pattern-matches prompt against skill triggers; injects reminder to invoke the correct skill
Workflow enforcer User submits prompt Injects current workflow state plus runtime phase-gate warnings for decomposition, plan, review, document, and metrics gates
Learning signals User submits prompt Detects corrections, approvals, frustrations, and pivots; logs to signals.jsonl for trend analysis
Workflow guard Before tool use Warns when direct edits happen during an active build/review workflow and keeps supported tool-use adjustments centralized
Pre-compress Before context compaction Reminds agent to update task journal before state is lost
Post-compact After compaction completes Re-injects task journal and feedback rules
Stop review Agent finishes responding during active build/review/document work Consolidated strict stop gate: approved medium+ plan, structured Spec Review, Quality Review, Final Result, medium+ rubric score, and metrics before task handoff
Session end Session terminates Logs reminder about uncaptured insights

These replace manual steps — you no longer need to ask "did you read the task journal?" or "do a fresh review".

Skill routing

The skill router hook prevents the agent from freelancing tasks that skills already handle. It fires on every user prompt, scans all installed skills for triggers: frontmatter, and injects a context reminder when a match is found.

Adding triggers to a skill — add a triggers: block to the SKILL.md frontmatter:

---
name: my-skill
description: "..."
triggers:
  - pattern: "keyword1|keyword2|multi word phrase"
    priority: 80
    reminder: "You MUST invoke the Skill tool with skill='my-skill' BEFORE proceeding."
  - pattern: "another pattern"
    priority: 60
    min_words: 5
    reminder: "Consider invoking my-skill for this request."
---
Field Required Description
pattern Yes Regex pattern matched against the user's prompt (case-insensitive, word-boundary)
priority No Higher = checked first. Default: 50. Use 80-90 for specific triggers, 30-50 for broad ones
reminder No Custom text injected into agent context. Default: generic "invoke skill X" message
min_words No Minimum word count in prompt to trigger. Prevents false positives on short messages

Priority ordering ensures specific skills match before broad ones (e.g., "use TDD to implement X" matches assistant-tdd at priority 85, not assistant-workflow at priority 30).

No script changes needed when adding new skills — just add the frontmatter and reinstall.

Design principles

  • Never guess — Ask when ambiguous, state assumptions when clear
  • Right-sized ceremony — Small tasks get lightweight treatment, large tasks get full workflow
  • Composable skills — Each first-class assistant-* skill works standalone or together with the others
  • Progressive loading — Each SKILL.md is small. Tool files load on demand.
  • Thinking tools are tools, not phases — Use them when needed, not on every task
  • Memory survives reinstalls — Data in ~/.{agent}/memory/, not in skill directories
  • Learning compounds — Insights from past work inform future decisions
  • Self-improving — Every task makes the next task better through reflexion
  • Covers weaknesses — Documentation, diagrams, and onboarding compensate for developer blind spots

Releases

No releases published

Packages

 
 
 

Contributors