A Personal AI Assistant framework for developers. 16 first-class assistant-* skills: structured workflow, clarification, TDD enforcement, thinking tools, research, security analysis, cross-session memory, documentation generation, codebase onboarding, idea generation, visual diagrams, review automation, skill creation, self-improving reflexion, and purpose-driven context (Telos).
- Structured Workflow — TRIAGE > DISCOVER > PLAN > BUILD & TEST > DOCUMENT with approval gates and two-stage review
- Clarification — Converts ambiguous, fragmented, or multi-intent prompts into an executable brief
- TDD Enforcement — Red-Green-Refactor cycle with strict verification gates at each transition
- Debugging — Evidence-first root-cause workflow: reproduce, hypothesize, isolate, fix, verify
- Thinking Tools — On-demand structured reasoning (first principles, multi-perspective debate, stress testing, etc.)
- Research Tools — Tiered information gathering with URL verification and confidence scoring
- Security Analysis — STRIDE threat modeling, OWASP code review, CVE dependency audit, attack surface mapping
- Memory System — Cross-session learning: user preferences, feedback rules, task insights, project context
- Documentation — Auto-generates API docs, architecture docs, README, changelogs, migration guides, code explanations
- Onboarding — Systematic codebase learning: maps structure, identifies patterns, records project context
- Idea Generation — Diverge-converge-refine brainstorming pipeline with codebase awareness
- Visual Diagrams — Mermaid diagrams from code: architecture, sequence, ER, flow, component, class, state
- Review Automation — Autonomous review/fix/re-review loop with confidence thresholds
- Skill Creation — Scaffolds V1 skills with contracts, phase gates, and handoffs
- Reflexion — Self-improving agent: post-task reflection, lesson recall, strategy profiles, confidence calibration
- Telos — Purpose context framework (Daniel Miessler's Telos Method): problems, mission, goals, strategies, projects — so agents prioritize work that matters
Install all skills for any supported agent:
./install.sh --agent claude # → ~/.claude/skills/assistant-*/
./install.sh --agent codex # → ~/.codex/skills/assistant-*/
./install.sh --agent gemini # → ~/.gemini/skills/assistant-*/The release inventory is the tracked skills/assistant-* set. skills/unity-* directories are local-only and ignored by git; they are not installed or validated as framework release skills.
Plugin boundaries are contract-backed in docs/plugin-architecture.md. The current installer still uses the root skills/assistant-* release inventory by default, and it also supports focused profile installs:
./install.sh --agent codex --plugin assistant-core
./install.sh --agent codex --plugin assistant-research
./install.sh --agent codex --plugin assistant-devThe repo also includes scaffolded Codex plugin manifests at plugins/assistant-core/.codex-plugin/plugin.json, plugins/assistant-research/.codex-plugin/plugin.json, and plugins/assistant-dev/.codex-plugin/plugin.json. The core scaffold has plugin-local copies of the four core skills, the research scaffold has plugin-local copies of the three research skills, and the dev scaffold has plugin-local copies of the nine development skills. These plugin-local copies are generated release artifacts from the root skills/assistant-* source of truth; verify or refresh them with tools/plugins/sync-plugin-skills.sh --check and tools/plugins/sync-plugin-skills.sh --apply. The installer performs manifest-aware dry-run validation for the core, research, and dev profiles, but the scaffolds are not marketplace-registered yet; root installs remain the compatibility path.
Install a single skill:
./install.sh --agent claude --skill assistant-thinkingPreview without making changes:
./install.sh --agent claude --dry-runEach skill auto-triggers independently based on what you're doing.
Hook profiles control how much lifecycle automation is installed. The default is minimal: skill routing plus session/compaction context helpers. This follows the prompt-load reduction plan in docs/instruction-overload-reduction.md. Use strict only when you want the full enforcement stack (workflow-enforcer, guard, consolidated stop review, etc.), or none/--no-hooks for skills only:
./install.sh --agent claude --hook-profile minimal # default, low-friction
./install.sh --agent claude --hook-profile strict # full enforcement hooks
./install.sh --agent claude --hook-profile none # skills/tools only
./install.sh --agent claude --no-hooks # alias for noneTest hooks before installing:
./install.sh --agent claude --test-hooksOnly tracked assistant-* directories are first-class release skills.
Core development pipeline: idea-to-action decomposition, triage, discover, plan, build & test, verify, document.
Triggers on: build, implement, fix, refactor, plan, create, idea
Clarification workflow for ambiguous, fragmented, or multi-intent prompts. Restates the likely goal, surfaces constraints, asks targeted questions, and produces a structured execution brief.
Triggers on: messy prompt, unclear prompt, figure out what I mean, help me structure this
Test-Driven Development enforcement: Red-Green-Refactor cycle with verification gates. Bug fix pattern (reproduce → fix → protect). Integrates with workflow's build loop and review cycle.
Triggers on: TDD, tests first, test-driven, write the test first, red green refactor
Evidence-first debugging: reproduce or bound the failure, rank competing hypotheses, isolate root cause, apply the smallest durable fix, and verify with original reproduction plus regression checks.
Triggers on: debug, root cause, investigate failure, flaky test, failing test, production issue
Six structured reasoning tools: clarify, perspectives, stress-test, deep-think, hypothesize, creative.
Triggers on: think about, clarify, perspectives, stress test, brainstorm, debate
Tiered research (quick/standard/extensive/deep), five-lens decision briefing, deep investigation, URL verification.
Triggers on: research, investigate, look into, find out, what is
STRIDE threat model, OWASP code review, CVE dependency audit, attack surface mapping.
Triggers on: security, threat model, audit, vulnerability, OWASP
Autonomous code review loop: review, fix, re-review until clean or the loop reaches its cap. Prioritizes concrete bugs, regressions, risks, and missing tests.
Triggers on: review, fresh review, code review, review this, check the code
Memory management via SQLite-backed knowledge graph (~/.{agent}/memory/memory.db). Records rules, preferences, insights, and project context. Survives skill reinstalls. Legacy graph.jsonl files are imported or used as fallback seed compatibility only.
Triggers on: remember this, save insight, update memory, preferences
Documentation generation and maintenance. Six modes: API docs, architecture overview, README, changelog, migration guide, code explainer. Detects stale docs and offers updates.
Triggers on: document, write docs, update readme, changelog, API docs, architecture doc
Creates or updates V1 skills with input/output contracts, phase gates, and handoff definitions following the framework contract guide.
Triggers on: create skill, new skill, add contracts, skill contracts, scaffold skill
Systematic codebase learning for new projects. Six-phase protocol: surface scan, architecture map, pattern recognition, knowledge gaps, record project context, report.
Triggers on: learn this codebase, onboard, get familiar with, map this project
Structured brainstorming pipeline: understand → diverge (8-15 ideas) → converge (scored ranking) → refine (top candidates) → decide. Codebase-aware ideation scans TODOs, complexity hotspots, and recent momentum.
Triggers on: brainstorm, feature idea, what if, how could we, possibilities
Visual documentation from code analysis. Seven diagram types: architecture, sequence, entity-relationship, flow, component, class, state. All output as Mermaid for markdown embedding.
Triggers on: diagram, draw, visualize, show me the flow, architecture diagram
Self-improving agent loop. Post-task reflection captures what worked and what didn't. Pre-task lesson recall loads relevant lessons from past work. Strategy profiles accumulate per project type. Confidence calibration tracks prediction accuracy.
Triggers on: reflect, what did we learn, lessons, how did that go, calibrate
Purpose context framework based on Daniel Miessler's Telos Method. Guides you through building a purpose chain (problems → mission → goals → challenges → strategies → projects) stored at ~/.claude/telos.md. Loaded at every session start so agents can prioritize work aligned with what actually matters to you.
Triggers on: telos, my purpose, why am I doing this, what matters most, my mission, update telos
The sole persistence layer for cross-session memory. Provides queryable context so the agent can ask targeted questions like "What do I know about the desktop app?" via MCP tools.
15 MCP tools: memory_context, memory_search (FTS5-powered), memory_doctor, memory_add_entity, memory_add_relation, memory_add_insight, memory_remove_entity, memory_remove_relation, memory_graph, memory_reflect, memory_decide, memory_pattern, memory_consolidate, memory_stats, memory_trend
Installed automatically to ~/.{agent}/tools/memory-graph/ by the installer. The installer auto-registers the MCP server in your agent settings when jq is available. If not auto-registered, add manually (replace ~ with your actual home directory — most MCP hosts do not expand tilde):
Claude Code (~/.claude.json):
{
"mcpServers": {
"memory-graph": {
"command": "~/.claude/tools/memory-graph/run-memory-graph.sh",
"args": ["--memory-dir", "~/.claude/memory"]
}
}
}Codex (~/.codex/config.toml):
[mcp_servers.memory-graph]
command = "~/.codex/tools/memory-graph/run-memory-graph.sh"
args = ["--memory-dir", "~/.codex/memory"]Gemini (~/.gemini/settings.json):
{
"mcpServers": {
"memory-graph": {
"command": "~/.gemini/tools/memory-graph/run-memory-graph.sh",
"args": ["--memory-dir", "~/.gemini/memory"]
}
}
}Requires .NET 8+ SDK for the initial build (builds automatically on first run). See tools/memory-graph/DESIGN.md for architecture details.
Roslyn-based analyzer that scores method complexity. Used by the workflow skill's quality review stage. See tools/cognitive-complexity/.
Source validator for first-class skill metadata and contract structure:
tools/skills/validate-skills.shBy default it validates only the release inventory: tracked skills/assistant-*/SKILL.md skills and their contracts/*.yaml files. Local-only skills/unity-* directories are excluded by default.
Target a specific skill by name, directory, or SKILL.md path:
tools/skills/validate-skills.sh --skill assistant-thinking
tools/skills/validate-skills.sh --skill skills/assistant-thinking
tools/skills/validate-skills.sh --skill skills/assistant-thinking/SKILL.mdUse --include-local only when you explicitly want to validate every skills/*/SKILL.md, including local-only skill experiments:
tools/skills/validate-skills.sh --include-local
tools/skills/validate-skills.sh --include-local --listProvider-neutral per-skill eval fixtures live at skills/<skill>/evals/cases.json
and run locally through tools/evals/run-skill-evals.sh:
tools/evals/run-skill-evals.sh --validate-fixture
tools/evals/run-skill-evals.sh --list
tools/evals/run-skill-evals.sh --emit-prompts /tmp/skill-eval-prompts
tools/evals/run-skill-evals.sh --responses /tmp/skill-eval-responsesThe default eval inventory is first-class assistant-* skills with fixtures and
excludes local-only unity-* skills unless --include-local is passed. Current
coverage is complete first-class skill coverage for all 16 tracked assistant
skills: assistant-clarify, assistant-debugging, assistant-diagrams, assistant-docs,
assistant-ideate, assistant-memory, assistant-onboard,
assistant-reflexion, assistant-research, assistant-review,
assistant-security, assistant-skill-creator, assistant-tdd,
assistant-telos, assistant-thinking, and assistant-workflow. Local-only
Unity skills remain opt-in through --include-local. Local grading is heuristic
substring-based checking, useful as a Level 4 conformance proxy but not a
replacement for semantic review. Detailed usage is in docs/evals/README.md.
install.sh <- Top-level installer (skills + hooks + memory)
version.txt <- Framework version
graph-seed.jsonl <- Default knowledge graph seed data
skills/
assistant-workflow/
SKILL.md <- Core pipeline (always loaded when triggered)
references/ <- Plan templates, checklists, prompt packs
playbooks/ <- Project-type architecture guides
scripts/ <- Mega task automation
agents/ <- Agent presets (claude/codex/gemini.conf)
assistant-clarify/
SKILL.md <- Clarification workflow for ambiguous or multi-intent prompts
evals/cases.json <- Pilot provider-neutral behavior eval fixtures
assistant-tdd/
SKILL.md <- TDD enforcement (Red-Green-Refactor cycle)
assistant-thinking/
SKILL.md <- Tool descriptions and usage guidance
clarify.md <- First principles: hard vs soft constraints
perspectives.md <- Multi-perspective debate (4 roles, 3 rounds)
stress-test.md <- Steelman + counter-argument
deep-think.md <- 8 analytical lenses
hypothesize.md <- Goal-first + hypothesis plurality
creative.md <- Low-probability sampling
evals/cases.json <- Pilot provider-neutral behavior eval fixtures
assistant-research/
SKILL.md <- Tool descriptions and usage guidance
research.md <- Tiered: quick / standard / extensive / deep
five-lens-briefing.md <- STORM-inspired decision briefing: perspective scan / contradictions / synthesis / peer review
investigate.md <- Deep investigation with ethical framework
url-verify.md <- URL verification protocol
assistant-security/
SKILL.md <- Tool descriptions and severity scale
threat-model.md <- STRIDE analysis
code-review.md <- OWASP Top 10 review
dependency-audit.md <- CVE dependency checking
attack-surface.md <- Attack surface mapping
prompts/threat-model.md <- Deep analysis prompt pack
assistant-review/
SKILL.md <- Autonomous review/fix/re-review loop
assistant-memory/
SKILL.md <- Memory categories, rules, hygiene
templates/ <- Entry format templates
insight-template.md
feedback-template.md
user-pref-template.md
assistant-docs/
SKILL.md <- Mode selection and general protocol
api-docs.md <- API surface documentation
architecture.md <- System overview generation
readme-gen.md <- README generation from code analysis
changelog.md <- Release notes from git history
migration.md <- Breaking change migration guides
explainer.md <- Code explanation for learning
assistant-skill-creator/
SKILL.md <- V1 skill scaffolding with contracts and phase gates
assistant-onboard/
SKILL.md <- Six-phase onboarding protocol
assistant-ideate/
SKILL.md <- Diverge-converge-refine pipeline
assistant-diagrams/
SKILL.md <- Diagram type selection and protocol
arch-diagram.md <- Architecture (component) diagrams
sequence-diagram.md <- Interaction sequence diagrams
er-diagram.md <- Entity-relationship diagrams
flow-diagram.md <- Flowcharts and decision trees
component-diagram.md <- Module dependency diagrams
class-diagram.md <- Type hierarchy diagrams
state-diagram.md <- State machine diagrams
assistant-reflexion/
SKILL.md <- Self-improvement loop protocol
assistant-telos/
SKILL.md <- Purpose context framework (Telos Method)
unity-*/ <- Local-only skill experiments ignored by git, not release inventory
hooks/ <- Automated behaviors (Claude + Codex + Gemini)
scripts/
learning-signals.sh <- Detect learning signals in user prompts
session-start.sh <- Inject task journal + memory on start/resume
pre-compress.sh <- Save state before context compression
post-compact.sh <- Restore context after compaction
stop-review.sh <- Enforce self-review before task handoff
session-end.sh <- Reminder to capture insights
skill-router.sh <- Data-driven skill routing (UserPromptSubmit)
claude-settings.json <- Hook config for Claude Code
codex-settings.json <- Hook config for Codex hooks.json
gemini-settings.json <- Hook config for Gemini CLI
tools/
skills/
validate-skills.sh <- Source validator for first-class skill metadata and contracts
evals/
run-skill-evals.sh <- Provider-neutral per-skill eval fixture helper
run-framework-instruction-evals.sh <- Provider-neutral framework instruction eval helper
cognitive-complexity/ <- Roslyn-based complexity analyzer
memory-graph/
DESIGN.md <- Architecture and data model
run-memory-graph.sh <- Build-and-run script
src/MemoryGraph/ <- C# MCP server (stdio, JSON-RPC)
Graph/ <- In-memory knowledge graph abstractions + legacy JSONL compatibility
Storage/ <- Authoritative SQLite + FTS5 store (graph memory, reflexions, decisions, strategies)
Tools/ <- 15 MCP tool implementations
Server/ <- JSON-RPC message loop
tests/MemoryGraph.Tests/ <- 65 xUnit tests
tests/
test-hooks.sh <- Hook integration tests
You: "I want to add caching to our API"
Workflow skill: Decomposes into 6-8 testable criteria, asks for confirmation, then triages
You: "Fix the null reference in UserService.GetById"
Workflow skill: Triages as Small, quick discovery, lightweight plan, fix + test + self-review
You: "Use TDD to add a password strength validator"
TDD skill: Activates Red-Green-Refactor. Writes failing test first, implements minimum to pass, refactors, logs each cycle in task journal.
You: "Think about whether we should use microservices or modular monolith"
Thinking skill: Loads perspectives.md, runs 4-perspective debate
You: "Research the best .NET caching libraries"
Research skill: Runs standard-tier research with URL verification
You: "Audit the auth flow for vulnerabilities"
Security skill: Loads code-review.md, runs OWASP Top 10 analysis
You: "Document the API"
Docs skill: Scans endpoints, extracts parameters/types, generates API reference with examples
You: "Learn this codebase"
Onboard skill: Maps structure, identifies patterns, records project context through the memory graph, reports summary
You: "What are some ideas for improving the search experience?"
Ideate skill: Understands context, generates 10+ ideas, scores them, refines top 3
You: "Draw the architecture diagram"
Diagrams skill: Traces code, maps components and dependencies, outputs Mermaid diagram
[After completing a task]
Reflexion skill: Captures what worked, what didn't, extracts lessons for future tasks
[Before starting next task]
Reflexion: Recalls relevant lessons, adjusts plan based on past experience
You: "telos create"
Telos skill: Walks you through problems → mission → goals → challenges → strategies → projects
You: "Does this task align with my goals?"
Telos skill: Checks active work against your purpose chain
Hooks fire automatically on agent lifecycle events. Installed for Claude Code, Gemini CLI, and Codex. Codex hooks use ~/.codex/hooks.json with the hooks feature enabled. Codex compaction hooks require Codex CLI 0.129.0 or newer; older Codex installs still receive the supported lifecycle and tool-use hooks.
| Hook | Event | What it does |
|---|---|---|
| Session start | Session begins/resumes | Injects task journal + memory feedback into context |
| Skill router | User submits prompt | Pattern-matches prompt against skill triggers; injects reminder to invoke the correct skill |
| Workflow enforcer | User submits prompt | Injects current workflow state plus runtime phase-gate warnings for decomposition, plan, review, document, and metrics gates |
| Learning signals | User submits prompt | Detects corrections, approvals, frustrations, and pivots; logs to signals.jsonl for trend analysis |
| Workflow guard | Before tool use | Warns when direct edits happen during an active build/review workflow and keeps supported tool-use adjustments centralized |
| Pre-compress | Before context compaction | Reminds agent to update task journal before state is lost |
| Post-compact | After compaction completes | Re-injects task journal and feedback rules |
| Stop review | Agent finishes responding during active build/review/document work | Consolidated strict stop gate: approved medium+ plan, structured Spec Review, Quality Review, Final Result, medium+ rubric score, and metrics before task handoff |
| Session end | Session terminates | Logs reminder about uncaptured insights |
These replace manual steps — you no longer need to ask "did you read the task journal?" or "do a fresh review".
The skill router hook prevents the agent from freelancing tasks that skills already handle. It fires on every user prompt, scans all installed skills for triggers: frontmatter, and injects a context reminder when a match is found.
Adding triggers to a skill — add a triggers: block to the SKILL.md frontmatter:
---
name: my-skill
description: "..."
triggers:
- pattern: "keyword1|keyword2|multi word phrase"
priority: 80
reminder: "You MUST invoke the Skill tool with skill='my-skill' BEFORE proceeding."
- pattern: "another pattern"
priority: 60
min_words: 5
reminder: "Consider invoking my-skill for this request."
---| Field | Required | Description |
|---|---|---|
pattern |
Yes | Regex pattern matched against the user's prompt (case-insensitive, word-boundary) |
priority |
No | Higher = checked first. Default: 50. Use 80-90 for specific triggers, 30-50 for broad ones |
reminder |
No | Custom text injected into agent context. Default: generic "invoke skill X" message |
min_words |
No | Minimum word count in prompt to trigger. Prevents false positives on short messages |
Priority ordering ensures specific skills match before broad ones (e.g., "use TDD to implement X" matches assistant-tdd at priority 85, not assistant-workflow at priority 30).
No script changes needed when adding new skills — just add the frontmatter and reinstall.
- Never guess — Ask when ambiguous, state assumptions when clear
- Right-sized ceremony — Small tasks get lightweight treatment, large tasks get full workflow
- Composable skills — Each first-class
assistant-*skill works standalone or together with the others - Progressive loading — Each SKILL.md is small. Tool files load on demand.
- Thinking tools are tools, not phases — Use them when needed, not on every task
- Memory survives reinstalls — Data in
~/.{agent}/memory/, not in skill directories - Learning compounds — Insights from past work inform future decisions
- Self-improving — Every task makes the next task better through reflexion
- Covers weaknesses — Documentation, diagrams, and onboarding compensate for developer blind spots