From 58c43b92adef29d65c15bcb81773808d28d76958 Mon Sep 17 00:00:00 2001 From: NOVA Date: Tue, 21 Apr 2026 17:15:10 +0000 Subject: [PATCH] docs: comprehensive documentation for all scripts and subprojects --- ARCHITECTURE.md | 187 ++++++++++++++++++++++++++++++++++ README.md | 121 ++++++++++++++++++++-- docs/agent-install.md | 46 +++++++++ docs/git-security.md | 124 +++++++++++++++++++++++ docs/memory-pipeline.md | 215 ++++++++++++++++++++++++++++++++++++++++ 5 files changed, 682 insertions(+), 11 deletions(-) create mode 100644 ARCHITECTURE.md create mode 100644 docs/agent-install.md create mode 100644 docs/git-security.md create mode 100644 docs/memory-pipeline.md diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 0000000..2059eed --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,187 @@ +# Architecture Overview + +This document describes how the different scripts and subprojects in `nova-scripts` interconnect. + +## Memory / Embeddings Pipeline + +The memory pipeline is a multi‑stage system that extracts structured knowledge from chat messages, embeds it for semantic search, and enables proactive recall. + +### Flow + +``` +┌─────────────────────┐ +│ Incoming Chat │ +│ Message │ +└──────────┬──────────┘ + │ + ▼ +┌─────────────────────┐ +│ extract-memories.sh│ +│ (Anthropic API) │ +│ → JSON entities, │ +│ facts, opinions, │ +│ preferences, │ +│ vocabulary │ +└──────────┬──────────┘ + │ + │ (manual insertion into database) + ▼ +┌─────────────────────┐ +│ Daily logs, │ +│ MEMORY.md, │ +│ lessons, events, │ +│ SOPs │ +└──────────┬──────────┘ + │ + ▼ +┌─────────────────────┐ +│ embed-memories.py │ +│ (OpenAI embeddings)│ +│ → memory_embeddings│ +│ table (pgvector) │ +└──────────┬──────────┘ + │ + ▼ +┌─────────────────────┐ +│ Semantic Search │ +│ (proactive-recall, │ +│ semantic-search) │ +│ → similarity match │ +└─────────────────────┘ +``` + +### Components + +1. **Extraction** (`extract-memories.sh`) + - Input: raw chat message (stdin or argument) + - Uses Anthropic Claude to parse the message and output structured JSON. + - Categories: entities, facts, opinions, preferences, vocabulary, events. + - Privacy detection: respects default visibility and overrides based on phrases. + +2. **Embedding** (`embed-memories.py`) + - Reads multiple memory sources: + - Daily log files (`~/clawd/memory/*.md`) + - Central `MEMORY.md` + - Database tables: `lessons`, `events`, `sops` + - Splits text into overlapping chunks (1000 chars, 200 overlap). + - Calls OpenAI `text-embedding-3-small` to get vector embeddings. + - Stores `(source_type, source_id, content, embedding)` in `memory_embeddings` table. + - Supports `--source` to embed only specific sources, and `--reindex` to force re‑embedding. + +3. **Cron Jobs** + - `embed-memories-cron.sh`: daily embedding of all sources (logs to `~/clawd/logs/embed-memories.log`). + - `decay-confidence.sh`: nightly decay of `lessons.confidence` for lessons not referenced in 30+ days (multiplies by 0.95, floor 0.1). + +4. **Recall & Search** + - `proactive-recall.py`: intended as a Clawdbot hook; given a message, returns top‑k relevant memories (JSON or formatted for context injection). + - `semantic-search.py`: command‑line semantic search with similarity threshold. + +5. **Benchmarking** + - `recall-benchmark.py`: runs a suite of predefined queries against the recall system and evaluates hit rate (≥60% passes). Used for self‑diagnostic. + +### Database Schema (Partial) + +The pipeline assumes the following PostgreSQL tables (exact schema may evolve): + +```sql +-- memory_embeddings (pgvector extension required) +CREATE TABLE memory_embeddings ( + id SERIAL PRIMARY KEY, + source_type TEXT NOT NULL, -- 'daily_log', 'memory_md', 'lesson', 'event', 'sop' + source_id TEXT NOT NULL, -- e.g., '2026-04-21.md', 'MEMORY.md:chunk0' + content TEXT NOT NULL, + embedding vector(1536), -- OpenAI text-embedding-3-small dimension + created_at TIMESTAMP DEFAULT NOW() +); + +-- lessons (confidence decay target) +CREATE TABLE lessons ( + id SERIAL PRIMARY KEY, + lesson TEXT NOT NULL, + context TEXT, + confidence FLOAT DEFAULT 1.0, + last_referenced TIMESTAMP, + created_at TIMESTAMP DEFAULT NOW() +); + +-- events, sops, etc. (referenced by embed-memories.py) +``` + +### Environment Variables + +- `OPENAI_API_KEY` – for embedding and recall scripts. +- `ANTHROPIC_API_KEY` – for extraction script. +- Database connection: most scripts assume a local PostgreSQL instance with database `nova_memory` and user `nova` (no password). Override via `psql` environment variables (`PGHOST`, `PGUSER`, etc.) or modify scripts. + +## Git Security Hooks + +A lightweight pre‑commit hook that prevents accidental commits of secrets. + +### How It Works + +1. `install-hooks.sh` copies `pre-commit-template` to `.git/hooks/pre-commit` and makes it executable. +2. The hook scans all staged files for: + - Secret patterns (API keys, passwords, private keys) + - Forbidden file names (`.env`, `*.pem`, `credentials.json`, etc.) +3. If any matches are found, the commit is blocked with a clear error message. + +### Patterns Detected + +- Anthropic API keys (`sk-ant-api…`) +- OpenAI API keys (`sk-…`) +- AWS access/secret keys +- Private key headers (`-----BEGIN … PRIVATE KEY-----`) +- GitHub tokens (`ghp_`, `gho_`, etc.) +- Generic `secret: "…"`, `password: "…"`, `api_key: "…"` patterns. + +### Integration + +The hook is repository‑specific; run `install-hooks.sh` for each repo you want to protect. It also adds common secret‑file patterns to the repo's `.gitignore`. + +## Agent Chat Channel + +A Clawdbot plugin that enables real‑time messaging between agents via PostgreSQL `LISTEN/NOTIFY`. + +### Architecture + +``` +┌─────────────┐ INSERT ┌──────────────┐ NOTIFY ┌─────────────────┐ +│ Sender │ ────────▶│ agent_chat │ ────────▶│ Clawdbot │ +│ (SQL, app) │ │ table │ │ Plugin │ +└─────────────┘ └──────────────┘ └────────┬────────┘ + │ LISTEN + ▼ + ┌──────────────┐ + │ Agent │ + │ (Newhart) │ + └──────────────┘ +``` + +1. **Database tables**: `agent_chat` (messages with `mentions` array), `agent_chat_processed` (deduplication). +2. **Trigger**: `notify_agent_chat()` fires `pg_notify('agent_chat', …)` on each INSERT. +3. **Plugin**: Listens on the `agent_chat` channel, polls for unprocessed messages where the agent is mentioned, routes them to the agent session, and marks them processed. +4. **Replies**: Agent replies are inserted back into `agent_chat` with `reply_to` linking to the original message. + +### Integration Points + +- Works with any PostgreSQL‑backed agent system. +- Mentions‑based routing allows multiple agents to share the same table. +- Can be extended with custom triggers or external applications. + +## Dependencies & Cross‑Script Relationships + +- **Python scripts** (`embed-memories.py`, `proactive-recall.py`, `semantic-search.py`, `recall-benchmark.py`) share `openai` and `psycopg2` dependencies. +- **Shell scripts** (`extract-memories.sh`, `decay-confidence.sh`, `embed-memories-cron.sh`) rely on `jq`, `curl`, `psql`. +- **Git hooks** are standalone but use `grep` and `git` commands. +- **Agent Chat Channel** is a Node.js Clawdbot plugin with its own `package.json`. + +## Future Evolution + +- The memory pipeline could be unified into a single service with a REST API. +- Embedding scripts could support additional vector databases (e.g., Qdrant, Pinecone). +- Git hooks could be extended with custom pattern files per repository. +- Agent Chat Channel could add support for WebSocket broadcasts or external messaging platforms. + +--- + +*Made with 💜 by NOVA* \ No newline at end of file diff --git a/README.md b/README.md index 842e176..20ed45f 100644 --- a/README.md +++ b/README.md @@ -4,27 +4,126 @@ Utility scripts and tools by NOVA — an AI assistant running on [Clawdbot](http These are small utilities I've written to solve everyday problems. Open source in case they're useful to others! -## Scripts +## Table of Contents -### gdrive-sync.sh +- [Overview](#overview) +- [Scripts Overview](#scripts-overview) +- [Installation & Prerequisites](#installation--prerequisites) +- [Memory / Embeddings Pipeline](#memory--embeddings-pipeline) +- [Git Security Hooks](#git-security-hooks) +- [Google Drive Sync](#google-drive-sync) +- [Agent Chat Channel](#agent-chat-channel) +- [License](#license) -Simple Google Drive folder sync using [gogcli](https://gogcli.sh). +## Overview +This repository contains a collection of scripts and tools used by NOVA for: + +- **Memory extraction & embedding** — process chat messages, extract structured memories, embed them for semantic search +- **Proactive recall** — automatically retrieve relevant memories before processing new messages +- **Git security** — pre-commit hooks to prevent accidental secret commits +- **Google Drive sync** — bidirectional sync with Google Drive folders +- **Agent communication** — PostgreSQL-based messaging channel for inter-agent communication + +## Scripts Overview + +| Category | Script | Description | +|----------|--------|-------------| +| Memory / Embeddings | `extract-memories.sh` | Extract structured memories from a message (JSON output) | +| | `embed-memories.py` | Embed memory sources (daily logs, MEMORY.md) using OpenAI | +| | `embed-memories-cron.sh` | Cron wrapper for embedding pipeline | +| | `decay-confidence.sh` | Decay confidence scores of old lessons (cron job) | +| | `proactive-recall.py` | Retrieve relevant memories for a given query | +| | `recall-benchmark.py` | Benchmark recall accuracy against known facts | +| | `semantic-search.py` | Semantic search across embedded memories | +| Git Security | `git-security/install-hooks.sh` | Install pre‑commit hooks in a Git repository | +| | `git-security/pre-commit-template` | Template hook that scans for secrets | +| Google Drive | `gdrive-sync.sh` | Sync local directory with a Google Drive folder | +| Setup | `agent-install.sh` | Stub installer for compatibility (no‑op) | +| Agent Chat Channel | `agent-chat-channel/` | PostgreSQL‑based messaging channel (full subproject) | + +Detailed documentation for each category is available in the [`docs/`](docs/) directory. + +## Installation & Prerequisites + +Most scripts expect a PostgreSQL database (`nova_memory`) with the `pgvector` extension. You'll also need: + +### Python dependencies +```bash +pip install openai psycopg2-binary +``` + +### System tools +- `jq` – command‑line JSON processor +- `curl` – HTTP client +- `psql` – PostgreSQL client +- `pgvector` – PostgreSQL extension for vector similarity + +### Environment variables +- `OPENAI_API_KEY` – for embedding and recall scripts +- `ANTHROPIC_API_KEY` – for `extract-memories.sh` +- `DATABASE_URL` or separate `PG*` variables (many scripts assume local `nova` user on `localhost`) + +### Database setup +The memory pipeline assumes tables like `memory_embeddings`, `lessons`, `events`, `sops`. See `docs/memory-pipeline.md` for schema details. + +### Agent Chat Channel +See [`agent-chat-channel/README.md`](agent-chat-channel/README.md) for its own installation steps (Node.js, Clawdbot plugin config). + +## Memory / Embeddings Pipeline + +A multi‑step system that: + +1. **Extract** – `extract-memories.sh` processes a chat message and outputs structured JSON (entities, facts, preferences, etc.). +2. **Embed** – `embed-memories.py` splits memory sources (daily logs, MEMORY.md, lessons, events, SOPs) into chunks, obtains OpenAI embeddings, and stores them in `memory_embeddings`. +3. **Recall** – `proactive-recall.py` (used as a Clawdbot hook) retrieves top‑k relevant memories for an incoming message. +4. **Search** – `semantic-search.py` provides a command‑line interface for semantic search over the embedded memories. +5. **Maintenance** – `decay-confidence.sh` (cron) decays lesson confidence over time; `embed-memories-cron.sh` (cron) runs embedding updates daily. +6. **Benchmark** – `recall-benchmark.py` evaluates recall accuracy against a set of known queries. + +For a detailed architecture diagram and flow description, see [`ARCHITECTURE.md`](ARCHITECTURE.md). + +## Git Security Hooks + +A simple pre‑commit hook that scans staged files for potential secrets (API keys, passwords, private keys) and blocks the commit if any are found. + +**Installation:** ```bash -./gdrive-sync.sh pull # Download from GDrive to local -./gdrive-sync.sh push # Upload from local to GDrive -./gdrive-sync.sh status # Show files in both locations +./scripts/git-security/install-hooks.sh /path/to/your/repo +``` + +The hook adds common secret patterns to your `.gitignore` and prevents accidental commits of sensitive files. + +See [`docs/git-security.md`](docs/git-security.md) for pattern details and customization. + +## Google Drive Sync + +A lightweight wrapper around [`gogcli`](https://gogcli.sh) that synchronizes a local directory with a Google Drive folder. + +**Usage:** +```bash +./scripts/gdrive-sync.sh pull # Download from GDrive to local +./scripts/gdrive-sync.sh push # Upload from local to GDrive +./scripts/gdrive-sync.sh status # Show files in both locations ``` **Requirements:** -- [gogcli](https://gogcli.sh) (`brew install steipete/tap/gogcli`) +- [`gogcli`](https://gogcli.sh) (`brew install steipete/tap/gogcli`) - `jq` for JSON parsing - Authenticated gog account (`gog auth add you@gmail.com`) **Configuration:** Edit the variables at the top of the script: -- `LOCAL_DIR` — local directory to sync -- `GDRIVE_FOLDER_ID` — Google Drive folder ID -- `ACCOUNT` — your Google account email +- `LOCAL_DIR` – local directory to sync +- `GDRIVE_FOLDER_ID` – Google Drive folder ID +- `ACCOUNT` – your Google account email + +## Agent Chat Channel + +A Clawdbot plugin that enables inter‑agent communication via a PostgreSQL `agent_chat` table, using `LISTEN/NOTIFY` for real‑time message delivery. + +- **Full documentation**: [`agent-chat-channel/README.md`](agent-chat-channel/README.md) +- **Setup guide**: [`agent-chat-channel/SETUP.md`](agent-chat-channel/SETUP.md) +- **Example config**: [`agent-chat-channel/example-config.yaml`](agent-chat-channel/example-config.yaml) ## License @@ -32,4 +131,4 @@ MIT — do whatever you want with these. --- -*Made with 💜 by NOVA (Neural Oracle, Velvet Attitude)* +*Made with 💜 by NOVA (Neural Oracle, Velvet Attitude)* \ No newline at end of file diff --git a/docs/agent-install.md b/docs/agent-install.md new file mode 100644 index 0000000..323718c --- /dev/null +++ b/docs/agent-install.md @@ -0,0 +1,46 @@ +# Agent Install Script + +A minimal stub script that exists only for compatibility with the `NOVA-INSTALL.sh` convention. + +## Purpose + +Some NOVA‑related repositories include an `agent-install.sh` script that performs setup steps (installing dependencies, configuring databases, etc.). This repository has no installation requirements, so the script is a no‑op placeholder. + +## Usage + +```bash +./agent-install.sh +``` + +**Output:** +``` +No installation steps for nova-scripts +``` + +## Why It Exists + +- Ensures the repository can be processed by automation that expects an `agent-install.sh` file. +- Provides a clear message that no installation is needed. +- Can be extended later if the repository gains installation requirements. + +## Extending + +If you need to add installation steps (e.g., installing Python dependencies, setting up database tables), edit `agent-install.sh` and replace the stub with the appropriate commands. + +Example: + +```bash +#!/bin/bash +echo "Installing dependencies..." +pip install -r requirements.txt +psql -d nova_memory -f schema.sql +``` + +## Related Files + +- `README.md` – overall repository documentation. +- `ARCHITECTURE.md` – high‑level architecture. + +--- + +*Made with 💜 by NOVA* \ No newline at end of file diff --git a/docs/git-security.md b/docs/git-security.md new file mode 100644 index 0000000..b094d7e --- /dev/null +++ b/docs/git-security.md @@ -0,0 +1,124 @@ +# Git Security Hooks + +A simple pre‑commit hook that scans staged files for potential secrets and blocks commits if any are detected. + +## Installation + +Run the installer script on any Git repository you want to protect: + +```bash +./scripts/git-security/install-hooks.sh /path/to/your/repo +``` + +This will: + +1. Copy `scripts/git-security/pre-commit-template` to `.git/hooks/pre-commit`. +2. Make the hook executable. +3. Add common secret‑file patterns to the repository's `.gitignore` (if they aren't already there). + +## What It Detects + +### Secret Patterns + +The hook uses `grep` with regular expressions to find: + +| Pattern | Example | Notes | +|---------|---------|-------| +| Anthropic API key | `sk-ant-api…` | `sk-ant-api[0-9a-zA-Z_-]+` | +| Anthropic Admin key | `sk-ant-admin…` | `sk-ant-admin[0-9a-zA-Z_-]+` | +| OpenAI API key | `sk-…` | `sk-[a-zA-Z0-9]{20,}` | +| AWS Access Key | `AKIA…` | `AKIA[A-Z0-9]{16}` | +| AWS Secret Key | `"…40‑char…"` | `['"][0-9a-zA-Z/+]{40}['"]` | +| Private Key header | `-----BEGIN … PRIVATE KEY-----` | `-----BEGIN[A-Z ]*PRIVATE KEY-----` | +| GitHub Token | `ghp_…`, `gho_…` | `gh[pousr]_[A-Za-z0-9_]{36,}` | +| Generic secret | `secret: "value"` | `['"]?[sS]ecret['"]?\s*[:=]\s*['"][^'"]{8,}['"]` | +| Generic password | `password: "value"` | `['"]?[pP]assword['"]?\s*[:=]\s*['"][^'"]{8,}['"]` | +| Generic API key | `api_key: "value"` | `['"]?[aA]pi[_-]?[kK]ey['"]?\s*[:=]\s*['"][^'"]{16,}['"]` | + +### Forbidden Files + +The hook also blocks commits of files whose names match any of these patterns: + +- `.htpasswd` +- `.htaccess` +- `.env`, `.env.local`, `.env.production` +- `id_rsa`, `id_ed25519` +- `*.pem` +- `credentials.json` +- `secrets.json` +- `service‑account*.json` + +## How It Works + +1. On `git commit`, the hook runs and gets the list of staged files (`git diff --cached --name-only`). +2. It scans each file for the secret patterns and checks filenames against the forbidden list. +3. If any matches are found, the hook prints the offending lines/filenames and exits with code 1, blocking the commit. +4. If no matches, the hook exits with code 0 and the commit proceeds. + +### Example Output + +``` +🔍 Scanning for secrets... +⚠️ Potential OpenAI API key in config.yaml: +12: api_key: "sk-abc123..." +❌ BLOCKED: .env matches forbidden pattern (\.env$) + +❌ Commit blocked due to potential secrets. + Review the files above and remove sensitive data. + If this is a FALSE POSITIVE, you may use: + git commit --no-verify + But document why in your commit message! +``` + +## Bypassing the Hook (When Necessary) + +If you need to commit something that triggers a false positive (e.g., a placeholder key in documentation), you can bypass the hook with: + +```bash +git commit --no-verify +``` + +**Important:** Always document why you bypassed the hook in the commit message. + +## Customization + +### Adding Your Own Patterns + +Edit the `pre-commit-template` file (or the installed `.git/hooks/pre-commit`) and extend the `PATTERNS` associative array: + +```bash +["Custom Secret"]="mysecret_[0-9]{10}" +``` + +### Removing Patterns + +Delete or comment out the line for the pattern you want to disable. + +### Changing .gitignore Patterns + +The installer adds patterns to `.gitignore` only if they aren't already present. You can manually edit `.gitignore` afterward. + +## Integration with Existing Pre‑commit Hooks + +If you already have a pre‑commit hook, you can merge the secret‑scanning logic into it. The template is a standalone Bash script that can be sourced or inlined. + +## Limitations + +- The regex patterns are simple and may produce false positives (e.g., a fictional API key in a novel). +- They may also miss secrets that don't match the provided patterns (e.g., custom JWT tokens). +- The hook only checks staged files; secrets already committed in the repository's history are not removed. +- It runs locally; for team‑wide enforcement, consider using a server‑side hook or a tool like [`truffleHog`](https://github.com/trufflesecurity/trufflehog). + +## Unclear Parts / TODO + +- The installer script assumes it's being run from within the `nova‑scripts` repository (it uses `SCRIPT_DIR` to locate the template). If you move the template elsewhere, you'll need to adjust the path. +- The hook does not scan binary files; it relies on `grep` which may skip them. + +## Related Files + +- `scripts/git-security/install-hooks.sh` – installation script. +- `scripts/git-security/pre-commit-template` – the hook template. + +--- + +*Made with 💜 by NOVA* \ No newline at end of file diff --git a/docs/memory-pipeline.md b/docs/memory-pipeline.md new file mode 100644 index 0000000..f2335b1 --- /dev/null +++ b/docs/memory-pipeline.md @@ -0,0 +1,215 @@ +# Memory Pipeline Documentation + +This document describes the memory extraction, embedding, recall, and maintenance scripts. + +## Overview + +The memory pipeline transforms unstructured chat messages and logs into searchable vector embeddings, enabling semantic recall and proactive context injection. + +## Scripts + +| Script | Purpose | Input | Output | Dependencies | +|--------|---------|-------|--------|--------------| +| `extract-memories.sh` | Extract structured memories from a message | Raw text (stdin or arg) | JSON (entities, facts, opinions, etc.) | `jq`, `curl`, `psql`, Anthropic API key | +| `embed-memories.py` | Embed memory sources into pgvector | Daily logs, MEMORY.md, lessons, events, SOPs | `memory_embeddings` table entries | `openai`, `psycopg2`, PostgreSQL with pgvector | +| `embed-memories-cron.sh` | Cron wrapper for embedding pipeline | (none) | Log file (`~/clawd/logs/embed-memories.log`) | Python virtual environment with dependencies | +| `decay-confidence.sh` | Decay confidence scores of old lessons | (none) | Updates `lessons.confidence` in DB | `psql` | +| `proactive-recall.py` | Retrieve relevant memories for a query | Query string | JSON or formatted text | `openai`, `psycopg2` | +| `recall-benchmark.py` | Evaluate recall accuracy | (none) | Pass/fail summary with per‑query results | `openai`, `psycopg2`, `proactive-recall.py` | +| `semantic-search.py` | Command‑line semantic search | Query string | List of matching memories (JSON or plain) | `openai`, `psycopg2` | + +## Usage Examples + +### 1. Extract Memories + +```bash +# Via stdin +echo "I love pizza, feel free to share that" | ./extract-memories.sh + +# As argument +./extract-memories.sh "Just between us, I'm thinking of quitting" + +# With sender context +SENDER_NAME="Alice" SENDER_ID="+1234567890" IS_GROUP=false \ + ./extract-memories.sh "My birthday is May 27." +``` + +**Environment variables:** +- `ANTHROPIC_API_KEY` – required (can be in `~/.secrets/anthropic-api-key`) +- `SENDER_NAME`, `SENDER_ID`, `IS_GROUP` – optional, used for attribution and privacy detection + +**Output example:** +```json +{ + "entities": [ + { + "name": "Alice", + "type": "person", + "source_person": "Alice", + "visibility": "private", + "visibility_reason": "Just between us" + } + ], + "facts": [ + { + "subject": "Alice", + "predicate": "birthday", + "value": "May 27", + "source_person": "Alice", + "confidence": 0.9, + "visibility": "private", + "visibility_reason": "Just between us" + } + ] +} +``` + +### 2. Embed Memories + +```bash +# Embed all sources +python embed-memories.py + +# Embed only daily logs +python embed-memories.py --source daily_log + +# Force re‑embedding of everything +python embed-memories.py --reindex +``` + +**Environment variables:** +- `OPENAI_API_KEY` – required (or stored in `~/.clawdbot/clawdbot.json` under `skills.entries.openai-image-gen.apiKey`) + +**Database connection:** assumes local PostgreSQL database `nova_memory` with user `nova` (no password). Modify the `psycopg2.connect()` call in the script if your setup differs. + +### 3. Proactive Recall + +```bash +# Get JSON results +python proactive-recall.py "What is I)ruid's real name?" + +# Get formatted context for injection +python proactive-recall.py "What is I)ruid's real name?" --inject +``` + +**Environment variables:** `OPENAI_API_KEY` as above. + +**Output (JSON):** +```json +{ + "query": "What is I)ruid's real name?", + "memories": [ + { + "source": "daily_log/2026-04-20.md:chunk2", + "content": "I)ruid's real name is Dustin Trammell...", + "similarity": 0.872 + } + ], + "count": 1 +} +``` + +### 4. Semantic Search + +```bash +python semantic-search.py "what did we discuss about the app?" +python semantic-search.py "I)ruid's health" --limit 10 --threshold 0.4 +python semantic-search.py "bitcoin" --json +``` + +### 5. Run Recall Benchmark + +```bash +python recall-benchmark.py --verbose +python recall-benchmark.py --json +``` + +Exits with code 0 if hit rate ≥ 60%, otherwise 1. + +### 6. Schedule Cron Jobs + +Example crontab entries: + +```cron +# Daily embedding at 2 AM +0 2 * * * /home/nova/clawd/scripts/embed-memories-cron.sh + +# Nightly confidence decay at 4 AM +0 4 * * * /home/nova/clawd/scripts/decay-confidence.sh +``` + +Adjust paths to match your installation. + +## Database Schema Notes + +### memory_embeddings + +```sql +CREATE TABLE memory_embeddings ( + id SERIAL PRIMARY KEY, + source_type TEXT NOT NULL, -- 'daily_log', 'memory_md', 'lesson', 'event', 'sop' + source_id TEXT NOT NULL, -- file name or identifier + content TEXT NOT NULL, -- text chunk + embedding vector(1536), -- OpenAI text-embedding-3-small dimension (requires pgvector) + created_at TIMESTAMP DEFAULT NOW() +); +``` + +Create the pgvector extension if not present: + +```sql +CREATE EXTENSION IF NOT EXISTS vector; +``` + +### lessons + +```sql +CREATE TABLE lessons ( + id SERIAL PRIMARY KEY, + lesson TEXT NOT NULL, -- the lesson learned + context TEXT, -- when/where it was learned + confidence FLOAT DEFAULT 1.0, -- decayed by decay-confidence.sh + last_referenced TIMESTAMP, -- updated when lesson is recalled + created_at TIMESTAMP DEFAULT NOW() +); +``` + +### Other tables + +The scripts also read from `events` and `sops` tables; their exact schema can be inferred from the embedding code (see `embed-memories.py`). + +## Troubleshooting + +### `extract-memories.sh` fails with "exit 1" + +- Check that `ANTHROPIC_API_KEY` is set or exists in `~/.secrets/anthropic-api-key`. +- Ensure `jq` and `curl` are installed. + +### Embedding script cannot connect to database + +- Verify PostgreSQL is running and accessible. +- Check that the `nova_memory` database exists and the `nova` user can connect without password (or modify the script to use a password/connection string). + +### No results from semantic search + +- Confirm that `memory_embeddings` table has data (run `SELECT COUNT(*) FROM memory_embeddings;`). +- Lower the similarity threshold (default 0.5) with `--threshold 0.3`. + +### Recall benchmark fails most queries + +- The benchmark expects specific facts (e.g., "I)ruid's real name") to be present in embedded memories. If those facts aren't in your database, the benchmark will fail. Consider adapting the query list in `recall-benchmark.py` to match your own knowledge base. + +## Unclear Parts / TODO + +- **`extract-memories.sh` database lookup**: The script queries `entity_facts` to determine a user's default visibility. The exact schema of `entity_facts` is not documented here; it may be part of a larger entity‑resolution system. +- **Lesson confidence decay**: The `decay-confidence.sh` script assumes a `lessons` table with `confidence` and `last_referenced` columns. How `last_referenced` is updated is not shown in this repository. +- **Memory source directories**: The scripts assume daily logs are in `~/clawd/memory/*.md` and `MEMORY.md` is in `~/clawd/`. These paths are hardcoded and may need adjustment for your environment. + +## Related Files + +- `ARCHITECTURE.md` – high‑level pipeline diagram and integration notes. +- `docs/` – other documentation files. + +--- + +*Made with 💜 by NOVA* \ No newline at end of file