diff --git a/README.md b/README.md index fc49782..7af7f1f 100644 --- a/README.md +++ b/README.md @@ -6,32 +6,21 @@ A progressive RAG system built from first principles -- from raw embeddings and ## What It Does (Current State) -
| - -Ingestion +**Ingestion** 1. **Loads** `.txt` files (PDF, DOCX, Markdown from Phase 4) 2. **Chunks** each document into overlapping word windows 3. **Embeds** each chunk using OpenAI `text-embedding-3-small`, producing a 1536-dimensional vector 4. **Stores** vectors with metadata (`source`, `chunk_index`) in a persistent Chroma collection -Search +**Search** 1. **Embeds** the query using the same model 2. **Queries** Chroma for the top-K nearest vectors using built-in ANN (Approximate Nearest Neighbor) search 3. **Returns** results with chunk text, source filename, and distance score - | --  - | -