Skip to content

51193/MemoryBridge

Repository files navigation

MemoryBridge

English | 中文

Drop-in memory middleware for LLM agents. Point your agent's OpenAI-compatible API endpoint to MemoryBridge and it automatically retrieves relevant memories, manages conversation context, and stores new memories — all without changing a single line of your agent code.

Agent ──(OpenAI API + Token)──→ MemoryBridge ──(HTTP)──→ Your LLM Provider
                                 │   ↑
                                 │   ├── Retrieve memories  (read)
                                 │   └── Store memories     (write)
                                 │
                             ┌───┴────┐
                             │  Qdrant  │
                             └───┬────┘
                                 │
                             Mem0 (memory engine)

For a deep dive into the architecture, see docs/ARCHITECTURE.md.


Why MemoryBridge?

Problem Solution
Agents forget past conversations Automatic long-term memory retrieval and injection
Every LLM provider has different APIs OpenAI-compatible single endpoint; provider details per token
Memory code mixed with agent logic Transparent proxy — agent sees only a standard chat API
Manual context window management Automatic session windowing and memory truncation
Stateless chat requests lose context Automatic session history injection — recent conversation history prepended after system prompt

Key Design Choices:

Principle Description
Drop-in compatible Same request/response format as any OpenAI-compatible chat API
Transparent proxy Only intercepts messages for memory injection; all other fields passed through untouched
Tool-call safe LLM tool_calls responses are preserved in session history for correct multi-turn tool chains
Hard failure, no fallback Missing token → 401, invalid config → fail immediately. Never guess, degrade, or default
Per-request isolation Fresh handler instance per HTTP request, no shared mutable state
Config externalized LLM/Embedding keys stored per-token via Admin API; .env holds only infrastructure
Minimal dependencies Zero Docker, zero Redis, zero ORM, zero message queue

Quick Start

Prerequisites

  • Python 3.11+
  • Linux x86_64 / aarch64, or macOS x86_64 / arm64

3 Steps to Run

# 1. Install
git clone https://github.com/51193/MemoryBridge && cd MemoryBridge
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
uv sync

# 2. Initialize (downloads Qdrant, creates databases, generates admin token)
uv run python src/memory_bridge/host_manager.py --init
# Output: ADMIN TOKEN (save this): <32-char hex>

# 3. Start
uv run python src/memory_bridge/host_manager.py
# Visit http://localhost:8000/health → {"status":"ok","qdrant":"connected"}

Register a Service Token

After starting, register a Service Token using the Admin Token from step 2 above. Each token carries its own LLM and Embedding configuration:

curl -X POST http://localhost:8000/v1/admin/tokens \
  -H "Authorization: Bearer <admin_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "label": "my-llm",
    "main_llm_base_url": "https://api.deepseek.com",
    "main_llm_api_key": "sk-xxx",
    "memory_llm_base_url": "https://api.deepseek.com",
    "memory_llm_api_key": "sk-xxx",
    "memory_llm_provider": "openai",
    "embed_base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
    "embed_api_key": "sk-xxx",
    "embed_dims": 1024,
    "embed_provider": "openai"
  }'
# Returns: {"token": "<32-char service_token>", ...}

Make Your First Request

# Exactly like calling any LLM API — just use the service token
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer <service_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-chat",
    "messages": [{"role": "user", "content": "Hi, my name is Ming"}],
    "stream": false
  }'

MemoryBridge will retrieve any relevant past memories, inject them into the system prompt, forward the enriched request to your LLM provider, and asynchronously store new memories from the conversation — all transparently.

Multi-turn Conversation (with Streaming)

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer <service_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-chat",
    "messages": [
      {"role": "user", "content": "My name is Ming"},
      {"role": "assistant", "content": "Hello Ming!"},
      {"role": "user", "content": "What is my name?"}
    ],
    "stream": true
  }'

Token System

MemoryBridge enforces token authentication. There are two types:

Type Purpose Created By
Admin Token Register/manage Service Tokens --init auto-generates
Service Token Call /v1/chat/completions Admin API

Each Service Token carries its full LLM/Embedding configuration:

Field Required Description
label No Human-readable label
main_llm_base_url Yes Chat LLM base URL (path /v1/chat/completions auto-appended)
main_llm_api_key Yes Chat LLM API key
memory_llm_base_url Yes Memory extraction LLM base URL
memory_llm_api_key Yes Memory extraction LLM API key
memory_llm_provider Yes Memory extraction LLM provider (e.g. deepseek, dashscope)
memory_llm_body No Transparent JSON (model, temperature, etc.)
embed_base_url Yes Embedding service base URL
embed_api_key Yes Embedding API key
embed_dims Yes Vector dimensions (e.g. 1024); must not be 0
embed_provider Yes Embedding provider name (e.g. openai, dashscope)
embed_body No Transparent JSON (model, etc.)
memory_enabled No Enable memory (default: true)
memory_limit No Memories per retrieval (1–20, default: 5)
session_window_size No Session history window (1–100, default: 10)
memory_prompt No Custom memory extraction prompt

Hard-failure semantics: If any required field is null, empty, or zero (for numeric fields), the token is treated as invalid and returns 401.


API Reference

Chat Endpoint

Method Path Description
POST /v1/chat/completions OpenAI-compatible chat, with memory injection

Admin Endpoints

Method Path Description
POST /v1/admin/tokens Register a Service Token (201)
GET /v1/admin/tokens/{token} View config (API keys masked)
POST /v1/admin/tokens/{token} Update config (fields optional)
DELETE /v1/admin/tokens/{token} Delete a token (204)

Health

Method Path Description
GET /health Health check (bridge + Qdrant status)

Error Codes

Status Meaning
200 Success
201 Token registered
204 Token deleted
401 Token missing, invalid, or config parse failed
403 Service token used on admin endpoint
404 Token not found
422 Request validation failed
500 Unexpected internal error
502 Memory retrieval failed or LLM provider unreachable
Other Passed through from LLM provider (status code + headers + body)

Request Flow

Incoming request
  → TokenAuthMiddleware (auth + request-id + read raw JSON body)
  → router (parse messages, validate, create Handler instance)
  → ChatRequestHandler.execute()
      ├─ SessionStore.get()       ← Read recent conversation history
      ├─ MemoryManager.search()   ← Memory retrieval (Qdrant vector search)
      ├─ ContextBuilder.build()   ← Assemble context (memory block + request)
      ├─ _inject_history()        ← Prepend session history after system prompt
      ├─ response_sender.send_*() ← Forward enriched request to LLM
      └─ _store_memory()          ← Async store (session + memories)

Project Structure

src/memory_bridge/
├── main.py              # FastAPI app factory + lifespan
├── host_manager.py      # Process manager (Qdrant + API subprocess)
├── host_process.py      # Subprocess lifecycle management
├── host_init.py         # First-time initialization
├── config.py            # Infrastructure config (pydantic-settings)
├── exceptions.py        # Custom exceptions
├── api/                 # Routes + middleware + request handling
│   ├── response_sender.py   # Non-stream / streaming LLM dispatch
├── core/                # Memory / session / context / token / logging
├── providers/           # HTTP client (transparent proxy)
└── models/              # Request / response models

templates/
    ├── memory_template.md     # Memory injection template (English)
    └── memory_template_zh.md  # Memory injection template (Chinese)

Development

uv sync --extra dev            # Install dev dependencies
uv run mypy src/               # Type check (strict mode)
uv run ruff check src/ tests/  # Lint
uv run pytest -v               # Unit tests
uv run pytest --cov=src/memory_bridge --cov-fail-under=95  # Coverage gate

The .env template (committed, no secrets) includes MEMORY_SEARCH_TIMEOUT=30 (to prevent memory searches from hanging indefinitely) and MEMORY_STORE_TIMEOUT=120 (to protect background memory store operations that involve LLM fact extraction).

Test structure mirrors source:

tests/
├── api/              ← src/memory_bridge/api/
├── core/             ← src/memory_bridge/core/
├── providers/        ← src/memory_bridge/providers/
├── architecture/     # Architecture guard tests (boundary checks)
└── integration/      # Integration smoke tests

Install from Release

Download the single-file .pyz from GitHub Releases:

mkdir -p /opt/memorybridge && cd /opt/memorybridge
wget https://github.com/51193/MemoryBridge/releases/latest/download/memorybridge.pyz
python3 memorybridge.pyz --init     # First-time setup
python3 memorybridge.pyz            # Start

License

Apache 2.0 — see also NOTICE for bundled open source attributions, and docs/LICENSES.md for a full third-party license summary.

About

Plug-and-play long-term memory for LLM agents, zero agent code changes.  即插即用的 LLM Agent 长期记忆,无需修改Agent代码。

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors