English | 中文
Drop-in memory middleware for LLM agents. Point your agent's OpenAI-compatible API endpoint to MemoryBridge and it automatically retrieves relevant memories, manages conversation context, and stores new memories — all without changing a single line of your agent code.
Agent ──(OpenAI API + Token)──→ MemoryBridge ──(HTTP)──→ Your LLM Provider
│ ↑
│ ├── Retrieve memories (read)
│ └── Store memories (write)
│
┌───┴────┐
│ Qdrant │
└───┬────┘
│
Mem0 (memory engine)
For a deep dive into the architecture, see docs/ARCHITECTURE.md.
| Problem | Solution |
|---|---|
| Agents forget past conversations | Automatic long-term memory retrieval and injection |
| Every LLM provider has different APIs | OpenAI-compatible single endpoint; provider details per token |
| Memory code mixed with agent logic | Transparent proxy — agent sees only a standard chat API |
| Manual context window management | Automatic session windowing and memory truncation |
| Stateless chat requests lose context | Automatic session history injection — recent conversation history prepended after system prompt |
Key Design Choices:
| Principle | Description |
|---|---|
| Drop-in compatible | Same request/response format as any OpenAI-compatible chat API |
| Transparent proxy | Only intercepts messages for memory injection; all other fields passed through untouched |
| Tool-call safe | LLM tool_calls responses are preserved in session history for correct multi-turn tool chains |
| Hard failure, no fallback | Missing token → 401, invalid config → fail immediately. Never guess, degrade, or default |
| Per-request isolation | Fresh handler instance per HTTP request, no shared mutable state |
| Config externalized | LLM/Embedding keys stored per-token via Admin API; .env holds only infrastructure |
| Minimal dependencies | Zero Docker, zero Redis, zero ORM, zero message queue |
- Python 3.11+
- Linux x86_64 / aarch64, or macOS x86_64 / arm64
# 1. Install
git clone https://github.com/51193/MemoryBridge && cd MemoryBridge
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
uv sync
# 2. Initialize (downloads Qdrant, creates databases, generates admin token)
uv run python src/memory_bridge/host_manager.py --init
# Output: ADMIN TOKEN (save this): <32-char hex>
# 3. Start
uv run python src/memory_bridge/host_manager.py
# Visit http://localhost:8000/health → {"status":"ok","qdrant":"connected"}After starting, register a Service Token using the Admin Token from step 2 above. Each token carries its own LLM and Embedding configuration:
curl -X POST http://localhost:8000/v1/admin/tokens \
-H "Authorization: Bearer <admin_token>" \
-H "Content-Type: application/json" \
-d '{
"label": "my-llm",
"main_llm_base_url": "https://api.deepseek.com",
"main_llm_api_key": "sk-xxx",
"memory_llm_base_url": "https://api.deepseek.com",
"memory_llm_api_key": "sk-xxx",
"memory_llm_provider": "openai",
"embed_base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
"embed_api_key": "sk-xxx",
"embed_dims": 1024,
"embed_provider": "openai"
}'
# Returns: {"token": "<32-char service_token>", ...}# Exactly like calling any LLM API — just use the service token
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer <service_token>" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-chat",
"messages": [{"role": "user", "content": "Hi, my name is Ming"}],
"stream": false
}'MemoryBridge will retrieve any relevant past memories, inject them into the system prompt, forward the enriched request to your LLM provider, and asynchronously store new memories from the conversation — all transparently.
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer <service_token>" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-chat",
"messages": [
{"role": "user", "content": "My name is Ming"},
{"role": "assistant", "content": "Hello Ming!"},
{"role": "user", "content": "What is my name?"}
],
"stream": true
}'MemoryBridge enforces token authentication. There are two types:
| Type | Purpose | Created By |
|---|---|---|
| Admin Token | Register/manage Service Tokens | --init auto-generates |
| Service Token | Call /v1/chat/completions |
Admin API |
Each Service Token carries its full LLM/Embedding configuration:
| Field | Required | Description |
|---|---|---|
label |
No | Human-readable label |
main_llm_base_url |
Yes | Chat LLM base URL (path /v1/chat/completions auto-appended) |
main_llm_api_key |
Yes | Chat LLM API key |
memory_llm_base_url |
Yes | Memory extraction LLM base URL |
memory_llm_api_key |
Yes | Memory extraction LLM API key |
memory_llm_provider |
Yes | Memory extraction LLM provider (e.g. deepseek, dashscope) |
memory_llm_body |
No | Transparent JSON (model, temperature, etc.) |
embed_base_url |
Yes | Embedding service base URL |
embed_api_key |
Yes | Embedding API key |
embed_dims |
Yes | Vector dimensions (e.g. 1024); must not be 0 |
embed_provider |
Yes | Embedding provider name (e.g. openai, dashscope) |
embed_body |
No | Transparent JSON (model, etc.) |
memory_enabled |
No | Enable memory (default: true) |
memory_limit |
No | Memories per retrieval (1–20, default: 5) |
session_window_size |
No | Session history window (1–100, default: 10) |
memory_prompt |
No | Custom memory extraction prompt |
Hard-failure semantics: If any required field is null, empty, or zero (for numeric fields), the token is treated as invalid and returns 401.
| Method | Path | Description |
|---|---|---|
POST |
/v1/chat/completions |
OpenAI-compatible chat, with memory injection |
| Method | Path | Description |
|---|---|---|
POST |
/v1/admin/tokens |
Register a Service Token (201) |
GET |
/v1/admin/tokens/{token} |
View config (API keys masked) |
POST |
/v1/admin/tokens/{token} |
Update config (fields optional) |
DELETE |
/v1/admin/tokens/{token} |
Delete a token (204) |
| Method | Path | Description |
|---|---|---|
GET |
/health |
Health check (bridge + Qdrant status) |
| Status | Meaning |
|---|---|
200 |
Success |
201 |
Token registered |
204 |
Token deleted |
401 |
Token missing, invalid, or config parse failed |
403 |
Service token used on admin endpoint |
404 |
Token not found |
422 |
Request validation failed |
500 |
Unexpected internal error |
502 |
Memory retrieval failed or LLM provider unreachable |
| Other | Passed through from LLM provider (status code + headers + body) |
Incoming request
→ TokenAuthMiddleware (auth + request-id + read raw JSON body)
→ router (parse messages, validate, create Handler instance)
→ ChatRequestHandler.execute()
├─ SessionStore.get() ← Read recent conversation history
├─ MemoryManager.search() ← Memory retrieval (Qdrant vector search)
├─ ContextBuilder.build() ← Assemble context (memory block + request)
├─ _inject_history() ← Prepend session history after system prompt
├─ response_sender.send_*() ← Forward enriched request to LLM
└─ _store_memory() ← Async store (session + memories)
src/memory_bridge/
├── main.py # FastAPI app factory + lifespan
├── host_manager.py # Process manager (Qdrant + API subprocess)
├── host_process.py # Subprocess lifecycle management
├── host_init.py # First-time initialization
├── config.py # Infrastructure config (pydantic-settings)
├── exceptions.py # Custom exceptions
├── api/ # Routes + middleware + request handling
│ ├── response_sender.py # Non-stream / streaming LLM dispatch
├── core/ # Memory / session / context / token / logging
├── providers/ # HTTP client (transparent proxy)
└── models/ # Request / response models
templates/
├── memory_template.md # Memory injection template (English)
└── memory_template_zh.md # Memory injection template (Chinese)
uv sync --extra dev # Install dev dependencies
uv run mypy src/ # Type check (strict mode)
uv run ruff check src/ tests/ # Lint
uv run pytest -v # Unit tests
uv run pytest --cov=src/memory_bridge --cov-fail-under=95 # Coverage gateThe .env template (committed, no secrets) includes MEMORY_SEARCH_TIMEOUT=30
(to prevent memory searches from hanging indefinitely) and MEMORY_STORE_TIMEOUT=120
(to protect background memory store operations that involve LLM fact extraction).
Test structure mirrors source:
tests/
├── api/ ← src/memory_bridge/api/
├── core/ ← src/memory_bridge/core/
├── providers/ ← src/memory_bridge/providers/
├── architecture/ # Architecture guard tests (boundary checks)
└── integration/ # Integration smoke tests
Download the single-file .pyz from GitHub Releases:
mkdir -p /opt/memorybridge && cd /opt/memorybridge
wget https://github.com/51193/MemoryBridge/releases/latest/download/memorybridge.pyz
python3 memorybridge.pyz --init # First-time setup
python3 memorybridge.pyz # StartApache 2.0 — see also NOTICE for bundled open source attributions,
and docs/LICENSES.md for a full third-party license summary.