Skip to content

Fix #1897: bug(memos-local-plugin): Hermes background model status checks can trigger paid #1899

Open
Memtensor-AI wants to merge 1 commit into
dev-20260604-v2.0.19from
bugfix/autodev-1897
Open

Fix #1897: bug(memos-local-plugin): Hermes background model status checks can trigger paid #1899
Memtensor-AI wants to merge 1 commit into
dev-20260604-v2.0.19from
bugfix/autodev-1897

Conversation

@Memtensor-AI

Copy link
Copy Markdown
Collaborator

Description

Fixes #1897: paid LLM request storm on Hermes when the configured provider returns terminal errors (402 insufficient balance / 401 invalid key / 403). Issue reporter saw ~12,900 paid DeepSeek requests in 24 h tracking exactly the local system_model_status audit-row count.

Root cause: the system_model_status row name is misleading. It is not a health probe; it is the audit row written once per LLM call inside apps/memos-local-plugin/core/llm/client.ts::callWithFallback. With no circuit breaker, every pipeline subscriber (capture / session-relation / reward / L2 / L3 / skill / retrieval LLM filter / world-model) kept firing on every turn, induction, and closed episode, generating one paid request each even though the provider was permanently broken until the operator fixed billing.

Fix: per-LlmClient circuit breaker that trips on terminal provider errors (HTTP 401/402/403 or messages containing "insufficient balance" / "invalid api key" / "unauthorized" / "account suspended" / "billing"). When open, calls short-circuit inside the facade — zero paid requests, no further system_model_status="error" rows. Half-open after a 5-minute cool-down (configurable, min 30 s); the next call probes the provider and either closes the breaker on success or re-opens it on terminal failure. Host fallback rescues a call without tripping the breaker. A coalesced system_model_status="circuit_open" row is emitted at most once per ~25 s so the Logs viewer still surfaces the suppression event without replacing paid spam with audit-row spam. The breaker is enabled by default; legacy behaviour is available via circuitBreaker.enabled = false.

Tests: 9 new vitest cases under apps/memos-local-plugin/tests/unit/llm/client.test.ts covering trip on 402, trip on "insufficient balance" message, no trip on generic transient, coalescing under load, half-open close on success, host-fallback rescues without trip, legacy disabled mode, stats exposure, and re-open on terminal probe failure. All 59 LLM and 28 pipeline tests pass; tsc --noEmit clean. Three full-suite failures (storage migrator / traces-count / e2e episode merge) are pre-existing on the base branch dev-20260604-v2.0.19 and unrelated to this change.

Out of scope (tracked separately): 429 Retry-After handling (issue #1620), per-tool rate limits, daily budget caps.

Related Issue (Required): Fixes #1897

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (does not change functionality, e.g. code style improvements, linting)
  • Documentation update

How Has This Been Tested?

Automated tests are pending.

  • Unit Test
  • Test Script Or Test Steps (please provide)
  • Pipeline Automated API Test (please provide)

Checklist

  • I have performed a self-review of my own code
  • I have commented my code in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • I have created related documentation issue/PR in MemOS-Docs (if applicable)
  • I have linked the issue to this PR (if applicable)
  • I have mentioned the person who will review this PR

@MatthewZhuang, @CarltonXiang, @syzsunshine219 please review this PR.

Reviewer Checklist

…nal provider errors

Issue #1897 reported ~12,900 paid LLM requests in 24 h on Hermes
against a DeepSeek key with insufficient balance. The local
`system_model_status` row count (12,900) closely tracked the
provider-side `request_count` (11,344) for the same billing window.
The naming is misleading: `system_model_status` is not a health probe;
it is the audit row written once per LLM call (ok / fallback / error)
inside `core/llm/client.ts`. With no circuit breaker, every pipeline
subscriber (capture / session-relation / reward / L2 / L3 / skill /
retrieval LLM filter / world-model) kept firing on every turn / closed
episode / induction, generating one paid request each.

Add a per-`LlmClient` circuit breaker:

- Trips on terminal errors: HTTP 401/402/403 or messages containing
  `insufficient balance` / `invalid api key` / `unauthorized` /
  `account suspended` / `billing`.
- Open: short-circuits subsequent calls inside the facade without
  contacting the provider. Throws `MemosError(LLM_UNAVAILABLE)` with
  `details.circuitOpen=true` so existing catch blocks still work.
- Half-open after cool-down (default 5 min, configurable, min 30 s):
  next call probes the provider; success closes the breaker, terminal
  failure re-opens it for another cool-down.
- Host fallback rescues a call without tripping the breaker —
  fallback exists precisely to keep going when the primary is down.
- Coalesces `system_model_status="circuit_open"` audit rows to at
  most one per ~25 s while the breaker stays open, so we don't
  replace paid spam with audit-row spam.
- Exposes `circuitOpen` / `circuitOpenUntil` / `circuitOpenedReason`
  via `LlmClientStats` for the Overview viewer card.
- Enabled by default; legacy behaviour available via
  `circuitBreaker.enabled = false`.

Tests: 9 new vitest cases under `tests/unit/llm/client.test.ts`
covering trip on 402, trip on "insufficient balance" message, no
trip on generic transient, coalescing, half-open close on success,
host-fallback rescues without trip, disabled mode, stats fields,
and re-open on terminal probe failure. All 59 LLM and 28 pipeline
tests pass; `tsc --noEmit` clean.

Out of scope (tracked separately): 429 `Retry-After` handling
(issue #1620), per-tool rate limits, daily budget caps.
@Memtensor-AI

Copy link
Copy Markdown
Collaborator Author

✅ Automated Test Results: PASSED

All tests passed (35/35 executed, 36 skipped). memos_local_plugin/smoke: 0 passed, 1 skipped, memos_local_plugin/contract: 35 passed, 35 skipped. Duration: 5s

Branch: bugfix/autodev-1897

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-generated bug Something isn't working | 功能异常

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants