Fix #1897: bug(memos-local-plugin): Hermes background model status checks can trigger paid #1899
Open
Memtensor-AI wants to merge 1 commit into
Open
Fix #1897: bug(memos-local-plugin): Hermes background model status checks can trigger paid #1899Memtensor-AI wants to merge 1 commit into
Memtensor-AI wants to merge 1 commit into
Conversation
…nal provider errors Issue #1897 reported ~12,900 paid LLM requests in 24 h on Hermes against a DeepSeek key with insufficient balance. The local `system_model_status` row count (12,900) closely tracked the provider-side `request_count` (11,344) for the same billing window. The naming is misleading: `system_model_status` is not a health probe; it is the audit row written once per LLM call (ok / fallback / error) inside `core/llm/client.ts`. With no circuit breaker, every pipeline subscriber (capture / session-relation / reward / L2 / L3 / skill / retrieval LLM filter / world-model) kept firing on every turn / closed episode / induction, generating one paid request each. Add a per-`LlmClient` circuit breaker: - Trips on terminal errors: HTTP 401/402/403 or messages containing `insufficient balance` / `invalid api key` / `unauthorized` / `account suspended` / `billing`. - Open: short-circuits subsequent calls inside the facade without contacting the provider. Throws `MemosError(LLM_UNAVAILABLE)` with `details.circuitOpen=true` so existing catch blocks still work. - Half-open after cool-down (default 5 min, configurable, min 30 s): next call probes the provider; success closes the breaker, terminal failure re-opens it for another cool-down. - Host fallback rescues a call without tripping the breaker — fallback exists precisely to keep going when the primary is down. - Coalesces `system_model_status="circuit_open"` audit rows to at most one per ~25 s while the breaker stays open, so we don't replace paid spam with audit-row spam. - Exposes `circuitOpen` / `circuitOpenUntil` / `circuitOpenedReason` via `LlmClientStats` for the Overview viewer card. - Enabled by default; legacy behaviour available via `circuitBreaker.enabled = false`. Tests: 9 new vitest cases under `tests/unit/llm/client.test.ts` covering trip on 402, trip on "insufficient balance" message, no trip on generic transient, coalescing, half-open close on success, host-fallback rescues without trip, disabled mode, stats fields, and re-open on terminal probe failure. All 59 LLM and 28 pipeline tests pass; `tsc --noEmit` clean. Out of scope (tracked separately): 429 `Retry-After` handling (issue #1620), per-tool rate limits, daily budget caps.
Collaborator
Author
✅ Automated Test Results: PASSEDAll tests passed (35/35 executed, 36 skipped). memos_local_plugin/smoke: 0 passed, 1 skipped, memos_local_plugin/contract: 35 passed, 35 skipped. Duration: 5s Branch: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes #1897: paid LLM request storm on Hermes when the configured provider returns terminal errors (402 insufficient balance / 401 invalid key / 403). Issue reporter saw ~12,900 paid DeepSeek requests in 24 h tracking exactly the local
system_model_statusaudit-row count.Root cause: the
system_model_statusrow name is misleading. It is not a health probe; it is the audit row written once per LLM call insideapps/memos-local-plugin/core/llm/client.ts::callWithFallback. With no circuit breaker, every pipeline subscriber (capture / session-relation / reward / L2 / L3 / skill / retrieval LLM filter / world-model) kept firing on every turn, induction, and closed episode, generating one paid request each even though the provider was permanently broken until the operator fixed billing.Fix: per-
LlmClientcircuit breaker that trips on terminal provider errors (HTTP 401/402/403 or messages containing "insufficient balance" / "invalid api key" / "unauthorized" / "account suspended" / "billing"). When open, calls short-circuit inside the facade — zero paid requests, no furthersystem_model_status="error"rows. Half-open after a 5-minute cool-down (configurable, min 30 s); the next call probes the provider and either closes the breaker on success or re-opens it on terminal failure. Host fallback rescues a call without tripping the breaker. A coalescedsystem_model_status="circuit_open"row is emitted at most once per ~25 s so the Logs viewer still surfaces the suppression event without replacing paid spam with audit-row spam. The breaker is enabled by default; legacy behaviour is available viacircuitBreaker.enabled = false.Tests: 9 new vitest cases under
apps/memos-local-plugin/tests/unit/llm/client.test.tscovering trip on 402, trip on "insufficient balance" message, no trip on generic transient, coalescing under load, half-open close on success, host-fallback rescues without trip, legacy disabled mode, stats exposure, and re-open on terminal probe failure. All 59 LLM and 28 pipeline tests pass;tsc --noEmitclean. Three full-suite failures (storage migrator / traces-count / e2e episode merge) are pre-existing on the base branch dev-20260604-v2.0.19 and unrelated to this change.Out of scope (tracked separately): 429
Retry-Afterhandling (issue #1620), per-tool rate limits, daily budget caps.Related Issue (Required): Fixes #1897
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Automated tests are pending.
Checklist
@MatthewZhuang, @CarltonXiang, @syzsunshine219 please review this PR.
Reviewer Checklist