fix(blockrun): surface Surf endpoint discovery so the agent stops guessing#67
Merged
Conversation
…ssing
A user hit a cascade: the agent guessed non-existent Surf paths
(market/concept-ranking, market/token-ranking), the /openapi.json
self-correction path 404'd, and after 3 failures the tool-failure
circuit breaker disabled BlockRun for the session.
Root cause was twofold and neither was the LLM "being dumb":
1. The system prompt advertised Surf as wildcards (/v1/surf/market/*) with
prose ("token rankings"), with no exact paths inline — so the model
completed the prose into a plausible-but-wrong path. The real path list
lives in a skill doc that isn't reliably loaded, and openapi discovery
was unreachable through this tool.
2. The gateway already returns the full valid-endpoint list under
`available` on a 404, but the tool buried it in fullOutput and only
showed the model "Not Found" — so the agent never saw the answer.
Fixes (prevention + cure):
- Inline the high-frequency Surf endpoints (market/ranking, fear-greed,
price, etf, options, exchange/*, liquidations, indicators) with exact
paths, plus an explicit "there is no market/token-ranking" guard against
the observed hallucination.
- Add an endpoint-discovery note telling the model that a wrong /v1/surf/
path returns `available` — read it and retry instead of looping.
- Surface the gateway's `available` list (and `message`) into the
model-visible output on any 4xx, straight from the live registry so it's
always complete and never drifts.
Regression: same repro prompt now picks /v1/surf/market/ranking on the
first try and returns data — no 404s, no circuit breaker.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A user hit this cascade (real report):
Reproduced locally: the agent guessed non-existent Surf paths, tried to self-correct via
/openapi.json(which 404s through this tool), and tripped the tool-failure circuit breaker.Root cause (two issues, neither is "the LLM being dumb")
The prompt invited guessing. Surf was advertised as wildcards (
/v1/surf/market/*) + prose ("token rankings"), with no exact paths inline. The model completed the prose into a plausible-but-wrong path (market/token-ranking). The exact list lives in a skill doc that isn't reliably loaded, and/openapi.jsononly covers 17 of 83 surf endpoints and is unreachable through this tool (it prepends/api, and the openapi rewrite is site-root).The gateway already returned the answer — the tool hid it. A wrong
/v1/surf/...path returns a 404 whose body lists every valid endpoint underavailable(all 83, straight from the live registry). But the tool put that infullOutputand only showed the model"Not Found", so the agent never saw it.Fix (prevention + cure)
market/ranking,fear-greed,price,etf,options,exchange/*, liquidations, indicators) + an explicit guard: "there is nomarket/token-rankingormarket/concept-ranking"./v1/surf/path returnsavailable— read it and retry instead of looping.available+messageinto the model-visible output on any 4xx (blockrun.ts). Comes straight from the live registry, so it's complete and never drifts. Guarded withArray.isArrayso non-Surf 4xx bodies are unaffected.Regression
Same repro prompt, after the fix:
(Tested with a forced model since the router's default pick
nvidia/deepseek-v4-flashwas provider-side 503 at the time — unrelated infra.)Known limitation / follow-up
The inlined list in
context.tsis hand-written, so it can drift from the gateway'ssurf.tsregistry over time. The durable fix is a generated typedSurftool whose endpoint enum is derived fromSURF_ENDPOINTS(or/openapi.json) — tracked as a follow-up. This PR is the immediate, low-risk stop-gap; the 404-surfacing cure already covers all 83 endpoints regardless of what's inlined.Scope
Two files (
src/tools/blockrun.ts,src/agent/context.ts), +18/-7. No behavior change for non-Surf endpoints.npm run buildclean.