Summary
When running on zai-org/GLM-5.2-FP8, the model occasionally emits a tool_use block whose input matches a different tool's schema than the one named in the call. NCode's Zod validator rejects it, the model retries in a loop, and the turn exhausts itself — visible to the user as garbled thinking and torn output that looks like transcript corruption.
This is a model-side argument-shape confusion, but NCode could do more to prevent it, detect it, and help the user recover.
Observed mismatches (one query chain)
Bash called with {content, file_path} (Write's shape) — missing command
Write called with {command, description, timeout} (Bash's shape) — missing file_path, content
Write called with content as an object instead of a string
Counts on a single affected chain: 12 Error normalizing tool input events, ~16 tool rejections. The API returned 200 / stop_reason=tool_use every time — no overload, no content filter. The on-disk JSONL transcript is fully valid (0 malformed lines), so this is not a storage/harness corruption bug.
What NCode could do to help
-
Schema-swap detection. When a tool_use is rejected for unrecognized_keys or invalid_type on a required field, check whether the rejected keys match another tool's required params. If so, log a distinct event (ncode_tool_use_schema_swap) separate from generic ncode_tool_use_error, so this class of glitch is observable in telemetry instead of buried in Error normalizing tool input.
-
Break the retry loop. When the same tool_use call fails the same schema validation N times in a row on the same queryChainId (say 3+), stop re-emitting the tool result and inject a structured system message telling the model its last tool call used the wrong argument shape — instead of letting Zod's raw error bounce around the context until the turn budget is gone.
-
User-visible signal. When this happens, surface a one-line hint to the user in the UI (e.g. "model emitted a malformed tool call N times — start a fresh turn or /compact"). Right now the only evidence is buried in the debug log; the user just sees the model spinning.
-
Resume guardrail. On --resume, if the last assistant message in the transcript contains a tool_use whose input fails the named tool's schema, warn the user before sending the next prompt — the malformed message stays in history and can re-trigger the loop on resume. Optionally offer to strip it.
-
Stricter input coercion at the boundary. For tools where a field has a known wrong-shape pattern (e.g. content arriving as an object instead of a string), consider a targeted coercion or a clearer error message that tells the model which field it got wrong, rather than the full Zod dump.
Why this matters
The current UX presents as "corruption in thinking + output," which sent me down a rabbit hole auditing the JSONL and debug log before realizing the transcript was clean. A distinct event + an early break + a UI hint would have made this a one-line diagnosis instead of a multi-hour forensic dig.
Environment
- Model:
/data/models/hf/zai-org__GLM-5.2-FP8
- Permission mode: bypassPermissions
- Affects: resumed sessions (the malformed
tool_use persists in history)
- Session ID (for correlating against telemetry, if useful):
cfc2b511-c223-42ef-a5b8-c1b5c4ef3380
- queryChainId:
76ebaf5a-3915-4da0-ac2e-e2bcd3e2975c
Happy to provide a redacted excerpt of the relevant debug-log lines if helpful — I'm not attaching the raw log since NCode debug logs contain env vars / config snapshots / file contents and shouldn't be uploaded to a public tracker.
Summary
When running on
zai-org/GLM-5.2-FP8, the model occasionally emits atool_useblock whoseinputmatches a different tool's schema than the one named in the call. NCode's Zod validator rejects it, the model retries in a loop, and the turn exhausts itself — visible to the user as garbled thinking and torn output that looks like transcript corruption.This is a model-side argument-shape confusion, but NCode could do more to prevent it, detect it, and help the user recover.
Observed mismatches (one query chain)
Bashcalled with{content, file_path}(Write's shape) — missingcommandWritecalled with{command, description, timeout}(Bash's shape) — missingfile_path,contentWritecalled withcontentas an object instead of a stringCounts on a single affected chain: 12
Error normalizing tool inputevents, ~16 tool rejections. The API returned 200 /stop_reason=tool_useevery time — no overload, no content filter. The on-disk JSONL transcript is fully valid (0 malformed lines), so this is not a storage/harness corruption bug.What NCode could do to help
Schema-swap detection. When a
tool_useis rejected forunrecognized_keysorinvalid_typeon a required field, check whether the rejected keys match another tool's required params. If so, log a distinct event (ncode_tool_use_schema_swap) separate from genericncode_tool_use_error, so this class of glitch is observable in telemetry instead of buried inError normalizing tool input.Break the retry loop. When the same
tool_usecall fails the same schema validation N times in a row on the same queryChainId (say 3+), stop re-emitting the tool result and inject a structured system message telling the model its last tool call used the wrong argument shape — instead of letting Zod's raw error bounce around the context until the turn budget is gone.User-visible signal. When this happens, surface a one-line hint to the user in the UI (e.g. "model emitted a malformed tool call N times — start a fresh turn or /compact"). Right now the only evidence is buried in the debug log; the user just sees the model spinning.
Resume guardrail. On
--resume, if the last assistant message in the transcript contains atool_usewhoseinputfails the named tool's schema, warn the user before sending the next prompt — the malformed message stays in history and can re-trigger the loop on resume. Optionally offer to strip it.Stricter input coercion at the boundary. For tools where a field has a known wrong-shape pattern (e.g.
contentarriving as an object instead of a string), consider a targeted coercion or a clearer error message that tells the model which field it got wrong, rather than the full Zod dump.Why this matters
The current UX presents as "corruption in thinking + output," which sent me down a rabbit hole auditing the JSONL and debug log before realizing the transcript was clean. A distinct event + an early break + a UI hint would have made this a one-line diagnosis instead of a multi-hour forensic dig.
Environment
/data/models/hf/zai-org__GLM-5.2-FP8tool_usepersists in history)cfc2b511-c223-42ef-a5b8-c1b5c4ef338076ebaf5a-3915-4da0-ac2e-e2bcd3e2975cHappy to provide a redacted excerpt of the relevant debug-log lines if helpful — I'm not attaching the raw log since NCode debug logs contain env vars / config snapshots / file contents and shouldn't be uploaded to a public tracker.