Strip reasoning_content from chat history before sending to the LLM#371
Open
jwzj720 wants to merge 1 commit into
Open
Strip reasoning_content from chat history before sending to the LLM#371jwzj720 wants to merge 1 commit into
jwzj720 wants to merge 1 commit into
Conversation
NBI stores assistant messages in chat history with a `reasoning_content`
key (unconditionally, even as an empty string), then replays the full
history back to the LLM API on the next turn. `reasoning_content` (and
`reasoning`) are OUTPUT-only fields; strict-validating OpenAI-compatible
endpoints reject them on input. For example, Databricks model serving
(pydantic `extra="forbid"`) returns:
Error code: 400 - {"message":"messages.0.reasoning_content:
Extra inputs are not permitted"}
Because the key is always present in stored history, the request is
always rejected.
Fix: add a `strip_reasoning_fields()` helper to both the OpenAI-compatible
and LiteLLM-compatible providers that returns a sanitized copy of the
messages list with `reasoning_content` and `reasoning` removed from each
message dict. It is applied in `completions()` right before the messages
are passed to the API client, replacing the prior `messages.copy()`. The
caller's list and NBI's stored history are left untouched (per-dict copy),
so reasoning is still available for the UI; only the outbound request is
sanitized. Both streaming and non-streaming paths use the sanitized list.
Adds focused unit tests asserting the helper removes the keys without
mutating its input, and that the outbound API call receives messages
without the reasoning fields.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mbektas
reviewed
Jun 11, 2026
| DEFAULT_CONTEXT_WINDOW = 4096 | ||
|
|
||
|
|
||
| def strip_reasoning_fields(messages: list[dict]) -> list[dict]: |
Collaborator
There was a problem hiding this comment.
@jwzj720 can you move this method to util.py to prevent duplication. otherwise it looks good to me.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
NBI stores assistant messages in chat history with a
reasoning_contentkey and replays the full history back to the model on every subsequent turn.reasoning_content(andreasoning) are output-only fields. Strict-validating OpenAI-compatible endpoints reject them as input — for example Databricks model serving (pydanticextra="forbid") returns:Because the key is written unconditionally (even as an empty string when there is no reasoning), the first turn of a chat works but every follow-up turn against such an endpoint 400s.
Where it comes from
extension.py(add_message(..., {"role": "assistant", "content": ..., "reasoning_content": ...}), both the buffered and streamed paths).base_chat_participant.py(messages = [system] + request.chat_history→chat_model.completions(messages, ...)).messages=messages.copy()), so the rejected field goes out on the wire.Fix
Add a
strip_reasoning_fields()helper to the LiteLLM-compatible and OpenAI-compatible providers and apply it incompletions()immediately before the API call (replacing the priormessages.copy()). It returns a sanitized copy withreasoning_contentandreasoningremoved from each message dict, without mutating the caller's list or NBI's stored history — so reasoning stays available to the UI and only the outbound request is cleaned. Both streaming and non-streaming paths use it.These two providers are the ones that hit strict OpenAI-compatible / LiteLLM validators (including Databricks).
Tests
Added to
tests/test_openai_compatible_llm_provider.py:strip_reasoning_fields()removes both keys without mutating its input, and leaves non-dict entries untouched;Reported environment
The 400 was observed with:
claude-sonnet-4-6(Databricks-hosted, reasoning on by default)400 ... messages.0.reasoning_content: Extra inputs are not permitted.The added test asserts the offending field is no longer present in the outbound request, which is exactly what the endpoint rejected.