Skip to content

Strip reasoning_content from chat history before sending to the LLM#371

Open
jwzj720 wants to merge 1 commit into
plmbr:mainfrom
jwzj720:fix/strip-reasoning-content-from-request
Open

Strip reasoning_content from chat history before sending to the LLM#371
jwzj720 wants to merge 1 commit into
plmbr:mainfrom
jwzj720:fix/strip-reasoning-content-from-request

Conversation

@jwzj720

@jwzj720 jwzj720 commented Jun 9, 2026

Copy link
Copy Markdown

Problem

NBI stores assistant messages in chat history with a reasoning_content key and replays the full history back to the model on every subsequent turn. reasoning_content (and reasoning) are output-only fields. Strict-validating OpenAI-compatible endpoints reject them as input — for example Databricks model serving (pydantic extra="forbid") returns:

Error code: 400 - {"message":"messages.0.reasoning_content: Extra inputs are not permitted"}

Because the key is written unconditionally (even as an empty string when there is no reasoning), the first turn of a chat works but every follow-up turn against such an endpoint 400s.

Where it comes from

  • The assistant message is stored with the key in extension.py (add_message(..., {"role": "assistant", "content": ..., "reasoning_content": ...}), both the buffered and streamed paths).
  • It is replayed via base_chat_participant.py (messages = [system] + request.chat_historychat_model.completions(messages, ...)).
  • The LiteLLM and OpenAI-compatible providers forward the messages to the API unmodified (messages=messages.copy()), so the rejected field goes out on the wire.

Fix

Add a strip_reasoning_fields() helper to the LiteLLM-compatible and OpenAI-compatible providers and apply it in completions() immediately before the API call (replacing the prior messages.copy()). It returns a sanitized copy with reasoning_content and reasoning removed from each message dict, without mutating the caller's list or NBI's stored history — so reasoning stays available to the UI and only the outbound request is cleaned. Both streaming and non-streaming paths use it.

These two providers are the ones that hit strict OpenAI-compatible / LiteLLM validators (including Databricks).

Tests

Added to tests/test_openai_compatible_llm_provider.py:

  • strip_reasoning_fields() removes both keys without mutating its input, and leaves non-dict entries untouched;
  • the outbound API call (mocked client) receives messages with no reasoning fields, while the input/stored history is preserved.
$ python -m pytest tests/test_openai_compatible_llm_provider.py -q
4 passed

Reported environment

The 400 was observed with:

  • NBI: 5.0.1
  • Provider: OpenAI-compatible / LiteLLM → Databricks model serving
  • Model: claude-sonnet-4-6 (Databricks-hosted, reasoning on by default)
  • Repro: ask a prompt, let it answer, then send a second prompt → server log shows 400 ... messages.0.reasoning_content: Extra inputs are not permitted.

The added test asserts the offending field is no longer present in the outbound request, which is exactly what the endpoint rejected.

NBI stores assistant messages in chat history with a `reasoning_content`
key (unconditionally, even as an empty string), then replays the full
history back to the LLM API on the next turn. `reasoning_content` (and
`reasoning`) are OUTPUT-only fields; strict-validating OpenAI-compatible
endpoints reject them on input. For example, Databricks model serving
(pydantic `extra="forbid"`) returns:

    Error code: 400 - {"message":"messages.0.reasoning_content:
    Extra inputs are not permitted"}

Because the key is always present in stored history, the request is
always rejected.

Fix: add a `strip_reasoning_fields()` helper to both the OpenAI-compatible
and LiteLLM-compatible providers that returns a sanitized copy of the
messages list with `reasoning_content` and `reasoning` removed from each
message dict. It is applied in `completions()` right before the messages
are passed to the API client, replacing the prior `messages.copy()`. The
caller's list and NBI's stored history are left untouched (per-dict copy),
so reasoning is still available for the UI; only the outbound request is
sanitized. Both streaming and non-streaming paths use the sanitized list.

Adds focused unit tests asserting the helper removes the keys without
mutating its input, and that the outbound API call receives messages
without the reasoning fields.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
DEFAULT_CONTEXT_WINDOW = 4096


def strip_reasoning_fields(messages: list[dict]) -> list[dict]:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwzj720 can you move this method to util.py to prevent duplication. otherwise it looks good to me.

@mbektas mbektas added the blocked Blocked due to conflicts or no response from author label Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocked Blocked due to conflicts or no response from author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants