fix: prevent tool-call preface text leaks#9075
Conversation
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- In
tool_loop_agent_runner.step, the newif not llm_resp.tools_call_nameguard now controls allllm_resultemissions; consider restructuring this block (e.g., early-return or a dedicated helper for tool-call turns) to make the separation between internal tool-call narration and user-visible results more explicit and easier to maintain. - The string-pattern loop in
_normalize_contentthat strips[{text=..., type=text}]shells is quite heuristic and relies on a magic length cap (8192); consider extracting this into a small helper with clearer intent and tighter conditions so it doesn’t accidentally rewrite legitimate content that happens to start/end with the same tokens. - In
_normalize_content, the early return for repr-shell strings bypasses the subsequent JSON-array normalization logic; if these repr strings can sometimes contain list-like or mixed content, it may be worth passing the simplified string through the same normalization path to keep behavior consistent across inputs.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `tool_loop_agent_runner.step`, the new `if not llm_resp.tools_call_name` guard now controls all `llm_result` emissions; consider restructuring this block (e.g., early-return or a dedicated helper for tool-call turns) to make the separation between internal tool-call narration and user-visible results more explicit and easier to maintain.
- The string-pattern loop in `_normalize_content` that strips `[{text=..., type=text}]` shells is quite heuristic and relies on a magic length cap (8192); consider extracting this into a small helper with clearer intent and tighter conditions so it doesn’t accidentally rewrite legitimate content that happens to start/end with the same tokens.
- In `_normalize_content`, the early return for repr-shell strings bypasses the subsequent JSON-array normalization logic; if these repr strings can sometimes contain list-like or mixed content, it may be worth passing the simplified string through the same normalization path to keep behavior consistent across inputs.
## Individual Comments
### Comment 1
<location path="tests/test_tool_loop_agent_runner.py" line_range="511-520" />
<code_context>
+@pytest.mark.asyncio
+async def test_tool_call_assistant_preface_is_not_sent_as_llm_result(
+ runner, provider_request, mock_tool_executor, mock_hooks, mock_provider
+):
+ """Assistant preface text on a tool-call turn must stay internal."""
+ mock_provider.should_call_tools = True
+ mock_provider.max_calls_before_normal_response = 1
+
+ await runner.reset(
+ provider=mock_provider,
+ request=provider_request,
+ run_context=ContextWrapper(context=None),
+ tool_executor=mock_tool_executor,
+ agent_hooks=mock_hooks,
+ streaming=False,
+ )
+
+ responses = []
+ async for response in runner.step_until_done(3):
+ responses.append(response)
+
+ llm_texts = [
+ response.data["chain"].get_plain_text()
+ for response in responses
+ if response.type == "llm_result"
+ ]
+ assert "我需要使用工具来帮助您" not in llm_texts
+ assert llm_texts == ["这是我的最终回答"]
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add an assertion that no llm_result is emitted on the tool-call turn itself, to more directly prove the preface text is suppressed.
Currently the test only verifies suppression indirectly by checking that collected `llm_result` texts equal the final answer. To better guard against regressions, also assert that exactly one `llm_result` is emitted (or that the tool-call turn produces none). For instance, inspect `responses` so that any response corresponding to a tool call from the provider yields no `llm_result`, directly tying the test behavior to the bug this PR fixes.
Suggested implementation:
```python
responses = []
async for response in runner.step_until_done(3):
responses.append(response)
# Collect all llm_result responses and their texts
llm_results = [response for response in responses if response.type == "llm_result"]
llm_texts = [
response.data["chain"].get_plain_text()
for response in llm_results
]
# There must be exactly one llm_result emitted, corresponding to the final answer.
# This ensures that no llm_result is emitted on the earlier tool-call turn.
assert len(llm_results) == 1
# Assistant preface text must never surface in llm_result outputs.
assert "我需要使用工具来帮助您" not in llm_texts
assert llm_texts == ["这是我的最终回答"]
```
To more directly tie the assertion to the tool-call turn (as suggested in the review), you can:
1. Identify responses that correspond to tool calls (for example, by checking a specific `response.type` such as `"tool_call"` or a similar field your framework uses).
2. Add an assertion that there is no `llm_result` response at the same step/turn as those tool-call responses. For instance, group responses by `response.step_index` or equivalent, and assert that steps containing tool-call responses have no accompanying `llm_result`.
These additional checks should be implemented once you confirm the concrete shape and fields of the `response` objects in your test harness.
</issue_to_address>
### Comment 2
<location path="tests/test_tool_loop_agent_runner.py" line_range="527-530" />
<code_context>
+ async for response in runner.step_until_done(3):
+ responses.append(response)
+
+ llm_texts = [
+ response.data["chain"].get_plain_text()
+ for response in responses
+ if response.type == "llm_result"
+ ]
+ assert "我需要使用工具来帮助您" not in llm_texts
+ assert llm_texts == ["这是我的最终回答"]
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Consider asserting the sequence of response types to document the intended control flow of tool calls vs final answer.
Beyond asserting the text values, also assert the ordered sequence of `response.type` values (e.g., tool-related types followed by a single `llm_result`). This makes the test more robust against regressions in the tool loop and clarifies that the intermediate assistant preface must not be an `llm_result`, while the final answer must be.
```suggestion
responses = []
async for response in runner.step_until_done(3):
responses.append(response)
# Assert control flow: tool-related responses first, then a single final llm_result
response_types = [response.type for response in responses]
assert response_types[-1] == "llm_result"
assert "llm_result" not in response_types[:-1]
llm_texts = [
response.data["chain"].get_plain_text()
for response in responses
if response.type == "llm_result"
]
assert "我需要使用工具来帮助您" not in llm_texts
assert llm_texts == ["这是我的最终回答"]
```
</issue_to_address>
### Comment 3
<location path="tests/test_openai_source.py" line_range="63-66" />
<code_context>
)
+def test_normalize_content_handles_text_part_repr_string():
+ raw = "[{text=I will use the file reading tool., type=text}]"
+
+ assert ProviderOpenAIOfficial._normalize_content(raw) == (
+ "I will use the file reading tool."
+ )
</code_context>
<issue_to_address>
**suggestion (testing):** Add a companion test covering the strip=False case for text-part repr strings.
The tests currently cover `_normalize_content` with `strip=True`, but the function has different behavior when `strip=False`. Please add a test that uses a `raw` repr string with leading/trailing whitespace and asserts the result of `_normalize_content(raw, strip=False)` to verify normalization when stripping is disabled.
</issue_to_address>
### Comment 4
<location path="tests/test_openai_source.py" line_range="71-74" />
<code_context>
+ )
+
+
+def test_normalize_content_drops_empty_nested_text_part_repr_string():
+ raw = "[{text=[{text=[{text=, type=text}], type=text}], type=text}]"
+
+ assert ProviderOpenAIOfficial._normalize_content(raw) == ""
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Cover a nested repr string where the innermost text is non-empty to ensure we return the actual text, not an empty string.
Please also add a nested repr case with non-empty content, e.g. `[{text=[{text=[{text=hello, type=text}], type=text}], type=text}]`, and assert that `_normalize_content` returns `"hello"`. This will confirm the loop handles multiple layers correctly and doesn’t strip valid text.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request prevents assistant preface text from being sent as an LLM result during tool-call turns in the tool loop agent runner, and adds handling in the OpenAI source to normalize non-standard string representations of text parts (e.g., [{text=..., type=text}]). Corresponding unit tests have been added for both changes. The reviewer suggested extracting magic strings and length limits into named constants within the content normalization logic to improve readability and maintainability.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| check_content = raw_content.strip() | ||
| while ( | ||
| check_content.startswith("[{text=") | ||
| and check_content.endswith(", type=text}]") | ||
| and len(check_content) < 8192 | ||
| ): | ||
| check_content = check_content[ | ||
| len("[{text=") : -len(", type=text}]") | ||
| ].strip() | ||
| if check_content != raw_content.strip(): | ||
| return check_content.strip() if strip else check_content |
There was a problem hiding this comment.
For better readability and maintainability, it's a good practice to extract magic strings and numbers into named constants. This makes the code easier to understand and modify in the future if these values need to be changed.
check_content = raw_content.strip()
# Handle non-standard repr-like strings from some providers, e.g., "[{text=..., type=text}]"
REPR_LIKE_PREFIX = "[{text=""
REPR_LIKE_SUFFIX = ", type=text}]"
REPR_LIKE_MAX_LEN = 8192
while (
check_content.startswith(REPR_LIKE_PREFIX)
and check_content.endswith(REPR_LIKE_SUFFIX)
and len(check_content) < REPR_LIKE_MAX_LEN
):
check_content = check_content[
len(REPR_LIKE_PREFIX) : -len(REPR_LIKE_SUFFIX)
].strip()
if check_content != raw_content.strip():
return check_content.strip() if strip else check_content
Summary
[{text=..., type=text}]to plain text, and drop empty nested repr shells.Test plan
uv run pytest tests/test_tool_loop_agent_runner.py -k "tool_call_assistant_preface_is_not_sent_as_llm_result"uv run pytest tests/test_openai_source.py -k "normalize_content_handles_text_part_repr_string or normalize_content_drops_empty_nested_text_part_repr_string"uv run ruff format astrbot/core/agent/runners/tool_loop_agent_runner.py astrbot/core/provider/sources/openai_source.py tests/test_tool_loop_agent_runner.py tests/test_openai_source.py --checkuv run ruff check astrbot/core/agent/runners/tool_loop_agent_runner.py astrbot/core/provider/sources/openai_source.py tests/test_tool_loop_agent_runner.py tests/test_openai_source.pyNote: Running the full related files with
uv run pytest tests/test_tool_loop_agent_runner.py tests/test_openai_source.pypassed the tool-loop file but hit 3 pre-existing Windows path separator assertions intest_openai_source.py(file_uri_to_pathexpects/while Windows returns\).🤖 Generated with Claude Code
Summary by Sourcery
Prevent internal assistant preface text from being surfaced as user-visible LLM output during tool-call turns and improve normalization of OpenAI-compatible content representations.
Bug Fixes:
Tests: