LCORE-2310: Added tool processing module for pydantic-ai#1855
Conversation
|
Warning Review limit reached
More reviews will be available in 22 minutes and 11 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThis PR introduces a new utility module for summarizing and recording tool calls and results during agent stream dispatch. The implementation handles function-based tool calls, native tool calls (web search, file search, MCP), result summarization for each type, and extraction of RAG artifacts from file-search operations. ChangesTool call and result processing for agent stream dispatch
Sequence DiagramssequenceDiagram
participant StreamPart as Tool call part
participant Summarizer as Summarizer
participant Recorder as Recorder
participant State as AgentTurnAccumulator
StreamPart->>Summarizer: ToolCallPart or NativeToolCallPart
Summarizer->>Summarizer: Match on call type and tool kind
Summarizer-->>Recorder: ToolCallSummary
Recorder->>Recorder: Check deduplication by emitted_id
Recorder->>Recorder: Increment round if pending
Recorder->>State: Append to turn_summary.tool_calls
Recorder-->>Recorder: Return summary
flowchart LR
FSR["OpenAI file-search results"]
FSR -->|iterate| RD["Referenced document"]
FSR -->|iterate| RC["RAG chunk"]
RD -->|URL/title dedup| SD["Seen documents set"]
RC -->|text content| CC["Filtered chunks"]
RD -->|source label| RM["Source mapping"]
RC -->|score + attrs| RM
sequenceDiagram
participant NTR as Native tool result
participant Dispatcher as Route by tool name
participant Summarizer as Result summarizer
participant Recorder as Recorder
participant State as AgentTurnAccumulator
NTR->>Dispatcher: NativeToolReturnPart
Dispatcher->>Dispatcher: Match tool_name
alt File search
Dispatcher->>Summarizer: summarize_file_search_result
Summarizer-->>Summarizer: Extract RAG chunks + documents
Summarizer-->>Recorder: (ToolResultSummary, chunks, docs)
else Web search or MCP
Dispatcher->>Summarizer: type-specific summarizer
Summarizer-->>Recorder: ToolResultSummary
end
Recorder->>Recorder: Dedup by emitted_id
Recorder->>State: Append results, chunks, documents
Recorder-->>Recorder: Mark round_increment_pending
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/utils/agents/tool_processor.py`:
- Around line 289-302: The code constructs AnyUrl(doc_url) unguarded, so a
malformed external metadata URL will raise and abort referenced-document
extraction; wrap the URL validation when building the ReferencedDocument in a
try/except (catch pydantic ValidationError or a generic Exception) and fall back
to None for doc_url while preserving doc_title and other fields (keep use of
_file_search_attribute_url, _file_search_attribute_str,
resolve_source_for_result, vector_store_ids and rag_id_mapping). Optionally log
or record the bad URL but ensure a bad URL does not prevent returning a
ReferencedDocument with document_id/title/source.
- Around line 380-385: The code currently mutates caller-owned payloads by
calling content.pop("status") on part.content inside the function that builds
ToolResultSummary; instead make a shallow copy of part.content (e.g., copy =
dict(part.content) or content_copy = cast(dict[str, Any], dict(part.content)))
and pop "status" from that copy, then use json.dumps(content_copy) for the
content field; apply the same change to the other occurrence that creates a
ToolResultSummary (the block around lines 505-509) so neither summarizer mutates
the original part.content or caller-owned objects.
- Around line 55-65: Update the signature and docstring of
summarize_native_tool_call to use modern union types (change
Optional[ToolCallSummary] to ToolCallSummary | None) and ensure all parameters
and return types are fully annotated; replace the Args: section with
Google-style "Parameters" and "Returns" headings and add a "Raises" section if
the function can raise exceptions. Locate the function by name
summarize_native_tool_call and apply the same style across the file for any
other helpers that still use Optional[...] or "Args:" so the module conforms to
the repo's typing and Google docstring conventions.
- Around line 480-483: The dispatcher currently uses the presence of an "error"
key to choose summarize_mcp_list_tools_result, which misroutes normal call
failures; change the branching to inspect the explicit originating-call
discriminator (e.g., the call name/id carried on the part or tool_round) instead
of the "error" field. Concretely, read the call identifier from the part (e.g.,
part.metadata['call'] or part.get('call')) or from tool_round (e.g.,
tool_round.call_name) and if it equals the list-tools call identifier (e.g.,
"mcp_list_tools" or whatever your call name is) then call
summarize_mcp_list_tools_result(part, tool_round), otherwise call
summarize_mcp_call_result(part, tool_round); do not use the presence of "error"
to decide routing.
- Around line 455-460: The code is converting structured MCP outputs to Python
repr by doing str(output); update the return in ToolResultSummary creation so
that when output is a dict/list (or other JSON-serializable types) it is
serialized to a JSON string instead of using str(), e.g., serialize the local
variable output with json.dumps (or a helper serializer) and assign that JSON
string to ToolResultSummary.content; ensure the code imports json and preserves
non-string scalars by falling back to json.dumps(output) only for non-str inputs
while leaving existing string outputs unchanged.
- Around line 68-104: The current match against MCPServerTool.kind in
summarize_native_tool_call() and process_native_tool_result() misses labeled
names like f"{MCPServerTool.kind}:<label>"; update those functions to detect MCP
tool names that either equal MCPServerTool.kind or start with
f"{MCPServerTool.kind}:" (use
part.tool_name.startswith(f"{MCPServerTool.kind}:") or strip the prefix with
part.tool_name.removeprefix(f"{MCPServerTool.kind}:") before matching) so the
existing MCP logic that uses label =
part.tool_name.removeprefix(f"{MCPServerTool.kind}:") and the MCP list/call
branches still run for labeled tools; ensure subsequent uses of part.tool_name
use the extracted label where appropriate.
In `@tests/unit/utils/agents/test_tool_processor.py`:
- Around line 106-119: The test coverage is missing end-to-end MCP cases for
labeled server names and error payloads: update/add tests (e.g., around
test_mcp_list_tools_call and the other ranges noted) to call
summarize_native_tool_call with both tool_name=MCPServerTool.kind and with
explicit labeled server_label values, then pass resulting payloads through
summarize_mcp_tool_result and process_native_tool_result including an {"error":
"..."} result payload to assert the summary/error fields are handled correctly;
use the functions summarize_native_tool_call, summarize_mcp_tool_result, and
process_native_tool_result and assert expected name, args (including
server_label), and error propagation in the test assertions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: abc4e8d7-58dc-47b2-a9e1-57713fd2208f
📒 Files selected for processing (2)
src/utils/agents/tool_processor.pytests/unit/utils/agents/test_tool_processor.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: build-pr
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (2)
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/utils/agents/test_tool_processor.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/utils/agents/tool_processor.py
89c9c20 to
be57676
Compare
| label = tool_name.removeprefix(_MCP_SERVER_TOOL_PREFIX) | ||
| action = args.get("action") | ||
| # MCP list tools | ||
| if action == "list_tools": |
There was a problem hiding this comment.
Relies on pydantic open responses processor:
def _map_mcp_list_tools(
item: responses.response_output_item.McpListTools, provider_name: str
) -> tuple[NativeToolCallPart, NativeToolReturnPart]:
tool_name = ':'.join([MCPServerTool.kind, item.server_label])
return (
NativeToolCallPart(
tool_name=tool_name,
tool_call_id=item.id,
provider_name=provider_name,
args={'action': 'list_tools'},
),
...
)| Returns: | ||
| Tool result summary in LCS turn-summary format. | ||
| """ | ||
| content = cast(dict[str, Any], part.content) |
There was a problem hiding this comment.
Safe to cast for all processing openai responses parts
| content=json.dumps(list_summary.model_dump()), | ||
| type="mcp_list_tools", | ||
| round=tool_round, | ||
| ) |
There was a problem hiding this comment.
Relies on pydantic-ai open responses processor:
def _map_mcp_list_tools(
item: responses.response_output_item.McpListTools, provider_name: str
) -> tuple[NativeToolCallPart, NativeToolReturnPart]:
tool_name = ':'.join([MCPServerTool.kind, item.server_label])
return (
...
NativeToolReturnPart(
tool_name=tool_name,
tool_call_id=item.id,
content=item.model_dump(mode='json', include={'tools', 'error'}),
provider_name=provider_name,
),
)| content=str(output), | ||
| type="mcp_call", | ||
| round=tool_round, | ||
| ) |
There was a problem hiding this comment.
Relies on pydantic-ai processor:
def _map_mcp_call(
item: responses.response_output_item.McpCall, provider_name: str
) -> tuple[NativeToolCallPart, NativeToolReturnPart]:
tool_name = ':'.join([MCPServerTool.kind, item.server_label])
return (
...
NativeToolReturnPart(
tool_name=tool_name,
tool_call_id=item.id,
content={
'output': item.output,
'error': item.error,
},
provider_name=provider_name,
),
)| results = [ | ||
| OpenAIFileSearchResult.model_validate(result) | ||
| for result in content.get("results", []) | ||
| ] |
There was a problem hiding this comment.
Relies on pydantic-ai processor:
def _map_file_search_tool_call(
item: responses.ResponseFileSearchToolCall,
provider_name: str,
) -> tuple[NativeToolCallPart, NativeToolReturnPart]:
result: dict[str, Any] = {
'status': item.status,
}
if item.results is not None:
result['results'] = [r.model_dump(mode='json') for r in item.results]
return (
...
NativeToolReturnPart(
tool_name=FileSearchTool.kind,
tool_call_id=item.id,
content=result,
provider_name=provider_name,
),
)| id=call_id, | ||
| name=args.get("tool_name") or "", | ||
| args=args.get("tool_args", {}), | ||
| type="mcp_call", |
There was a problem hiding this comment.
Relies on pydantic-ai processor:
def _map_mcp_call(
item: responses.response_output_item.McpCall, provider_name: str
) -> tuple[NativeToolCallPart, NativeToolReturnPart]:
tool_name = ':'.join([MCPServerTool.kind, item.server_label])
return (
NativeToolCallPart(
tool_name=tool_name,
tool_call_id=item.id,
args={
'action': 'call_tool',
'tool_name': item.name,
'tool_args': json.loads(item.arguments) if item.arguments else {},
},
provider_name=provider_name,
),
...
)
Description
Adds module for tool processing and recording from pydantic-ai agent stream.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Tests