LCORE-2310: Added tool processing module for pydantic-ai by asimurka · Pull Request #1855 · lightspeed-core/lightspeed-stack

asimurka · 2026-06-05T09:05:00Z

Description

Adds module for tool processing and recording from pydantic-ai agent stream.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Cursor

Related Tickets & Documents

Related Issue # https://redhat.atlassian.net/browse/LCORE-2310
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Enhanced agent tool processing with improved support for web search, file search (RAG), and MCP tools.
- Better tracking and summarization of tool interactions for enhanced visibility into agent operations.
- Extraction of RAG artifacts, referenced documents, and chunk metadata from search results.
Tests
- Added comprehensive unit test coverage for tool processing and result handling.

coderabbitai · 2026-06-05T09:07:26Z

Warning

Review limit reached

@asimurka, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 22 minutes and 11 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 775812a1-e92d-4d0b-afeb-b56dce94855b

📥 Commits

Reviewing files that changed from the base of the PR and between 3b0fe3d and be57676.

📒 Files selected for processing (2)

src/utils/agents/tool_processor.py
tests/unit/utils/agents/test_tool_processor.py

Walkthrough

This PR introduces a new utility module for summarizing and recording tool calls and results during agent stream dispatch. The implementation handles function-based tool calls, native tool calls (web search, file search, MCP), result summarization for each type, and extraction of RAG artifacts from file-search operations.

Changes

Tool call and result processing for agent stream dispatch

Layer / File(s)	Summary
Tool call summarization and recording `src/utils/agents/tool_processor.py`, `tests/unit/utils/agents/test_tool_processor.py`	Imports and constants establish the module foundation. Function and native tool-call summarizers convert `ToolCallPart` and `NativeToolCallPart` into `ToolCallSummary` objects with id/name/args/type. Function and native tool-call recorders deduplicate by emitted ID, conditionally increment the tool round, and append summaries to turn state. Tests validate call mapping for function calls, web search, file search (RAG), and MCP (list-tools vs call), plus unknown-tool handling.
Function tool result summarization and recording `src/utils/agents/tool_processor.py`, `tests/unit/utils/agents/test_tool_processor.py`	Function tool results are summarized with status, JSON-encoded content, type, and round. Results are recorded by deduplicating on result ID and marking round progression. Tests verify status/type/round are captured and duplicates are suppressed.
Web search and MCP result summarization `src/utils/agents/tool_processor.py`, `tests/unit/utils/agents/test_tool_processor.py`	Web search results extract the status field and serialize remaining content to JSON. MCP list-tools results serialize tool entries and server labels; MCP call results extract output and handle both success and error payloads. A dispatcher selects the correct summarizer based on result payload shape. Tests cover all three tool types and verify JSON serialization and error handling.
RAG artifact extraction from file search results `src/utils/agents/tool_processor.py`, `tests/unit/utils/agents/test_tool_processor.py`	Referenced documents are built from file-search result attributes (URL, title, document ID) with deduplication by `(url, title)` tuple across results and alternate attribute key fallbacks. RAG chunks map each file-search row to content, score, and attributes, filtering empty text. Helper functions resolve source labels and extract metadata fields. Tests verify document deduplication, alternate attribute keys, chunk filtering, and source mapping.
File search result integration and native tool result recording `src/utils/agents/tool_processor.py`, `tests/unit/utils/agents/test_tool_processor.py`	File-search result summarizer calls out to RAG extraction helpers and returns a tuple of tool result, chunks, and documents. The native tool result recorder routes by tool name, dispatches to type-specific summarizers, records results/chunks/documents to turn state while deduplicating by emitted ID, and marks round progression pending. Tests verify file-search returns are integrated with chunks and documents, and web-search/MCP results are recorded without artifacts.

Sequence Diagrams

sequenceDiagram
  participant StreamPart as Tool call part
  participant Summarizer as Summarizer
  participant Recorder as Recorder
  participant State as AgentTurnAccumulator
  StreamPart->>Summarizer: ToolCallPart or NativeToolCallPart
  Summarizer->>Summarizer: Match on call type and tool kind
  Summarizer-->>Recorder: ToolCallSummary
  Recorder->>Recorder: Check deduplication by emitted_id
  Recorder->>Recorder: Increment round if pending
  Recorder->>State: Append to turn_summary.tool_calls
  Recorder-->>Recorder: Return summary

flowchart LR
  FSR["OpenAI file-search results"]
  FSR -->|iterate| RD["Referenced document"]
  FSR -->|iterate| RC["RAG chunk"]
  RD -->|URL/title dedup| SD["Seen documents set"]
  RC -->|text content| CC["Filtered chunks"]
  RD -->|source label| RM["Source mapping"]
  RC -->|score + attrs| RM

sequenceDiagram
  participant NTR as Native tool result
  participant Dispatcher as Route by tool name
  participant Summarizer as Result summarizer
  participant Recorder as Recorder
  participant State as AgentTurnAccumulator
  NTR->>Dispatcher: NativeToolReturnPart
  Dispatcher->>Dispatcher: Match tool_name
  alt File search
    Dispatcher->>Summarizer: summarize_file_search_result
    Summarizer-->>Summarizer: Extract RAG chunks + documents
    Summarizer-->>Recorder: (ToolResultSummary, chunks, docs)
  else Web search or MCP
    Dispatcher->>Summarizer: type-specific summarizer
    Summarizer-->>Recorder: ToolResultSummary
  end
  Recorder->>Recorder: Dedup by emitted_id
  Recorder->>State: Append results, chunks, documents
  Recorder-->>Recorder: Mark round_increment_pending

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

tisnik

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: adding a tool processing module for pydantic-ai, which directly matches the core changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/utils/agents/tool_processor.py`:
- Around line 289-302: The code constructs AnyUrl(doc_url) unguarded, so a
malformed external metadata URL will raise and abort referenced-document
extraction; wrap the URL validation when building the ReferencedDocument in a
try/except (catch pydantic ValidationError or a generic Exception) and fall back
to None for doc_url while preserving doc_title and other fields (keep use of
_file_search_attribute_url, _file_search_attribute_str,
resolve_source_for_result, vector_store_ids and rag_id_mapping). Optionally log
or record the bad URL but ensure a bad URL does not prevent returning a
ReferencedDocument with document_id/title/source.
- Around line 380-385: The code currently mutates caller-owned payloads by
calling content.pop("status") on part.content inside the function that builds
ToolResultSummary; instead make a shallow copy of part.content (e.g., copy =
dict(part.content) or content_copy = cast(dict[str, Any], dict(part.content)))
and pop "status" from that copy, then use json.dumps(content_copy) for the
content field; apply the same change to the other occurrence that creates a
ToolResultSummary (the block around lines 505-509) so neither summarizer mutates
the original part.content or caller-owned objects.
- Around line 55-65: Update the signature and docstring of
summarize_native_tool_call to use modern union types (change
Optional[ToolCallSummary] to ToolCallSummary | None) and ensure all parameters
and return types are fully annotated; replace the Args: section with
Google-style "Parameters" and "Returns" headings and add a "Raises" section if
the function can raise exceptions. Locate the function by name
summarize_native_tool_call and apply the same style across the file for any
other helpers that still use Optional[...] or "Args:" so the module conforms to
the repo's typing and Google docstring conventions.
- Around line 480-483: The dispatcher currently uses the presence of an "error"
key to choose summarize_mcp_list_tools_result, which misroutes normal call
failures; change the branching to inspect the explicit originating-call
discriminator (e.g., the call name/id carried on the part or tool_round) instead
of the "error" field. Concretely, read the call identifier from the part (e.g.,
part.metadata['call'] or part.get('call')) or from tool_round (e.g.,
tool_round.call_name) and if it equals the list-tools call identifier (e.g.,
"mcp_list_tools" or whatever your call name is) then call
summarize_mcp_list_tools_result(part, tool_round), otherwise call
summarize_mcp_call_result(part, tool_round); do not use the presence of "error"
to decide routing.
- Around line 455-460: The code is converting structured MCP outputs to Python
repr by doing str(output); update the return in ToolResultSummary creation so
that when output is a dict/list (or other JSON-serializable types) it is
serialized to a JSON string instead of using str(), e.g., serialize the local
variable output with json.dumps (or a helper serializer) and assign that JSON
string to ToolResultSummary.content; ensure the code imports json and preserves
non-string scalars by falling back to json.dumps(output) only for non-str inputs
while leaving existing string outputs unchanged.
- Around line 68-104: The current match against MCPServerTool.kind in
summarize_native_tool_call() and process_native_tool_result() misses labeled
names like f"{MCPServerTool.kind}:<label>"; update those functions to detect MCP
tool names that either equal MCPServerTool.kind or start with
f"{MCPServerTool.kind}:" (use
part.tool_name.startswith(f"{MCPServerTool.kind}:") or strip the prefix with
part.tool_name.removeprefix(f"{MCPServerTool.kind}:") before matching) so the
existing MCP logic that uses label =
part.tool_name.removeprefix(f"{MCPServerTool.kind}:") and the MCP list/call
branches still run for labeled tools; ensure subsequent uses of part.tool_name
use the extracted label where appropriate.

In `@tests/unit/utils/agents/test_tool_processor.py`:
- Around line 106-119: The test coverage is missing end-to-end MCP cases for
labeled server names and error payloads: update/add tests (e.g., around
test_mcp_list_tools_call and the other ranges noted) to call
summarize_native_tool_call with both tool_name=MCPServerTool.kind and with
explicit labeled server_label values, then pass resulting payloads through
summarize_mcp_tool_result and process_native_tool_result including an {"error":
"..."} result payload to assert the summary/error fields are handled correctly;
use the functions summarize_native_tool_call, summarize_mcp_tool_result, and
process_native_tool_result and assert expected name, args (including
server_label), and error propagation in the test assertions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: abc4e8d7-58dc-47b2-a9e1-57713fd2208f

📥 Commits

Reviewing files that changed from the base of the PR and between bf9b8a7 and 3b0fe3d.

📒 Files selected for processing (2)

src/utils/agents/tool_processor.py
tests/unit/utils/agents/test_tool_processor.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: build-pr
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request

🧰 Additional context used

📓 Path-based instructions (2)

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

tests/unit/utils/agents/test_tool_processor.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

src/utils/agents/tool_processor.py

asimurka · 2026-06-05T10:35:10Z

+            label = tool_name.removeprefix(_MCP_SERVER_TOOL_PREFIX)
+            action = args.get("action")
+            # MCP list tools
+            if action == "list_tools":


Relies on pydantic open responses processor:

def _map_mcp_list_tools( item: responses.response_output_item.McpListTools, provider_name: str ) -> tuple[NativeToolCallPart, NativeToolReturnPart]: tool_name = ':'.join([MCPServerTool.kind, item.server_label]) return ( NativeToolCallPart( tool_name=tool_name, tool_call_id=item.id, provider_name=provider_name, args={'action': 'list_tools'}, ), ... )

asimurka · 2026-06-05T10:39:52Z

+    Returns:
+        Tool result summary in LCS turn-summary format.
+    """
+    content = cast(dict[str, Any], part.content)


Safe to cast for all processing openai responses parts

asimurka · 2026-06-05T10:41:57Z

+        content=json.dumps(list_summary.model_dump()),
+        type="mcp_list_tools",
+        round=tool_round,
+    )


Relies on pydantic-ai open responses processor:

def _map_mcp_list_tools( item: responses.response_output_item.McpListTools, provider_name: str ) -> tuple[NativeToolCallPart, NativeToolReturnPart]: tool_name = ':'.join([MCPServerTool.kind, item.server_label]) return ( ... NativeToolReturnPart( tool_name=tool_name, tool_call_id=item.id, content=item.model_dump(mode='json', include={'tools', 'error'}), provider_name=provider_name, ), )

asimurka · 2026-06-05T10:43:35Z

+        content=str(output),
+        type="mcp_call",
+        round=tool_round,
+    )


Relies on pydantic-ai processor:

def _map_mcp_call( item: responses.response_output_item.McpCall, provider_name: str ) -> tuple[NativeToolCallPart, NativeToolReturnPart]: tool_name = ':'.join([MCPServerTool.kind, item.server_label]) return ( ... NativeToolReturnPart( tool_name=tool_name, tool_call_id=item.id, content={ 'output': item.output, 'error': item.error, }, provider_name=provider_name, ), )

asimurka · 2026-06-05T10:45:45Z

+    results = [
+        OpenAIFileSearchResult.model_validate(result)
+        for result in content.get("results", [])
+    ]


Relies on pydantic-ai processor:

def _map_file_search_tool_call( item: responses.ResponseFileSearchToolCall, provider_name: str, ) -> tuple[NativeToolCallPart, NativeToolReturnPart]: result: dict[str, Any] = { 'status': item.status, } if item.results is not None: result['results'] = [r.model_dump(mode='json') for r in item.results] return ( ... NativeToolReturnPart( tool_name=FileSearchTool.kind, tool_call_id=item.id, content=result, provider_name=provider_name, ), )

asimurka · 2026-06-05T10:48:50Z

+                id=call_id,
+                name=args.get("tool_name") or "",
+                args=args.get("tool_args", {}),
+                type="mcp_call",


Relies on pydantic-ai processor:

def _map_mcp_call( item: responses.response_output_item.McpCall, provider_name: str ) -> tuple[NativeToolCallPart, NativeToolReturnPart]: tool_name = ':'.join([MCPServerTool.kind, item.server_label]) return ( NativeToolCallPart( tool_name=tool_name, tool_call_id=item.id, args={ 'action': 'call_tool', 'tool_name': item.name, 'tool_args': json.loads(item.arguments) if item.arguments else {}, }, provider_name=provider_name, ), ... )

jrobertboos

LGTM

Added tool processing module for pydantic-ai

3b0fe3d

asimurka requested review from jrobertboos and tisnik June 5, 2026 09:05

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

Fixed dynamic MCP naming by pydantic

be57676

asimurka force-pushed the pydantic_tool_processors branch from 89c9c20 to be57676 Compare June 5, 2026 09:42

asimurka commented Jun 5, 2026

View reviewed changes

jrobertboos approved these changes Jun 5, 2026

View reviewed changes

Conversation

asimurka commented Jun 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asimurka Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asimurka Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asimurka Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

jrobertboos left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

asimurka commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

asimurka Jun 5, 2026 •

edited

Loading

asimurka Jun 5, 2026 •

edited

Loading