Skip to content

feat: support Claude Code transcripts#168

Open
LoikStyle wants to merge 1 commit into
XortexAI:mainfrom
LoikStyle:loikstyle/claude-code-transcript-156
Open

feat: support Claude Code transcripts#168
LoikStyle wants to merge 1 commit into
XortexAI:mainfrom
LoikStyle:loikstyle/claude-code-transcript-156

Conversation

@LoikStyle
Copy link
Copy Markdown

Summary

  • Add Claude Code JSONL transcript parsing to the context import pipeline
  • Extract only conversational text from user/assistant turns and ignore tool-only blocks
  • Include focused regression coverage for Claude Code transcript uploads

Test Plan

  • python3 -m pytest tests/test_claude_code_transcript.py -q -o addopts=''
  • python3 -m py_compile src/api/routes/memory.py server.py

Fixes #156

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a parser for Claude Code JSONL transcripts, adding _content_to_text and _parse_claude_code_transcript functions to server.py and src/api/routes/memory.py, and includes a new test file. Feedback indicates that the parsing logic is duplicated and should be moved to a shared module to reduce maintenance overhead. Additionally, a performance optimization was suggested to include a heuristic check for JSON content before attempting to parse the transcript lines.

Comment thread server.py
Comment on lines +798 to +810
def _content_to_text(content: Any) -> str:
"""Extract readable text from Claude Code message content blocks."""
if isinstance(content, str):
return content.strip()
if isinstance(content, list):
chunks: list[str] = []
for item in content:
if isinstance(item, str):
chunks.append(item)
elif isinstance(item, dict) and item.get("type") == "text":
chunks.append(str(item.get("text", "")))
return "\n".join(chunk.strip() for chunk in chunks if chunk.strip()).strip()
return ""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for _content_to_text and _parse_claude_code_transcript is duplicated between server.py and src/api/routes/memory.py. This increases maintenance overhead and the risk of inconsistencies as the parsing logic evolves. Consider moving these utilities to a shared module (e.g., src/utils/transcripts.py) that both files can import from.

Comment thread server.py
current_user_query: str | None = None
assistant_chunks: list[str] = []

for raw_line in text.splitlines():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of _parse_claude_code_transcript iterates through every line of the input text and attempts to parse it as JSON. This can be inefficient for large non-JSON transcripts (e.g., standard markdown files that don't match Cursor or Antigravity formats). Since Claude Code transcripts are JSONL files, adding a quick heuristic check at the beginning of the function can avoid unnecessary processing.

Suggested change
for raw_line in text.splitlines():
if not text.strip().startswith("{"):
return []
for raw_line in text.splitlines():

Comment thread src/api/routes/memory.py
current_user_query: str | None = None
assistant_chunks: List[str] = []

for raw_line in text.splitlines():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of _parse_claude_code_transcript iterates through every line of the input text and attempts to parse it as JSON. This can be inefficient for large non-JSON transcripts. Adding a quick heuristic check at the beginning of the function can avoid unnecessary processing for files that are clearly not in JSONL format.

Suggested change
for raw_line in text.splitlines():
if not text.strip().startswith("{"):
return []
for raw_line in text.splitlines():

Comment thread src/api/routes/memory.py
current_user_query: str | None = None
assistant_chunks: List[str] = []

for raw_line in text.splitlines():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Since Claude Code transcripts are JSONL, the shared parser should first reject obvious non-JSONL input before iterating through every line. This should be fixed in the shared parser rather than separately in both files.

Comment thread server.py


def _content_to_text(content: Any) -> str:
"""Extract readable text from Claude Code message content blocks."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Since this parser is used by both the standalone server and the production memory route, please move the Claude transcript parsing into src/utils/transcripts.py and have both server.py and src/api/routes/memory.py import the shared parser from there.

Copy link
Copy Markdown
Contributor

@Ankit-Kotnala Ankit-Kotnala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature is good, but @LoikStyle should centralize the parser and clean up the test before merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add support of claude code transcript in /context page

2 participants