fix(parsers/python): segment-match path exclusion/classification + resolve relative-import anchors#90
Open
gadievron wants to merge 2 commits into
Conversation
…solve relative-import anchors
Three independent defects in parsers/python/function_extractor.py:
1. extract_all(): the no-args scan excluded files with
`any(excl in str(file_path) for excl in [...])` -- an unanchored substring test on the full path, so
a file whose path merely contains a token ('myvenv/keep.py' contains 'venv') was silently dropped,
and an ancestor directory containing a token could exclude the whole scan. Now matches whole path
SEGMENTS: `{tokens} & set(file_path.relative_to(repo_path).parts)`. Python's own token set
(__pycache__/.git/venv/.venv/node_modules) is preserved.
2. classify_function(): classification used `'<token>' in path_lower` substring tests, so
'interviews/api.py' was classified 'view_function'. 'view_function' is in
entry_point_detector.ENTRY_POINT_TYPES (:26-32), so that misclassification became a false entry-point
seed that cascades into false reachability (consumed at entry_point_detector.py:177). The 'views'
token now matches a whole path segment via a new _path_has_segment helper. The 'middleware' token is
given the same segment fix because it shares the substring defect, but note 'middleware' (the python
label) is NOT in ENTRY_POINT_TYPES -- so that half is classification accuracy, not a reachability
change. The 'test' classifier is left as a substring on purpose (test-file conventions use 'tests/'
and 'test_*'/'*_test' forms a segment match would miss; 'test' is not an entry-point type, so it
seeds no false reachability).
3. extract_imports(): the ast.ImportFrom branch read node.module but never node.level, so relative
imports lost their package anchor ('from . import X' stored bare 'X'; 'from ..pkg import Y' stored
anchor-less 'pkg.Y'). call_graph_builder._resolve_import then rebuilt a wrong/no file path and the
edges were dropped (verified: pre-fix the candidate resolves to None, post-fix it resolves to the real
pkg/sub/helpers.py). Now reconstructs the absolute anchor from the importing file's package location
(level=1 -> own package, level=2 -> parent, ...); over-deep levels degrade to no leading dot. Absolute
imports (level=0) are unchanged.
Scope: the php/ruby function_extractor.py extract_all + classify siblings carry related defects and are
not widened here.
Tests: tests/test_python_function_extractor.py -- loads the module under a unique importlib name (the
bare 'function_extractor' name is shared by five other parsers, so a plain import would pollute
sys.modules for the rest of the suite). Three checks: segment-vs-substring exclusion, entry-point
classification by segment, and relative-import anchor reconstruction. RED 3 failed (pre-fix) -> GREEN 3
passed; full suite 179 passed / 63 skipped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Windows CI The segment-vs-substring exclusion regression test recorded the processed path via str(Path.relative_to(...)), which yields backslash separators on Windows and fails the forward-slash 'in seen' assertions. Use .as_posix() so the comparison is OS-independent. The substring-over-exclusion assertions are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three independent defects in parsers/python/function_extractor.py:
extract_all(): the no-args scan excluded files with
any(excl in str(file_path) for excl in [...])-- an unanchored substring test on the full path, soa file whose path merely contains a token ('myvenv/keep.py' contains 'venv') was silently dropped,
and an ancestor directory containing a token could exclude the whole scan. Now matches whole path
SEGMENTS:
{tokens} & set(file_path.relative_to(repo_path).parts). Python's own token set(pycache/.git/venv/.venv/node_modules) is preserved.
classify_function(): classification used
'<token>' in path_lowersubstring tests, so'interviews/api.py' was classified 'view_function'. 'view_function' is in
entry_point_detector.ENTRY_POINT_TYPES (:26-32), so that misclassification became a false entry-point
seed that cascades into false reachability (consumed at entry_point_detector.py:177). The 'views'
token now matches a whole path segment via a new path_has_segment helper. The 'middleware' token is
given the same segment fix because it shares the substring defect, but note 'middleware' (the python
label) is NOT in ENTRY_POINT_TYPES -- so that half is classification accuracy, not a reachability
change. The 'test' classifier is left as a substring on purpose (test-file conventions use 'tests/'
and 'test'/'_test' forms a segment match would miss; 'test' is not an entry-point type, so it
seeds no false reachability).
extract_imports(): the ast.ImportFrom branch read node.module but never node.level, so relative
imports lost their package anchor ('from . import X' stored bare 'X'; 'from ..pkg import Y' stored
anchor-less 'pkg.Y'). call_graph_builder._resolve_import then rebuilt a wrong/no file path and the
edges were dropped (verified: pre-fix the candidate resolves to None, post-fix it resolves to the real
pkg/sub/helpers.py). Now reconstructs the absolute anchor from the importing file's package location
(level=1 -> own package, level=2 -> parent, ...); over-deep levels degrade to no leading dot. Absolute
imports (level=0) are unchanged.
Scope: the php/ruby function_extractor.py extract_all + classify siblings carry related defects and are
not widened here.
Tests: tests/test_python_function_extractor.py -- loads the module under a unique importlib name (the
bare 'function_extractor' name is shared by five other parsers, so a plain import would pollute
sys.modules for the rest of the suite). Three checks: segment-vs-substring exclusion, entry-point
classification by segment, and relative-import anchor reconstruction. RED 3 failed (pre-fix) -> GREEN 3
passed; full suite 179 passed / 63 skipped.
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com