feat(zig): add Zig language support with tree-sitter extraction#739
Open
hiwepy wants to merge 12 commits into
Open
feat(zig): add Zig language support with tree-sitter extraction#739hiwepy wants to merge 12 commits into
hiwepy wants to merge 12 commits into
Conversation
- Rewrite function_declaration handling entirely in visitNode with custom walkBodyForCalls to correctly resolve chained field chains (std.debug.print, std.mem.eql, std.fs.cwd, etc.) that the core extractCall resolves to just the leaf method name - Add @import("std").testing pattern detection for import references nested inside field_expression nodes - Add test_declaration extraction as function nodes (both string and identifier name forms) - Add comptime_declaration handling so inner declarations are visited - Add using_namespace_declaration support (deprecated but present) - Handle packed struct / extern struct / tagged union / enum(u8) - Filter _ wildcard from non-exhaustive enum members - Add export fn visibility / isExported detection - Threadlocal var detection as variable (not constant) - 36 tests covering functions, types, fields, methods, enums, imports, chained calls, tests, comptime, threadlocal, and stdlib patterns
Prevent false-positive resolution of Zig stdlib calls (std.debug.print, std.mem.eql, etc.) by treating the `std.`, `builtin.`, and `@`-prefixed namespaces as external. Without this, the resolver matches e.g. `std.debug.print` to an unrelated project node named "print", creating incorrect call edges. Also treats `root` module references as external. This matches the existing patterns for Go (GO_STDLIB_PACKAGES), Python (PYTHON_BUILT_INS), C/C++ (C_BUILT_INS/CPP_BUILT_INS), and Pascal (PASCAL_BUILT_INS).
Enable type-annotation-based `references` edges for Zig, matching the pattern used by Java, TypeScript, Go, Rust, and other languages. - Add `zig` to TYPE_ANNOTATION_LANGUAGES so the core extractor's extractTypeAnnotations path runs for Zig function/method/field nodes - Add extractZigTypeAnnotations to handle Zig-specific AST: `parameters` is not a named field, type names are `identifier` nodes - Add extractZigTypeRefs in zig.ts for the visitNode hook path (the hook handles function_declaration, bypassing the core extractor) - Filter parameter-name identifiers (self, allocator, etc.) from type references by skipping the first identifier in `parameter` nodes - Filter module-path segments in field_expression chains (std.mem.Allocator → only Allocator is a type reference) - Add extractZigTypeRefsFromSubtree in tree-sitter.ts with the same filtering for the core extractor path (struct field types via container_field) Results on agentscope-zig (719 files): Before: 27,515 edges (contains 15,830 + calls 9,553 + imports 2,132) After: 34,976 edges (contains 16,311 + calls 9,704 + references 6,790 + imports 2,171) Edges/file: 40.6 → 49.2 (+21%) References/file: 9.4 (Java: 7.3)
- Add builtin_function handling to walkBodyForCalls: non-@import builtins (@as, @ptrCast, @intcast, @sizeof, @memcpy, @min, @max, etc.) now produce call references instead of being silently skipped - Strip self/this/super receiver prefix from field_expression call chains so self.beforeAgentExecution() resolves to "beforeAgentExecution" (matches the core extractCall behavior for other languages) - The SKIP_RECEIVERS filter in buildFieldChain mirrors the core's calibration for Java (this), Python (self/cls), and ObjC (self/super) Results on agentscope-zig (719 files): calls: 9,704 → 10,045 (+341, +3.5%) total: 34,976 → 35,900 (+924, +2.6%) ReActAgent::call: 3 calls → 6 calls (all correct)
Replace the `constant` node creation for @import with proper `import` nodes that mirror Java's import tracking structure: - Each @import("module") now creates an `import` node (name = module path) with the full declaration as signature - The unresolved `imports` reference connects from the import node to the module name (instead of from the file node directly) - The visitNode hook returns `true` for @import variables, skipping the core's extractVariable (which would create a duplicate `constant` node) This makes the Zig node/edge structure structurally equivalent to Java: import nodes: 0 → 2,167 constant nodes: 2,645 → 559 (only real constants now)
- type_alias nodes: detect `const Name = type_expression;` patterns
(function pointer aliases, simple type aliases) by checking for
exactly 2 named children where the second is a type-expression node
(pointer_type, builtin_type, function_signature, etc.) and not a
value literal (integer/string/call). Creates a `type_alias` node
instead of the generic `constant` that the core would produce.
- instantiates edges: handle `struct_initializer` (Foo{} or Foo{.x=1})
nodes in walkBodyForCalls, extracting the type name from the first
identifier child and creating an `instantiates` unresolved reference.
Results on agentscope-zig (780 files):
nodes: 17,567 → 18,820 (+1,253)
edges: 35,855 → 39,245 (+3,390)
type_alias: 250 NEW
instantiates: 1,099 NEW
# Conflicts: # src/extraction/grammars.ts
…e/callconv
Restructure the Zig extractor to match the Kotlin pattern:
- Split 330-line visitNode into 8 named handler functions + 24-line switch dispatcher
- Implement extractModifiers hook (inline/noinline/comptime)
- Add tagged union metadata ({ taggedUnion: true }) + enum_member extraction for union(enum) variants
- Capture calling_convention in function signatures (callconv(.C))
- Mark inline/noinline functions in metadata
- Fix fromNodeId: 'standalone' → importId.id for @embedFile/@cImport references
- Remove dead Zig type extraction methods from tree-sitter.ts (-75 lines)
- Add 20 new tests: defer/errdefer/try/catch/for/while/labeled block/switch/orelse call tracking, .zon files, anonymous struct literals, extern fn, module docs, destructuring, C variadic
Validated on agentscope-zig (1413 Zig files): agent A/B n=2 shows 0 Read/0 Grep, 62-72% faster than without codegraph.
…ooks Align Zig extractor with Rust/Go/C# patterns: - Add extractImport hook for @import detection (standard hook vs visitNode) - Add getReceiverType hook for self: *Type method receiver extraction - Add ZIG_BUILTIN_TYPES filter to prevent builtin type reference noise - Add container field type-reference extraction (custom types on fields) - Add function parameter/return type-reference extraction - Add @cImport, @call, @embedfile detection - Add real-world eval test against agentscope-zig (1412 files, 0 errors) Constraint: Zig syntax is fundamentally different from class-based languages (const Foo = struct {} vs class Foo {}), requiring custom visitNode logic that cannot be replaced by purely declarative config. Rejected: Moving all visitNode logic to declarative config | Zig container pattern requires runtime AST inspection to distinguish type declarations from regular constants. Confidence: high Scope-risk: moderate Directive: Do not remove visitNode for variable_declaration — it handles the const = struct/enum/union pattern that has no declarative equivalent. Tested: 80 tests passing (73 unit + 7 real-world), 1412-file corpus validated Not-tested: Zig async/await (deprecated in 0.16.0), comptime blocks with complex nested type construction
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add comprehensive Zig language support to CodeGraph using web-tree-sitter.
Features
fndeclarations, distinguish methods inside struct/enum/union containers viaself: *Typereceiver detection (getReceiverTypehook)const Foo = struct {}pattern, including packed/extern variantserror{ NotFound, OutOfMemory }extractionconst Callback = *const fn(*anyopaque) void_for non-exhaustive enums)const/varwithisConstdifferentiation,threadlocalsupport@importdetection: StandardextractImporthook (aligned with Rust/C#/Go patterns)@cImport/@cInclude: C interop import detection@callindirect calls:@call(.auto, funcName, .{})→callsreference@embedFile: File embedding asimportsreferencereferencesedges (builtin types filtered)Foo{ .x = 1 }→instantiatesedgesstd.debug.printchained calls resolved to full pathtest "name"extracted as functionscomptime {}extractedusingnamespace: Namespace imports trackedStandardization
Aligned with Rust/Go/C# extractor patterns:
extractImporthook for@importdetectiongetReceiverTypehook forself: *Typemethod receiver extractionZIG_BUILTIN_TYPESfilter to prevent builtin type reference noisegetVisibility/isExported/getSignatureimplementationsTesting
.zigfiles, 0 parse errors, 32,747 nodes, 71,483 references extracted)Files Changed
src/extraction/languages/zig.ts— Zig language extractorsrc/extraction/grammars.ts— Zig grammar registrationsrc/extraction/wasm/tree-sitter-zig.wasm— Vendored WASM grammar__tests__/zig-extraction.test.ts— Unit tests__tests__/zig-real-world-eval.test.ts— Real-world evaluation