Skip to content

feat(zig): add Zig language support with tree-sitter extraction#739

Open
hiwepy wants to merge 12 commits into
colbymchenry:mainfrom
partme-ai:feat/zig-supported
Open

feat(zig): add Zig language support with tree-sitter extraction#739
hiwepy wants to merge 12 commits into
colbymchenry:mainfrom
partme-ai:feat/zig-supported

Conversation

@hiwepy

@hiwepy hiwepy commented Jun 8, 2026

Copy link
Copy Markdown

Summary

Add comprehensive Zig language support to CodeGraph using web-tree-sitter.

Features

  • Functions & methods: Extract fn declarations, distinguish methods inside struct/enum/union containers via self: *Type receiver detection (getReceiverType hook)
  • Struct/enum/union/opaque: Named type declarations via const Foo = struct {} pattern, including packed/extern variants
  • Error sets: error{ NotFound, OutOfMemory } extraction
  • Type aliases: const Callback = *const fn(*anyopaque) void
  • Enum members: Extract enum variants (excluding _ for non-exhaustive enums)
  • Container fields: Struct fields with type-reference edges to custom types
  • Variable declarations: const/var with isConst differentiation, threadlocal support
  • @import detection: Standard extractImport hook (aligned with Rust/C#/Go patterns)
  • @cImport/@cInclude: C interop import detection
  • @call indirect calls: @call(.auto, funcName, .{})calls reference
  • @embedFile: File embedding as imports reference
  • Type references: Function parameter/return types, variable type annotations, container field types → references edges (builtin types filtered)
  • Struct initializers: Foo{ .x = 1 }instantiates edges
  • Call tracking: std.debug.print chained calls resolved to full path
  • Test declarations: test "name" extracted as functions
  • Comptime blocks: Declarations inside comptime {} extracted
  • usingnamespace: Namespace imports tracked

Standardization

Aligned with Rust/Go/C# extractor patterns:

  • extractImport hook for @import detection
  • getReceiverType hook for self: *Type method receiver extraction
  • ZIG_BUILTIN_TYPES filter to prevent builtin type reference noise
  • Consistent getVisibility/isExported/getSignature implementations

Testing

  • 73 unit tests covering all features
  • 7 real-world evaluation tests against agentscope-zig (1,412 .zig files, 0 parse errors, 32,747 nodes, 71,483 references extracted)

Files Changed

  • src/extraction/languages/zig.ts — Zig language extractor
  • src/extraction/grammars.ts — Zig grammar registration
  • src/extraction/wasm/tree-sitter-zig.wasm — Vendored WASM grammar
  • __tests__/zig-extraction.test.ts — Unit tests
  • __tests__/zig-real-world-eval.test.ts — Real-world evaluation

hiwepy and others added 12 commits June 7, 2026 14:50
- Rewrite function_declaration handling entirely in visitNode with
  custom walkBodyForCalls to correctly resolve chained field chains
  (std.debug.print, std.mem.eql, std.fs.cwd, etc.) that the core
  extractCall resolves to just the leaf method name
- Add @import("std").testing pattern detection for import references
  nested inside field_expression nodes
- Add test_declaration extraction as function nodes (both string and
  identifier name forms)
- Add comptime_declaration handling so inner declarations are visited
- Add using_namespace_declaration support (deprecated but present)
- Handle packed struct / extern struct / tagged union / enum(u8)
- Filter _ wildcard from non-exhaustive enum members
- Add export fn visibility / isExported detection
- Threadlocal var detection as variable (not constant)
- 36 tests covering functions, types, fields, methods, enums, imports,
  chained calls, tests, comptime, threadlocal, and stdlib patterns
Prevent false-positive resolution of Zig stdlib calls (std.debug.print,
std.mem.eql, etc.) by treating the `std.`, `builtin.`, and `@`-prefixed
namespaces as external. Without this, the resolver matches e.g.
`std.debug.print` to an unrelated project node named "print", creating
incorrect call edges.

Also treats `root` module references as external.

This matches the existing patterns for Go (GO_STDLIB_PACKAGES), Python
(PYTHON_BUILT_INS), C/C++ (C_BUILT_INS/CPP_BUILT_INS), and Pascal
(PASCAL_BUILT_INS).
Enable type-annotation-based `references` edges for Zig, matching the
pattern used by Java, TypeScript, Go, Rust, and other languages.

- Add `zig` to TYPE_ANNOTATION_LANGUAGES so the core extractor's
  extractTypeAnnotations path runs for Zig function/method/field nodes
- Add extractZigTypeAnnotations to handle Zig-specific AST:
  `parameters` is not a named field, type names are `identifier` nodes
- Add extractZigTypeRefs in zig.ts for the visitNode hook path
  (the hook handles function_declaration, bypassing the core extractor)
- Filter parameter-name identifiers (self, allocator, etc.) from
  type references by skipping the first identifier in `parameter` nodes
- Filter module-path segments in field_expression chains
  (std.mem.Allocator → only Allocator is a type reference)
- Add extractZigTypeRefsFromSubtree in tree-sitter.ts with the same
  filtering for the core extractor path (struct field types via
  container_field)

Results on agentscope-zig (719 files):
  Before: 27,515 edges (contains 15,830 + calls 9,553 + imports 2,132)
  After:  34,976 edges (contains 16,311 + calls 9,704 + references 6,790 + imports 2,171)
  Edges/file: 40.6 → 49.2 (+21%)
  References/file: 9.4 (Java: 7.3)
- Add builtin_function handling to walkBodyForCalls: non-@import builtins
  (@as, @ptrCast, @intcast, @sizeof, @memcpy, @min, @max, etc.) now
  produce call references instead of being silently skipped
- Strip self/this/super receiver prefix from field_expression call chains
  so self.beforeAgentExecution() resolves to "beforeAgentExecution"
  (matches the core extractCall behavior for other languages)
- The SKIP_RECEIVERS filter in buildFieldChain mirrors the core's
  calibration for Java (this), Python (self/cls), and ObjC (self/super)

Results on agentscope-zig (719 files):
  calls:    9,704 → 10,045 (+341, +3.5%)
  total:   34,976 → 35,900 (+924, +2.6%)
  ReActAgent::call: 3 calls → 6 calls (all correct)
Replace the `constant` node creation for @import with proper `import`
nodes that mirror Java's import tracking structure:

- Each @import("module") now creates an `import` node (name = module path)
  with the full declaration as signature
- The unresolved `imports` reference connects from the import node to the
  module name (instead of from the file node directly)
- The visitNode hook returns `true` for @import variables, skipping the
  core's extractVariable (which would create a duplicate `constant` node)

This makes the Zig node/edge structure structurally equivalent to Java:
  import nodes: 0 → 2,167
  constant nodes: 2,645 → 559 (only real constants now)
- type_alias nodes: detect `const Name = type_expression;` patterns
  (function pointer aliases, simple type aliases) by checking for
  exactly 2 named children where the second is a type-expression node
  (pointer_type, builtin_type, function_signature, etc.) and not a
  value literal (integer/string/call). Creates a `type_alias` node
  instead of the generic `constant` that the core would produce.
- instantiates edges: handle `struct_initializer` (Foo{} or Foo{.x=1})
  nodes in walkBodyForCalls, extracting the type name from the first
  identifier child and creating an `instantiates` unresolved reference.

Results on agentscope-zig (780 files):
  nodes:    17,567 → 18,820 (+1,253)
  edges:    35,855 → 39,245 (+3,390)
  type_alias: 250 NEW
  instantiates: 1,099 NEW
# Conflicts:
#	src/extraction/grammars.ts
…e/callconv

Restructure the Zig extractor to match the Kotlin pattern:
- Split 330-line visitNode into 8 named handler functions + 24-line switch dispatcher
- Implement extractModifiers hook (inline/noinline/comptime)
- Add tagged union metadata ({ taggedUnion: true }) + enum_member extraction for union(enum) variants
- Capture calling_convention in function signatures (callconv(.C))
- Mark inline/noinline functions in metadata
- Fix fromNodeId: 'standalone' → importId.id for @embedFile/@cImport references
- Remove dead Zig type extraction methods from tree-sitter.ts (-75 lines)
- Add 20 new tests: defer/errdefer/try/catch/for/while/labeled block/switch/orelse call tracking, .zon files, anonymous struct literals, extern fn, module docs, destructuring, C variadic

Validated on agentscope-zig (1413 Zig files): agent A/B n=2 shows 0 Read/0 Grep, 62-72% faster than without codegraph.
…ooks

Align Zig extractor with Rust/Go/C# patterns:
- Add extractImport hook for @import detection (standard hook vs visitNode)
- Add getReceiverType hook for self: *Type method receiver extraction
- Add ZIG_BUILTIN_TYPES filter to prevent builtin type reference noise
- Add container field type-reference extraction (custom types on fields)
- Add function parameter/return type-reference extraction
- Add @cImport, @call, @embedfile detection
- Add real-world eval test against agentscope-zig (1412 files, 0 errors)

Constraint: Zig syntax is fundamentally different from class-based languages
  (const Foo = struct {} vs class Foo {}), requiring custom visitNode logic
  that cannot be replaced by purely declarative config.
Rejected: Moving all visitNode logic to declarative config | Zig container
  pattern requires runtime AST inspection to distinguish type declarations
  from regular constants.
Confidence: high
Scope-risk: moderate
Directive: Do not remove visitNode for variable_declaration — it handles
  the const = struct/enum/union pattern that has no declarative equivalent.
Tested: 80 tests passing (73 unit + 7 real-world), 1412-file corpus validated
Not-tested: Zig async/await (deprecated in 0.16.0), comptime blocks with
  complex nested type construction
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant