Skip to content

Parser enhancements wave: integration base#71

Merged
StreamDemon merged 7 commits into
mainfrom
feature/parser-enhancements-base
Jul 2, 2026
Merged

Parser enhancements wave: integration base#71
StreamDemon merged 7 commits into
mainfrom
feature/parser-enhancements-base

Conversation

@StreamDemon

@StreamDemon StreamDemon commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Summary

Wave complete — all six roadmap items shipped:

Ride-along: issue #78 (spurious "expected identifier" for non-ident-headed type args like Vec<&str>) was found while designing the attribute-argument parser, verified with a probe, and filed for a separate fix — it is a behavior change, so it deliberately does not ride this wave.

Related Issue

None (review-driven; companion to PR #69).

Spec Sections Affected

None — implementation quality only; no grammar or behavior changes.

Checklist

  • Code follows the Sploosh design principles (one way to do it, explicit over implicit, etc.)
  • Documentation updated in relevant docs/ pages — N/A, no behavior change
  • Tests added or updated
  • All build targets still compile (if applicable)
  • Spec-only PR (skip Build Targets section if checked)

Build Targets Tested

  • cargo fmt --all -- --check, cargo clippy --workspace --all-targets -- -D warnings, cargo test --workspace green on the assembled base (54 tests, 13 corpus fixtures). Every sub-PR was individually CI-green and cubic-reviewed before merging.

Test Plan

@StreamDemon StreamDemon force-pushed the feature/parser-enhancements-base branch from c04bde5 to c7bb5a3 Compare July 2, 2026 08:23
Empty seed commit for the enhancement-wave base PR; the roadmap lives in
the PR description. Rebased onto main after the parser correctness wave
(PR #69) merged.
@StreamDemon StreamDemon force-pushed the feature/parser-enhancements-base branch from c7bb5a3 to 1a01d83 Compare July 2, 2026 08:24
`numeric_suffix` and `validate_numeric_body` each carried their own copy
of the 13-entry suffix table, so any future suffix change had to be made
twice or the two paths would drift apart. Hoist the table into a shared
`NUMERIC_SUFFIXES` const and return the matched `&'static str` instead
of allocating a fresh `String` on every suffix scan.

A new test iterates the shared table and lexes `1<suffix>` for every
entry, so additions to the list are covered automatically.
cubic-dev-ai[bot]
cubic-dev-ai Bot previously approved these changes Jul 2, 2026

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.
Architecture diagram
sequenceDiagram
    participant L as Lexer
    participant Src as Source Text
    participant NS as NUMERIC_SUFFIXES const
    participant Tok as Token

    Note over L,Tok: Numeric literal scanning with shared suffix const

    L->>Src: rest = &source[pos..]
    L->>L: scan numeric digits & separators
    L->>NS: iterate over suffixes for match
    NS-->>L: suffix (e.g., "i32") as &'static str
    alt suffix found
        L->>Src: advance pos by suffix.len()
        L->>Tok: create token with &'static suffix (no String alloc)
    else no suffix
        L->>Tok: create token without suffix
    end

    Note over L: Later, validate_numeric_body uses same const
    L->>NS: find_map(f: strip suffix from body)
    NS-->>L: suffix to strip (if present)
    L->>L: validate separators in remainder

    Note over L,Tok: NEW: test covers all suffixes
    Test->>L: lex("1<i32>")
    L->>NS: iterate suffixes
    NS-->>L: "i32"
    L-->>Test: Token(IntLit, "1i32")
    Test->>Test: assert kind and lexeme
Loading

Auto-approved: Refactors numeric suffix handling into a shared const and eliminates per-scan allocations; adds a test covering all suffixes.

Re-trigger cubic

Every token carried an owned copy of its source text, so lexing a file
allocated one String per token even though the source buffer already
holds the same bytes. Token is now just a kind and a span; the new
`Token::text(source)` slices the original buffer on demand.

The parser threads the source string through and derives text only at
the few places that need it (identifiers, literals, the `vec` head, the
extern target). Unary and binary operator text now comes from the token
kind rather than the lexeme, since the kind already determines it.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 2 files (changes from recent commits).

Requires human review: Refactors token representation in lexer and parser to use span-based text slicing; moderate risk due to core data structure change.

Re-trigger cubic

With the lexeme gone, a token is a payload-free kind plus a span — both
trivially copyable. Deriving Copy lets every parse loop pass and return
tokens by value, so the `.clone()` calls sprinkled through `at`, `eat`,
`expect`, `bump`, `peek_kind`, and the recovery helpers all disappear.

The PR #71 roadmap sketched this slot as "`at`/`eat`/`expect` take
`&TokenKind`"; deriving Copy reaches the same goal (no clones in parse
loops) with by-value call sites instead of reference threading, which
only became possible after the span-slicing change landed.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 2 files (changes from recent commits).

Requires human review: This PR refactors the lexer to store tokens as span-based slices and the parser to use them, along with other internal improvements. Although no behavior changes are intended, such refactors have a broad impact across core data structures and code paths, making human review necessary.

Re-trigger cubic

`ExprKind::Unary`/`Binary` stored their operator as an owned String,
which allocated per node and let any string masquerade as an operator.
Dedicated enums make illegal operators unrepresentable, shrink the
nodes, and give match exhaustiveness checking to every consumer.

`=` never reaches the AST (it builds `ExprKind::Assign`), so the parser
classifies infix tokens with a private `Infix { Assign, Op(BinaryOp) }`
wrapper instead of widening the public enum with a variant no AST node
can carry. `as_str()`/`Display` on both enums recover the source
spelling for diagnostics and the future pretty-printer (#67).

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 2 files (changes from recent commits).

Requires human review: Introduces structural changes in AST, lexer, and parser (enums, span-based tokens, Copy); risk of subtle breakage across multiple crates.

Re-trigger cubic

`@mailbox(capacity: 2048)` and `@supervisor(strategy: "one_for_one")`
lost their arguments at parse time — the parser skipped everything
inside the parens — and attributes on actor handlers were dropped
outright. Nothing downstream (semantic analysis, diagnostics) could
ever see them.

`Attribute` now carries `args: Vec<AttrArg>` plus a span covering `@`
through the closing paren, with `AttrArg` mirroring the §16 grammar
(`attr_arg = IDENT [ ":" expr | "=" expr | "(" expr ")" ] | expr`).
Only the `IDENT ":"` form needs lookahead; the `=` and call forms are
valid expressions, so they parse as expressions and canonicalize to the
most specific attr shape afterwards.

Actor handlers become `Handler { attrs, function }`, since a handler is
a `fn_def` and §16 puts attrs on `fn_def` itself. The now-unused
`skip_balanced_after_open` helper is removed.
The crates/AGENTS.md rule is "add corpus tests for every grammar shape
accepted", but extern blocks, onchain modules, use trees, casts, the
literal zoo, async/.await, `?` outside pipes, attribute arguments, and
struct-literal shapes had no fixtures. Six new fixtures close those
gaps.

The harness now discovers `tests/corpus/*.sp` instead of maintaining a
hard-coded list, so a new fixture cannot be silently skipped; an
is-empty guard catches a moved or emptied corpus directory.
@StreamDemon StreamDemon marked this pull request as ready for review July 2, 2026 11:47
@StreamDemon StreamDemon merged commit 292c519 into main Jul 2, 2026
3 checks passed
@StreamDemon StreamDemon deleted the feature/parser-enhancements-base branch July 2, 2026 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant