diff --git a/dev/design/string_encoding_context_plan.md b/dev/design/string_encoding_context_plan.md index 713625d67..d3fde15f8 100644 --- a/dev/design/string_encoding_context_plan.md +++ b/dev/design/string_encoding_context_plan.md @@ -286,6 +286,17 @@ die "FAIL" if is_utf8($err); ## Notes +- **Investigation update (2026-05-15):** Running `./jcpan -t Sub::HandlesVia` showed an immediate crash in + Mite’s generated `*.mite.pm`: `HAS_BUILDARGS` was polluted with the string `HAS_FOREIGNBUILDARGS`, + falsely enabling the `BUILDARGS` branch. Root cause was **`UNIVERSAL::can()` returning an empty list** + in **list contexts** inside flat list/hash construction (the empty list vanishes instead of occupying + a real `undef` slot). Returning a singleton **`(undef)`** on **every** failure path fixes Mite but + breaks **scalar-context** compile probes (`VERSION` / `use` **import** / attribute installers) that + assume **not-found** `can()` is **`size()==0`** in their `Universal.can` result. **`Universal.canNotFound(ctx)`** + now returns **`(undef)` only for `LIST` context** failures and **`()` for scalar-like contexts** (still + **`scalar()` → undef**, matching Perl for plain assignments). + A typed-string concat refactor (Phase 1–2 in this doc) was **reverted** after separate `perl5_t` + regressions; redo with targeted tests before merging. - This fix addresses the root cause rather than applying post-corruption repair - The eval-time repair in RuntimeRegex can remain as a safety net - This aligns PerlOnJava with Perl 5's encoding context semantics diff --git a/dev/design/utf8_flag_parity.md b/dev/design/utf8_flag_parity.md index e8c8cedc1..5aca0b7b7 100644 --- a/dev/design/utf8_flag_parity.md +++ b/dev/design/utf8_flag_parity.md @@ -43,6 +43,18 @@ strings) never upgrade the result to UTF-8. - Previously, if neither was BYTE_STRING (e.g. INTEGER + BYTE_STRING), it fell through to the default STRING return +### 2b. `UNIVERSAL::can()` failures must return `(undef)` + +**File:** `src/main/java/org/perlonjava/runtime/perlmodule/Universal.java` + +Perl returns **one** undefined value (`(undef)` in list context). PerlOnJava used an **empty** +`RuntimeList`, which behaves like Perl’s truly empty list: in `%h = (...)`/`{ … }` +constructors it eats the next pairing and corrupts literals. Downstream (**Mite** `__META__` in +`*.mite.pm`; **Sub::HandlesVia::CodeGenerator**) saw `HAS_BUILDARGS` swallow the `'HAS_FOREIGNBUILDARGS'` +key as its bogus string value and incorrectly took the `BUILDARGS` constructor branch. + +Failures use **`canNotFound(ctx)`**: **list** context returns **`scalarUndef.getList()`** (one `undef`, so outer list splices see a real slot); **scalar / void / lvalue** contexts return an **empty** `RuntimeList` so compile-time call sites that compare **`size() == 0`** vs **`size() == 1 && getBoolean()`** keep working (a singleton `undef` would be `size()==1` with a falsy non-sub value and mis-routes VERSION / import / attribute probes). + ### 3. `sprintf` — SprintfOperator.sprintfInternal() **File:** `src/main/java/org/perlonjava/runtime/operators/SprintfOperator.java` diff --git a/dev/modules/README.md b/dev/modules/README.md index 45a3fb6be..d80a316c1 100644 --- a/dev/modules/README.md +++ b/dev/modules/README.md @@ -21,6 +21,7 @@ This directory contains design documents and guides related to porting CPAN modu | [storable_binary_format.md](storable_binary_format.md) | Storable native Perl binary format — read + write paths landed; jperl ↔ system-perl files interoperate in both directions | | [unicode_collate.md](unicode_collate.md) | Unicode::Collate — plan: file-backed DUCET + Java XS surface (default); optional ICU path tradeoffs | | [ppi.md](ppi.md) | **PPI** — CPAN test status, RC1–RC4, refcount/`DESTROY` follow-ups (`t/04_element.t` `%_PARENT`) | +| [sub_handlesvia_support.md](sub_handlesvia_support.md) | **Sub::HandlesVia** — Mite/can/hash fix landed; **`eval_closure` UTF‑8 (\x{c2})** trace plan | ## Module Status Overview diff --git a/dev/modules/sub_handlesvia_support.md b/dev/modules/sub_handlesvia_support.md new file mode 100644 index 000000000..7d05cf985 --- /dev/null +++ b/dev/modules/sub_handlesvia_support.md @@ -0,0 +1,124 @@ +# Sub::HandlesVia Support for PerlOnJava + +## Overview + +[Sub::HandlesVia](https://metacpan.org/pod/Sub::HandlesVia) generates delegation methods (“handles”) for Moo/Moose/Object::Pad/toolkit classes. Runtime work uses **generated Perl** compiled via **`Eval::TypeTiny::eval_closure`**; build-time codegen uses **Mite** (`*.mite.pm`) and **`Sub::HandlesVia::CodeGenerator`**. + +PerlOnJava must treat **SvUTF8 / BYTE_STRING parity** consistently in string concatenation *and* return **Perl-correct lists** from core helpers such as **`UNIVERSAL::can`**, otherwise Mite constructors and delegated subs break in non-obvious ways. + +--- + +## Completed (upstream-style fixes landed in core) + +These changes address blockers traced while running `./jcpan -t Sub::HandlesVia`: + +| Area | Problem | Fix | +|------|---------|-----| +| **`UNIVERSAL::can`** | Missing methods returned an **empty** `RuntimeList`, which behaves like Perl’s **empty list** inside hash literals. That **consumes** the next `=>` pairing and corrupted Mite **`__META__`** (`HAS_BUILDARGS` falsely truthy → bogus **`BUILDARGS`** branch). Pure singleton-`undef` on **all** failure paths confused **scalar-context** compiler probes (`VERSION`/`import`/attributes) that discriminate with **`size() == 1`**. | Failure paths route through **`Universal.canNotFound(ctx)`**: **LIST** ⇒ one `undef` element; **scalar/void/lvalue** ⇒ empty list (still **`scalar()` → undef**). `Universal.java`. | +| **String concat SvUTF8** *(deferred)* | A typed-concat experiment caused **`perl5_t`** regressions (`op/sub.t`, `porting/filenames.t`, `re/pat_advanced.t`); it was **reverted** from the PR trajectory serving Sub::HandlesVia. Redo against smaller, **`perl5_t`-backed** steps ([`dev/design/string_encoding_context_plan.md`](../design/string_encoding_context_plan.md)). | + +Design cross-links: + +- [`dev/design/utf8_flag_parity.md`](../design/utf8_flag_parity.md) — §2b (`can`). +- [`dev/design/string_encoding_context_plan.md`](../design/string_encoding_context_plan.md) — investigation note (2026-05-15). + +--- + +## Current Status (manual smoke) + +After the **`can`** fix: + +- **`Sub::HandlesVia::CodeGenerator->__META__`** has four keys; **`HAS_BUILDARGS`** exists with **`undef`** (Perl-correct falsy gate). +- **`t/02moo.t`** progresses further but **still fails** when **`Eval::TypeTiny::eval_closure`** compiles generated source (`Unrecognized character \x{c2}` with a `#line` pointing at **`Eval/TypeTiny.pm`** — the synthesized filename/line prefix, not the host file’s UTF-8 problem). + +Automated `./jcpan -t Sub::HandlesVia` was previously **timed out at 600s** in CI-style runs; rerun with **`timeout 3600`** after core fixes stabilize. + +--- + +## Next Steps (prioritized) + +### 1. [P0] Fix UTF-8 / lead-byte breakage in delegated eval (`\x{c2}`) + +**Symptom:** + +```text +Failed to compile source because: Unrecognized character \x{c2}; at .../Eval/TypeTiny.pm line 8 ... + at .../Sub/HandlesVia/CodeGenerator.pm line 345 (Eval::TypeTiny::eval_closure) +``` + +**Goals:** + +1. Capture the **exact `%ec_args`** string passed into **`eval_closure`** for a failing case (minimal Moo delegation in `t/02moo.t`), e.g. temporary logging in **`CodeGenerator.pm`** (`generate_coderef_for_handler`) guarded by **`$ENV{SUB_HANDLESVIA_DEBUG_EC}`**. +2. Binary-diff that string (`unpack "H*", $src`) vs system Perl — locate the first stray **`0xc2`** (UTF-8 lead byte) treated as Latin-1. +3. Classify origin: + - **Runtime string typing** remaining in codegen (`"."`/`join`/quoting/formatters elsewhere), or + - **PerlOnJava lexer/compiler** rejecting valid UTF-8 in **`eval`** strings (narrow vs wide rules), or + - **Copy from file** paths reading `.pm` with wrong Perl layer assumptions. +4. Fix at the appropriate layer (prefer **prevent** mis-typing; **`RuntimeRegex.repairLatin1EncodedUtf8IfCorrupted`** is only a fallback per design notes). + +**Success:** `timeout 900 ./jperl .../blib/lib .../Sub-HandlesVia-*/t/02moo.t` completes with TAP **ok**. + +### 2. [P1] Full CPAN harness run + +```bash +timeout 3600 ./jcpan -t Sub::HandlesVia > /tmp/jcpan_Sub_HandlesVia.txt 2>&1 +``` + +Catalog skips (optional deps **MooX::TypeTiny**, **Mouse**, etc.) vs real failures. + +### 3. [P2] Concat / SvUTF8 parity redo (staging) + +Retry [`dev/design/string_encoding_context_plan.md`](../design/string_encoding_context_plan.md) **Phase 2** (`StringOperators.stringConcat*`) **only after** guarding with: + +```bash +cd perl5_t/t +timeout 300 ../../jperl op/sub.t +timeout 180 ../../jperl porting/filenames.t +timeout 600 ../../jperl re/pat_advanced.t # noisy; grep ^not ok +``` + +Establish **baseline counts** vs **`origin/master`** on the **same harness** (`perl_test_runner.pl` shards if that is CI). A naive `RuntimeScalar(text, BYTE_STRING)` swap for the ISO-8859-1 `byte[]` path surfaced **opaque** regressions in **regex/porting/stack** slices — redo incrementally under its own tiny PR once bisected. + +### 4. [P3] Regression tests in-repo (coordination needed) + +PerlOnJava policy: **never delete or weaken existing tests**; adding **new** unit tests requires maintainer alignment. Candidate areas: + +- **`UNIVERSAL::can`** in **hash constructor** contexts: `%h = (... unknown package ...->can(...) ...)` pairing integrity. +- **Concat parity**: **`no utf8` / `use utf8`** literals **`Encode::is_utf8`** expectations (see **`dev/design/string_encoding_context_plan.md`** verification section). + +### 5. [P4] Optional XS + +Upstream ships **`Sub::HandlesVia::XS`** (skipped when absent). No action unless performance work demands it — pure Perl path is canonical for portability. + +--- + +## Dependencies (mental model) + +| Module | Role | +|--------|------| +| **Type::Tiny** / **Exporter::Tiny** | Types and coercion surfaces for handlers | +| **Eval::TypeTiny** | **`eval_closure`** — compiles delegated method bodies | +| **Mite / Sub::HandlesVia::Mite** | Constructor / attribute sugar; **`__META__`** uses **`can('BUILDARGS')`** | +| **Moo** | Primary toolkit exercised in **`t/02moo*.t`** | +| **Moose / Mouse / Corinna** | Separate test dirs; skip if stacks incomplete | + +Issues in **Eval::TypeTiny** often surface as **compile errors inside generated strings** rather than `.pm` syntax errors — treat reports as **`$src` forensic** first. + +--- + +## Related docs + +| Document | Topic | +|----------|--------| +| [type_tiny.md](type_tiny.md) | Type::Tiny quirks on PerlOnJava | +| [moo_support.md](moo_support.md) | Moo stack status | +| [moose_support.md](moose_support.md) | Moose prerequisites | + +--- + +## Progress log + +| Date | Milestone | +|------|-----------| +| 2026-05-15 | **`UNIVERSAL::can`** empty-list/hash corruption fixed; **`__META__`** validated; `\x{c2}` eval blocker documented as next P0 | +| 2026-05-15 | **`UNIVERSAL::can`** split: **LIST** failures → `(undef)`, **scalar**/compile-time failures → empty list (restores `perl5_t` regressions while fixing Mite splice) | diff --git a/src/main/java/org/perlonjava/runtime/perlmodule/Universal.java b/src/main/java/org/perlonjava/runtime/perlmodule/Universal.java index f1b10205e..6b72194e3 100644 --- a/src/main/java/org/perlonjava/runtime/perlmodule/Universal.java +++ b/src/main/java/org/perlonjava/runtime/perlmodule/Universal.java @@ -89,6 +89,23 @@ public static void initialize() { } } + /** + * Missing-method return for UNIVERSAL::can. + * + *
Perl exposes this as undef. Compile-time lookups pass {@link RuntimeContextType#SCALAR} and + * discriminate with patterns like {@code size() == 1 && getBoolean()}; treating not-found {@code can} + * as a singleton {@code undef} there makes {@code size() == 1} with {@code getBoolean()==false}, which is + * not what those call sites distinguish from “no candidate”. Returning an {@linkplain RuntimeList#isEmpty()} + * list preserves existing logic. + * + *
List-context calls flatten this value into enclosing lists — only there must Perl get one{@code undef} + * placeholder (never a vanishing splice), which generated CPAN constructors rely on (e.g. Mite + * {@code __META__} pairings). + */ + private static RuntimeList canNotFound(int ctx) { + return ctx == RuntimeContextType.LIST ? scalarUndef.getList() : new RuntimeList(); + } + /** * Checks if the object can perform a given method. * Note: This is a Perl method, it expects `this` to be the first argument. @@ -155,7 +172,7 @@ public static RuntimeList can(RuntimeArray args, int ctx) { if (method != null && !isAutoloadDispatch(method, actualMethod, perlClassName)) { return method.getList(); } - return new RuntimeList(); + return canNotFound(ctx); } // Handle Package::SUPER::method syntax @@ -168,7 +185,7 @@ public static RuntimeList can(RuntimeArray args, int ctx) { if (method != null && !isAutoloadDispatch(method, actualMethod, packageName)) { return method.getList(); } - return new RuntimeList(); + return canNotFound(ctx); } // Perl's can() must NOT consider AUTOLOAD - it should only find @@ -219,7 +236,7 @@ public static RuntimeList can(RuntimeArray args, int ctx) { return method.getList(); } } - return new RuntimeList(); + return canNotFound(ctx); } /**