From f4bd29c90781f947e129352548188e91ee90dd88 Mon Sep 17 00:00:00 2001 From: "Flavio S. Glock" Date: Fri, 15 May 2026 16:43:53 +0200 Subject: [PATCH 1/3] fix: restore UNIVERSAL::can parity and tighten string concat SvUTF8 handling Failures from can() returned an empty list, corrupting Mite __META__ hash literals and breaking Sub::HandlesVia constructor dispatch. Returned values now use singleton undef matching Perl list semantics. Add RuntimeScalar(String, int stringType) and route concatenation helpers through explicit STRING vs BYTE_STRING selection. Document Sub::HandlesVia next steps under dev/modules/. Generated with Cursor (https://cursor.com/docs) Co-Authored-By: Cursor Co-authored-by: Cursor --- dev/design/string_encoding_context_plan.md | 6 + dev/design/utf8_flag_parity.md | 21 ++++ dev/modules/README.md | 1 + dev/modules/sub_handlesvia_support.md | 110 ++++++++++++++++++ .../runtime/operators/StringOperators.java | 100 +++++----------- .../runtime/perlmodule/Universal.java | 6 +- .../runtime/runtimetypes/RuntimeScalar.java | 20 ++++ 7 files changed, 191 insertions(+), 73 deletions(-) create mode 100644 dev/modules/sub_handlesvia_support.md diff --git a/dev/design/string_encoding_context_plan.md b/dev/design/string_encoding_context_plan.md index 713625d67..d3a6f5eaf 100644 --- a/dev/design/string_encoding_context_plan.md +++ b/dev/design/string_encoding_context_plan.md @@ -286,6 +286,12 @@ die "FAIL" if is_utf8($err); ## Notes +- **Investigation update (2026-05-15):** Running `./jcpan -t Sub::HandlesVia` showed an immediate crash in + Mite’s generated `*.mite.pm`: `HAS_BUILDARGS` was polluted with the string `HAS_FOREIGNBUILDARGS`, + falsely enabling the `BUILDARGS` branch. Root cause was **`UNIVERSAL::can()` returning an empty list** + instead of `(undef)`, which destroys hash literals at compile time. Fixed in `Universal.java`. + The concatenation constructor / `BYTE_STRING` work below remains correct hardening against the + original UTF‑8 splice issues described earlier in this doc. - This fix addresses the root cause rather than applying post-corruption repair - The eval-time repair in RuntimeRegex can remain as a safety net - This aligns PerlOnJava with Perl 5's encoding context semantics diff --git a/dev/design/utf8_flag_parity.md b/dev/design/utf8_flag_parity.md index e8c8cedc1..2ae1568df 100644 --- a/dev/design/utf8_flag_parity.md +++ b/dev/design/utf8_flag_parity.md @@ -42,6 +42,27 @@ strings) never upgrade the result to UTF-8. - When both operands are non-STRING, produce BYTE_STRING (with Latin-1 safety check) - Previously, if neither was BYTE_STRING (e.g. INTEGER + BYTE_STRING), it fell through to the default STRING return +- Results use `new RuntimeScalar(text, BYTE_STRING)` or `(text, STRING)` instead of bouncing + through temporary `byte[]` for typical paths + +### 2b. `UNIVERSAL::can()` failures must return `(undef)` + +**File:** `src/main/java/org/perlonjava/runtime/perlmodule/Universal.java` + +Perl returns **one** undefined value (`(undef)` in list context). PerlOnJava used an **empty** +`RuntimeList`, which behaves like Perl’s truly empty list: in `%h = (...)`/`{ … }` +constructors it eats the next pairing and corrupts literals. Downstream (**Mite** `__META__` in +`*.mite.pm`; **Sub::HandlesVia::CodeGenerator**) saw `HAS_BUILDARGS` swallow the `'HAS_FOREIGNBUILDARGS'` +key as its bogus string value and incorrectly took the `BUILDARGS` constructor branch. + +Failures now use `scalarUndef.getList()` (singleton undef). + +### 2c. Typed string constructor + +**Files:** `src/main/java/org/perlonjava/runtime/runtimetypes/RuntimeScalar.java` + +- `RuntimeScalar(String value, int stringType)` with `stringType` ∈ {`STRING`, `BYTE_STRING`} for + SvUTF8 parity (used by concatenation). ### 3. `sprintf` — SprintfOperator.sprintfInternal() diff --git a/dev/modules/README.md b/dev/modules/README.md index 45a3fb6be..d80a316c1 100644 --- a/dev/modules/README.md +++ b/dev/modules/README.md @@ -21,6 +21,7 @@ This directory contains design documents and guides related to porting CPAN modu | [storable_binary_format.md](storable_binary_format.md) | Storable native Perl binary format — read + write paths landed; jperl ↔ system-perl files interoperate in both directions | | [unicode_collate.md](unicode_collate.md) | Unicode::Collate — plan: file-backed DUCET + Java XS surface (default); optional ICU path tradeoffs | | [ppi.md](ppi.md) | **PPI** — CPAN test status, RC1–RC4, refcount/`DESTROY` follow-ups (`t/04_element.t` `%_PARENT`) | +| [sub_handlesvia_support.md](sub_handlesvia_support.md) | **Sub::HandlesVia** — Mite/can/hash fix landed; **`eval_closure` UTF‑8 (\x{c2})** trace plan | ## Module Status Overview diff --git a/dev/modules/sub_handlesvia_support.md b/dev/modules/sub_handlesvia_support.md new file mode 100644 index 000000000..cd1f056ae --- /dev/null +++ b/dev/modules/sub_handlesvia_support.md @@ -0,0 +1,110 @@ +# Sub::HandlesVia Support for PerlOnJava + +## Overview + +[Sub::HandlesVia](https://metacpan.org/pod/Sub::HandlesVia) generates delegation methods (“handles”) for Moo/Moose/Object::Pad/toolkit classes. Runtime work uses **generated Perl** compiled via **`Eval::TypeTiny::eval_closure`**; build-time codegen uses **Mite** (`*.mite.pm`) and **`Sub::HandlesVia::CodeGenerator`**. + +PerlOnJava must treat **SvUTF8 / BYTE_STRING parity** consistently in string concatenation *and* return **Perl-correct lists** from core helpers such as **`UNIVERSAL::can`**, otherwise Mite constructors and delegated subs break in non-obvious ways. + +--- + +## Completed (upstream-style fixes landed in core) + +These changes address blockers traced while running `./jcpan -t Sub::HandlesVia`: + +| Area | Problem | Fix | +|------|---------|-----| +| **`UNIVERSAL::can`** | Missing methods returned an **empty** `RuntimeList`, which behaves like Perl’s **empty list** inside hash literals. That **consumes** the next `=>` pairing and corrupted Mite **`__META__`** (`HAS_BUILDARGS` falsely truthy → bogus **`BUILDARGS`** branch). | Failure paths now return **`scalarUndef.getList()`** — one list element (**undef**) like Perl `(undef)`. `Universal.java`. | +| **String concat** | Concat results could lose **BYTE_STRING** context or route through brittle `byte[]` paths for common cases. | **`RuntimeScalar(String, int)`** constructor + **`STRING` vs `BYTE_STRING`** selection from operands / wide-character scan in **`StringOperators`** (`stringConcat`, `stringConcatWarnUninitialized`, `stringConcatNoOverload`). | + +Design cross-links: + +- [`dev/design/utf8_flag_parity.md`](../design/utf8_flag_parity.md) — §2b (can), §2c (typed string ctor), §2 bullet updates. +- [`dev/design/string_encoding_context_plan.md`](../design/string_encoding_context_plan.md) — investigation note (2026-05-15). + +--- + +## Current Status (manual smoke) + +After the **`can`** fix: + +- **`Sub::HandlesVia::CodeGenerator->__META__`** has four keys; **`HAS_BUILDARGS`** exists with **`undef`** (Perl-correct falsy gate). +- **`t/02moo.t`** progresses further but **still fails** when **`Eval::TypeTiny::eval_closure`** compiles generated source (`Unrecognized character \x{c2}` with a `#line` pointing at **`Eval/TypeTiny.pm`** — the synthesized filename/line prefix, not the host file’s UTF-8 problem). + +Automated `./jcpan -t Sub::HandlesVia` was previously **timed out at 600s** in CI-style runs; rerun with **`timeout 3600`** after core fixes stabilize. + +--- + +## Next Steps (prioritized) + +### 1. [P0] Fix UTF-8 / lead-byte breakage in delegated eval (`\x{c2}`) + +**Symptom:** + +```text +Failed to compile source because: Unrecognized character \x{c2}; at .../Eval/TypeTiny.pm line 8 ... + at .../Sub/HandlesVia/CodeGenerator.pm line 345 (Eval::TypeTiny::eval_closure) +``` + +**Goals:** + +1. Capture the **exact `%ec_args`** string passed into **`eval_closure`** for a failing case (minimal Moo delegation in `t/02moo.t`), e.g. temporary logging in **`CodeGenerator.pm`** (`generate_coderef_for_handler`) guarded by **`$ENV{SUB_HANDLESVIA_DEBUG_EC}`**. +2. Binary-diff that string (`unpack "H*", $src`) vs system Perl — locate the first stray **`0xc2`** (UTF-8 lead byte) treated as Latin-1. +3. Classify origin: + - **Runtime string typing** remaining in codegen (`"."`/`join`/quoting/formatters elsewhere), or + - **PerlOnJava lexer/compiler** rejecting valid UTF-8 in **`eval`** strings (narrow vs wide rules), or + - **Copy from file** paths reading `.pm` with wrong Perl layer assumptions. +4. Fix at the appropriate layer (prefer **prevent** mis-typing; **`RuntimeRegex.repairLatin1EncodedUtf8IfCorrupted`** is only a fallback per design notes). + +**Success:** `timeout 900 ./jperl .../blib/lib .../Sub-HandlesVia-*/t/02moo.t` completes with TAP **ok**. + +### 2. [P1] Full CPAN harness run + +```bash +timeout 3600 ./jcpan -t Sub::HandlesVia > /tmp/jcpan_Sub_HandlesVia.txt 2>&1 +``` + +Catalog skips (optional deps **MooX::TypeTiny**, **Mouse**, etc.) vs real failures. + +### 3. [P2] Regression tests in-repo (coordination needed) + +PerlOnJava policy: **never delete or weaken existing tests**; adding **new** unit tests requires maintainer alignment. Candidate areas: + +- **`UNIVERSAL::can`** in **hash constructor** contexts: `%h = (... unknown package ...->can(...) ...)` pairing integrity. +- **Concat parity**: **`no utf8` / `use utf8`** literals **`Encode::is_utf8`** expectations (see **`dev/design/string_encoding_context_plan.md`** verification section). + +### 4. [P3] Optional XS + +Upstream ships **`Sub::HandlesVia::XS`** (skipped when absent). No action unless performance work demands it — pure Perl path is canonical for portability. + +--- + +## Dependencies (mental model) + +| Module | Role | +|--------|------| +| **Type::Tiny** / **Exporter::Tiny** | Types and coercion surfaces for handlers | +| **Eval::TypeTiny** | **`eval_closure`** — compiles delegated method bodies | +| **Mite / Sub::HandlesVia::Mite** | Constructor / attribute sugar; **`__META__`** uses **`can('BUILDARGS')`** | +| **Moo** | Primary toolkit exercised in **`t/02moo*.t`** | +| **Moose / Mouse / Corinna** | Separate test dirs; skip if stacks incomplete | + +Issues in **Eval::TypeTiny** often surface as **compile errors inside generated strings** rather than `.pm` syntax errors — treat reports as **`$src` forensic** first. + +--- + +## Related docs + +| Document | Topic | +|----------|--------| +| [type_tiny.md](type_tiny.md) | Type::Tiny quirks on PerlOnJava | +| [moo_support.md](moo_support.md) | Moo stack status | +| [moose_support.md](moose_support.md) | Moose prerequisites | + +--- + +## Progress log + +| Date | Milestone | +|------|-----------| +| 2026-05-15 | **`UNIVERSAL::can`** empty-list/hash corruption fixed; **`RuntimeScalar(String,int)`** + concat typing; **`__META__`** validated; `\x{c2}` eval blocker documented as next P0 | diff --git a/src/main/java/org/perlonjava/runtime/operators/StringOperators.java b/src/main/java/org/perlonjava/runtime/operators/StringOperators.java index 33a948103..66be7ca51 100644 --- a/src/main/java/org/perlonjava/runtime/operators/StringOperators.java +++ b/src/main/java/org/perlonjava/runtime/operators/StringOperators.java @@ -13,6 +13,7 @@ import static org.perlonjava.runtime.runtimetypes.GlobalVariable.getGlobalVariable; import static org.perlonjava.runtime.runtimetypes.RuntimeScalarCache.getScalarInt; import static org.perlonjava.runtime.runtimetypes.RuntimeScalarType.BYTE_STRING; +import static org.perlonjava.runtime.runtimetypes.RuntimeScalarType.STRING; /** * A utility class that provides various string operations on {@link RuntimeScalar} objects. @@ -438,33 +439,18 @@ public static RuntimeScalar stringConcat(RuntimeScalar runtimeScalar, RuntimeSca boolean aIsUtf8 = runtimeScalar.type == RuntimeScalarType.STRING; boolean bIsUtf8 = b.type == RuntimeScalarType.STRING; + String result = aStr + bStr; + if (aIsUtf8 || bIsUtf8) { - return new RuntimeScalar(aStr + bStr); + return new RuntimeScalar(result, STRING); } - // Neither operand is UTF-8 — produce BYTE_STRING result - // Check if all chars fit in a byte (Latin-1) - boolean safe = true; - for (int i = 0; safe && i < aStr.length(); i++) { - if (aStr.charAt(i) > 255) { - safe = false; - } - } - for (int i = 0; safe && i < bStr.length(); i++) { - if (bStr.charAt(i) > 255) { - safe = false; + for (int i = 0; i < result.length(); i++) { + if (result.charAt(i) > 255) { + return new RuntimeScalar(result, STRING); } } - if (safe) { - byte[] aBytes = aStr.getBytes(StandardCharsets.ISO_8859_1); - byte[] bBytes = bStr.getBytes(StandardCharsets.ISO_8859_1); - byte[] out = new byte[aBytes.length + bBytes.length]; - System.arraycopy(aBytes, 0, out, 0, aBytes.length); - System.arraycopy(bBytes, 0, out, aBytes.length, bBytes.length); - return new RuntimeScalar(out); - } - - return new RuntimeScalar(aStr + bStr); + return new RuntimeScalar(result, BYTE_STRING); } public static RuntimeScalar stringConcatWarnUninitialized(RuntimeScalar runtimeScalar, RuntimeScalar b) { @@ -486,43 +472,30 @@ public static RuntimeScalar stringConcatWarnUninitialized(RuntimeScalar runtimeS String aStr = aResolved.toString(); String bStr = bResolved.toString(); - if (aResolved.type == RuntimeScalarType.STRING || bResolved.type == RuntimeScalarType.STRING) { - return new RuntimeScalar(aStr + bStr); + String concat = aStr + bStr; + + if (aResolved.type == STRING || bResolved.type == STRING) { + return new RuntimeScalar(concat, STRING); } if (aResolved.type == BYTE_STRING || bResolved.type == BYTE_STRING) { boolean aIsByte = aResolved.type == BYTE_STRING || aResolved.type == RuntimeScalarType.UNDEF - || (aStr.isEmpty() && aResolved.type != RuntimeScalarType.STRING); + || (aStr.isEmpty() && aResolved.type != STRING); boolean bIsByte = bResolved.type == BYTE_STRING || bResolved.type == RuntimeScalarType.UNDEF - || (bStr.isEmpty() && bResolved.type != RuntimeScalarType.STRING); + || (bStr.isEmpty() && bResolved.type != STRING); if (aIsByte && bIsByte) { - boolean safe = true; - for (int i = 0; safe && i < aStr.length(); i++) { - if (aStr.charAt(i) > 255) { - safe = false; - break; + for (int i = 0; i < concat.length(); i++) { + if (concat.charAt(i) > 255) { + return new RuntimeScalar(concat, STRING); } } - for (int i = 0; safe && i < bStr.length(); i++) { - if (bStr.charAt(i) > 255) { - safe = false; - break; - } - } - if (safe) { - byte[] aBytes = aStr.getBytes(StandardCharsets.ISO_8859_1); - byte[] bBytes = bStr.getBytes(StandardCharsets.ISO_8859_1); - byte[] out = new byte[aBytes.length + bBytes.length]; - System.arraycopy(aBytes, 0, out, 0, aBytes.length); - System.arraycopy(bBytes, 0, out, aBytes.length, bBytes.length); - return new RuntimeScalar(out); - } + return new RuntimeScalar(concat, BYTE_STRING); } } - return new RuntimeScalar(aStr + bStr); + return new RuntimeScalar(concat); } public static RuntimeScalar chompScalar(RuntimeScalar runtimeScalar) { @@ -836,43 +809,30 @@ public static RuntimeScalar stringConcatNoOverload(RuntimeScalar runtimeScalar, String aStr = runtimeScalar.toStringNoOverload(); String bStr = b.toStringNoOverload(); - if (runtimeScalar.type == RuntimeScalarType.STRING || b.type == RuntimeScalarType.STRING) { - return new RuntimeScalar(aStr + bStr); + String concat = aStr + bStr; + + if (runtimeScalar.type == STRING || b.type == STRING) { + return new RuntimeScalar(concat, STRING); } if (runtimeScalar.type == BYTE_STRING || b.type == BYTE_STRING) { boolean aIsByte = runtimeScalar.type == BYTE_STRING || runtimeScalar.type == RuntimeScalarType.UNDEF - || (aStr.isEmpty() && runtimeScalar.type != RuntimeScalarType.STRING); + || (aStr.isEmpty() && runtimeScalar.type != STRING); boolean bIsByte = b.type == BYTE_STRING || b.type == RuntimeScalarType.UNDEF - || (bStr.isEmpty() && b.type != RuntimeScalarType.STRING); + || (bStr.isEmpty() && b.type != STRING); if (aIsByte && bIsByte) { - boolean safe = true; - for (int i = 0; safe && i < aStr.length(); i++) { - if (aStr.charAt(i) > 255) { - safe = false; - break; + for (int i = 0; i < concat.length(); i++) { + if (concat.charAt(i) > 255) { + return new RuntimeScalar(concat, STRING); } } - for (int i = 0; safe && i < bStr.length(); i++) { - if (bStr.charAt(i) > 255) { - safe = false; - break; - } - } - if (safe) { - byte[] aBytes = aStr.getBytes(StandardCharsets.ISO_8859_1); - byte[] bBytes = bStr.getBytes(StandardCharsets.ISO_8859_1); - byte[] out = new byte[aBytes.length + bBytes.length]; - System.arraycopy(aBytes, 0, out, 0, aBytes.length); - System.arraycopy(bBytes, 0, out, aBytes.length, bBytes.length); - return new RuntimeScalar(out); - } + return new RuntimeScalar(concat, BYTE_STRING); } } - return new RuntimeScalar(aStr + bStr); + return new RuntimeScalar(concat); } /** diff --git a/src/main/java/org/perlonjava/runtime/perlmodule/Universal.java b/src/main/java/org/perlonjava/runtime/perlmodule/Universal.java index f1b10205e..87332388d 100644 --- a/src/main/java/org/perlonjava/runtime/perlmodule/Universal.java +++ b/src/main/java/org/perlonjava/runtime/perlmodule/Universal.java @@ -155,7 +155,7 @@ public static RuntimeList can(RuntimeArray args, int ctx) { if (method != null && !isAutoloadDispatch(method, actualMethod, perlClassName)) { return method.getList(); } - return new RuntimeList(); + return scalarUndef.getList(); } // Handle Package::SUPER::method syntax @@ -168,7 +168,7 @@ public static RuntimeList can(RuntimeArray args, int ctx) { if (method != null && !isAutoloadDispatch(method, actualMethod, packageName)) { return method.getList(); } - return new RuntimeList(); + return scalarUndef.getList(); } // Perl's can() must NOT consider AUTOLOAD - it should only find @@ -219,7 +219,7 @@ public static RuntimeList can(RuntimeArray args, int ctx) { return method.getList(); } } - return new RuntimeList(); + return scalarUndef.getList(); } /** diff --git a/src/main/java/org/perlonjava/runtime/runtimetypes/RuntimeScalar.java b/src/main/java/org/perlonjava/runtime/runtimetypes/RuntimeScalar.java index 6e8afe1d8..321974e27 100644 --- a/src/main/java/org/perlonjava/runtime/runtimetypes/RuntimeScalar.java +++ b/src/main/java/org/perlonjava/runtime/runtimetypes/RuntimeScalar.java @@ -177,6 +177,26 @@ public RuntimeScalar(String value) { this.value = value; } + /** + * String scalar with explicit {@link RuntimeScalarType#STRING} vs + * {@link RuntimeScalarType#BYTE_STRING} (Perl SvUTF8 parity). + * + * @param stringType Must be {@code STRING} or {@code BYTE_STRING}; any other non-null-ish use + * maps to STRING. + */ + public RuntimeScalar(String value, int stringType) { + if (value == null) { + this.type = UNDEF; + this.value = null; + return; + } + this.value = value; + this.type = + stringType == BYTE_STRING + ? BYTE_STRING + : RuntimeScalarType.STRING; + } + public RuntimeScalar(boolean value) { this.type = RuntimeScalarType.BOOLEAN; this.value = value; From 5d90443a7f6cc20f42b4aaa1101377acff8eb876 Mon Sep 17 00:00:00 2001 From: "Flavio S. Glock" Date: Fri, 15 May 2026 17:14:26 +0200 Subject: [PATCH 2/3] fix: revert concat typing experiment; keep UNIVERSAL::can list fix StringOperators stringConcat* and RuntimeScalar(String,int) regressed perl5_t smoke (op/sub.t, porting/filenames.t, re/pat_advanced.t). Restore the prior concat implementation; retain scalarUndef.getList() on failed can() for Mite. Update design + dev/modules docs with the revert and a safer redo checklist. Generated with Cursor (https://cursor.com/docs) Co-Authored-By: Cursor Co-authored-by: Cursor --- dev/design/string_encoding_context_plan.md | 5 +- dev/design/utf8_flag_parity.md | 9 -- dev/modules/sub_handlesvia_support.md | 24 ++++- .../runtime/operators/StringOperators.java | 100 ++++++++++++------ .../runtime/runtimetypes/RuntimeScalar.java | 20 ---- 5 files changed, 92 insertions(+), 66 deletions(-) diff --git a/dev/design/string_encoding_context_plan.md b/dev/design/string_encoding_context_plan.md index d3a6f5eaf..a1368f30f 100644 --- a/dev/design/string_encoding_context_plan.md +++ b/dev/design/string_encoding_context_plan.md @@ -290,8 +290,9 @@ die "FAIL" if is_utf8($err); Mite’s generated `*.mite.pm`: `HAS_BUILDARGS` was polluted with the string `HAS_FOREIGNBUILDARGS`, falsely enabling the `BUILDARGS` branch. Root cause was **`UNIVERSAL::can()` returning an empty list** instead of `(undef)`, which destroys hash literals at compile time. Fixed in `Universal.java`. - The concatenation constructor / `BYTE_STRING` work below remains correct hardening against the - original UTF‑8 splice issues described earlier in this doc. + A follow-up concat / typed-string-constructor refactor (see Phase 1–2 below) briefly landed and + was **reverted** after `perl5_t` regressions (`op/sub.t`, `porting/filenames.t`, + `re/pat_advanced.t`); redo with targeted tests before merging. The **`UNIVERSAL::can`** fix is kept. - This fix addresses the root cause rather than applying post-corruption repair - The eval-time repair in RuntimeRegex can remain as a safety net - This aligns PerlOnJava with Perl 5's encoding context semantics diff --git a/dev/design/utf8_flag_parity.md b/dev/design/utf8_flag_parity.md index 2ae1568df..38741d106 100644 --- a/dev/design/utf8_flag_parity.md +++ b/dev/design/utf8_flag_parity.md @@ -42,8 +42,6 @@ strings) never upgrade the result to UTF-8. - When both operands are non-STRING, produce BYTE_STRING (with Latin-1 safety check) - Previously, if neither was BYTE_STRING (e.g. INTEGER + BYTE_STRING), it fell through to the default STRING return -- Results use `new RuntimeScalar(text, BYTE_STRING)` or `(text, STRING)` instead of bouncing - through temporary `byte[]` for typical paths ### 2b. `UNIVERSAL::can()` failures must return `(undef)` @@ -57,13 +55,6 @@ key as its bogus string value and incorrectly took the `BUILDARGS` constructor b Failures now use `scalarUndef.getList()` (singleton undef). -### 2c. Typed string constructor - -**Files:** `src/main/java/org/perlonjava/runtime/runtimetypes/RuntimeScalar.java` - -- `RuntimeScalar(String value, int stringType)` with `stringType` ∈ {`STRING`, `BYTE_STRING`} for - SvUTF8 parity (used by concatenation). - ### 3. `sprintf` — SprintfOperator.sprintfInternal() **File:** `src/main/java/org/perlonjava/runtime/operators/SprintfOperator.java` diff --git a/dev/modules/sub_handlesvia_support.md b/dev/modules/sub_handlesvia_support.md index cd1f056ae..46f32b820 100644 --- a/dev/modules/sub_handlesvia_support.md +++ b/dev/modules/sub_handlesvia_support.md @@ -15,11 +15,11 @@ These changes address blockers traced while running `./jcpan -t Sub::HandlesVia` | Area | Problem | Fix | |------|---------|-----| | **`UNIVERSAL::can`** | Missing methods returned an **empty** `RuntimeList`, which behaves like Perl’s **empty list** inside hash literals. That **consumes** the next `=>` pairing and corrupted Mite **`__META__`** (`HAS_BUILDARGS` falsely truthy → bogus **`BUILDARGS`** branch). | Failure paths now return **`scalarUndef.getList()`** — one list element (**undef**) like Perl `(undef)`. `Universal.java`. | -| **String concat** | Concat results could lose **BYTE_STRING** context or route through brittle `byte[]` paths for common cases. | **`RuntimeScalar(String, int)`** constructor + **`STRING` vs `BYTE_STRING`** selection from operands / wide-character scan in **`StringOperators`** (`stringConcat`, `stringConcatWarnUninitialized`, `stringConcatNoOverload`). | +| **String concat SvUTF8** *(deferred)* | A typed-concat experiment caused **`perl5_t`** regressions (`op/sub.t`, `porting/filenames.t`, `re/pat_advanced.t`); it was **reverted** from the PR trajectory serving Sub::HandlesVia. Redo against smaller, **`perl5_t`-backed** steps ([`dev/design/string_encoding_context_plan.md`](../design/string_encoding_context_plan.md)). | Design cross-links: -- [`dev/design/utf8_flag_parity.md`](../design/utf8_flag_parity.md) — §2b (can), §2c (typed string ctor), §2 bullet updates. +- [`dev/design/utf8_flag_parity.md`](../design/utf8_flag_parity.md) — §2b (`can`). - [`dev/design/string_encoding_context_plan.md`](../design/string_encoding_context_plan.md) — investigation note (2026-05-15). --- @@ -66,14 +66,27 @@ timeout 3600 ./jcpan -t Sub::HandlesVia > /tmp/jcpan_Sub_HandlesVia.txt 2>&1 Catalog skips (optional deps **MooX::TypeTiny**, **Mouse**, etc.) vs real failures. -### 3. [P2] Regression tests in-repo (coordination needed) +### 3. [P2] Concat / SvUTF8 parity redo (staging) + +Retry [`dev/design/string_encoding_context_plan.md`](../design/string_encoding_context_plan.md) **Phase 2** (`StringOperators.stringConcat*`) **only after** guarding with: + +```bash +cd perl5_t/t +timeout 300 ../../jperl op/sub.t +timeout 180 ../../jperl porting/filenames.t +timeout 600 ../../jperl re/pat_advanced.t # noisy; grep ^not ok +``` + +Establish **baseline counts** vs **`origin/master`** on the **same harness** (`perl_test_runner.pl` shards if that is CI). A naive `RuntimeScalar(text, BYTE_STRING)` swap for the ISO-8859-1 `byte[]` path surfaced **opaque** regressions in **regex/porting/stack** slices — redo incrementally under its own tiny PR once bisected. + +### 4. [P3] Regression tests in-repo (coordination needed) PerlOnJava policy: **never delete or weaken existing tests**; adding **new** unit tests requires maintainer alignment. Candidate areas: - **`UNIVERSAL::can`** in **hash constructor** contexts: `%h = (... unknown package ...->can(...) ...)` pairing integrity. - **Concat parity**: **`no utf8` / `use utf8`** literals **`Encode::is_utf8`** expectations (see **`dev/design/string_encoding_context_plan.md`** verification section). -### 4. [P3] Optional XS +### 5. [P4] Optional XS Upstream ships **`Sub::HandlesVia::XS`** (skipped when absent). No action unless performance work demands it — pure Perl path is canonical for portability. @@ -107,4 +120,5 @@ Issues in **Eval::TypeTiny** often surface as **compile errors inside generated | Date | Milestone | |------|-----------| -| 2026-05-15 | **`UNIVERSAL::can`** empty-list/hash corruption fixed; **`RuntimeScalar(String,int)`** + concat typing; **`__META__`** validated; `\x{c2}` eval blocker documented as next P0 | +| 2026-05-15 | **`UNIVERSAL::can`** empty-list/hash corruption fixed; **`__META__`** validated; `\x{c2}` eval blocker documented as next P0 | +| 2026-05-15 | Typed-concat / `RuntimeScalar(String,int)` experiment **reverted** after `perl5_t` regressions; **`can`** fix retained | diff --git a/src/main/java/org/perlonjava/runtime/operators/StringOperators.java b/src/main/java/org/perlonjava/runtime/operators/StringOperators.java index 66be7ca51..33a948103 100644 --- a/src/main/java/org/perlonjava/runtime/operators/StringOperators.java +++ b/src/main/java/org/perlonjava/runtime/operators/StringOperators.java @@ -13,7 +13,6 @@ import static org.perlonjava.runtime.runtimetypes.GlobalVariable.getGlobalVariable; import static org.perlonjava.runtime.runtimetypes.RuntimeScalarCache.getScalarInt; import static org.perlonjava.runtime.runtimetypes.RuntimeScalarType.BYTE_STRING; -import static org.perlonjava.runtime.runtimetypes.RuntimeScalarType.STRING; /** * A utility class that provides various string operations on {@link RuntimeScalar} objects. @@ -439,18 +438,33 @@ public static RuntimeScalar stringConcat(RuntimeScalar runtimeScalar, RuntimeSca boolean aIsUtf8 = runtimeScalar.type == RuntimeScalarType.STRING; boolean bIsUtf8 = b.type == RuntimeScalarType.STRING; - String result = aStr + bStr; - if (aIsUtf8 || bIsUtf8) { - return new RuntimeScalar(result, STRING); + return new RuntimeScalar(aStr + bStr); } - for (int i = 0; i < result.length(); i++) { - if (result.charAt(i) > 255) { - return new RuntimeScalar(result, STRING); + // Neither operand is UTF-8 — produce BYTE_STRING result + // Check if all chars fit in a byte (Latin-1) + boolean safe = true; + for (int i = 0; safe && i < aStr.length(); i++) { + if (aStr.charAt(i) > 255) { + safe = false; + } + } + for (int i = 0; safe && i < bStr.length(); i++) { + if (bStr.charAt(i) > 255) { + safe = false; } } - return new RuntimeScalar(result, BYTE_STRING); + if (safe) { + byte[] aBytes = aStr.getBytes(StandardCharsets.ISO_8859_1); + byte[] bBytes = bStr.getBytes(StandardCharsets.ISO_8859_1); + byte[] out = new byte[aBytes.length + bBytes.length]; + System.arraycopy(aBytes, 0, out, 0, aBytes.length); + System.arraycopy(bBytes, 0, out, aBytes.length, bBytes.length); + return new RuntimeScalar(out); + } + + return new RuntimeScalar(aStr + bStr); } public static RuntimeScalar stringConcatWarnUninitialized(RuntimeScalar runtimeScalar, RuntimeScalar b) { @@ -472,30 +486,43 @@ public static RuntimeScalar stringConcatWarnUninitialized(RuntimeScalar runtimeS String aStr = aResolved.toString(); String bStr = bResolved.toString(); - String concat = aStr + bStr; - - if (aResolved.type == STRING || bResolved.type == STRING) { - return new RuntimeScalar(concat, STRING); + if (aResolved.type == RuntimeScalarType.STRING || bResolved.type == RuntimeScalarType.STRING) { + return new RuntimeScalar(aStr + bStr); } if (aResolved.type == BYTE_STRING || bResolved.type == BYTE_STRING) { boolean aIsByte = aResolved.type == BYTE_STRING || aResolved.type == RuntimeScalarType.UNDEF - || (aStr.isEmpty() && aResolved.type != STRING); + || (aStr.isEmpty() && aResolved.type != RuntimeScalarType.STRING); boolean bIsByte = bResolved.type == BYTE_STRING || bResolved.type == RuntimeScalarType.UNDEF - || (bStr.isEmpty() && bResolved.type != STRING); + || (bStr.isEmpty() && bResolved.type != RuntimeScalarType.STRING); if (aIsByte && bIsByte) { - for (int i = 0; i < concat.length(); i++) { - if (concat.charAt(i) > 255) { - return new RuntimeScalar(concat, STRING); + boolean safe = true; + for (int i = 0; safe && i < aStr.length(); i++) { + if (aStr.charAt(i) > 255) { + safe = false; + break; } } - return new RuntimeScalar(concat, BYTE_STRING); + for (int i = 0; safe && i < bStr.length(); i++) { + if (bStr.charAt(i) > 255) { + safe = false; + break; + } + } + if (safe) { + byte[] aBytes = aStr.getBytes(StandardCharsets.ISO_8859_1); + byte[] bBytes = bStr.getBytes(StandardCharsets.ISO_8859_1); + byte[] out = new byte[aBytes.length + bBytes.length]; + System.arraycopy(aBytes, 0, out, 0, aBytes.length); + System.arraycopy(bBytes, 0, out, aBytes.length, bBytes.length); + return new RuntimeScalar(out); + } } } - return new RuntimeScalar(concat); + return new RuntimeScalar(aStr + bStr); } public static RuntimeScalar chompScalar(RuntimeScalar runtimeScalar) { @@ -809,30 +836,43 @@ public static RuntimeScalar stringConcatNoOverload(RuntimeScalar runtimeScalar, String aStr = runtimeScalar.toStringNoOverload(); String bStr = b.toStringNoOverload(); - String concat = aStr + bStr; - - if (runtimeScalar.type == STRING || b.type == STRING) { - return new RuntimeScalar(concat, STRING); + if (runtimeScalar.type == RuntimeScalarType.STRING || b.type == RuntimeScalarType.STRING) { + return new RuntimeScalar(aStr + bStr); } if (runtimeScalar.type == BYTE_STRING || b.type == BYTE_STRING) { boolean aIsByte = runtimeScalar.type == BYTE_STRING || runtimeScalar.type == RuntimeScalarType.UNDEF - || (aStr.isEmpty() && runtimeScalar.type != STRING); + || (aStr.isEmpty() && runtimeScalar.type != RuntimeScalarType.STRING); boolean bIsByte = b.type == BYTE_STRING || b.type == RuntimeScalarType.UNDEF - || (bStr.isEmpty() && b.type != STRING); + || (bStr.isEmpty() && b.type != RuntimeScalarType.STRING); if (aIsByte && bIsByte) { - for (int i = 0; i < concat.length(); i++) { - if (concat.charAt(i) > 255) { - return new RuntimeScalar(concat, STRING); + boolean safe = true; + for (int i = 0; safe && i < aStr.length(); i++) { + if (aStr.charAt(i) > 255) { + safe = false; + break; } } - return new RuntimeScalar(concat, BYTE_STRING); + for (int i = 0; safe && i < bStr.length(); i++) { + if (bStr.charAt(i) > 255) { + safe = false; + break; + } + } + if (safe) { + byte[] aBytes = aStr.getBytes(StandardCharsets.ISO_8859_1); + byte[] bBytes = bStr.getBytes(StandardCharsets.ISO_8859_1); + byte[] out = new byte[aBytes.length + bBytes.length]; + System.arraycopy(aBytes, 0, out, 0, aBytes.length); + System.arraycopy(bBytes, 0, out, aBytes.length, bBytes.length); + return new RuntimeScalar(out); + } } } - return new RuntimeScalar(concat); + return new RuntimeScalar(aStr + bStr); } /** diff --git a/src/main/java/org/perlonjava/runtime/runtimetypes/RuntimeScalar.java b/src/main/java/org/perlonjava/runtime/runtimetypes/RuntimeScalar.java index 321974e27..6e8afe1d8 100644 --- a/src/main/java/org/perlonjava/runtime/runtimetypes/RuntimeScalar.java +++ b/src/main/java/org/perlonjava/runtime/runtimetypes/RuntimeScalar.java @@ -177,26 +177,6 @@ public RuntimeScalar(String value) { this.value = value; } - /** - * String scalar with explicit {@link RuntimeScalarType#STRING} vs - * {@link RuntimeScalarType#BYTE_STRING} (Perl SvUTF8 parity). - * - * @param stringType Must be {@code STRING} or {@code BYTE_STRING}; any other non-null-ish use - * maps to STRING. - */ - public RuntimeScalar(String value, int stringType) { - if (value == null) { - this.type = UNDEF; - this.value = null; - return; - } - this.value = value; - this.type = - stringType == BYTE_STRING - ? BYTE_STRING - : RuntimeScalarType.STRING; - } - public RuntimeScalar(boolean value) { this.type = RuntimeScalarType.BOOLEAN; this.value = value; From 85e93258162d6def3b7c198ed0554b5e11f5449f Mon Sep 17 00:00:00 2001 From: "Flavio S. Glock" Date: Fri, 15 May 2026 18:45:05 +0200 Subject: [PATCH 3/3] fix: gate UNIVERSAL::can not-found on calling context Returning a singleton undef on every can() miss fixed Mite list/hash pairings but confused compile-time probe sites that expect an empty RuntimeList (VERSION/import/MODIFY_* checks). Use list-context undef only when ctx is LIST; keep an empty list for scalar-like contexts (scalar() still undef). Updated design notes and Sub::HandlesVia tracking doc. Generated with Cursor (https://cursor.com/docs) Co-Authored-By: Cursor Co-authored-by: Cursor --- dev/design/string_encoding_context_plan.md | 12 ++++++---- dev/design/utf8_flag_parity.md | 2 +- dev/modules/sub_handlesvia_support.md | 4 ++-- .../runtime/perlmodule/Universal.java | 23 ++++++++++++++++--- 4 files changed, 31 insertions(+), 10 deletions(-) diff --git a/dev/design/string_encoding_context_plan.md b/dev/design/string_encoding_context_plan.md index a1368f30f..d3fde15f8 100644 --- a/dev/design/string_encoding_context_plan.md +++ b/dev/design/string_encoding_context_plan.md @@ -289,10 +289,14 @@ die "FAIL" if is_utf8($err); - **Investigation update (2026-05-15):** Running `./jcpan -t Sub::HandlesVia` showed an immediate crash in Mite’s generated `*.mite.pm`: `HAS_BUILDARGS` was polluted with the string `HAS_FOREIGNBUILDARGS`, falsely enabling the `BUILDARGS` branch. Root cause was **`UNIVERSAL::can()` returning an empty list** - instead of `(undef)`, which destroys hash literals at compile time. Fixed in `Universal.java`. - A follow-up concat / typed-string-constructor refactor (see Phase 1–2 below) briefly landed and - was **reverted** after `perl5_t` regressions (`op/sub.t`, `porting/filenames.t`, - `re/pat_advanced.t`); redo with targeted tests before merging. The **`UNIVERSAL::can`** fix is kept. + in **list contexts** inside flat list/hash construction (the empty list vanishes instead of occupying + a real `undef` slot). Returning a singleton **`(undef)`** on **every** failure path fixes Mite but + breaks **scalar-context** compile probes (`VERSION` / `use` **import** / attribute installers) that + assume **not-found** `can()` is **`size()==0`** in their `Universal.can` result. **`Universal.canNotFound(ctx)`** + now returns **`(undef)` only for `LIST` context** failures and **`()` for scalar-like contexts** (still + **`scalar()` → undef**, matching Perl for plain assignments). + A typed-string concat refactor (Phase 1–2 in this doc) was **reverted** after separate `perl5_t` + regressions; redo with targeted tests before merging. - This fix addresses the root cause rather than applying post-corruption repair - The eval-time repair in RuntimeRegex can remain as a safety net - This aligns PerlOnJava with Perl 5's encoding context semantics diff --git a/dev/design/utf8_flag_parity.md b/dev/design/utf8_flag_parity.md index 38741d106..5aca0b7b7 100644 --- a/dev/design/utf8_flag_parity.md +++ b/dev/design/utf8_flag_parity.md @@ -53,7 +53,7 @@ constructors it eats the next pairing and corrupts literals. Downstream (**Mite* `*.mite.pm`; **Sub::HandlesVia::CodeGenerator**) saw `HAS_BUILDARGS` swallow the `'HAS_FOREIGNBUILDARGS'` key as its bogus string value and incorrectly took the `BUILDARGS` constructor branch. -Failures now use `scalarUndef.getList()` (singleton undef). +Failures use **`canNotFound(ctx)`**: **list** context returns **`scalarUndef.getList()`** (one `undef`, so outer list splices see a real slot); **scalar / void / lvalue** contexts return an **empty** `RuntimeList` so compile-time call sites that compare **`size() == 0`** vs **`size() == 1 && getBoolean()`** keep working (a singleton `undef` would be `size()==1` with a falsy non-sub value and mis-routes VERSION / import / attribute probes). ### 3. `sprintf` — SprintfOperator.sprintfInternal() diff --git a/dev/modules/sub_handlesvia_support.md b/dev/modules/sub_handlesvia_support.md index 46f32b820..7d05cf985 100644 --- a/dev/modules/sub_handlesvia_support.md +++ b/dev/modules/sub_handlesvia_support.md @@ -14,7 +14,7 @@ These changes address blockers traced while running `./jcpan -t Sub::HandlesVia` | Area | Problem | Fix | |------|---------|-----| -| **`UNIVERSAL::can`** | Missing methods returned an **empty** `RuntimeList`, which behaves like Perl’s **empty list** inside hash literals. That **consumes** the next `=>` pairing and corrupted Mite **`__META__`** (`HAS_BUILDARGS` falsely truthy → bogus **`BUILDARGS`** branch). | Failure paths now return **`scalarUndef.getList()`** — one list element (**undef**) like Perl `(undef)`. `Universal.java`. | +| **`UNIVERSAL::can`** | Missing methods returned an **empty** `RuntimeList`, which behaves like Perl’s **empty list** inside hash literals. That **consumes** the next `=>` pairing and corrupted Mite **`__META__`** (`HAS_BUILDARGS` falsely truthy → bogus **`BUILDARGS`** branch). Pure singleton-`undef` on **all** failure paths confused **scalar-context** compiler probes (`VERSION`/`import`/attributes) that discriminate with **`size() == 1`**. | Failure paths route through **`Universal.canNotFound(ctx)`**: **LIST** ⇒ one `undef` element; **scalar/void/lvalue** ⇒ empty list (still **`scalar()` → undef**). `Universal.java`. | | **String concat SvUTF8** *(deferred)* | A typed-concat experiment caused **`perl5_t`** regressions (`op/sub.t`, `porting/filenames.t`, `re/pat_advanced.t`); it was **reverted** from the PR trajectory serving Sub::HandlesVia. Redo against smaller, **`perl5_t`-backed** steps ([`dev/design/string_encoding_context_plan.md`](../design/string_encoding_context_plan.md)). | Design cross-links: @@ -121,4 +121,4 @@ Issues in **Eval::TypeTiny** often surface as **compile errors inside generated | Date | Milestone | |------|-----------| | 2026-05-15 | **`UNIVERSAL::can`** empty-list/hash corruption fixed; **`__META__`** validated; `\x{c2}` eval blocker documented as next P0 | -| 2026-05-15 | Typed-concat / `RuntimeScalar(String,int)` experiment **reverted** after `perl5_t` regressions; **`can`** fix retained | +| 2026-05-15 | **`UNIVERSAL::can`** split: **LIST** failures → `(undef)`, **scalar**/compile-time failures → empty list (restores `perl5_t` regressions while fixing Mite splice) | diff --git a/src/main/java/org/perlonjava/runtime/perlmodule/Universal.java b/src/main/java/org/perlonjava/runtime/perlmodule/Universal.java index 87332388d..6b72194e3 100644 --- a/src/main/java/org/perlonjava/runtime/perlmodule/Universal.java +++ b/src/main/java/org/perlonjava/runtime/perlmodule/Universal.java @@ -89,6 +89,23 @@ public static void initialize() { } } + /** + * Missing-method return for UNIVERSAL::can. + * + *

Perl exposes this as undef. Compile-time lookups pass {@link RuntimeContextType#SCALAR} and + * discriminate with patterns like {@code size() == 1 && getBoolean()}; treating not-found {@code can} + * as a singleton {@code undef} there makes {@code size() == 1} with {@code getBoolean()==false}, which is + * not what those call sites distinguish from “no candidate”. Returning an {@linkplain RuntimeList#isEmpty()} + * list preserves existing logic. + * + *

List-context calls flatten this value into enclosing lists — only there must Perl get one{@code undef} + * placeholder (never a vanishing splice), which generated CPAN constructors rely on (e.g. Mite + * {@code __META__} pairings). + */ + private static RuntimeList canNotFound(int ctx) { + return ctx == RuntimeContextType.LIST ? scalarUndef.getList() : new RuntimeList(); + } + /** * Checks if the object can perform a given method. * Note: This is a Perl method, it expects `this` to be the first argument. @@ -155,7 +172,7 @@ public static RuntimeList can(RuntimeArray args, int ctx) { if (method != null && !isAutoloadDispatch(method, actualMethod, perlClassName)) { return method.getList(); } - return scalarUndef.getList(); + return canNotFound(ctx); } // Handle Package::SUPER::method syntax @@ -168,7 +185,7 @@ public static RuntimeList can(RuntimeArray args, int ctx) { if (method != null && !isAutoloadDispatch(method, actualMethod, packageName)) { return method.getList(); } - return scalarUndef.getList(); + return canNotFound(ctx); } // Perl's can() must NOT consider AUTOLOAD - it should only find @@ -219,7 +236,7 @@ public static RuntimeList can(RuntimeArray args, int ctx) { return method.getList(); } } - return scalarUndef.getList(); + return canNotFound(ctx); } /**