Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions dev/design/string_encoding_context_plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -286,6 +286,13 @@ die "FAIL" if is_utf8($err);

## Notes

- **Investigation update (2026-05-15):** Running `./jcpan -t Sub::HandlesVia` showed an immediate crash in
Mite’s generated `*.mite.pm`: `HAS_BUILDARGS` was polluted with the string `HAS_FOREIGNBUILDARGS`,
falsely enabling the `BUILDARGS` branch. Root cause was **`UNIVERSAL::can()` returning an empty list**
instead of `(undef)`, which destroys hash literals at compile time. Fixed in `Universal.java`.
A follow-up concat / typed-string-constructor refactor (see Phase 1–2 below) briefly landed and
was **reverted** after `perl5_t` regressions (`op/sub.t`, `porting/filenames.t`,
`re/pat_advanced.t`); redo with targeted tests before merging. The **`UNIVERSAL::can`** fix is kept.
- This fix addresses the root cause rather than applying post-corruption repair
- The eval-time repair in RuntimeRegex can remain as a safety net
- This aligns PerlOnJava with Perl 5's encoding context semantics
Expand Down
12 changes: 12 additions & 0 deletions dev/design/utf8_flag_parity.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,18 @@ strings) never upgrade the result to UTF-8.
- Previously, if neither was BYTE_STRING (e.g. INTEGER + BYTE_STRING), it fell
through to the default STRING return

### 2b. `UNIVERSAL::can()` failures must return `(undef)`

**File:** `src/main/java/org/perlonjava/runtime/perlmodule/Universal.java`

Perl returns **one** undefined value (`(undef)` in list context). PerlOnJava used an **empty**
`RuntimeList`, which behaves like Perl’s truly empty list: in `%h = (...)`/`{ … }`
constructors it eats the next pairing and corrupts literals. Downstream (**Mite** `__META__` in
`*.mite.pm`; **Sub::HandlesVia::CodeGenerator**) saw `HAS_BUILDARGS` swallow the `'HAS_FOREIGNBUILDARGS'`
key as its bogus string value and incorrectly took the `BUILDARGS` constructor branch.

Failures now use `scalarUndef.getList()` (singleton undef).

### 3. `sprintf` — SprintfOperator.sprintfInternal()

**File:** `src/main/java/org/perlonjava/runtime/operators/SprintfOperator.java`
Expand Down
1 change: 1 addition & 0 deletions dev/modules/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ This directory contains design documents and guides related to porting CPAN modu
| [storable_binary_format.md](storable_binary_format.md) | Storable native Perl binary format — read + write paths landed; jperl ↔ system-perl files interoperate in both directions |
| [unicode_collate.md](unicode_collate.md) | Unicode::Collate — plan: file-backed DUCET + Java XS surface (default); optional ICU path tradeoffs |
| [ppi.md](ppi.md) | **PPI** — CPAN test status, RC1–RC4, refcount/`DESTROY` follow-ups (`t/04_element.t` `%_PARENT`) |
| [sub_handlesvia_support.md](sub_handlesvia_support.md) | **Sub::HandlesVia** — Mite/can/hash fix landed; **`eval_closure` UTF‑8 (\x{c2})** trace plan |

## Module Status Overview

Expand Down
31 changes: 30 additions & 1 deletion dev/modules/ppi.md
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ file before closing the phase.

## Progress Tracking

### Current Status: Phase 1 done; Phase 2 blocked on cross-cutting refcount work
### Current Status: Phase 1 done; Phase 1b (hash-slot refcount) landed; Phase 2 / `04_element` leak still open

### Completed Phases
- [x] **Phase 1** (2026-04-20): Fix `Package::->method()` bareword trailing-`::` stripping (RC1).
Expand All @@ -227,6 +227,18 @@ file before closing the phase.
- Verified: `t/ppi_token_whitespace.t` now passes all 6 subtests.
- `make` (full unit tests) passes.

- [x] **Phase 1b** (2026-05-15): Additional refcount parity for **hash slot** assignment
(`RuntimeScalar.setLargeRefCounted`).
- **Problem**: `RuntimeScalar.setLargeRefCounted` had a fast path that treated any pair of
`refCount==-1` referents as “fully untracked” and skipped refcount bookkeeping. That
diverged from `incrementRefCountForContainerStore` (used by `push` and anon hash finalize),
so some container stores could miss `deferDestroyForContainerClear` on `%hash=()`.
- **Change**: Bypass that fast path for `RuntimeHash`/`RuntimeArray` at `refCount<0`,
promote `-1→0`, then increment like container-store.
- **PPI `t/04_element.t` tests 202/206**: still fail (~2134 `%_PARENT` keys remain after
`PPI::Document->DESTROY` on large `PPI/Node.pm` parse); dominant cause still selective
refcount / teardown, not this hash-slot path alone.

### Next Steps

1. **Phase 2 is blocked** on a broader refcount-parity investigation. The root
Expand Down Expand Up @@ -294,6 +306,23 @@ equivalent cleanup). Other failures in `./jcpan -t PPI` (e.g. `t/07_token.t`,

These improve destructor reliability but **do not** yet pass tests 202/206 alone.

### Future: collision-safe `refaddr` (`perlObjectId`)

- Today `Scalar::Util::refaddr` on PerlOnJava uses `System.identityHashCode` on the referent,
which is **not** a unique id (collisions are possible; key space is 32-bit).
- PPI indexes `%PPI::Singletons::_PARENT` by `refaddr($child)`; in theory collisions could merge
unrelated children and confuse parent links (low probability at moderate tree sizes).
- **Idea**: assign each `RuntimeBase` a monotonic `long` at construction (`AtomicLong`),
return it from `refaddr` for `HASHREFERENCE` / blessed objects / etc.
- **Cost**: ~8 bytes per `RuntimeBase` instance (including every `RuntimeScalar`) plus an atomic
increment on every allocation; see earlier overhead discussion.
- **Experiment** (2026-05): a prototype did **not** fix `04_element` 202/206; the dominant issue
is refcount/teardown, not key collision. Keep this as an **optional** hardening step after
`04_element` is understood.
- **Tests**: `src/test/resources/unit/*.t` must remain runnable with **stock `perl`**
(`prove`, one-liner checks). Do not add jperl-only assertions about numeric `refaddr` layout
unless guarded (e.g. `plan skip_all` when `$^X` is not jperl).

### Next investigation steps (ordered)

1. **Reproduce at scale**
Expand Down
124 changes: 124 additions & 0 deletions dev/modules/sub_handlesvia_support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Sub::HandlesVia Support for PerlOnJava

## Overview

[Sub::HandlesVia](https://metacpan.org/pod/Sub::HandlesVia) generates delegation methods (“handles”) for Moo/Moose/Object::Pad/toolkit classes. Runtime work uses **generated Perl** compiled via **`Eval::TypeTiny::eval_closure`**; build-time codegen uses **Mite** (`*.mite.pm`) and **`Sub::HandlesVia::CodeGenerator`**.

PerlOnJava must treat **SvUTF8 / BYTE_STRING parity** consistently in string concatenation *and* return **Perl-correct lists** from core helpers such as **`UNIVERSAL::can`**, otherwise Mite constructors and delegated subs break in non-obvious ways.

---

## Completed (upstream-style fixes landed in core)

These changes address blockers traced while running `./jcpan -t Sub::HandlesVia`:

| Area | Problem | Fix |
|------|---------|-----|
| **`UNIVERSAL::can`** | Missing methods returned an **empty** `RuntimeList`, which behaves like Perl’s **empty list** inside hash literals. That **consumes** the next `=>` pairing and corrupted Mite **`__META__`** (`HAS_BUILDARGS` falsely truthy → bogus **`BUILDARGS`** branch). | Failure paths now return **`scalarUndef.getList()`** — one list element (**undef**) like Perl `(undef)`. `Universal.java`. |
| **String concat SvUTF8** *(deferred)* | A typed-concat experiment caused **`perl5_t`** regressions (`op/sub.t`, `porting/filenames.t`, `re/pat_advanced.t`); it was **reverted** from the PR trajectory serving Sub::HandlesVia. Redo against smaller, **`perl5_t`-backed** steps ([`dev/design/string_encoding_context_plan.md`](../design/string_encoding_context_plan.md)). |

Design cross-links:

- [`dev/design/utf8_flag_parity.md`](../design/utf8_flag_parity.md) — §2b (`can`).
- [`dev/design/string_encoding_context_plan.md`](../design/string_encoding_context_plan.md) — investigation note (2026-05-15).

---

## Current Status (manual smoke)

After the **`can`** fix:

- **`Sub::HandlesVia::CodeGenerator->__META__`** has four keys; **`HAS_BUILDARGS`** exists with **`undef`** (Perl-correct falsy gate).
- **`t/02moo.t`** progresses further but **still fails** when **`Eval::TypeTiny::eval_closure`** compiles generated source (`Unrecognized character \x{c2}` with a `#line` pointing at **`Eval/TypeTiny.pm`** — the synthesized filename/line prefix, not the host file’s UTF-8 problem).

Automated `./jcpan -t Sub::HandlesVia` was previously **timed out at 600s** in CI-style runs; rerun with **`timeout 3600`** after core fixes stabilize.

---

## Next Steps (prioritized)

### 1. [P0] Fix UTF-8 / lead-byte breakage in delegated eval (`\x{c2}`)

**Symptom:**

```text
Failed to compile source because: Unrecognized character \x{c2}; at .../Eval/TypeTiny.pm line 8 ...
at .../Sub/HandlesVia/CodeGenerator.pm line 345 (Eval::TypeTiny::eval_closure)
```

**Goals:**

1. Capture the **exact `%ec_args`** string passed into **`eval_closure`** for a failing case (minimal Moo delegation in `t/02moo.t`), e.g. temporary logging in **`CodeGenerator.pm`** (`generate_coderef_for_handler`) guarded by **`$ENV{SUB_HANDLESVIA_DEBUG_EC}`**.
2. Binary-diff that string (`unpack "H*", $src`) vs system Perl — locate the first stray **`0xc2`** (UTF-8 lead byte) treated as Latin-1.
3. Classify origin:
- **Runtime string typing** remaining in codegen (`"."`/`join`/quoting/formatters elsewhere), or
- **PerlOnJava lexer/compiler** rejecting valid UTF-8 in **`eval`** strings (narrow vs wide rules), or
- **Copy from file** paths reading `.pm` with wrong Perl layer assumptions.
4. Fix at the appropriate layer (prefer **prevent** mis-typing; **`RuntimeRegex.repairLatin1EncodedUtf8IfCorrupted`** is only a fallback per design notes).

**Success:** `timeout 900 ./jperl .../blib/lib .../Sub-HandlesVia-*/t/02moo.t` completes with TAP **ok**.

### 2. [P1] Full CPAN harness run

```bash
timeout 3600 ./jcpan -t Sub::HandlesVia > /tmp/jcpan_Sub_HandlesVia.txt 2>&1
```

Catalog skips (optional deps **MooX::TypeTiny**, **Mouse**, etc.) vs real failures.

### 3. [P2] Concat / SvUTF8 parity redo (staging)

Retry [`dev/design/string_encoding_context_plan.md`](../design/string_encoding_context_plan.md) **Phase 2** (`StringOperators.stringConcat*`) **only after** guarding with:

```bash
cd perl5_t/t
timeout 300 ../../jperl op/sub.t
timeout 180 ../../jperl porting/filenames.t
timeout 600 ../../jperl re/pat_advanced.t # noisy; grep ^not ok
```

Establish **baseline counts** vs **`origin/master`** on the **same harness** (`perl_test_runner.pl` shards if that is CI). A naive `RuntimeScalar(text, BYTE_STRING)` swap for the ISO-8859-1 `byte[]` path surfaced **opaque** regressions in **regex/porting/stack** slices — redo incrementally under its own tiny PR once bisected.

### 4. [P3] Regression tests in-repo (coordination needed)

PerlOnJava policy: **never delete or weaken existing tests**; adding **new** unit tests requires maintainer alignment. Candidate areas:

- **`UNIVERSAL::can`** in **hash constructor** contexts: `%h = (... unknown package ...->can(...) ...)` pairing integrity.
- **Concat parity**: **`no utf8` / `use utf8`** literals **`Encode::is_utf8`** expectations (see **`dev/design/string_encoding_context_plan.md`** verification section).

### 5. [P4] Optional XS

Upstream ships **`Sub::HandlesVia::XS`** (skipped when absent). No action unless performance work demands it — pure Perl path is canonical for portability.

---

## Dependencies (mental model)

| Module | Role |
|--------|------|
| **Type::Tiny** / **Exporter::Tiny** | Types and coercion surfaces for handlers |
| **Eval::TypeTiny** | **`eval_closure`** — compiles delegated method bodies |
| **Mite / Sub::HandlesVia::Mite** | Constructor / attribute sugar; **`__META__`** uses **`can('BUILDARGS')`** |
| **Moo** | Primary toolkit exercised in **`t/02moo*.t`** |
| **Moose / Mouse / Corinna** | Separate test dirs; skip if stacks incomplete |

Issues in **Eval::TypeTiny** often surface as **compile errors inside generated strings** rather than `.pm` syntax errors — treat reports as **`$src` forensic** first.

---

## Related docs

| Document | Topic |
|----------|--------|
| [type_tiny.md](type_tiny.md) | Type::Tiny quirks on PerlOnJava |
| [moo_support.md](moo_support.md) | Moo stack status |
| [moose_support.md](moose_support.md) | Moose prerequisites |

---

## Progress log

| Date | Milestone |
|------|-----------|
| 2026-05-15 | **`UNIVERSAL::can`** empty-list/hash corruption fixed; **`__META__`** validated; `\x{c2}` eval blocker documented as next P0 |
| 2026-05-15 | Typed-concat / `RuntimeScalar(String,int)` experiment **reverted** after `perl5_t` regressions; **`can`** fix retained |
70 changes: 70 additions & 0 deletions src/main/java/org/perlonjava/backend/jvm/EmitLiteral.java
Original file line number Diff line number Diff line change
Expand Up @@ -328,6 +328,76 @@ public static void emitString(EmitterContext ctx, StringNode node) {
}
}

/**
* Emits a string operand for {@code \\} (ref-to-literal) using the string cache's
* <strong>singleton</strong> scalars ({@link RuntimeScalarCache#getScalarByteString(int)} /
* {@link RuntimeScalarCache#getScalarString(int)}) for cacheable literals, and records
* them on {@link JavaClassInfo#addPadConstant}.
*
* <p>Ordinary {@link #emitString} uses {@link RuntimeScalarCache#materializeByteStringLiteral(int)}
* / {@link RuntimeScalarCache#materializeStringLiteral(int)} so each literal <em>occurrence</em>
* has a distinct SV (needed for {@code pos()} / {@code \\G}). Ref-to-literal in Perl shares one
* const SV in the CV; weak refs (e.g. Moo {@code weak_ref}) and {@link RuntimeCode#clearPadConstantWeakRefs}
* must target that same object identity.</p>
*/
public static void emitStringForRefToLiteral(EmitterContext ctx, StringNode node) {
if (ctx.contextType == RuntimeContextType.VOID) {
return;
}
MethodVisitor mv = ctx.mv;
if (!ctx.isBoxed) {
emitStringValue(mv, node.value);
return;
}
if (node.isVString) {
// No cache slot; same as plain literal (optree reaping may not match Perl for \\v).
emitString(ctx, node);
return;
}

if (node.forceByteString
|| (!ctx.symbolTable.isStrictOptionEnabled(HINT_UTF8) && !ctx.compilerOptions.isUnicodeSource)) {
boolean hasWideChars = false;
for (int i = 0; i < node.value.length(); i++) {
if (node.value.charAt(i) > 255) {
hasWideChars = true;
break;
}
}
if (!hasWideChars) {
int stringIndex = RuntimeScalarCache.getOrCreateByteStringIndex(node.value);
if (stringIndex >= 0) {
mv.visitLdcInsn(stringIndex);
mv.visitMethodInsn(
Opcodes.INVOKESTATIC,
"org/perlonjava/runtime/runtimetypes/RuntimeScalarCache",
"getScalarByteString",
"(I)Lorg/perlonjava/runtime/runtimetypes/RuntimeScalar;",
false);
ctx.javaClassInfo.addPadConstant(RuntimeScalarCache.getScalarByteString(stringIndex));
return;
}
emitString(ctx, node);
return;
}
// fall through — wide chars use UTF-8 string path (same as emitString)
}

int stringIndex = RuntimeScalarCache.getOrCreateStringIndex(node.value);
if (stringIndex >= 0) {
mv.visitLdcInsn(stringIndex);
mv.visitMethodInsn(
Opcodes.INVOKESTATIC,
"org/perlonjava/runtime/runtimetypes/RuntimeScalarCache",
"getScalarString",
"(I)Lorg/perlonjava/runtime/runtimetypes/RuntimeScalar;",
false);
ctx.javaClassInfo.addPadConstant(RuntimeScalarCache.getScalarString(stringIndex));
} else {
emitString(ctx, node);
}
}

/**
* Emits a string value, handling large strings by breaking them into chunks.
*/
Expand Down
21 changes: 5 additions & 16 deletions src/main/java/org/perlonjava/backend/jvm/EmitOperator.java
Original file line number Diff line number Diff line change
Expand Up @@ -1773,23 +1773,12 @@ static void handleCreateReference(EmitterVisitor emitterVisitor, OperatorNode no
contextType = RuntimeContextType.SCALAR;
}

node.operand.accept(emitterVisitor.with(contextType));

// Track cached string constants referenced via backslash for optree reaping.
// When a subroutine is replaced (e.g., *foo = sub {}), weak refs to these
// constants need to be cleared.
if (node.operand instanceof StringNode strNode) {
int idx = RuntimeScalarCache.lookupByteStringIndex(strNode.value);
if (idx >= 0) {
emitterVisitor.ctx.javaClassInfo.addPadConstant(
RuntimeScalarCache.getScalarByteString(idx));
} else {
idx = RuntimeScalarCache.lookupStringIndex(strNode.value);
if (idx >= 0) {
emitterVisitor.ctx.javaClassInfo.addPadConstant(
RuntimeScalarCache.getScalarString(idx));
}
}
// Singleton cache scalars + padConstants; see EmitLiteral.emitStringForRefToLiteral.
EmitLiteral.emitStringForRefToLiteral(
emitterVisitor.with(contextType).ctx, strNode);
} else {
node.operand.accept(emitterVisitor.with(contextType));
}

// Always create a proper reference - don't special case CODE references
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ public static RuntimeList can(RuntimeArray args, int ctx) {
if (method != null && !isAutoloadDispatch(method, actualMethod, perlClassName)) {
return method.getList();
}
return new RuntimeList();
return scalarUndef.getList();
}

// Handle Package::SUPER::method syntax
Expand All @@ -168,7 +168,7 @@ public static RuntimeList can(RuntimeArray args, int ctx) {
if (method != null && !isAutoloadDispatch(method, actualMethod, packageName)) {
return method.getList();
}
return new RuntimeList();
return scalarUndef.getList();
}

// Perl's can() must NOT consider AUTOLOAD - it should only find
Expand Down Expand Up @@ -219,7 +219,7 @@ public static RuntimeList can(RuntimeArray args, int ctx) {
return method.getList();
}
}
return new RuntimeList();
return scalarUndef.getList();
}

/**
Expand Down
Loading
Loading