Skip to content

fix: bundle Unicode::UTF8 (Java XS) for jcpan#748

Merged
fglock merged 1 commit into
masterfrom
feature/unicode-utf8-jcpan
May 15, 2026
Merged

fix: bundle Unicode::UTF8 (Java XS) for jcpan#748
fglock merged 1 commit into
masterfrom
feature/unicode-utf8-jcpan

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented May 15, 2026

Summary

  • Bundles Unicode::UTF8 (CPAN 0.70-compatible) with a Java implementation behind XSLoader, matching the upstream .pm + three XS entry points (decode_utf8, encode_utf8, valid_utf8).
  • Aligns runtime edge cases with the dist tests: Encode::_utf8_on semantics, pack("U", …) for surrogates and large UVs, decode fallback chunk handling, and FATAL utf8 when XS runs in Java (call-site warning bits via WarningBitsRegistry).
  • ./jcpan -t Unicode::UTF8 runs the full 11241-test suite successfully; prerequisite noise from Devel::AssertC99 / Devel::CheckCompiler may still appear when CPAN tries to build real XS.

Test plan

  • make (unit tests)
  • timeout 600 ./jcpan -t Unicode::UTF8 → Unicode-UTF8 Result: PASS / “All tests successful”
  • Spot checks: t/060_surrogates.t, t/080_super.t, t/090_non_shortest_form.t, t/120_fallback.t, t/200_leaks.t under jperl -Ilib -It

- Add Java XS module UnicodeUTF8 (DFA align with Unicode-UTF8 0.70) and ship
  Unicode/UTF8.pm for XSLoader.
- Fix Encode::_utf8_on to flip SvUTF8 without decoding bytes; add
  RuntimeScalar.utf8UncheckedOctets so encode_utf8 can validate raw octets only
  for that case (t/090, t/200_leaks).
- Use WarningBitsRegistry call-site bits for utf8 warn/die from Java XS so
  Test::Fatal + FATAL utf8 works (t/090 decode).
- pack: surrogate code points from pack U; allow U template up to 0x7FFFFFFF in
  character mode (t/060, t/080).
- decode fallback chunks: one output character per returned octet (t/120).

Generated with [Cursor](https://docs.cursor.com)

Co-Authored-By: Cursor <noreply@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@fglock fglock merged commit 6184945 into master May 15, 2026
2 checks passed
@fglock fglock deleted the feature/unicode-utf8-jcpan branch May 15, 2026 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant