Skip to content

sort:Optimize sort collation for long lines#12144

Open
mattsu2020 wants to merge 2 commits into
uutils:mainfrom
mattsu2020:fix_sort_performance
Open

sort:Optimize sort collation for long lines#12144
mattsu2020 wants to merge 2 commits into
uutils:mainfrom
mattsu2020:fix_sort_performance

Conversation

@mattsu2020

@mattsu2020 mattsu2020 commented May 4, 2026

Copy link
Copy Markdown
Contributor

What changed

  • Avoid precomputing ICU collation sort keys for lines larger than 65,535 bytes.
  • Store optional collation key ranges so very long lines can fall back to lazy locale comparison during sorting.

Why

Fixes #12138. In UTF-8 locales, sort precomputed ICU collation keys for every input line. For inputs with a small number of very large lines, such as 26 lines of 200 MiB each, the cost of generating and storing multi-GiB collation keys dominated runtime.

Impact

Small and normal-sized lines keep the existing precomputed-key fast path. Very long lines skip the expensive key materialization and use locale_cmp when compared.

Validation

  • cargo check -p uu_sort
  • cargo test -p uu_sort
  • cargo test -p coreutils --test tests test_sort::test_default_unsorted_ints -- --exact
  • Compared output against GNU sort with cmp for 52 MiB and 130 MiB reproducer inputs.
  • Hyperfine on the issue-sized 5.1 GiB input with LC_ALL=en_US.UTF-8 --parallel 1 --buffer-size 8G:
    • uutils release: 5.054 s
    • GNU gsort 9.11: 33.685 s

@mattsu2020 mattsu2020 changed the title [codex] Optimize sort collation for long lines sort:Optimize sort collation for long lines May 4, 2026
@mattsu2020 mattsu2020 marked this pull request as ready for review May 4, 2026 13:00
@github-actions

github-actions Bot commented May 4, 2026

Copy link
Copy Markdown

GNU testsuite comparison:

Skip an intermittent issue tests/cut/bounded-memory (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/tail/tail-n0f (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/io-errors (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tail/retry (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/dd/no-allocate is now passing!

@codspeed-hq

codspeed-hq Bot commented May 4, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 70 untouched benchmarks
⏩ 299 skipped benchmarks1


Comparing mattsu2020:fix_sort_performance (3e6f8d6) with main (80cc829)2

Open in CodSpeed

Footnotes

  1. 299 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on main (47d7c76) during the generation of this report, so 80cc829 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@xtqqczze

xtqqczze commented May 4, 2026

Copy link
Copy Markdown
Contributor

Out of interest, why choose 1 MiB as the limit, rather than something lower like u16::MAX?

@mattsu2020

Copy link
Copy Markdown
Contributor Author

Out of interest, why choose 1 MiB as the limit, rather than something lower like u16::MAX?

Since measurements using 64 KiB showed performance that was at least equivalent for the issue workload, we will change the threshold to u16::MAX.

@xtqqczze

xtqqczze commented May 4, 2026

Copy link
Copy Markdown
Contributor

@mattsu2020 Could you also add a benchmark (in separate PR)?

@mattsu2020

Copy link
Copy Markdown
Contributor Author

@mattsu2020 Could you also add a benchmark (in separate PR)?

Sure, I’ll keep this PR focused on the fix and open a separate PR adding a benchmark for long-line locale collation.

@sylvestre

Copy link
Copy Markdown
Contributor

Codspeed is not happy

@mattsu2020

Copy link
Copy Markdown
Contributor Author

Codspeed is not happy

The memory usage is within expectations, but I'll look into the CPU.

@mattsu2020 mattsu2020 marked this pull request as draft May 23, 2026 10:23
@mattsu2020 mattsu2020 force-pushed the fix_sort_performance branch from 0534052 to d481c7d Compare May 26, 2026 22:21
@mattsu2020 mattsu2020 marked this pull request as ready for review May 26, 2026 22:47
Avoid precomputing ICU collation sort keys for lines larger than 65,535
bytes. Store optional collation key ranges using the high bit so very
long lines can fall back to lazy locale comparison during sorting.

Fixes uutils#12138
@sylvestre sylvestre force-pushed the fix_sort_performance branch from 47cf9b2 to 3e6f8d6 Compare June 7, 2026 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Case where GNU sort is 40 times faster than uutils

3 participants