A more efficient slice comparison implementation for T: !BytewiseEq by krtab · Pull Request #116846 · rust-lang/rust

krtab · 2023-10-17T14:49:54Z

(This is a follow up PR on #113654)

This PR changes the implementation for [T] slice comparison when T: !BytewiseEq. The previous implementation using zip was not optimized properly by the compiler, which didn't leverage the fact that both length were equal. Performance improvements are for example 20% when testing that [Some(0_u64); 4096].as_slice() == [Some(0_u64); 4096].as_slice().

rustbot · 2023-10-17T14:50:03Z

r? @joshtriplett

(rustbot has picked a reviewer for you, use r? to override)

asquared31415 · 2023-10-17T17:42:34Z

The following code seems to generate identical code as this PR for most types and better code for float types. (also it's safe!) However, I haven't benchmarked it, so maybe there's something I'm not seeing.

if a.len() != b.len() {
    return false;
}

for idx in 0..a.len() {
    if a[idx] != b[idx] {
        return false;
    }
}
true

the8472 · 2023-10-17T19:35:43Z

+        // SAFETY:
+        // This is sound because:
+        // - self.len == other.len
+        // - self.len <= isize::MAX


That isn't true for ZSTs. Though the result still happens to work because bumping zst pointers does nothing

Thanks for the review!

krtab · 2023-10-17T21:38:03Z

The following code seems to generate identical code as this PR for most types and better code for float types. (also it's safe!) However, I haven't benchmarked it, so maybe there's something I'm not seeing.
if a.len() != b.len() {
    return false;
}

for idx in 0..a.len() {
    if a[idx] != b[idx] {
        return false;
    }
}
true

Oh ! Very nice catch ! A quick check on godbolt shows me that this is only properly optimized since 1.73.0, and I didn''t recheck this code from my previous PR before resubmitting.

I'll update this soon. Thanks.

asquared31415 · 2023-10-18T00:06:07Z

Wow, I'm shocked that this was not as well optimized for that long, this should have been easy enough for the optimizer. Oh well, sometimes they're tricky like that!

I did some history, and of note is that the zip implementation seems to have been created as a more concise way of writing the loop, not for any specific performance reasons: #61665 (comment)

the8472 · 2024-01-05T10:00:59Z

@bors try @rust-timer queue

bors · 2024-01-05T10:03:19Z

⌛ Trying commit a70613b with merge c12f891...

…=<try> A more efficient slice comparison implementation for T: !BytewiseEq (This is a follow up PR on rust-lang#113654) This PR changes the implementation for `[T]` slice comparison when `T: !BytewiseEq`. The previous implementation using zip was not optimized properly by the compiler, which didn't leverage the fact that both length were equal. Performance improvements are for example 20% when testing that `[Some(0_u64); 4096].as_slice() == [Some(0_u64); 4096].as_slice()`.

bors · 2024-01-05T11:29:52Z

☀️ Try build successful - checks-actions
Build commit: c12f891 (c12f8910a3463d1e5fa69bd857e9253878a9a990)

rust-timer · 2024-01-05T13:40:04Z

Finished benchmarking commit (c12f891): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.5%	[-1.1%, -0.2%]	12
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.5%	[-1.1%, -0.2%]	12

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.1%	[1.9%, 2.4%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.7%	[-3.2%, -0.3%]	2
Improvements ✅ (secondary)	-2.3%	[-2.3%, -2.3%]	1
All ❌✅ (primary)	0.2%	[-3.2%, 2.4%]	4

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.3%	[-1.6%, -1.1%]	3
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.3%	[-1.6%, -1.1%]	3

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.0%, 0.4%]	5
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.5%	[-2.1%, -0.0%]	24
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.4%	[-2.1%, 0.4%]	29

Bootstrap: 669.803s -> 668.21s (-0.24%)
Artifact size: 311.12 MiB -> 311.14 MiB (0.01%)

the8472 · 2024-01-07T08:49:59Z

The previous implementation using zip was not optimized properly by the compiler, which didn't leverage the fact that both length were equal.

That in itself seems like an issue... ah yes, #100124 last attempted to fix this but that stalled.
In the meantime this does seem fine.

Would you be willing to work out the critical difference in LLVM IR and add a codegen test? That's optional though, I can accept the PR without that.

The previous implementation was not optimized properly by the compiler, which didn't leverage the fact that both length were equal.

krtab · 2024-01-08T15:38:21Z

I had a look but couldn't figure out a way to characterize the difference between the two IR.
I added a comment hoping to prevent accidental regression.

the8472 · 2024-01-09T20:31:24Z

@bors r+

bors · 2024-01-09T20:31:26Z

📌 Commit 5b041ab has been approved by the8472

It is now in the queue for this repository.

bors · 2024-01-09T20:52:37Z

⌛ Testing commit 5b041ab with merge 190f4c9...

bors · 2024-01-09T22:49:43Z

☀️ Test successful - checks-actions
Approved by: the8472
Pushing 190f4c9 to master...

rust-timer · 2024-01-10T00:02:35Z

Finished benchmarking commit (190f4c9): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.5%	[-0.9%, -0.2%]	15
Improvements ✅ (secondary)	-0.6%	[-0.6%, -0.6%]	1
All ❌✅ (primary)	-0.5%	[-0.9%, -0.2%]	15

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.6%	[2.9%, 4.6%]	3
Regressions ❌ (secondary)	1.8%	[1.8%, 1.8%]	1
Improvements ✅ (primary)	-1.5%	[-3.6%, -0.4%]	3
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.0%	[-3.6%, 4.6%]	6

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.0%	[-1.1%, -0.9%]	2
Improvements ✅ (secondary)	-3.1%	[-3.5%, -2.3%]	4
All ❌✅ (primary)	-1.0%	[-1.1%, -0.9%]	2

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 667.74s -> 666.209s (-0.23%)
Artifact size: 308.59 MiB -> 308.59 MiB (0.00%)

krtab · 2024-01-10T13:02:53Z

Thanks @the8472 👍

rustbot assigned joshtriplett Oct 17, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 17, 2023

krtab mentioned this pull request Oct 17, 2023

Non memcmp slice comparison optimization #113654

Closed

This comment has been minimized.

Sign in to view

krtab force-pushed the slice_compare_no_memcmp_opt branch from 9348c33 to 0cc5c97 Compare October 17, 2023 15:15

the8472 reviewed Oct 17, 2023

View reviewed changes

krtab force-pushed the slice_compare_no_memcmp_opt branch from 0cc5c97 to a70613b Compare October 18, 2023 09:32

the8472 assigned the8472 and unassigned joshtriplett Jan 5, 2024

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 5, 2024

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 5, 2024

A more efficient slice comparison implementation for T: !BytewiseEq

5b041ab

The previous implementation was not optimized properly by the compiler, which didn't leverage the fact that both length were equal.

krtab force-pushed the slice_compare_no_memcmp_opt branch from a70613b to 5b041ab Compare January 8, 2024 15:37

bors removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 9, 2024

bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Jan 9, 2024

bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 9, 2024

bors merged commit 190f4c9 into rust-lang:master Jan 9, 2024

rustbot added this to the 1.77.0 milestone Jan 9, 2024

matthiaskrgr mentioned this pull request May 30, 2025

stack overflow in trait selection #141772

Open

apiraino mentioned this pull request Jan 13, 2026

Long delay printing errors with metadata #150907

Closed

Uh oh!

Conversation

krtab commented Oct 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rustbot commented Oct 17, 2023

Uh oh!

This comment has been minimized.

asquared31415 commented Oct 17, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krtab commented Oct 17, 2023

Uh oh!

asquared31415 commented Oct 18, 2023

Uh oh!

the8472 commented Jan 5, 2024

Uh oh!

This comment has been minimized.

bors commented Jan 5, 2024

Uh oh!

bors commented Jan 5, 2024

Uh oh!

This comment has been minimized.

rust-timer commented Jan 5, 2024

Overall result: ✅ improvements - no action needed

Uh oh!

the8472 commented Jan 7, 2024

Uh oh!

krtab commented Jan 8, 2024

Uh oh!

the8472 commented Jan 9, 2024

Uh oh!

bors commented Jan 9, 2024

Uh oh!

bors commented Jan 9, 2024

Uh oh!

bors commented Jan 9, 2024

Uh oh!

rust-timer commented Jan 10, 2024

Overall result: ✅ improvements - no action needed

Uh oh!

krtab commented Jan 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

krtab commented Oct 17, 2023 •

edited

Loading