Skip to content

A more efficient slice comparison implementation for T: !BytewiseEq#116846

Merged
bors merged 1 commit into
rust-lang:masterfrom
krtab:slice_compare_no_memcmp_opt
Jan 9, 2024
Merged

A more efficient slice comparison implementation for T: !BytewiseEq#116846
bors merged 1 commit into
rust-lang:masterfrom
krtab:slice_compare_no_memcmp_opt

Conversation

@krtab

@krtab krtab commented Oct 17, 2023

Copy link
Copy Markdown
Contributor

(This is a follow up PR on #113654)

This PR changes the implementation for [T] slice comparison when T: !BytewiseEq. The previous implementation using zip was not optimized properly by the compiler, which didn't leverage the fact that both length were equal. Performance improvements are for example 20% when testing that [Some(0_u64); 4096].as_slice() == [Some(0_u64); 4096].as_slice().

@rustbot

rustbot commented Oct 17, 2023

Copy link
Copy Markdown
Collaborator

r? @joshtriplett

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 17, 2023
@rust-log-analyzer

This comment has been minimized.

@krtab krtab force-pushed the slice_compare_no_memcmp_opt branch from 9348c33 to 0cc5c97 Compare October 17, 2023 15:15
@asquared31415

Copy link
Copy Markdown
Contributor

The following code seems to generate identical code as this PR for most types and better code for float types. (also it's safe!) However, I haven't benchmarked it, so maybe there's something I'm not seeing.

if a.len() != b.len() {
    return false;
}

for idx in 0..a.len() {
    if a[idx] != b[idx] {
        return false;
    }
}
true

Comment thread library/core/src/slice/cmp.rs Outdated
// SAFETY:
// This is sound because:
// - self.len == other.len
// - self.len <= isize::MAX

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That isn't true for ZSTs. Though the result still happens to work because bumping zst pointers does nothing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

@krtab

krtab commented Oct 17, 2023

Copy link
Copy Markdown
Contributor Author

The following code seems to generate identical code as this PR for most types and better code for float types. (also it's safe!) However, I haven't benchmarked it, so maybe there's something I'm not seeing.

if a.len() != b.len() {
    return false;
}

for idx in 0..a.len() {
    if a[idx] != b[idx] {
        return false;
    }
}
true

Oh ! Very nice catch ! A quick check on godbolt shows me that this is only properly optimized since 1.73.0, and I didn''t recheck this code from my previous PR before resubmitting.

I'll update this soon. Thanks.

@asquared31415

Copy link
Copy Markdown
Contributor

Wow, I'm shocked that this was not as well optimized for that long, this should have been easy enough for the optimizer. Oh well, sometimes they're tricky like that!

I did some history, and of note is that the zip implementation seems to have been created as a more concise way of writing the loop, not for any specific performance reasons: #61665 (comment)

@krtab krtab force-pushed the slice_compare_no_memcmp_opt branch from 0cc5c97 to a70613b Compare October 18, 2023 09:32
@the8472 the8472 assigned the8472 and unassigned joshtriplett Jan 5, 2024
@the8472

the8472 commented Jan 5, 2024

Copy link
Copy Markdown
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 5, 2024
@bors

bors commented Jan 5, 2024

Copy link
Copy Markdown
Collaborator

⌛ Trying commit a70613b with merge c12f891...

bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 5, 2024
…=<try>

A more efficient slice comparison implementation for T: !BytewiseEq

(This is a follow up PR on rust-lang#113654)

This PR changes the implementation for `[T]` slice comparison when `T: !BytewiseEq`. The previous implementation using zip was not optimized properly by the compiler, which didn't leverage the fact that both length were equal. Performance improvements are for example 20% when testing that `[Some(0_u64); 4096].as_slice() == [Some(0_u64); 4096].as_slice()`.
@bors

bors commented Jan 5, 2024

Copy link
Copy Markdown
Collaborator

☀️ Try build successful - checks-actions
Build commit: c12f891 (c12f8910a3463d1e5fa69bd857e9253878a9a990)

@rust-timer

This comment has been minimized.

@rust-timer

Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (c12f891): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.5% [-1.1%, -0.2%] 12
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.5% [-1.1%, -0.2%] 12

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.1% [1.9%, 2.4%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.7% [-3.2%, -0.3%] 2
Improvements ✅
(secondary)
-2.3% [-2.3%, -2.3%] 1
All ❌✅ (primary) 0.2% [-3.2%, 2.4%] 4

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.3% [-1.6%, -1.1%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.3% [-1.6%, -1.1%] 3

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 0.4%] 5
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.5% [-2.1%, -0.0%] 24
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.4% [-2.1%, 0.4%] 29

Bootstrap: 669.803s -> 668.21s (-0.24%)
Artifact size: 311.12 MiB -> 311.14 MiB (0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 5, 2024
@the8472

the8472 commented Jan 7, 2024

Copy link
Copy Markdown
Member

The previous implementation using zip was not optimized properly by the compiler, which didn't leverage the fact that both length were equal.

That in itself seems like an issue... ah yes, #100124 last attempted to fix this but that stalled.
In the meantime this does seem fine.

Would you be willing to work out the critical difference in LLVM IR and add a codegen test? That's optional though, I can accept the PR without that.

The previous implementation was not optimized properly by the compiler,
which didn't leverage the fact that both length were equal.
@krtab krtab force-pushed the slice_compare_no_memcmp_opt branch from a70613b to 5b041ab Compare January 8, 2024 15:37
@krtab

krtab commented Jan 8, 2024

Copy link
Copy Markdown
Contributor Author

I had a look but couldn't figure out a way to characterize the difference between the two IR.
I added a comment hoping to prevent accidental regression.

@the8472

the8472 commented Jan 9, 2024

Copy link
Copy Markdown
Member

@bors r+

@bors

bors commented Jan 9, 2024

Copy link
Copy Markdown
Collaborator

📌 Commit 5b041ab has been approved by the8472

It is now in the queue for this repository.

@bors bors removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 9, 2024
@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Jan 9, 2024
@bors

bors commented Jan 9, 2024

Copy link
Copy Markdown
Collaborator

⌛ Testing commit 5b041ab with merge 190f4c9...

@bors

bors commented Jan 9, 2024

Copy link
Copy Markdown
Collaborator

☀️ Test successful - checks-actions
Approved by: the8472
Pushing 190f4c9 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 9, 2024
@bors bors merged commit 190f4c9 into rust-lang:master Jan 9, 2024
@rustbot rustbot added this to the 1.77.0 milestone Jan 9, 2024
@rust-timer

Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (190f4c9): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.5% [-0.9%, -0.2%] 15
Improvements ✅
(secondary)
-0.6% [-0.6%, -0.6%] 1
All ❌✅ (primary) -0.5% [-0.9%, -0.2%] 15

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.6% [2.9%, 4.6%] 3
Regressions ❌
(secondary)
1.8% [1.8%, 1.8%] 1
Improvements ✅
(primary)
-1.5% [-3.6%, -0.4%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.0% [-3.6%, 4.6%] 6

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.0% [-1.1%, -0.9%] 2
Improvements ✅
(secondary)
-3.1% [-3.5%, -2.3%] 4
All ❌✅ (primary) -1.0% [-1.1%, -0.9%] 2

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 667.74s -> 666.209s (-0.23%)
Artifact size: 308.59 MiB -> 308.59 MiB (0.00%)

@krtab

krtab commented Jan 10, 2024

Copy link
Copy Markdown
Contributor Author

Thanks @the8472 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants