[QDP] Pr1 phase kernel opt by aloha1357 · Pull Request #1386 · apache/mahout

aloha1357 · 2026-06-07T17:22:15Z

Related Issues

related #1385

Changes

Why

The original phase encoding and IQP encoding kernels suffered from GPU thread divergence due to conditional branching (if (val != 0.0) or if ((x >> i) & 1U)). Furthermore, the normalization factor (norm_factor) was being redundantly calculated inside the GPU kernel, consuming extra cycles. Eliminating these inefficiencies significantly improves the kernel's execution speed on the GPU.

How

Replaced Conditional Branching: In both phase.cu and iqp.cu, the if conditions checking bit states were replaced with boolean arithmetic casting and multiplication (e.g., phases[bit] * (double)((idx >> bit) & 1U)). This ensures that all threads in a warp follow the exact same instruction path, eliminating warp divergence.
Host-side Pre-calculation: Moved the norm_factor calculation to the host (CPU) before launching the kernel in phase.cu, passing the result as an immutable parameter.
Added Explanatory Comments: Included inline documentation near the bitwise arithmetic lines to aid code reviewers in understanding the optimizations.

Benchmark Results

Environment: Dev Machine (NVIDIA RTX 4060)
Configuration: Qubits (N): 14, Batch Size: 128, Iterations: 5

Implementation	Execution Time (ms)	Notes
GPU phase (Before PR1)	1.26 ms (per sample) / 161.91 ms (total)	Strict checkout of unoptimized `phase.cu`
GPU phase (This PR)	1.16 ms (per sample) / 147.94 ms (total)	~9.4% Performance Gain with zero divergence.

Checklist

Added or updated unit tests for all changes (Verified passing against existing CI test suite)
Added or updated documentation for all changes (Added explanatory inline comments for PR)

ryankert01

I need you to do four things:

show the benchmark of this changes (before & after)
enhance our unit tests for this function
cleanup the code & comments from coding agents
read CONTRIBUTING.md

Thanks!

…mputation

…ations

aloha1357 requested review from 400Ping and ryankert01 as code owners June 7, 2026 17:22

aloha1357 changed the title ~~Pr1 phase kernel opt~~ [QDP] Pr1 phase kernel opt Jun 7, 2026

ryankert01 requested changes Jun 8, 2026

View reviewed changes

aloha1357 added 2 commits June 8, 2026 15:17

feat(qdp): optimize phase kernel divergence and hoist constant mem co…

226e8f2

…mputation

style(qdp): add explanatory comments for phase and iqp kernel optimiz…

29ecb19

…ations

ryankert01 force-pushed the pr1-phase-kernel-opt branch from ca90282 to 29ecb19 Compare June 8, 2026 07:17

test(qdp): add PR1 phase benchmark script and N=14 batch tests

3d6db12

aloha1357 requested a review from guan404ming as a code owner June 8, 2026 22:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QDP] Pr1 phase kernel opt#1386

[QDP] Pr1 phase kernel opt#1386
aloha1357 wants to merge 3 commits into
apache:mainfrom
aloha1357:pr1-phase-kernel-opt

aloha1357 commented Jun 7, 2026 •

edited

Loading

Uh oh!

ryankert01 left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aloha1357 commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues

Changes

Why

How

Benchmark Results

Checklist

Uh oh!

ryankert01 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aloha1357 commented Jun 7, 2026 •

edited

Loading

ryankert01 left a comment •

edited

Loading