[4970] Generate tuning inputs on GPU via splitmix64 device RNG by itikhono · Pull Request #4971 · ROCm/AMDMIGraphX

itikhono · 2026-06-16T17:16:05Z

This PR covers 1st part of the issue #4970.
Eliminates "input-gen + H2D (CPU waste)" part , GPU part (caused by bundle increase 1->10) remains

During op/program tuning, candidate inputs were generated on the host (xorshf96 PRNG) and copied to the device for every candidate. This replaces that with a device kernel that fills tuning inputs directly on the GPU, removing the per-candidate host PRNG + H2D copy.
New device::generate_random uses a counter-based splitmix64 RNG (seed + i * golden_ratio_step → splitmix64), so output is deterministic per seed and reproducible across candidates for fair comparison.
time_program now allocates inputs with allocate_gpu and fills them via gpu_generate_random (recurses tuple sub-objects), while fill_map inputs keep the host-fill path.

Behavior parity with the old host path

bool: handled by visit_all → normalize<bool> → 0/1, identical to the old special-case.
fp4x2 (only non-computable type): visit_all would throw, so generation falls back to a raw byte fill — matching the old uint8 host behavior.
tuples: same seed across sub-objects, same as the previous generate_argument.

Performance

No FPS regression across the YOLO model family (within noise).
Compile/tuning time improved up to ~6.6x at batch 64 on MI350, and ~10x at batch 32 on R9700 (measured together with reverting the bundle increase 1->10)

Test plan

test_gpu_generate_random: seed determinism + range, half type, empty shape no-op, tuple fills every sub-buffer, non-computable (fp4x2) raw-byte fill — 5/5 pass.
YOLO compile + inference sweep (fork vs develop).

Perf testing for YOLO-family models (MI350):

Used migraphx-driver perf, no actual diff detected, the results are quite noisy

different models, batch 4

Model	Fixed, img/s	Develop (before), img/s	Δ
yolov8m	1255.7	1242.5	+1.1%
yolov9m	1177.1	1111.2	+5.9%
yolov10m	1337.1	1303.4	+2.6%
yolo11m	1583.1	1564.2	+1.2%
yolo12m	1341.4	1350.1	−0.6%
yolo26m	1415.0	1407.2	+0.6%

github-actions · 2026-06-16T17:16:33Z

Thank you for your contribution! Since this is an external pull request, a maintainer must review PR and add the "ok-to-test" label if it is approved for testing.

Copilot

Pull request overview

This PR improves GPU compile/tuning throughput by generating candidate input buffers directly on the GPU (splitmix64 counter-based RNG) instead of generating on the host and copying H2D per candidate.

Changes:

Added a GPU-side random-fill kernel (device::generate_random) and a host wrapper (gpu_generate_random) that recurses into tuple sub-objects.
Updated time_program tuning path to allocate parameter buffers on GPU and fill them via gpu_generate_random (keeping fill_map on the host-fill path).
Added a GPU unit test covering determinism, supported types, empty shapes, tuples, and non-computable types.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
test/gpu/generate_random.cpp	New GPU test coverage for deterministic RNG fill, tuples, and non-computable types.
src/targets/gpu/time_op.cpp	Switches tuning input creation to GPU allocation + GPU RNG fill (except `fill_map`).
src/targets/gpu/include/migraphx/gpu/hip.hpp	Exposes `gpu_generate_random` API.
src/targets/gpu/include/migraphx/gpu/device/generate_random.hpp	Declares new device-side RNG entrypoint.
src/targets/gpu/hip.cpp	Implements `gpu_generate_random` wrapper with tuple recursion.
src/targets/gpu/device/generate_random.cpp	Implements splitmix64-based device kernel to fill buffers.

pfultz2 · 2026-06-16T20:39:06Z


 MIGRAPHX_GPU_EXPORT void gpu_fill(context& ctx, const argument& dst, int value = 0);

+MIGRAPHX_GPU_EXPORT void gpu_generate_random(context& ctx, const argument& dst, unsigned long seed);


This should take a shape instead of an argument and return an argument:

argument gpu_generate_random(context& ctx, const shape& s, unsigned long seed)

I originally aligned it with gpu_fill function, but I agree the new signature is better. Done.

Generate tuning inputs on GPU via splitmix64 device RNG

c6a80fe

Copilot AI review requested due to automatic review settings June 16, 2026 17:16

itikhono requested a review from causten as a code owner June 16, 2026 17:16

Copilot started reviewing on behalf of itikhono June 16, 2026 17:16 View session

itikhono mentioned this pull request Jun 16, 2026

YOLO-family models: slow compile that grows dramatically with input size #4970

Open

Copilot AI reviewed Jun 16, 2026

View reviewed changes

Comment thread src/targets/gpu/time_op.cpp Outdated

Comment thread src/targets/gpu/device/generate_random.cpp Outdated

pfultz2 requested changes Jun 16, 2026

View reviewed changes

Resolve review comments; new tests

17c321d

itikhono requested a review from pfultz2 June 17, 2026 09:28

pfultz2 approved these changes Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[4970] Generate tuning inputs on GPU via splitmix64 device RNG#4971

[4970] Generate tuning inputs on GPU via splitmix64 device RNG#4971
itikhono wants to merge 2 commits into
ROCm:developfrom
itikhono:gpu-device-bench-inputs

itikhono commented Jun 16, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pfultz2 Jun 16, 2026

Uh oh!

itikhono Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		MIGRAPHX_GPU_EXPORT void gpu_fill(context& ctx, const argument& dst, int value = 0);

		MIGRAPHX_GPU_EXPORT void gpu_generate_random(context& ctx, const argument& dst, unsigned long seed);

Conversation

itikhono commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Behavior parity with the old host path

Performance

Test plan

Perf testing for YOLO-family models (MI350):

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pfultz2 Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

itikhono Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

itikhono commented Jun 16, 2026 •

edited

Loading