Vectorize Resize by pfultz2 · Pull Request #4967 · ROCm/AMDMIGraphX

pfultz2 · 2026-06-15T17:19:44Z

Motivation

Enable vectorization for resize which help improve the performance of the kernel.

Technical Details

Since the input and output are different sizes the vectorize arg transformer cannot be applied across all tensors. Instead, we vectorize the output and input differently. Ouput vectorization can be easily applied for most cases but input vectorization can only be applied when its not resizing the fastest axis. So it mainly helps for cases like NHWC.

Changelog Category

Add a CHANGELOG.md entry for any option other than Not Applicable

- Added: New functionality.
- Changed: Changes to existing functionality.
- Removed: Functionality or support that has been removed. (Compared to a previous release)
- Optimized: Component performance that has been optimized or improved.
- Resolved Issues: Known issues from a previous version that have been resolved.
- Not Applicable: This PR is not to be included in the changelog.

gh-app-migraphx-bot-pr-write · 2026-06-15T23:02:13Z

Test	Batch	New Rate (400886)	Old Rate (b69836)	Diff	Status
torchvision-resnet50	64	nan	3,154.40	nan	❌
torchvision-resnet50_fp16	64	nan	6,635.18	nan	❌
torchvision-densenet121	32	nan	2,694.69	nan	❌
torchvision-densenet121_fp16	32	nan	4,526.13	nan	❌
torchvision-inceptionv3	32	nan	1,797.12	nan	❌
torchvision-inceptionv3_fp16	32	nan	2,819.16	nan	❌
cadene-inceptionv4	16	nan	824.35	nan	❌
cadene-resnext64x4	16	nan	783.08	nan	❌
slim-mobilenet	64	nan	8,386.62	nan	❌
slim-nasnetalarge	64	nan	228.34	nan	❌
slim-resnet50v2	64	nan	3,313.18	nan	❌
bert-mrpc-onnx	8	nan	1,172.65	nan	❌
bert-mrpc-tf	1	nan	493.16	nan	❌
pytorch-examples-wlang-gru	1	nan	327.56	nan	❌
pytorch-examples-wlang-lstm	1	nan	465.24	nan	❌
torchvision-resnet50_1	1	nan	768.99	nan	❌
cadene-dpn92_1	1	nan	453.05	nan	❌
cadene-resnext101_1	1	nan	363.80	nan	❌
onnx-taau-downsample	1	nan	399.85	nan	❌
dlrm-criteoterabyte	1	nan	32.43	nan	❌
dlrm-criteoterabyte_fp16	1	nan	51.82	nan	❌
agentmodel	1	nan	10,024.97	nan	❌
unet_fp16	2	nan	56.82	nan	❌
resnet50v1_fp16	1	nan	953.97	nan	❌
resnet50v1_int8	1	nan	932.02	nan	❌
bert_base_cased_fp16	64	nan	1,097.66	nan	❌
bert_large_uncased_fp16	32	nan	346.32	nan	❌
bert_large_fp16	1	nan	203.57	nan	❌
distilgpt2_fp16	16	nan	2,085.63	nan	❌
yolov5s	1	nan	564.77	nan	❌
tinyllama	1	nan	45.96	nan	❌
vicuna-fastchat	1	nan	44.01	nan	❌
whisper-tiny-encoder	1	nan	417.38	nan	❌
whisper-tiny-decoder	1	nan	413.20	nan	❌
llama2_7b	1	nan	19.07	nan	❌
qwen1.5-7b	1	nan	22.86	nan	❌
phi3-3.8b	1	nan	25.80	nan	❌
llama3-8b	1	nan	18.06	nan	❌
whisper-large-encoder	1	nan	7.78	nan	❌
whisper-large-decoder	1	nan	7.09	nan	❌
mistral-7b	1	nan	23.37	nan	❌
FLUX.1-schnell	1	nan	305.98	nan	❌

Regressions detected 🔴

gh-app-migraphx-bot-pr-write · 2026-06-15T23:02:14Z

Test	Status	Result
bert-mrpc-onnx	✅	PASSED: MIGraphX meets tolerance
bert-mrpc-tf	❌	ERROR - check error output traceback Traceback (most recent call last): File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 377, in main() File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 313, in main import tensorflow as tf File "/usr/local/lib/python3.10/dist-packages/tensorflow/init.py", line 38, in from tensorflow.python.tools import module_util as _module_util File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/init.py", line 36, in from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 26, in self_check.preload_check() File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/platform/self_check.py", line 63, in preload_check from tensorflow.python.platform import _pywrap_cpu_feature_guard ImportError: libamdhip64.so.6: cannot open shared object file: No such file or directory
pytorch-examples-wlang-gru	✅	PASSED: MIGraphX meets tolerance
pytorch-examples-wlang-lstm	✅	PASSED: MIGraphX meets tolerance
dlrm-criteoterabyte	✅	PASSED: MIGraphX meets tolerance
agentmodel	✅	PASSED: MIGraphX meets tolerance
unet	✅	PASSED: MIGraphX meets tolerance
resnet50v1	✅	PASSED: MIGraphX meets tolerance
bert_base_cased_fp16	✅	PASSED: MIGraphX meets tolerance
bert_large_uncased_fp16	🔴	FAILED: MIGraphX is not within tolerance - check verbose output
bert_large	✅	PASSED: MIGraphX meets tolerance
yolov5s	✅	PASSED: MIGraphX meets tolerance
tinyllama	✅	PASSED: MIGraphX meets tolerance
vicuna-fastchat	✅	PASSED: MIGraphX meets tolerance
whisper-tiny-encoder	✅	PASSED: MIGraphX meets tolerance
whisper-tiny-decoder	✅	PASSED: MIGraphX meets tolerance
distilgpt2_fp16	🔴	FAILED: MIGraphX is not within tolerance - check verbose output
llama2_7b	✅	PASSED: MIGraphX meets tolerance
qwen1.5-7b	✅	PASSED: MIGraphX meets tolerance
phi3-3.8b	✅	PASSED: MIGraphX meets tolerance
llama3-8b	✅	PASSED: MIGraphX meets tolerance
whisper-large-encoder	❌	ERROR - check error output traceback 2026-06-17 12:07:52.141276 [WARN] [/data/src/onnx/onnx_parser.cpp:282] Model has unbound symbolic dimension(s): batch_size, encoder_sequence_length, feature_size. These default to 1 and may cause unexpected behavior. Try setting `--dim-param @<name> <value>` or `--input-dim @<input> <dims>` if program compilation fails. Traceback (most recent call last): File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 377, in main() File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 224, in main model = migraphx.parse_onnx(model_name, default_dim_value=batch) RuntimeError: /data/src/include/migraphx/op/convolution.hpp:113: normalize_compute_shape: CONVOLUTION: mismatched channel numbers: input channels (1) != weights channels (80) * group (1)
whisper-large-decoder	✅	PASSED: MIGraphX meets tolerance
mistral-7b	✅	PASSED: MIGraphX meets tolerance
FLUX.1-schnell	✅	PASSED: MIGraphX meets tolerance

codecov · 2026-06-16T17:53:03Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #4967   +/-   ##
========================================
  Coverage    92.73%   92.73%           
========================================
  Files          594      594           
  Lines        31340    31340           
========================================
  Hits         29063    29063           
  Misses        2277     2277

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR updates the GPU resize JIT path to enable vectorized stores on the output (and conditionally vectorized loads on the input when the fastest axis is a true pass-through), aiming to improve kernel performance for common layouts like NHWC.

Changes:

Refactors the device-side resize kernels to run through a shared resize_apply wrapper that supports mixed (scalar vs vectorized) input/output handling.
Updates the GPU JIT resize kernel template to apply vectorize<N, Axis>() transformers to out (and to input when safe), and adjusts launch sizing to operate on vectorized output elements.
Adds host-side logic to decide when input vectorization is safe based on stride/scale/coordinate transform constraints.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
src/targets/gpu/kernels/include/migraphx/kernels/resize.hpp	Adds vectorization support via a new `resize_apply` wrapper and updates nearest/linear/cubic implementations to use it.
src/targets/gpu/jit/resize.cpp	Applies vectorize transformers in the generated kernel, selects vectorization parameters, and updates launch sizing and input-vectorization eligibility logic.

+template <index_int Axis, class Input, class Output, class Outv, class Compute>
+__device__ void resize_apply(Input input, Output out, Outv outv, Compute compute)
 {
-    auto idx       = make_index();
-    auto in_shape  = input.get_shape();
-    auto out_shape = output.get_shape();
-
-    idx.global_stride(out_shape.elements(), [&](auto out_idx) {
-        auto in_idx = compute_nearest_idx<CoordOp, NearestOp>(in_shape, out_shape, out_idx, scales);
-        output[out_idx] = input[in_idx];
-    });
+    auto idx                = make_index();
+    constexpr index_int ivn = tensor_vec_size<Input>(); // >= 2 only for a pass-through fast axis
+    constexpr index_int ovn = tensor_vec_size<Outv>();


Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

pfultz2 added 5 commits June 15, 2026 09:19

Vectorize resize

843ffa7

Vectorize

454fe02

Use vecotrize arg transformer

781fc32

vec input

1df975f

Vectorize input

ff0b123

pfultz2 changed the title ~~Resize GPU improvements~~ Vectorize Resize Jun 15, 2026

pfultz2 added 2 commits June 16, 2026 09:25

Update comment

41a1936

Simplify

accb22b

pfultz2 marked this pull request as ready for review June 17, 2026 14:12

pfultz2 requested a review from causten as a code owner June 17, 2026 14:12

pfultz2 requested review from CharlieL7, TedThemistokleous, Copilot and kahmed10 June 17, 2026 14:12

Copilot started reviewing on behalf of pfultz2 June 17, 2026 14:13 View session

Copilot AI reviewed Jun 17, 2026

View reviewed changes

pfultz2 and others added 3 commits June 17, 2026 09:49

Potential fix for pull request finding

00296b8

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Update src/targets/gpu/jit/resize.cpp

4008862

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Merge branch 'develop' into resize-vectorize

75dafc3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize Resize#4967

Vectorize Resize#4967
pfultz2 wants to merge 10 commits into
developfrom
resize-vectorize

pfultz2 commented Jun 15, 2026 •

edited

Loading

Uh oh!

gh-app-migraphx-bot-pr-write Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

gh-app-migraphx-bot-pr-write Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pfultz2 commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Changelog Category

Uh oh!

gh-app-migraphx-bot-pr-write Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gh-app-migraphx-bot-pr-write Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pfultz2 commented Jun 15, 2026 •

edited

Loading

gh-app-migraphx-bot-pr-write Bot commented Jun 15, 2026 •

edited

Loading

gh-app-migraphx-bot-pr-write Bot commented Jun 15, 2026 •

edited

Loading

codecov Bot commented Jun 16, 2026 •

edited

Loading