Skip to content

Add: support input_shape_profile for trt-rtx ep#1782

Open
haoxiz-nvidia wants to merge 3 commits into
mainfrom
haoxiz/onnx-ptq-model-id
Open

Add: support input_shape_profile for trt-rtx ep#1782
haoxiz-nvidia wants to merge 3 commits into
mainfrom
haoxiz/onnx-ptq-model-id

Conversation

@haoxiz-nvidia

@haoxiz-nvidia haoxiz-nvidia commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Add support for onnx quantization and support model_id as input, which fix missing input_shpae_profile problem for some version of trt-rtx

Usage

python -m modelopt.onnx.quantization --onnx_path="path\to\model.onnx" --quantize_mode=int8 --output_path="path\to\output\model.onnx" --calibration_eps=NvTensorRtRtx --use_external_data_format --high_precision_dtype=fp32 --model_id="huggingface_model_id"

Testing

Tested on 4 popular llm models on all popular quantization method(int4, fp8, int8)

Before your PR is "Ready for review"

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅
  • Did you write any new necessary tests?: ❌
  • Did you update Changelog?: ❌
  • Did you get Claude approval on this PR?: N/A

Summary by CodeRabbit

  • New Features
    • Added model_id support to the ONNX PTQ CLI and quantization API, enabling automatic generation of input_shapes_profile when not provided.
    • Added input_shapes_profile parsing from inline JSON or a JSON file, plus trust_remote_code for resolving custom model code.
  • Enhancements
    • Extended input-shape profile handling across INT8/FP8 quantization and MatMul/MHA quantization exclusions.
    • Updated the Windows example to generate profiles when trt/NvTensorRtRtx calibration endpoints are used.
  • Tests
    • Added unit and CLI integration coverage for profile parsing, forwarding, and profile realignment.

Signed-off-by: haoxiz <haoxiz@nvidia.com>
@haoxiz-nvidia haoxiz-nvidia self-assigned this Jun 22, 2026
@haoxiz-nvidia haoxiz-nvidia requested a review from a team as a code owner June 22, 2026 04:48
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds input_shapes_profile support across ONNX PTQ entrypoints, quantization flow, graph exclusion logic, and ORT provider configuration. Profiles can now be parsed from the CLI or inferred from model_id, then forwarded into INT8/FP8 quantization and example scripts.

Changes

Input Shape Profile Pipeline

Layer / File(s) Summary
Shape profile generation and ORT EP wiring
modelopt/onnx/quantization/ort_utils.py, tests/unit/onnx/quantization/test_ort_utils.py
Adds create_input_shapes_profile(...), extends provider-list construction to merge per-EP profile options, and updates inference-session and TRT-guided ORT configuration to use the prepared execution providers. Tests cover profile generation, provider alignment, and session construction.
Graph analysis and quantizer threading
modelopt/onnx/quantization/graph_utils.py, modelopt/onnx/quantization/int8.py, modelopt/onnx/quantization/fp8.py
Threads input_shapes_profile through extended-model inference, MatMul/MHA exclusion helpers, and the INT8/FP8 quantize entrypoints so the profile reaches exclusion detection and ORT configuration.
Top-level quantize() and CLI wiring
modelopt/onnx/quantization/quantize.py, modelopt/onnx/quantization/__main__.py, examples/windows/onnx_ptq/genai_llm/quantize.py, tests/unit/onnx/quantization/test_autotune_quantization_integration.py, tests/unit/onnx/quantization/test_quantize_api.py
Adds model_id and trust_remote_code to the top-level API, generates or realigns profiles around calibration EP updates, parses --input_shapes_profile from inline JSON or file input, and forwards the new values through the CLI and example script. Tests cover parsing and profile realignment.
Platform loading adjustments
modelopt/onnx/quantization/autotune/benchmark.py, modelopt/onnx/quantization/ort_utils.py
Refactors TensorRT plugin loading flag detection and removes a Windows ctypes type-ignore comment.

Estimated code review effort: 4 (Complex) | ~45 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: adding input shape profile support for the TensorRT-RTX execution provider.
Docstring Coverage ✅ Passed Docstring coverage is 92.86% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed The only actual diff is quantize.py, and it adds model_id flow without hardcoded trust_remote_code=True, unsafe loads, eval/exec, or new nosec comments.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch haoxiz/onnx-ptq-model-id

Comment @coderabbitai help to get the list of available commands.

@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1782/

Built to branch gh-pages at 2026-07-03 08:38 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/onnx/quantization/ort_utils.py`:
- Around line 595-603: The issue is that after _prepare_ep_list filters the
calibration_eps list to remove unavailable providers, the enumeration of
execution_providers uses indices from the filtered list instead of the original
list, causing the input_shapes_profile indices to misalign. To fix this,
enumerate over the original calibration_eps list instead of the filtered
execution_providers list when building the tuple pairs, using the index to
access input_shapes_profile correctly, and mapping each original ep to either
the profile (if available) or the filtered execution_providers equivalent.

In `@modelopt/onnx/quantization/quantize.py`:
- Around line 557-559: The input_shapes_profile is being created from
calibration_eps before it has been finalized by the update_trt_ep_support
function, causing potential sync issues downstream. Move the conditional block
that checks if input_shapes_profile is None and calls
create_input_shapes_profile with model_id and calibration_eps to execute after
update_trt_ep_support has been called, ensuring calibration_eps reflects the
final list of execution providers before generating the profile.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 19ed1a5a-2793-4772-b650-d3982467b520

📥 Commits

Reviewing files that changed from the base of the PR and between 9048d13 and db840b4.

📒 Files selected for processing (6)
  • modelopt/onnx/quantization/__main__.py
  • modelopt/onnx/quantization/fp8.py
  • modelopt/onnx/quantization/graph_utils.py
  • modelopt/onnx/quantization/int8.py
  • modelopt/onnx/quantization/ort_utils.py
  • modelopt/onnx/quantization/quantize.py

Comment thread modelopt/onnx/quantization/ort_utils.py Outdated
Comment thread modelopt/onnx/quantization/quantize.py
@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 28.57143% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.69%. Comparing base (cfc823d) to head (db840b4).
⚠️ Report is 45 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/onnx/quantization/ort_utils.py 11.53% 23 Missing ⚠️
modelopt/onnx/quantization/graph_utils.py 66.66% 1 Missing ⚠️
modelopt/onnx/quantization/quantize.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1782      +/-   ##
==========================================
- Coverage   77.09%   75.69%   -1.41%     
==========================================
  Files         511      511              
  Lines       56168    58272    +2104     
==========================================
+ Hits        43302    44107     +805     
- Misses      12866    14165    +1299     
Flag Coverage Δ
examples 41.80% <14.28%> (-0.15%) ⬇️
gpu 57.67% <25.71%> (-0.64%) ⬇️
unit 54.41% <28.57%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vishalpandya1990

Copy link
Copy Markdown
Contributor

Add support for onnx quantization and support model_id as input, which fix missing input_shpae_profile problem for some version of trt-rtx

Is this TRT-RTX version specific? For the same input values to quantize() API, it works with certain TRT-RTX version and fails with other?

Can you help me recap what is going to be behaviour without providing model-id?

Also, do we handle for bad model-id and "missing key" cases (suppose some arch/model choosing different name for hidden_size in the config)?

Comment thread modelopt/onnx/quantization/ort_utils.py Outdated

from transformers import AutoConfig

config = AutoConfig.from_pretrained(model_id)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this call (and above) have trust_remote_code hooked up here (with default = false) - in case mode's config needs custom code?

@vishalpandya1990

Copy link
Copy Markdown
Contributor

There is a nosec related comment / error mentioned in a coderabbit comment above - Please check that as well.

Signed-off-by: haoxiz <haoxiz@nvidia.com>
Signed-off-by: haoxiz <haoxiz@nvidia.com>
@haoxiz-nvidia haoxiz-nvidia requested review from a team as code owners July 3, 2026 08:34
@haoxiz-nvidia haoxiz-nvidia requested a review from ynankani July 3, 2026 08:34

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modelopt/onnx/quantization/ort_utils.py (1)

447-500: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Handle missing config fields in create_input_shapes_profile
AutoConfig configs don’t always expose hidden_size, num_attention_heads, or num_hidden_layers, so this helper will fail with an opaque AttributeError on valid architectures that use other field names. Raise a clear ValueError (or map the common aliases) so users can fix model_id or pass an explicit profile.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/onnx/quantization/ort_utils.py` around lines 447 - 500, The
create_input_shapes_profile helper assumes AutoConfig always has hidden_size,
num_attention_heads, and num_hidden_layers, which can raise an opaque
AttributeError for valid models. Update create_input_shapes_profile to validate
these fields up front and either map common aliases or raise a clear ValueError
with the model_id context before building shapes; keep the logic localized
around the head_dim, num_kv_heads, num_layers, and make_shapes setup so callers
get an actionable error or supported fallback.
♻️ Duplicate comments (2)
modelopt/onnx/quantization/quantize.py (2)

632-643: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Correctly ordered, but only reachable if the earlier block at Lines 577-579 is removed.

This block (snapshot original EPs, update EPs, then realign-or-regenerate) is the right approach and matches what a prior review round asked for. See the comment on Lines 577-579 — as written, the elif model_id: branch here is currently unreachable whenever model_id is set, because input_shapes_profile is already non-None by the time execution reaches here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/onnx/quantization/quantize.py` around lines 632 - 643, The
calibration EP snapshot/update flow in quantize() is correct, but the
create/regenerate branch is unreachable because input_shapes_profile is already
populated earlier. Remove the earlier input_shapes_profile initialization in the
preceding block so the existing logic here can realign via
_realign_input_shapes_profile or regenerate via create_input_shapes_profile when
model_id is set, using the original_calibration_eps snapshot and the updated
calibration_eps.

577-579: 🎯 Functional Correctness | 🔴 Critical | ⚡ Quick win

Dead/duplicate profile generation drops trust_remote_code and computes the profile before EPs are finalized.

This block regenerates input_shapes_profile from model_id before update_trt_ep_support(...) runs (line 633) and before trust_remote_code is forwarded — it calls create_input_shapes_profile(model_id, calibration_eps) without trust_remote_code, defaulting it to False regardless of what the caller passed.

Because this sets input_shapes_profile to a non-None value, the later elif model_id: branch (lines 639-642, which correctly passes trust_remote_code and uses the finalized calibration_eps) can never execute when model_id is provided — it only realigns the stale profile via _realign_input_shapes_profile. This reproduces the exact issue from a previous review round (profile computed with pre-update EP ordering) and separately drops the user's trust_remote_code flag, which was flagged as an open question by a reviewer ("Should this call... have trust_remote_code hooked up here").

Tracing test_quantize_infers_input_profiles_after_ep_support_update against this code: captured["profile_eps"] would be ["cpu", "trt"] (not ["trt", "cpu"]) and captured["trust_remote_code"] would be False (not True), so both assertions should fail with the current implementation.

🐛 Proposed fix — remove the early block, rely on the later corrected one
-    if input_shapes_profile is None and model_id:
-        input_shapes_profile = create_input_shapes_profile(model_id, calibration_eps)
-
     # quantize_static creates a shape-inferred copy at the input model's directory
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/onnx/quantization/quantize.py` around lines 577 - 579, The early
`input_shapes_profile` regeneration in `quantize()` is stale and bypasses the
later corrected path: it runs before `update_trt_ep_support(...)` and drops the
caller’s `trust_remote_code` by calling `create_input_shapes_profile(model_id,
calibration_eps)` with defaults. Remove this premature block and let the later
`elif model_id:` branch handle profile creation so
`create_input_shapes_profile`, `_realign_input_shapes_profile`, and the
finalized `calibration_eps`/`trust_remote_code` values are used consistently.
🧹 Nitpick comments (2)
tests/unit/onnx/quantization/test_autotune_quantization_integration.py (1)

42-108: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Move new test imports to the top of the file.

The three new tests import get_parser/__main__ inside the function body (Lines 43, 59, 78) with no justification comment. Per repo conventions, imports belong at module top so import errors surface at collection time; in-function imports should be reserved for circular imports or optional dependencies, with a comment naming the reason.

♻️ Suggested fix
+from modelopt.onnx.quantization.__main__ import get_parser
+import modelopt.onnx.quantization.__main__ as quantization_cli
+
 def test_quantization_cli_parses_inline_input_shapes_profile():
-    from modelopt.onnx.quantization.__main__ import get_parser
-
     profile = [{"nv_profile_min_shapes": "input_ids:1x1"}, {}]

As per path instructions, "Imports belong at the top of the file so import errors surface at collection time, not mid-test... Put an import inside a function only when there is a concrete reason... Add a brief comment in those cases naming the reason."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/onnx/quantization/test_autotune_quantization_integration.py`
around lines 42 - 108, Move the new imports used by
test_quantization_cli_parses_inline_input_shapes_profile,
test_quantization_cli_parses_input_shapes_profile_file, and
test_quantization_cli_forwards_input_shapes_profile from inside the test bodies
to the module top level so import failures are caught during collection; if any
import must remain local, add a short comment in that test explaining the
concrete reason. Use the existing get_parser and
modelopt.onnx.quantization.__main__ references to place the imports
appropriately.

Source: Path instructions

tests/unit/onnx/quantization/test_quantize_api.py (1)

54-143: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Move importlib.import_module calls to the top of the file.

Same pattern as in test_autotune_quantization_integration.py: quantize_module = importlib.import_module("modelopt.onnx.quantization.quantize") is repeated inside three separate test functions (Lines 55, 67, 78) with no justification comment. Prefer a single top-level from modelopt.onnx.quantization import quantize as quantize_module so import errors surface at collection time and the duplication is removed.

As per path instructions, "Imports belong at the top of each file... Put an import inside a function only when there is a concrete reason... those should carry a brief comment naming the reason."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/onnx/quantization/test_quantize_api.py` around lines 54 - 143, The
test file repeats dynamic imports inside multiple test functions, which should
be moved to a top-level import. Replace the three importlib.import_module calls
in the quantize API tests with a single top-level import for quantize_module so
collection-time import failures are surfaced early and the duplication is
removed. Update the tests that reference quantize_module to use that shared
module alias consistently.

Source: Path instructions

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@modelopt/onnx/quantization/ort_utils.py`:
- Around line 447-500: The create_input_shapes_profile helper assumes AutoConfig
always has hidden_size, num_attention_heads, and num_hidden_layers, which can
raise an opaque AttributeError for valid models. Update
create_input_shapes_profile to validate these fields up front and either map
common aliases or raise a clear ValueError with the model_id context before
building shapes; keep the logic localized around the head_dim, num_kv_heads,
num_layers, and make_shapes setup so callers get an actionable error or
supported fallback.

---

Duplicate comments:
In `@modelopt/onnx/quantization/quantize.py`:
- Around line 632-643: The calibration EP snapshot/update flow in quantize() is
correct, but the create/regenerate branch is unreachable because
input_shapes_profile is already populated earlier. Remove the earlier
input_shapes_profile initialization in the preceding block so the existing logic
here can realign via _realign_input_shapes_profile or regenerate via
create_input_shapes_profile when model_id is set, using the
original_calibration_eps snapshot and the updated calibration_eps.
- Around line 577-579: The early `input_shapes_profile` regeneration in
`quantize()` is stale and bypasses the later corrected path: it runs before
`update_trt_ep_support(...)` and drops the caller’s `trust_remote_code` by
calling `create_input_shapes_profile(model_id, calibration_eps)` with defaults.
Remove this premature block and let the later `elif model_id:` branch handle
profile creation so `create_input_shapes_profile`,
`_realign_input_shapes_profile`, and the finalized
`calibration_eps`/`trust_remote_code` values are used consistently.

---

Nitpick comments:
In `@tests/unit/onnx/quantization/test_autotune_quantization_integration.py`:
- Around line 42-108: Move the new imports used by
test_quantization_cli_parses_inline_input_shapes_profile,
test_quantization_cli_parses_input_shapes_profile_file, and
test_quantization_cli_forwards_input_shapes_profile from inside the test bodies
to the module top level so import failures are caught during collection; if any
import must remain local, add a short comment in that test explaining the
concrete reason. Use the existing get_parser and
modelopt.onnx.quantization.__main__ references to place the imports
appropriately.

In `@tests/unit/onnx/quantization/test_quantize_api.py`:
- Around line 54-143: The test file repeats dynamic imports inside multiple test
functions, which should be moved to a top-level import. Replace the three
importlib.import_module calls in the quantize API tests with a single top-level
import for quantize_module so collection-time import failures are surfaced early
and the duplication is removed. Update the tests that reference quantize_module
to use that shared module alias consistently.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: eada498a-a724-4907-8f69-962d3ee690f2

📥 Commits

Reviewing files that changed from the base of the PR and between db840b4 and d1a70d8.

📒 Files selected for processing (8)
  • examples/windows/onnx_ptq/genai_llm/quantize.py
  • modelopt/onnx/quantization/__main__.py
  • modelopt/onnx/quantization/autotune/benchmark.py
  • modelopt/onnx/quantization/ort_utils.py
  • modelopt/onnx/quantization/quantize.py
  • tests/unit/onnx/quantization/test_autotune_quantization_integration.py
  • tests/unit/onnx/quantization/test_ort_utils.py
  • tests/unit/onnx/quantization/test_quantize_api.py

@haoxiz-nvidia

Copy link
Copy Markdown
Contributor Author

Hi, I have resolved all issues. The context is current trt-rtx ep will require a input_shape_profile otherwise it will report error

[2026-04-20 23:42:34 ERROR] IBuilder::buildSerializedNetwork: Error Code 1: Internal Error (Failed to create any myelin custom layer tactic In nvinfer1::builder::MyelinGraphTranslatorBase::addPluginV3 at C:_src\optimizer\myelin\myelinPluginV3Layer.cpp:573)

Before int4 quantization test script has already used this parameter. I found it is also necessary for int8/fp8 so this pr I make it a global parameter.

def get_input_shapes_profile(model_name_or_path):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants