Add: support input_shape_profile for trt-rtx ep#1782
Conversation
Signed-off-by: haoxiz <haoxiz@nvidia.com>
📝 WalkthroughWalkthroughAdds ChangesInput Shape Profile Pipeline
Estimated code review effort: 4 (Complex) | ~45 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modelopt/onnx/quantization/ort_utils.py`:
- Around line 595-603: The issue is that after _prepare_ep_list filters the
calibration_eps list to remove unavailable providers, the enumeration of
execution_providers uses indices from the filtered list instead of the original
list, causing the input_shapes_profile indices to misalign. To fix this,
enumerate over the original calibration_eps list instead of the filtered
execution_providers list when building the tuple pairs, using the index to
access input_shapes_profile correctly, and mapping each original ep to either
the profile (if available) or the filtered execution_providers equivalent.
In `@modelopt/onnx/quantization/quantize.py`:
- Around line 557-559: The input_shapes_profile is being created from
calibration_eps before it has been finalized by the update_trt_ep_support
function, causing potential sync issues downstream. Move the conditional block
that checks if input_shapes_profile is None and calls
create_input_shapes_profile with model_id and calibration_eps to execute after
update_trt_ep_support has been called, ensuring calibration_eps reflects the
final list of execution providers before generating the profile.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 19ed1a5a-2793-4772-b650-d3982467b520
📒 Files selected for processing (6)
modelopt/onnx/quantization/__main__.pymodelopt/onnx/quantization/fp8.pymodelopt/onnx/quantization/graph_utils.pymodelopt/onnx/quantization/int8.pymodelopt/onnx/quantization/ort_utils.pymodelopt/onnx/quantization/quantize.py
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1782 +/- ##
==========================================
- Coverage 77.09% 75.69% -1.41%
==========================================
Files 511 511
Lines 56168 58272 +2104
==========================================
+ Hits 43302 44107 +805
- Misses 12866 14165 +1299
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Is this TRT-RTX version specific? For the same input values to quantize() API, it works with certain TRT-RTX version and fails with other? Can you help me recap what is going to be behaviour without providing model-id? Also, do we handle for bad model-id and "missing key" cases (suppose some arch/model choosing different name for hidden_size in the config)? |
|
|
||
| from transformers import AutoConfig | ||
|
|
||
| config = AutoConfig.from_pretrained(model_id) |
There was a problem hiding this comment.
Should this call (and above) have trust_remote_code hooked up here (with default = false) - in case mode's config needs custom code?
|
There is a nosec related comment / error mentioned in a coderabbit comment above - Please check that as well. |
Signed-off-by: haoxiz <haoxiz@nvidia.com>
Signed-off-by: haoxiz <haoxiz@nvidia.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modelopt/onnx/quantization/ort_utils.py (1)
447-500: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winHandle missing config fields in
create_input_shapes_profile
AutoConfigconfigs don’t always exposehidden_size,num_attention_heads, ornum_hidden_layers, so this helper will fail with an opaqueAttributeErroron valid architectures that use other field names. Raise a clearValueError(or map the common aliases) so users can fixmodel_idor pass an explicit profile.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt/onnx/quantization/ort_utils.py` around lines 447 - 500, The create_input_shapes_profile helper assumes AutoConfig always has hidden_size, num_attention_heads, and num_hidden_layers, which can raise an opaque AttributeError for valid models. Update create_input_shapes_profile to validate these fields up front and either map common aliases or raise a clear ValueError with the model_id context before building shapes; keep the logic localized around the head_dim, num_kv_heads, num_layers, and make_shapes setup so callers get an actionable error or supported fallback.
♻️ Duplicate comments (2)
modelopt/onnx/quantization/quantize.py (2)
632-643: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winCorrectly ordered, but only reachable if the earlier block at Lines 577-579 is removed.
This block (snapshot original EPs, update EPs, then realign-or-regenerate) is the right approach and matches what a prior review round asked for. See the comment on Lines 577-579 — as written, the
elif model_id:branch here is currently unreachable whenevermodel_idis set, becauseinput_shapes_profileis already non-Noneby the time execution reaches here.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt/onnx/quantization/quantize.py` around lines 632 - 643, The calibration EP snapshot/update flow in quantize() is correct, but the create/regenerate branch is unreachable because input_shapes_profile is already populated earlier. Remove the earlier input_shapes_profile initialization in the preceding block so the existing logic here can realign via _realign_input_shapes_profile or regenerate via create_input_shapes_profile when model_id is set, using the original_calibration_eps snapshot and the updated calibration_eps.
577-579: 🎯 Functional Correctness | 🔴 Critical | ⚡ Quick winDead/duplicate profile generation drops
trust_remote_codeand computes the profile before EPs are finalized.This block regenerates
input_shapes_profilefrommodel_idbeforeupdate_trt_ep_support(...)runs (line 633) and beforetrust_remote_codeis forwarded — it callscreate_input_shapes_profile(model_id, calibration_eps)withouttrust_remote_code, defaulting it toFalseregardless of what the caller passed.Because this sets
input_shapes_profileto a non-Nonevalue, the laterelif model_id:branch (lines 639-642, which correctly passestrust_remote_codeand uses the finalizedcalibration_eps) can never execute whenmodel_idis provided — it only realigns the stale profile via_realign_input_shapes_profile. This reproduces the exact issue from a previous review round (profile computed with pre-update EP ordering) and separately drops the user'strust_remote_codeflag, which was flagged as an open question by a reviewer ("Should this call... have trust_remote_code hooked up here").Tracing
test_quantize_infers_input_profiles_after_ep_support_updateagainst this code:captured["profile_eps"]would be["cpu", "trt"](not["trt", "cpu"]) andcaptured["trust_remote_code"]would beFalse(notTrue), so both assertions should fail with the current implementation.🐛 Proposed fix — remove the early block, rely on the later corrected one
- if input_shapes_profile is None and model_id: - input_shapes_profile = create_input_shapes_profile(model_id, calibration_eps) - # quantize_static creates a shape-inferred copy at the input model's directory🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt/onnx/quantization/quantize.py` around lines 577 - 579, The early `input_shapes_profile` regeneration in `quantize()` is stale and bypasses the later corrected path: it runs before `update_trt_ep_support(...)` and drops the caller’s `trust_remote_code` by calling `create_input_shapes_profile(model_id, calibration_eps)` with defaults. Remove this premature block and let the later `elif model_id:` branch handle profile creation so `create_input_shapes_profile`, `_realign_input_shapes_profile`, and the finalized `calibration_eps`/`trust_remote_code` values are used consistently.
🧹 Nitpick comments (2)
tests/unit/onnx/quantization/test_autotune_quantization_integration.py (1)
42-108: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winMove new test imports to the top of the file.
The three new tests import
get_parser/__main__inside the function body (Lines 43, 59, 78) with no justification comment. Per repo conventions, imports belong at module top so import errors surface at collection time; in-function imports should be reserved for circular imports or optional dependencies, with a comment naming the reason.♻️ Suggested fix
+from modelopt.onnx.quantization.__main__ import get_parser +import modelopt.onnx.quantization.__main__ as quantization_cli + def test_quantization_cli_parses_inline_input_shapes_profile(): - from modelopt.onnx.quantization.__main__ import get_parser - profile = [{"nv_profile_min_shapes": "input_ids:1x1"}, {}]As per path instructions, "Imports belong at the top of the file so import errors surface at collection time, not mid-test... Put an import inside a function only when there is a concrete reason... Add a brief comment in those cases naming the reason."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/onnx/quantization/test_autotune_quantization_integration.py` around lines 42 - 108, Move the new imports used by test_quantization_cli_parses_inline_input_shapes_profile, test_quantization_cli_parses_input_shapes_profile_file, and test_quantization_cli_forwards_input_shapes_profile from inside the test bodies to the module top level so import failures are caught during collection; if any import must remain local, add a short comment in that test explaining the concrete reason. Use the existing get_parser and modelopt.onnx.quantization.__main__ references to place the imports appropriately.Source: Path instructions
tests/unit/onnx/quantization/test_quantize_api.py (1)
54-143: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winMove
importlib.import_modulecalls to the top of the file.Same pattern as in
test_autotune_quantization_integration.py:quantize_module = importlib.import_module("modelopt.onnx.quantization.quantize")is repeated inside three separate test functions (Lines 55, 67, 78) with no justification comment. Prefer a single top-levelfrom modelopt.onnx.quantization import quantize as quantize_moduleso import errors surface at collection time and the duplication is removed.As per path instructions, "Imports belong at the top of each file... Put an import inside a function only when there is a concrete reason... those should carry a brief comment naming the reason."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/onnx/quantization/test_quantize_api.py` around lines 54 - 143, The test file repeats dynamic imports inside multiple test functions, which should be moved to a top-level import. Replace the three importlib.import_module calls in the quantize API tests with a single top-level import for quantize_module so collection-time import failures are surfaced early and the duplication is removed. Update the tests that reference quantize_module to use that shared module alias consistently.Source: Path instructions
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@modelopt/onnx/quantization/ort_utils.py`:
- Around line 447-500: The create_input_shapes_profile helper assumes AutoConfig
always has hidden_size, num_attention_heads, and num_hidden_layers, which can
raise an opaque AttributeError for valid models. Update
create_input_shapes_profile to validate these fields up front and either map
common aliases or raise a clear ValueError with the model_id context before
building shapes; keep the logic localized around the head_dim, num_kv_heads,
num_layers, and make_shapes setup so callers get an actionable error or
supported fallback.
---
Duplicate comments:
In `@modelopt/onnx/quantization/quantize.py`:
- Around line 632-643: The calibration EP snapshot/update flow in quantize() is
correct, but the create/regenerate branch is unreachable because
input_shapes_profile is already populated earlier. Remove the earlier
input_shapes_profile initialization in the preceding block so the existing logic
here can realign via _realign_input_shapes_profile or regenerate via
create_input_shapes_profile when model_id is set, using the
original_calibration_eps snapshot and the updated calibration_eps.
- Around line 577-579: The early `input_shapes_profile` regeneration in
`quantize()` is stale and bypasses the later corrected path: it runs before
`update_trt_ep_support(...)` and drops the caller’s `trust_remote_code` by
calling `create_input_shapes_profile(model_id, calibration_eps)` with defaults.
Remove this premature block and let the later `elif model_id:` branch handle
profile creation so `create_input_shapes_profile`,
`_realign_input_shapes_profile`, and the finalized
`calibration_eps`/`trust_remote_code` values are used consistently.
---
Nitpick comments:
In `@tests/unit/onnx/quantization/test_autotune_quantization_integration.py`:
- Around line 42-108: Move the new imports used by
test_quantization_cli_parses_inline_input_shapes_profile,
test_quantization_cli_parses_input_shapes_profile_file, and
test_quantization_cli_forwards_input_shapes_profile from inside the test bodies
to the module top level so import failures are caught during collection; if any
import must remain local, add a short comment in that test explaining the
concrete reason. Use the existing get_parser and
modelopt.onnx.quantization.__main__ references to place the imports
appropriately.
In `@tests/unit/onnx/quantization/test_quantize_api.py`:
- Around line 54-143: The test file repeats dynamic imports inside multiple test
functions, which should be moved to a top-level import. Replace the three
importlib.import_module calls in the quantize API tests with a single top-level
import for quantize_module so collection-time import failures are surfaced early
and the duplication is removed. Update the tests that reference quantize_module
to use that shared module alias consistently.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: eada498a-a724-4907-8f69-962d3ee690f2
📒 Files selected for processing (8)
examples/windows/onnx_ptq/genai_llm/quantize.pymodelopt/onnx/quantization/__main__.pymodelopt/onnx/quantization/autotune/benchmark.pymodelopt/onnx/quantization/ort_utils.pymodelopt/onnx/quantization/quantize.pytests/unit/onnx/quantization/test_autotune_quantization_integration.pytests/unit/onnx/quantization/test_ort_utils.pytests/unit/onnx/quantization/test_quantize_api.py
|
Hi, I have resolved all issues. The context is current trt-rtx ep will require a input_shape_profile otherwise it will report error
Before int4 quantization test script has already used this parameter. I found it is also necessary for int8/fp8 so this pr I make it a global parameter. |
What does this PR do?
Add support for onnx quantization and support model_id as input, which fix missing input_shpae_profile problem for some version of trt-rtx
Usage
Testing
Tested on 4 popular llm models on all popular quantization method(int4, fp8, int8)
Before your PR is "Ready for review"
CONTRIBUTING.md: ✅Summary by CodeRabbit
model_idsupport to the ONNX PTQ CLI and quantization API, enabling automatic generation ofinput_shapes_profilewhen not provided.input_shapes_profileparsing from inline JSON or a JSON file, plustrust_remote_codefor resolving custom model code.trt/NvTensorRtRtxcalibration endpoints are used.