Add TensorQuantizer rotate-back mode#1879
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR adds a ChangesRotate-back mode implementation
Estimated code review effort: 4 (Complex) | ~60 minutes Sequence Diagram(s)sequenceDiagram
participant Caller
participant TensorQuantizer
participant normalized_hadamard_transform
Caller->>TensorQuantizer: forward(inputs)
alt rotate_back_is_enabled and fake_quant is false
TensorQuantizer-->>Caller: raise ValueError
else fake_quant is true
TensorQuantizer->>normalized_hadamard_transform: _rotate_inputs(inputs)
TensorQuantizer->>TensorQuantizer: quantize(inputs)
opt rotate_back_is_enabled
TensorQuantizer->>normalized_hadamard_transform: _rotate_inputs(outputs)
end
TensorQuantizer-->>Caller: return outputs
end
sequenceDiagram
participant QuantModule
participant TensorQuantizer
participant ExportFlow
QuantModule->>TensorQuantizer: _fold_weight_quantizer(weights, keep_attrs)
TensorQuantizer->>TensorQuantizer: fold weight values into tensors
TensorQuantizer->>TensorQuantizer: disable_rotate()
TensorQuantizer->>TensorQuantizer: disable()
ExportFlow->>TensorQuantizer: disable_rotate() during export
🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
a51f4da to
3c6b394
Compare
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1879 +/- ##
==========================================
+ Coverage 61.21% 61.30% +0.08%
==========================================
Files 515 515
Lines 57245 57243 -2
==========================================
+ Hits 35043 35092 +49
+ Misses 22202 22151 -51
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modelopt/torch/quantization/nn/modules/tensor_quantizer.py`:
- Around line 1110-1111: The rotate_back validation in TensorQuantizer should
happen before the first rotation path is entered, since _rotate_inputs() can
invoke normalized_hadamard_transform() and fail with lower-level errors first.
Move the unsupported rotate_back check in the TensorQuantizer flow ahead of the
initial rotation branch so the intended ValueError is raised consistently, and
update the related fake_quant=False path and test coverage to assert the early
rejection via the rotate_back/config validation rather than relying on a stubbed
transform.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6f9377bc-7e6a-4144-8fd9-109d398c314f
📒 Files selected for processing (3)
modelopt/torch/quantization/config.pymodelopt/torch/quantization/nn/modules/tensor_quantizer.pytests/unit/torch/quantization/test_tensor_quant_cpu.py
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Small, cohesive feature (+106/-9, 3 files) adding a rotate_back mode to TensorQuantizer (rotate → quantize → rotate-back for fake quant), with a clear rejection for real quantization. Code is clean and reuses existing machinery: a new _rotate_inputs helper deduplicates the two rotation call-sites, and rotate_back_is_enabled mirrors the existing rotate_is_* property pattern (dict/RotateConfig/bool backward-compat handled consistently). The field_name == "rotate" short-circuit in validate_config correctly stops the "only {'enable': True}" check from mis-firing on rotate={"enable": True} (a latent bug), and RotateConfig's own validators still cover the sub-fields.
CPU test coverage is solid: config serialization round-trip, default-path (transform called once), rotate-back path (transform called twice with matching fp32/block_size args), and the real-quant ValueError. Tests monkeypatch normalized_hadamard_transform so they don't need the fast_hadamard_transform package. No licensing concerns, no oversized diff, and no prompt-injection content in the PR fields.
Why nudge rather than approve:
- Correctness rests on the Hadamard transform being self-inverse.
rotate_backapplies the samenormalized_hadamard_transforma second time rather than an explicit inverse; this is only correct because the normalized FHT is orthogonal/symmetric (HᵀH = I). That holds for the full-dim and fixed-block paths, but the auto-selected block-size path (block_size=None→_largest_pow2_divisor(dim)) picks the block size from the tensor shape at call time — worth a human confirming the forward and rotate-back calls always resolve to the same block size so the round-trip is truly identity. The CPU tests stub out the transform, so this numerical property is not actually exercised (GPU-only). - CHANGELOG not updated. This is a user-facing
QuantizerAttributeConfig.rotate.modeaddition; the 0.46 "New Features" section has no entry for it.
realAsma
left a comment
There was a problem hiding this comment.
Simplify all the fold_weight doctoring -> no need of mentioning rotate functionality in particular - it is a Lower level details obvious in the code.
realAsma
left a comment
There was a problem hiding this comment.
There are a lot of unittests; can we simplify or consolidate any of them?
are there any low level tests which are already covered by high-level tests - if yes should we remove them?
We can keep the unittests that are essential for testing the correctness or expected behavior
| enable: bool = False | ||
| mode: Literal["rotate", "rotate_back"] = "rotate" | ||
| rotate_fp32: bool = False | ||
| block_size: int | None = None |
There was a problem hiding this comment.
Can we add doc for these fields? Not clear what these are for
There was a problem hiding this comment.
🧹 Nitpick comments (1)
examples/llm_eval/quantization_utils.py (1)
66-71: 🎯 Functional Correctness | 🔵 Trivial | 💤 Low valueTrack the embed_tokens rotate workaround.
The
TODOdocuments a known rotate issue for the embedding token path that's being worked around by force-disabling rotate/quantization for*embed_tokens*quantizerrather than fixing the root cause.Do you want me to open a follow-up issue to investigate the root cause of the rotate incompatibility with the embedding path?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/llm_eval/quantization_utils.py` around lines 66 - 71, The embed_tokens rotate workaround in quantization_utils is currently only documented with a TODO, so it should be tracked with a real follow-up issue instead of an inline note. Update the embed_tokens quantizer entry in the quantization config to reference the tracking ticket or issue ID, and keep the workaround description concise so the intent is clear for the *embed_tokens* quantizer path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@examples/llm_eval/quantization_utils.py`:
- Around line 66-71: The embed_tokens rotate workaround in quantization_utils is
currently only documented with a TODO, so it should be tracked with a real
follow-up issue instead of an inline note. Update the embed_tokens quantizer
entry in the quantization config to reference the tracking ticket or issue ID,
and keep the workaround description concise so the intent is clear for the
*embed_tokens* quantizer path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 4f889424-843f-4693-8d5f-656c5e1c9110
📒 Files selected for processing (12)
examples/llm_eval/quantization_utils.pymodelopt/torch/export/plugins/vllm_fakequant_hf.pymodelopt/torch/quantization/config.pymodelopt/torch/quantization/model_quant.pymodelopt/torch/quantization/nn/functional.pymodelopt/torch/quantization/nn/modules/quant_module.pymodelopt/torch/quantization/nn/modules/tensor_quantizer.pymodelopt/torch/quantization/plugins/huggingface.pymodelopt/torch/quantization/plugins/vllm.pytests/unit/torch/quantization/plugins/test_fused_experts.pytests/unit/torch/quantization/test_print.pytests/unit/torch/quantization/test_tensor_quant_cpu.py
✅ Files skipped from review due to trivial changes (1)
- modelopt/torch/quantization/model_quant.py
🚧 Files skipped from review as they are similar to previous changes (2)
- modelopt/torch/quantization/config.py
- modelopt/torch/quantization/nn/modules/tensor_quantizer.py
52b366a to
960716f
Compare
Signed-off-by: realAsma <akuriparambi@nvidia.com>
Signed-off-by: realAsma <akuriparambi@nvidia.com>
Signed-off-by: realAsma <akuriparambi@nvidia.com>
Signed-off-by: realAsma <akuriparambi@nvidia.com>
Signed-off-by: realAsma <akuriparambi@nvidia.com>
Signed-off-by: realAsma <akuriparambi@nvidia.com>
Signed-off-by: realAsma <akuriparambi@nvidia.com>
Signed-off-by: realAsma <akuriparambi@nvidia.com>
960716f to
f5451f2
Compare
Summary
RotateConfig.modewith the existingrotatebehavior as the defaultrotate={"enable": true, "mode": "rotate_back"}for rotate -> quantize -> rotate back in fake quantizationNVFP4_W4A4_ROTATEeval config with the config-level embed-token rotate workaround and TODO for the longer-term fixLatest Update
fold_weightapplied rotation into the stored weight, disabled the quantizer, but left rotation enabled, causing the folded weight to be rotated a second time on later use*embed_tokens*quantizerhas rotate explicitly disabled because the embedding token/input path creates the rotate issue; a TODO tracks replacing this with a better fixExample
rotate_backselects rotate -> quantize -> rotate-back. The existing behavior stays asrotate=Trueorrotate={"enable": True, "mode": "rotate"}.rotate_backis supported for fake quant only; real quantization raises clearly instead of silently skipping the inverse rotation.Validation
pre-commit run --files modelopt/torch/quantization/config.py modelopt/torch/quantization/nn/modules/tensor_quantizer.py tests/unit/torch/quantization/test_tensor_quant_cpu.pypytest_pwd tests/unit/torch/quantization/test_tensor_quant_cpu.py -q --tb=short -xpython_pwd -m py_compile examples/llm_eval/quantization_utils.pypre-commit run --files examples/llm_eval/quantization_utils.pyEval Results
Qwen/Qwen3-8B MMLU eval from
examples/llm_eval/lm_eval_hf.pyusinglm-eval==0.4.10,mmlu, 5-shot,batch_size=4,calib_size=512, andcalib_batch_size=4for quantized runs.NVFP4_DEFAULT_CFGNVFP4_W4A4_ROTATENVFP4_W4A4_ROTATE_ROTATE_BACKThe previous near-random W4A4 rotate diagnostics were removed from this table. The corrected W4A4 rotate result is
0.6916 +/- 0.0037; split checks isolated the issue toembed_tokensretaining rotate in the token/input path.Full run logs, status files, GPU snapshots, and JSON outputs are retained in the private BeeBot artifact bundles for the Qwen3-8B eval/debug runs.
Review
Summary by CodeRabbit
New Features
Bug Fixes