draft: Add Decoupled Scale Search for NVFP4 local_hessian calibration#1881
Draft
Fridah-nv wants to merge 1 commit into
Draft
draft: Add Decoupled Scale Search for NVFP4 local_hessian calibration#1881Fridah-nv wants to merge 1 commit into
Fridah-nv wants to merge 1 commit into
Conversation
DSS (SOAR, arXiv:2605.12245) decouples the FP4 quantization scale s_q (high precision) from the stored FP8 dequantization scale s_d. A diagnostic (examples/llm_ptq/dss_diagnostic.py + the nvfp4_dss_diag kernel) confirmed DSS is a no-op for the plain weight-MSE L2 sweep (the existing 126-candidate FP8 sweep is already optimal, incl. saturated blocks), so DSS is wired into the non-separable local_hessian objective where it provably helps. - Two-scale fake-quant: nvfp4_scalar_quant_decoupled + static_blockwise_fp4_fake_quant(quant_amax=...) - NVFP4StaticQuantizer._quant_amax buffer (save/restore/_apply) - NVFP4DSSCalibrator: SOAR 2-neighbor s_d x beta-grid search under the Hessian error - decoupled_scale_search + dss_beta_step flags on LocalHessianCalibConfig - decoupled NVFP4 HF export (codes from s_q, stored FP8 scale s_d); stored format unchanged so inference runtimes need no changes - tests + e2e driver (examples/llm_ptq/dss_local_hessian_e2e.py) Known limitation: DSS export/restore is wired for the HF path only; the Megatron export + TP-resize restore paths still produce coupled codes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Contributor
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
DSS (SOAR, arXiv:2605.12245) decouples the FP4 quantization scale s_q (high precision) from the stored FP8 dequantization scale s_d. A diagnostic (examples/llm_ptq/dss_diagnostic.py + the nvfp4_dss_diag kernel) confirmed DSS is a no-op for the plain weight-MSE L2 sweep (the existing 126-candidate FP8 sweep is already optimal, incl. saturated blocks), so DSS is wired into the non-separable local_hessian objective where it provably helps.
Known limitation: DSS export/restore is wired for the HF path only; the Megatron export + TP-resize restore paths still produce coupled codes.
What does this PR do?
Type of change: ?
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information