draft: Add Decoupled Scale Search for NVFP4 local_hessian calibration by Fridah-nv · Pull Request #1881 · NVIDIA/Model-Optimizer

Fridah-nv · 2026-07-01T23:00:26Z

DSS (SOAR, arXiv:2605.12245) decouples the FP4 quantization scale s_q (high precision) from the stored FP8 dequantization scale s_d. A diagnostic (examples/llm_ptq/dss_diagnostic.py + the nvfp4_dss_diag kernel) confirmed DSS is a no-op for the plain weight-MSE L2 sweep (the existing 126-candidate FP8 sweep is already optimal, incl. saturated blocks), so DSS is wired into the non-separable local_hessian objective where it provably helps.

Two-scale fake-quant: nvfp4_scalar_quant_decoupled + static_blockwise_fp4_fake_quant(quant_amax=...)
NVFP4StaticQuantizer._quant_amax buffer (save/restore/_apply)
NVFP4DSSCalibrator: SOAR 2-neighbor s_d x beta-grid search under the Hessian error
decoupled_scale_search + dss_beta_step flags on LocalHessianCalibConfig
decoupled NVFP4 HF export (codes from s_q, stored FP8 scale s_d); stored format unchanged so inference runtimes need no changes
tests + e2e driver (examples/llm_ptq/dss_local_hessian_e2e.py)

Known limitation: DSS export/restore is wired for the HF path only; the Megatron export + TP-resize restore paths still produce coupled codes.

What does this PR do?

Type of change: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A
Did you get Claude approval on this PR?: ✅ / ❌ / N/A

Additional Information

DSS (SOAR, arXiv:2605.12245) decouples the FP4 quantization scale s_q (high precision) from the stored FP8 dequantization scale s_d. A diagnostic (examples/llm_ptq/dss_diagnostic.py + the nvfp4_dss_diag kernel) confirmed DSS is a no-op for the plain weight-MSE L2 sweep (the existing 126-candidate FP8 sweep is already optimal, incl. saturated blocks), so DSS is wired into the non-separable local_hessian objective where it provably helps. - Two-scale fake-quant: nvfp4_scalar_quant_decoupled + static_blockwise_fp4_fake_quant(quant_amax=...) - NVFP4StaticQuantizer._quant_amax buffer (save/restore/_apply) - NVFP4DSSCalibrator: SOAR 2-neighbor s_d x beta-grid search under the Hessian error - decoupled_scale_search + dss_beta_step flags on LocalHessianCalibConfig - decoupled NVFP4 HF export (codes from s_q, stored FP8 scale s_d); stored format unchanged so inference runtimes need no changes - tests + e2e driver (examples/llm_ptq/dss_local_hessian_e2e.py) Known limitation: DSS export/restore is wired for the HF path only; the Megatron export + TP-resize restore paths still produce coupled codes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>

copy-pr-bot · 2026-07-01T23:00:29Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-07-01T23:00:33Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: af3981a5-4b1a-4eaa-9359-d75e26994b0c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fridah/mse-dss

_{Comment @coderabbitai help to get the list of available commands.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

draft: Add Decoupled Scale Search for NVFP4 local_hessian calibration#1881

draft: Add Decoupled Scale Search for NVFP4 local_hessian calibration#1881
Fridah-nv wants to merge 1 commit into
mainfrom
fridah/mse-dss

Fridah-nv commented Jul 1, 2026

Uh oh!

copy-pr-bot Bot commented Jul 1, 2026

Uh oh!

coderabbitai Bot commented Jul 1, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Fridah-nv commented Jul 1, 2026

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented Jul 1, 2026

Uh oh!

coderabbitai Bot commented Jul 1, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant