Skip to content

draft: Add Decoupled Scale Search for NVFP4 local_hessian calibration#1881

Draft
Fridah-nv wants to merge 1 commit into
mainfrom
fridah/mse-dss
Draft

draft: Add Decoupled Scale Search for NVFP4 local_hessian calibration#1881
Fridah-nv wants to merge 1 commit into
mainfrom
fridah/mse-dss

Conversation

@Fridah-nv

Copy link
Copy Markdown
Contributor

DSS (SOAR, arXiv:2605.12245) decouples the FP4 quantization scale s_q (high precision) from the stored FP8 dequantization scale s_d. A diagnostic (examples/llm_ptq/dss_diagnostic.py + the nvfp4_dss_diag kernel) confirmed DSS is a no-op for the plain weight-MSE L2 sweep (the existing 126-candidate FP8 sweep is already optimal, incl. saturated blocks), so DSS is wired into the non-separable local_hessian objective where it provably helps.

  • Two-scale fake-quant: nvfp4_scalar_quant_decoupled + static_blockwise_fp4_fake_quant(quant_amax=...)
  • NVFP4StaticQuantizer._quant_amax buffer (save/restore/_apply)
  • NVFP4DSSCalibrator: SOAR 2-neighbor s_d x beta-grid search under the Hessian error
  • decoupled_scale_search + dss_beta_step flags on LocalHessianCalibConfig
  • decoupled NVFP4 HF export (codes from s_q, stored FP8 scale s_d); stored format unchanged so inference runtimes need no changes
  • tests + e2e driver (examples/llm_ptq/dss_local_hessian_e2e.py)

Known limitation: DSS export/restore is wired for the HF path only; the Megatron export + TP-resize restore paths still produce coupled codes.

What does this PR do?

Type of change: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A
  • Did you get Claude approval on this PR?: ✅ / ❌ / N/A

Additional Information

DSS (SOAR, arXiv:2605.12245) decouples the FP4 quantization scale s_q (high
precision) from the stored FP8 dequantization scale s_d. A diagnostic
(examples/llm_ptq/dss_diagnostic.py + the nvfp4_dss_diag kernel) confirmed DSS is
a no-op for the plain weight-MSE L2 sweep (the existing 126-candidate FP8 sweep is
already optimal, incl. saturated blocks), so DSS is wired into the non-separable
local_hessian objective where it provably helps.

- Two-scale fake-quant: nvfp4_scalar_quant_decoupled + static_blockwise_fp4_fake_quant(quant_amax=...)
- NVFP4StaticQuantizer._quant_amax buffer (save/restore/_apply)
- NVFP4DSSCalibrator: SOAR 2-neighbor s_d x beta-grid search under the Hessian error
- decoupled_scale_search + dss_beta_step flags on LocalHessianCalibConfig
- decoupled NVFP4 HF export (codes from s_q, stored FP8 scale s_d); stored format
  unchanged so inference runtimes need no changes
- tests + e2e driver (examples/llm_ptq/dss_local_hessian_e2e.py)

Known limitation: DSS export/restore is wired for the HF path only; the Megatron
export + TP-resize restore paths still produce coupled codes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
@copy-pr-bot

copy-pr-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: af3981a5-4b1a-4eaa-9359-d75e26994b0c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fridah/mse-dss

Comment @coderabbitai help to get the list of available commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant