Add recipe used for Qwen3.5 397B NVFP4 V2 checkpoint by sugunav14 · Pull Request #1868 · NVIDIA/Model-Optimizer

sugunav14 · 2026-06-30T17:24:49Z

What does this PR do?

Type of change: New example (model-specific PTQ recipe)

Adds the built-in PTQ recipe used to produce the Qwen3.5 397B NVFP4 V2 checkpoint to the model-specific recipe registry at modelopt_recipes/huggingface/qwen3_5_moe/ptq/nvfp4_experts-fp8_rest-kv_fp8_mse.yaml. The recipe applies a mixed NVFP4/FP8 scheme to the qwen3_5_moe architecture: NVFP4 (MSE/FP8-scale-sweep static weights, dynamic inputs) on the LM routed experts; ModelOpt-default FP8 (W8A8 per-tensor static, max-calibrated) on every other Linear layer; FP8 KV cache; MTP block left in BF16. The fp8_scale_sweep MSE refinement is scoped to the static NVFP4 weights only — all FP8 / dynamic / KV quantizers stay max-calibrated.

Usage

# Select via --recipe (path relative to modelopt_recipes/)
python hf_ptq.py \
  --pyt_ckpt_path <Qwen3.5-397B-MoE> \
  --recipe huggingface/qwen3_5_moe/ptq/nvfp4_experts_mse-fp8_rest-kv_fp8.yaml \
  --export_path <output_dir>

Testing

Recipe was used to generate the Qwen3.5 397B NVFP4 V2 checkpoint.

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: N/A
Did you update Changelog?: N/A
Did you get Claude approval on this PR?: ❌

Summary by CodeRabbit

New Features
- Added a new post-training quantization recipe for Qwen3_5 MoE models.
- Supports FP8 KV cache and tailored quantization settings for routed experts and other selected model components.
- Includes MSE-based quantization settings with FP8 calibration behavior for improved deployment flexibility.

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

coderabbitai · 2026-06-30T17:26:12Z

📝 Walkthrough

Walkthrough

A new PTQ recipe YAML configuration is added for the qwen3_5_moe model. It defines MSE-based quantization with FP8 scale sweep, applies NVFP4 to LM routed expert weight and input quantizers, enables FP8 KV cache, and disables BF16 quantizers for specific attention, visual, and mtp modules.

Changes

PTQ Recipe Configuration

Layer / File(s)	Summary
NVFP4 experts + FP8 + KV cache PTQ recipe `modelopt_recipes/huggingface/qwen3_5_moe/ptq/nvfp4_experts_mse-fp8_rest-kv_fp8.yaml`	New YAML recipe with license header, documentation comments, imports/metadata, MSE algorithm with FP8 scale sweep, and quant_cfg defining NVFP4 static weights/dynamic inputs for LM routed experts, FP8 KV cache, and disabled BF16 quantizers for attention, visual, and mtp modules.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly matches the main change: adding a PTQ recipe for the Qwen3.5 397B NVFP4 V2 checkpoint.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	The PR only adds a YAML PTQ recipe; no Python, dependency, or security-sensitive patterns like torch.load(weights_only=False), allow_pickle=True, trust_remote_code=True, eval/exec, or nosec were ad...
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch svelury/qwen3p5_397b_moe_recipe

_{Comment @coderabbitai help to get the list of available commands.}

sugunav14 · 2026-06-30T17:28:44Z

/claude review

coderabbitai

🧹 Nitpick comments (1)

modelopt_recipes/huggingface/qwen3_5_moe/ptq/nvfp4_experts-fp8_rest-kv_fp8_mse.yaml (1)
37-60: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add a focused regression test for this recipe.

This recipe’s behavior depends on several glob matches and on the disable rules being applied last, so a small naming or ordering change could silently quantize the wrong modules. Please add a recipe-level test that loads this YAML and asserts the intended contracts: expert weights use static NVFP4, expert inputs use dynamic NVFP4, KV cache stays FP8, and linear_attn.in_proj_a, linear_attn.in_proj_b, visual, and mtp remain disabled. As per coding guidelines, "For any new features / examples, make sure to add tests and that the coverage check in your PR passes."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@modelopt_recipes/huggingface/qwen3_5_moe/ptq/nvfp4_experts-fp8_rest-kv_fp8_mse.yaml`
around lines 37 - 60, Add a focused recipe-level regression test for this
Qwen3.5 MoE PTQ YAML that loads the recipe and verifies the intended
quantization contract. Use the recipe identifier from the quantize config and
assert the glob-based rules still resolve correctly: expert weight quantizers
match the NVFP4 static config, expert input quantizers match NVFP4 dynamic
config, KV cache uses FP8, and the final disable rules keep
linear_attn.in_proj_a, linear_attn.in_proj_b, visual, and mtp disabled. Place
the test alongside the existing recipe tests and ensure it fails if the ordering
of the quant_cfg entries changes or the glob patterns stop matching.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@modelopt_recipes/huggingface/qwen3_5_moe/ptq/nvfp4_experts-fp8_rest-kv_fp8_mse.yaml`:
- Around line 37-60: Add a focused recipe-level regression test for this Qwen3.5
MoE PTQ YAML that loads the recipe and verifies the intended quantization
contract. Use the recipe identifier from the quantize config and assert the
glob-based rules still resolve correctly: expert weight quantizers match the
NVFP4 static config, expert input quantizers match NVFP4 dynamic config, KV
cache uses FP8, and the final disable rules keep linear_attn.in_proj_a,
linear_attn.in_proj_b, visual, and mtp disabled. Place the test alongside the
existing recipe tests and ensure it fails if the ordering of the quant_cfg
entries changes or the glob patterns stop matching.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 920c89ef-337b-4e60-92f4-1c17c8015239

📥 Commits

Reviewing files that changed from the base of the PR and between 43c2034 and b74bbea.

📒 Files selected for processing (1)

modelopt_recipes/huggingface/qwen3_5_moe/ptq/nvfp4_experts-fp8_rest-kv_fp8_mse.yaml

cjluo-nv

Bot review — DM the bot to share feedback.

Small additive PR (+60, 1 file): adds a Qwen3.5-MoE-specific PTQ recipe (huggingface/qwen3_5_moe/ptq/nvfp4_experts-fp8_rest-kv_fp8_mse.yaml) to the existing built-in recipe registry. No prompt-injection attempts in the untrusted content.

What I verified:

All 6 $import targets resolve to existing files (base_disable_all, default_disabled_quantizers, w8a8_fp8_fp8, nvfp4, nvfp4_static, kv_fp8).
Structure matches established siblings — it is essentially the general/ptq/nvfp4_experts_only_mse-kv_fp8_cast.yaml pattern (method: mse + fp8_scale_sweep: true + layerwise: false algorithm dict) plus an FP8 base layer (w8a8_fp8_fp8) for non-expert Linears and kv_fp8 instead of cast mode. nvfp4_static = static weight scales, nvfp4 = dynamic inputs, matching the stated intent.
Ordering is correct — quant_cfg is last-wins per the documented semantics; BF16 exclusions (mtp, visual, linear_attn.in_proj_*) are placed last so they win.
Placement/naming conform to modelopt_recipes/huggingface/README.md conventions (per-task README is optional).
Design review does not apply — this is a single data file consuming a pre-existing, documented recipe subsystem, not a new abstraction.
Licensing clean — new file's only license content is the standard Apache-2.0 NVIDIA header, matching LICENSE_HEADER exactly (safe exception).

Why nudge rather than approve:

No automated test, and the recipe can't be functionally exercised in CI (used to produce a real 397B-parameter checkpoint, GPU-only). This is consistent with how sibling recipes were merged, but an owner should confirm the quantizer-name glob patterns (e.g. '*language_model*mlp.experts.*weight_quantizer', '*mtp*', '*visual*') actually match the Qwen3.5-MoE module tree — that's the one correctness aspect not verifiable from the repo alone.
Minor: the leading comment calls this the "MSE variant of qwen3p5_mixed_nvfp4_fp8.yaml", but that referenced file doesn't appear in the repo (likely an internal/non-shipped recipe). Non-functional doc comment, but worth a quick fix or clarification.

codecov · 2026-06-30T17:34:11Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.86%. Comparing base (52f1ccb) to head (fb18d68).
⚠️ Report is 128 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1868      +/-   ##
==========================================
- Coverage   77.49%   72.86%   -4.64%     
==========================================
  Files         489      516      +27     
  Lines       54415    65036   +10621     
==========================================
+ Hits        42169    47388    +5219     
- Misses      12246    17648    +5402

Flag	Coverage Δ
unit	`54.86% <ø> (+0.85%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

claude · 2026-06-30T17:35:17Z

+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# MSE variant of qwen3p5_mixed_nvfp4_fp8.yaml.


[SUGGESTION] This header describes the recipe as the "MSE variant of qwen3p5_mixed_nvfp4_fp8.yaml", but no file by that name exists anywhere in the repo (the only matches for qwen3p5_mixed/mixed_nvfp4_fp8 are this file and an unrelated test). A maintainer trying to locate the non-MSE base recipe will hit a dead reference.

(1) The comment points at a file that doesn't exist; (2) it makes the relationship between recipes harder to follow and risks rotting further; (3) either drop the cross-reference or point it at the actual base recipe (e.g. the sibling w4a16_nvfp4-fp8_attn-kv_fp8_cast.yaml) if that's the intended counterpart. Everything else in the recipe — the $import targets, the mse/fp8_scale_sweep algorithm fields, and the documented "FP8 layers stay max-calibrated" behavior — checks out.

claude

Claude review passed — no blocking issues found. LGTM

Findings: CRITICAL: 0, IMPORTANT: 0, SUGGESTION: 1

This is a single-file PR adding one PTQ recipe (nvfp4_experts-fp8_rest-kv_fp8_mse.yaml). I reviewed the recipe in full and traced its imports and algorithm semantics through the source.

What I verified:

All 6 $import targets resolve to existing config units (base_disable_all, default_disabled_quantizers, w8a8_fp8_fp8, nvfp4, nvfp4_static, kv_fp8).
Schema validity — method: mse, fp8_scale_sweep, and layerwise are all valid MseCalibConfig fields (modelopt/torch/quantization/config.py).
Central claim confirmed in code — the header/comments state that fp8_scale_sweep: true refines only static NVFP4 weights with MSE while FP8 / dynamic / KV quantizers stay max-calibrated. _make_weight_mse_calibrator (model_calib.py:455-471) returns None for any quantizer that isn't a static NVFP4 weight or registered FP8-sweep backend, so those keep their max-calibrated amax. Behavior matches the documentation exactly.
Override ordering — base_disable_all → FP8 base → NVFP4 expert overrides via cfg → KV → exclusions-last follows the established last-wins pattern used by the validated sibling recipe w4a16_nvfp4-fp8_attn-kv_fp8_cast.quant_cfg.yaml.
Recipes are pre-commit-validated via load_recipe() (tools/precommit/check_modelopt_recipes.py), so structural/import errors would be caught at commit time.

Only finding (non-blocking): the header comment references qwen3p5_mixed_nvfp4_fp8.yaml as the non-MSE base, but no such file exists in the repo — a stale/dead cross-reference. Posted inline.

Backward compatibility: additive only (new file, no schema or default changes). Risk level: low.

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

Move the `mse` modifier from the kv group to the nvfp4 group it actually refines, matching the convention in modelopt_recipes/general/ptq (e.g. nvfp4_experts_only_mse-kv_fp8_cast.yaml). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

cjluo-nv

Bot review — DM the bot to share feedback.

Small additive PR (+59, 1 file): adds a Qwen3.5-MoE-specific PTQ recipe under modelopt_recipes/huggingface/qwen3_5_moe/ptq/ to the existing built-in recipe registry. This consumes the pre-existing, documented modelopt.recipe subsystem rather than introducing a new abstraction, so design review doesn't apply. No prompt-injection attempts in the untrusted content.

What I verified:

All 6 $import targets resolve to existing files (base_disable_all, default_disabled_quantizers, w8a8_fp8_fp8, nvfp4, nvfp4_static, kv_fp8).
Structure matches the validated sibling general/ptq/nvfp4_experts_only_mse-kv_fp8_cast.yaml: method: mse + fp8_scale_sweep: true + layerwise: false, plus an FP8 base (w8a8_fp8_fp8) for non-expert Linears and kv_fp8. nvfp4_static = static weight scales, nvfp4 = dynamic inputs — matches the stated intent.
Last-wins ordering is correct: BF16 exclusions (*mtp*, *visual*, *linear_attn.in_proj_*) placed last so they win.
Licensing clean: the new file's only license content is the standard NVIDIA Apache-2.0 header, matching LICENSE_HEADER exactly (safe exception).

Previous review comments:

CodeRabbit's "add a regression test" (minor): still unaddressed, but consistent with how sibling recipes were merged (GPU-only 397B path, not CI-exercisable).
💬 Claude's "stale qwen3p5_mixed_nvfp4_fp8.yaml cross-reference" (minor): appears resolved — the current file header has no such reference.

Why nudge rather than approve:

No automated test, and the recipe can't be functionally exercised in CI (used to produce a real 397B-parameter GPU-only checkpoint). An owner should confirm the quantizer-name glob patterns ('*language_model*mlp.experts.*weight_quantizer', '*mtp*', '*visual*', '*linear_attn.in_proj_*') actually match the Qwen3.5-MoE module tree — the one correctness aspect not verifiable from the repo alone.
Minor doc drift: the committed filename is nvfp4_experts_mse-fp8_rest-kv_fp8.yaml, but the PR title and CodeRabbit reference nvfp4_experts-fp8_rest-kv_fp8_mse.yaml, and the PR body's --recipe usage example cites yet another spelling. Worth reconciling so copy-pasted --recipe paths resolve.

added recipe used for Qwen3.5 397B NVFP4 V2 checkpoint

b74bbea

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

sugunav14 requested a review from a team as a code owner June 30, 2026 17:24

sugunav14 requested a review from h-guo18 June 30, 2026 17:24

sugunav14 requested a review from cjluo-nv June 30, 2026 17:28

coderabbitai Bot reviewed Jun 30, 2026

View reviewed changes

coderabbitai Bot approved these changes Jun 30, 2026

View reviewed changes

cjluo-nv reviewed Jun 30, 2026

View reviewed changes

claude Bot reviewed Jun 30, 2026

View reviewed changes

claude Bot approved these changes Jun 30, 2026

View reviewed changes

sugunav14 and others added 2 commits June 30, 2026 17:36

modelopt review bot suggestion

bcbc2e3

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

cjluo-nv reviewed Jul 1, 2026

View reviewed changes

cjluo-nv approved these changes Jul 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add recipe used for Qwen3.5 397B NVFP4 V2 checkpoint#1868

Add recipe used for Qwen3.5 397B NVFP4 V2 checkpoint#1868
sugunav14 wants to merge 3 commits into
mainfrom
svelury/qwen3p5_397b_moe_recipe

sugunav14 commented Jun 30, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 30, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

sugunav14 commented Jun 30, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

cjluo-nv left a comment

Uh oh!

codecov Bot commented Jun 30, 2026 •

edited

Loading

Uh oh!

claude Bot Jun 30, 2026

Uh oh!

claude Bot left a comment

Uh oh!

cjluo-nv left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sugunav14 commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

sugunav14 commented Jun 30, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

claude Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sugunav14 commented Jun 30, 2026 •

edited

Loading

coderabbitai Bot commented Jun 30, 2026 •

edited

Loading

codecov Bot commented Jun 30, 2026 •

edited

Loading