Skip to content

Add recipe used for Qwen3.5 397B NVFP4 V2 checkpoint#1868

Open
sugunav14 wants to merge 3 commits into
mainfrom
svelury/qwen3p5_397b_moe_recipe
Open

Add recipe used for Qwen3.5 397B NVFP4 V2 checkpoint#1868
sugunav14 wants to merge 3 commits into
mainfrom
svelury/qwen3p5_397b_moe_recipe

Conversation

@sugunav14

@sugunav14 sugunav14 commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Type of change: New example (model-specific PTQ recipe)

Adds the built-in PTQ recipe used to produce the Qwen3.5 397B NVFP4 V2 checkpoint to the model-specific recipe registry at modelopt_recipes/huggingface/qwen3_5_moe/ptq/nvfp4_experts-fp8_rest-kv_fp8_mse.yaml. The recipe applies a mixed NVFP4/FP8 scheme to the qwen3_5_moe architecture: NVFP4 (MSE/FP8-scale-sweep static weights, dynamic inputs) on the LM routed experts; ModelOpt-default FP8 (W8A8 per-tensor static, max-calibrated) on every other Linear layer; FP8 KV cache; MTP block left in BF16. The fp8_scale_sweep MSE refinement is scoped to the static NVFP4 weights only — all FP8 / dynamic / KV quantizers stay max-calibrated.

Usage

# Select via --recipe (path relative to modelopt_recipes/)
python hf_ptq.py \
  --pyt_ckpt_path <Qwen3.5-397B-MoE> \
  --recipe huggingface/qwen3_5_moe/ptq/nvfp4_experts_mse-fp8_rest-kv_fp8.yaml \
  --export_path <output_dir>

Testing

Recipe was used to generate the Qwen3.5 397B NVFP4 V2 checkpoint.

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: N/A
  • Did you update Changelog?: N/A
  • Did you get Claude approval on this PR?: ❌

Summary by CodeRabbit

  • New Features
    • Added a new post-training quantization recipe for Qwen3_5 MoE models.
    • Supports FP8 KV cache and tailored quantization settings for routed experts and other selected model components.
    • Includes MSE-based quantization settings with FP8 calibration behavior for improved deployment flexibility.

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
@sugunav14 sugunav14 requested a review from a team as a code owner June 30, 2026 17:24
@sugunav14 sugunav14 requested a review from h-guo18 June 30, 2026 17:24
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

A new PTQ recipe YAML configuration is added for the qwen3_5_moe model. It defines MSE-based quantization with FP8 scale sweep, applies NVFP4 to LM routed expert weight and input quantizers, enables FP8 KV cache, and disables BF16 quantizers for specific attention, visual, and mtp modules.

Changes

PTQ Recipe Configuration

Layer / File(s) Summary
NVFP4 experts + FP8 + KV cache PTQ recipe
modelopt_recipes/huggingface/qwen3_5_moe/ptq/nvfp4_experts_mse-fp8_rest-kv_fp8.yaml
New YAML recipe with license header, documentation comments, imports/metadata, MSE algorithm with FP8 scale sweep, and quant_cfg defining NVFP4 static weights/dynamic inputs for LM routed experts, FP8 KV cache, and disabled BF16 quantizers for attention, visual, and mtp modules.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly matches the main change: adding a PTQ recipe for the Qwen3.5 397B NVFP4 V2 checkpoint.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed The PR only adds a YAML PTQ recipe; no Python, dependency, or security-sensitive patterns like torch.load(weights_only=False), allow_pickle=True, trust_remote_code=True, eval/exec, or nosec were ad...
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch svelury/qwen3p5_397b_moe_recipe

Comment @coderabbitai help to get the list of available commands.

@sugunav14 sugunav14 requested a review from cjluo-nv June 30, 2026 17:28
@sugunav14

Copy link
Copy Markdown
Contributor Author

/claude review

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
modelopt_recipes/huggingface/qwen3_5_moe/ptq/nvfp4_experts-fp8_rest-kv_fp8_mse.yaml (1)

37-60: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add a focused regression test for this recipe.

This recipe’s behavior depends on several glob matches and on the disable rules being applied last, so a small naming or ordering change could silently quantize the wrong modules. Please add a recipe-level test that loads this YAML and asserts the intended contracts: expert weights use static NVFP4, expert inputs use dynamic NVFP4, KV cache stays FP8, and linear_attn.in_proj_a, linear_attn.in_proj_b, visual, and mtp remain disabled. As per coding guidelines, "For any new features / examples, make sure to add tests and that the coverage check in your PR passes."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@modelopt_recipes/huggingface/qwen3_5_moe/ptq/nvfp4_experts-fp8_rest-kv_fp8_mse.yaml`
around lines 37 - 60, Add a focused recipe-level regression test for this
Qwen3.5 MoE PTQ YAML that loads the recipe and verifies the intended
quantization contract. Use the recipe identifier from the quantize config and
assert the glob-based rules still resolve correctly: expert weight quantizers
match the NVFP4 static config, expert input quantizers match NVFP4 dynamic
config, KV cache uses FP8, and the final disable rules keep
linear_attn.in_proj_a, linear_attn.in_proj_b, visual, and mtp disabled. Place
the test alongside the existing recipe tests and ensure it fails if the ordering
of the quant_cfg entries changes or the glob patterns stop matching.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@modelopt_recipes/huggingface/qwen3_5_moe/ptq/nvfp4_experts-fp8_rest-kv_fp8_mse.yaml`:
- Around line 37-60: Add a focused recipe-level regression test for this Qwen3.5
MoE PTQ YAML that loads the recipe and verifies the intended quantization
contract. Use the recipe identifier from the quantize config and assert the
glob-based rules still resolve correctly: expert weight quantizers match the
NVFP4 static config, expert input quantizers match NVFP4 dynamic config, KV
cache uses FP8, and the final disable rules keep linear_attn.in_proj_a,
linear_attn.in_proj_b, visual, and mtp disabled. Place the test alongside the
existing recipe tests and ensure it fails if the ordering of the quant_cfg
entries changes or the glob patterns stop matching.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 920c89ef-337b-4e60-92f4-1c17c8015239

📥 Commits

Reviewing files that changed from the base of the PR and between 43c2034 and b74bbea.

📒 Files selected for processing (1)
  • modelopt_recipes/huggingface/qwen3_5_moe/ptq/nvfp4_experts-fp8_rest-kv_fp8_mse.yaml

@cjluo-nv cjluo-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Small additive PR (+60, 1 file): adds a Qwen3.5-MoE-specific PTQ recipe (huggingface/qwen3_5_moe/ptq/nvfp4_experts-fp8_rest-kv_fp8_mse.yaml) to the existing built-in recipe registry. No prompt-injection attempts in the untrusted content.

What I verified:

  • All 6 $import targets resolve to existing files (base_disable_all, default_disabled_quantizers, w8a8_fp8_fp8, nvfp4, nvfp4_static, kv_fp8).
  • Structure matches established siblings — it is essentially the general/ptq/nvfp4_experts_only_mse-kv_fp8_cast.yaml pattern (method: mse + fp8_scale_sweep: true + layerwise: false algorithm dict) plus an FP8 base layer (w8a8_fp8_fp8) for non-expert Linears and kv_fp8 instead of cast mode. nvfp4_static = static weight scales, nvfp4 = dynamic inputs, matching the stated intent.
  • Ordering is correctquant_cfg is last-wins per the documented semantics; BF16 exclusions (mtp, visual, linear_attn.in_proj_*) are placed last so they win.
  • Placement/naming conform to modelopt_recipes/huggingface/README.md conventions (per-task README is optional).
  • Design review does not apply — this is a single data file consuming a pre-existing, documented recipe subsystem, not a new abstraction.
  • Licensing clean — new file's only license content is the standard Apache-2.0 NVIDIA header, matching LICENSE_HEADER exactly (safe exception).

Why nudge rather than approve:

  • No automated test, and the recipe can't be functionally exercised in CI (used to produce a real 397B-parameter checkpoint, GPU-only). This is consistent with how sibling recipes were merged, but an owner should confirm the quantizer-name glob patterns (e.g. '*language_model*mlp.experts.*weight_quantizer', '*mtp*', '*visual*') actually match the Qwen3.5-MoE module tree — that's the one correctness aspect not verifiable from the repo alone.
  • Minor: the leading comment calls this the "MSE variant of qwen3p5_mixed_nvfp4_fp8.yaml", but that referenced file doesn't appear in the repo (likely an internal/non-shipped recipe). Non-functional doc comment, but worth a quick fix or clarification.

@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.86%. Comparing base (52f1ccb) to head (fb18d68).
⚠️ Report is 128 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1868      +/-   ##
==========================================
- Coverage   77.49%   72.86%   -4.64%     
==========================================
  Files         489      516      +27     
  Lines       54415    65036   +10621     
==========================================
+ Hits        42169    47388    +5219     
- Misses      12246    17648    +5402     
Flag Coverage Δ
unit 54.86% <ø> (+0.85%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

# See the License for the specific language governing permissions and
# limitations under the License.

# MSE variant of qwen3p5_mixed_nvfp4_fp8.yaml.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SUGGESTION] This header describes the recipe as the "MSE variant of qwen3p5_mixed_nvfp4_fp8.yaml", but no file by that name exists anywhere in the repo (the only matches for qwen3p5_mixed/mixed_nvfp4_fp8 are this file and an unrelated test). A maintainer trying to locate the non-MSE base recipe will hit a dead reference.

(1) The comment points at a file that doesn't exist; (2) it makes the relationship between recipes harder to follow and risks rotting further; (3) either drop the cross-reference or point it at the actual base recipe (e.g. the sibling w4a16_nvfp4-fp8_attn-kv_fp8_cast.yaml) if that's the intended counterpart. Everything else in the recipe — the $import targets, the mse/fp8_scale_sweep algorithm fields, and the documented "FP8 layers stay max-calibrated" behavior — checks out.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review passed — no blocking issues found. LGTM

Findings: CRITICAL: 0, IMPORTANT: 0, SUGGESTION: 1

This is a single-file PR adding one PTQ recipe (nvfp4_experts-fp8_rest-kv_fp8_mse.yaml). I reviewed the recipe in full and traced its imports and algorithm semantics through the source.

What I verified:

  • All 6 $import targets resolve to existing config units (base_disable_all, default_disabled_quantizers, w8a8_fp8_fp8, nvfp4, nvfp4_static, kv_fp8).
  • Schema validitymethod: mse, fp8_scale_sweep, and layerwise are all valid MseCalibConfig fields (modelopt/torch/quantization/config.py).
  • Central claim confirmed in code — the header/comments state that fp8_scale_sweep: true refines only static NVFP4 weights with MSE while FP8 / dynamic / KV quantizers stay max-calibrated. _make_weight_mse_calibrator (model_calib.py:455-471) returns None for any quantizer that isn't a static NVFP4 weight or registered FP8-sweep backend, so those keep their max-calibrated amax. Behavior matches the documentation exactly.
  • Override orderingbase_disable_all → FP8 base → NVFP4 expert overrides via cfg → KV → exclusions-last follows the established last-wins pattern used by the validated sibling recipe w4a16_nvfp4-fp8_attn-kv_fp8_cast.quant_cfg.yaml.
  • Recipes are pre-commit-validated via load_recipe() (tools/precommit/check_modelopt_recipes.py), so structural/import errors would be caught at commit time.

Only finding (non-blocking): the header comment references qwen3p5_mixed_nvfp4_fp8.yaml as the non-MSE base, but no such file exists in the repo — a stale/dead cross-reference. Posted inline.

Backward compatibility: additive only (new file, no schema or default changes). Risk level: low.

sugunav14 and others added 2 commits June 30, 2026 17:36
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Move the `mse` modifier from the kv group to the nvfp4 group it
actually refines, matching the convention in modelopt_recipes/general/ptq
(e.g. nvfp4_experts_only_mse-kv_fp8_cast.yaml).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

@cjluo-nv cjluo-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Small additive PR (+59, 1 file): adds a Qwen3.5-MoE-specific PTQ recipe under modelopt_recipes/huggingface/qwen3_5_moe/ptq/ to the existing built-in recipe registry. This consumes the pre-existing, documented modelopt.recipe subsystem rather than introducing a new abstraction, so design review doesn't apply. No prompt-injection attempts in the untrusted content.

What I verified:

  • All 6 $import targets resolve to existing files (base_disable_all, default_disabled_quantizers, w8a8_fp8_fp8, nvfp4, nvfp4_static, kv_fp8).
  • Structure matches the validated sibling general/ptq/nvfp4_experts_only_mse-kv_fp8_cast.yaml: method: mse + fp8_scale_sweep: true + layerwise: false, plus an FP8 base (w8a8_fp8_fp8) for non-expert Linears and kv_fp8. nvfp4_static = static weight scales, nvfp4 = dynamic inputs — matches the stated intent.
  • Last-wins ordering is correct: BF16 exclusions (*mtp*, *visual*, *linear_attn.in_proj_*) placed last so they win.
  • Licensing clean: the new file's only license content is the standard NVIDIA Apache-2.0 header, matching LICENSE_HEADER exactly (safe exception).

Previous review comments:

  • CodeRabbit's "add a regression test" (minor): still unaddressed, but consistent with how sibling recipes were merged (GPU-only 397B path, not CI-exercisable).
  • 💬 Claude's "stale qwen3p5_mixed_nvfp4_fp8.yaml cross-reference" (minor): appears resolved — the current file header has no such reference.

Why nudge rather than approve:

  • No automated test, and the recipe can't be functionally exercised in CI (used to produce a real 397B-parameter GPU-only checkpoint). An owner should confirm the quantizer-name glob patterns ('*language_model*mlp.experts.*weight_quantizer', '*mtp*', '*visual*', '*linear_attn.in_proj_*') actually match the Qwen3.5-MoE module tree — the one correctness aspect not verifiable from the repo alone.
  • Minor doc drift: the committed filename is nvfp4_experts_mse-fp8_rest-kv_fp8.yaml, but the PR title and CodeRabbit reference nvfp4_experts-fp8_rest-kv_fp8_mse.yaml, and the PR body's --recipe usage example cites yet another spelling. Worth reconciling so copy-pasted --recipe paths resolve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants