Add recipe used for Qwen3.5 397B NVFP4 V2 checkpoint#1868
Conversation
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
📝 WalkthroughWalkthroughA new PTQ recipe YAML configuration is added for the qwen3_5_moe model. It defines MSE-based quantization with FP8 scale sweep, applies NVFP4 to LM routed expert weight and input quantizers, enables FP8 KV cache, and disables BF16 quantizers for specific attention, visual, and mtp modules. ChangesPTQ Recipe Configuration
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
/claude review |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
modelopt_recipes/huggingface/qwen3_5_moe/ptq/nvfp4_experts-fp8_rest-kv_fp8_mse.yaml (1)
37-60: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd a focused regression test for this recipe.
This recipe’s behavior depends on several glob matches and on the disable rules being applied last, so a small naming or ordering change could silently quantize the wrong modules. Please add a recipe-level test that loads this YAML and asserts the intended contracts: expert weights use static NVFP4, expert inputs use dynamic NVFP4, KV cache stays FP8, and
linear_attn.in_proj_a,linear_attn.in_proj_b,visual, andmtpremain disabled. As per coding guidelines, "For any new features / examples, make sure to add tests and that the coverage check in your PR passes."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt_recipes/huggingface/qwen3_5_moe/ptq/nvfp4_experts-fp8_rest-kv_fp8_mse.yaml` around lines 37 - 60, Add a focused recipe-level regression test for this Qwen3.5 MoE PTQ YAML that loads the recipe and verifies the intended quantization contract. Use the recipe identifier from the quantize config and assert the glob-based rules still resolve correctly: expert weight quantizers match the NVFP4 static config, expert input quantizers match NVFP4 dynamic config, KV cache uses FP8, and the final disable rules keep linear_attn.in_proj_a, linear_attn.in_proj_b, visual, and mtp disabled. Place the test alongside the existing recipe tests and ensure it fails if the ordering of the quant_cfg entries changes or the glob patterns stop matching.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In
`@modelopt_recipes/huggingface/qwen3_5_moe/ptq/nvfp4_experts-fp8_rest-kv_fp8_mse.yaml`:
- Around line 37-60: Add a focused recipe-level regression test for this Qwen3.5
MoE PTQ YAML that loads the recipe and verifies the intended quantization
contract. Use the recipe identifier from the quantize config and assert the
glob-based rules still resolve correctly: expert weight quantizers match the
NVFP4 static config, expert input quantizers match NVFP4 dynamic config, KV
cache uses FP8, and the final disable rules keep linear_attn.in_proj_a,
linear_attn.in_proj_b, visual, and mtp disabled. Place the test alongside the
existing recipe tests and ensure it fails if the ordering of the quant_cfg
entries changes or the glob patterns stop matching.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 920c89ef-337b-4e60-92f4-1c17c8015239
📒 Files selected for processing (1)
modelopt_recipes/huggingface/qwen3_5_moe/ptq/nvfp4_experts-fp8_rest-kv_fp8_mse.yaml
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Small additive PR (+60, 1 file): adds a Qwen3.5-MoE-specific PTQ recipe (huggingface/qwen3_5_moe/ptq/nvfp4_experts-fp8_rest-kv_fp8_mse.yaml) to the existing built-in recipe registry. No prompt-injection attempts in the untrusted content.
What I verified:
- All 6
$importtargets resolve to existing files (base_disable_all,default_disabled_quantizers,w8a8_fp8_fp8,nvfp4,nvfp4_static,kv_fp8). - Structure matches established siblings — it is essentially the
general/ptq/nvfp4_experts_only_mse-kv_fp8_cast.yamlpattern (method: mse+fp8_scale_sweep: true+layerwise: falsealgorithm dict) plus an FP8 base layer (w8a8_fp8_fp8) for non-expert Linears andkv_fp8instead of cast mode.nvfp4_static= static weight scales,nvfp4= dynamic inputs, matching the stated intent. - Ordering is correct —
quant_cfgis last-wins per the documented semantics; BF16 exclusions (mtp,visual,linear_attn.in_proj_*) are placed last so they win. - Placement/naming conform to
modelopt_recipes/huggingface/README.mdconventions (per-task README is optional). - Design review does not apply — this is a single data file consuming a pre-existing, documented recipe subsystem, not a new abstraction.
- Licensing clean — new file's only license content is the standard Apache-2.0 NVIDIA header, matching
LICENSE_HEADERexactly (safe exception).
Why nudge rather than approve:
- No automated test, and the recipe can't be functionally exercised in CI (used to produce a real 397B-parameter checkpoint, GPU-only). This is consistent with how sibling recipes were merged, but an owner should confirm the quantizer-name glob patterns (e.g.
'*language_model*mlp.experts.*weight_quantizer','*mtp*','*visual*') actually match the Qwen3.5-MoE module tree — that's the one correctness aspect not verifiable from the repo alone. - Minor: the leading comment calls this the "MSE variant of qwen3p5_mixed_nvfp4_fp8.yaml", but that referenced file doesn't appear in the repo (likely an internal/non-shipped recipe). Non-functional doc comment, but worth a quick fix or clarification.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1868 +/- ##
==========================================
- Coverage 77.49% 72.86% -4.64%
==========================================
Files 489 516 +27
Lines 54415 65036 +10621
==========================================
+ Hits 42169 47388 +5219
- Misses 12246 17648 +5402
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| # MSE variant of qwen3p5_mixed_nvfp4_fp8.yaml. |
There was a problem hiding this comment.
[SUGGESTION] This header describes the recipe as the "MSE variant of qwen3p5_mixed_nvfp4_fp8.yaml", but no file by that name exists anywhere in the repo (the only matches for qwen3p5_mixed/mixed_nvfp4_fp8 are this file and an unrelated test). A maintainer trying to locate the non-MSE base recipe will hit a dead reference.
(1) The comment points at a file that doesn't exist; (2) it makes the relationship between recipes harder to follow and risks rotting further; (3) either drop the cross-reference or point it at the actual base recipe (e.g. the sibling w4a16_nvfp4-fp8_attn-kv_fp8_cast.yaml) if that's the intended counterpart. Everything else in the recipe — the $import targets, the mse/fp8_scale_sweep algorithm fields, and the documented "FP8 layers stay max-calibrated" behavior — checks out.
There was a problem hiding this comment.
Claude review passed — no blocking issues found. LGTM
Findings: CRITICAL: 0, IMPORTANT: 0, SUGGESTION: 1
This is a single-file PR adding one PTQ recipe (nvfp4_experts-fp8_rest-kv_fp8_mse.yaml). I reviewed the recipe in full and traced its imports and algorithm semantics through the source.
What I verified:
- All 6
$importtargets resolve to existing config units (base_disable_all,default_disabled_quantizers,w8a8_fp8_fp8,nvfp4,nvfp4_static,kv_fp8). - Schema validity —
method: mse,fp8_scale_sweep, andlayerwiseare all validMseCalibConfigfields (modelopt/torch/quantization/config.py). - Central claim confirmed in code — the header/comments state that
fp8_scale_sweep: truerefines only static NVFP4 weights with MSE while FP8 / dynamic / KV quantizers stay max-calibrated._make_weight_mse_calibrator(model_calib.py:455-471) returnsNonefor any quantizer that isn't a static NVFP4 weight or registered FP8-sweep backend, so those keep their max-calibrated amax. Behavior matches the documentation exactly. - Override ordering —
base_disable_all→ FP8 base → NVFP4 expert overrides viacfg→ KV → exclusions-last follows the established last-wins pattern used by the validated sibling recipew4a16_nvfp4-fp8_attn-kv_fp8_cast.quant_cfg.yaml. - Recipes are pre-commit-validated via
load_recipe()(tools/precommit/check_modelopt_recipes.py), so structural/import errors would be caught at commit time.
Only finding (non-blocking): the header comment references qwen3p5_mixed_nvfp4_fp8.yaml as the non-MSE base, but no such file exists in the repo — a stale/dead cross-reference. Posted inline.
Backward compatibility: additive only (new file, no schema or default changes). Risk level: low.
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Move the `mse` modifier from the kv group to the nvfp4 group it actually refines, matching the convention in modelopt_recipes/general/ptq (e.g. nvfp4_experts_only_mse-kv_fp8_cast.yaml). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Small additive PR (+59, 1 file): adds a Qwen3.5-MoE-specific PTQ recipe under modelopt_recipes/huggingface/qwen3_5_moe/ptq/ to the existing built-in recipe registry. This consumes the pre-existing, documented modelopt.recipe subsystem rather than introducing a new abstraction, so design review doesn't apply. No prompt-injection attempts in the untrusted content.
What I verified:
- All 6
$importtargets resolve to existing files (base_disable_all,default_disabled_quantizers,w8a8_fp8_fp8,nvfp4,nvfp4_static,kv_fp8). - Structure matches the validated sibling
general/ptq/nvfp4_experts_only_mse-kv_fp8_cast.yaml:method: mse+fp8_scale_sweep: true+layerwise: false, plus an FP8 base (w8a8_fp8_fp8) for non-expert Linears andkv_fp8.nvfp4_static= static weight scales,nvfp4= dynamic inputs — matches the stated intent. - Last-wins ordering is correct: BF16 exclusions (
*mtp*,*visual*,*linear_attn.in_proj_*) placed last so they win. - Licensing clean: the new file's only license content is the standard NVIDIA Apache-2.0 header, matching
LICENSE_HEADERexactly (safe exception).
Previous review comments:
- CodeRabbit's "add a regression test" (minor): still unaddressed, but consistent with how sibling recipes were merged (GPU-only 397B path, not CI-exercisable).
- 💬 Claude's "stale
qwen3p5_mixed_nvfp4_fp8.yamlcross-reference" (minor): appears resolved — the current file header has no such reference.
Why nudge rather than approve:
- No automated test, and the recipe can't be functionally exercised in CI (used to produce a real 397B-parameter GPU-only checkpoint). An owner should confirm the quantizer-name glob patterns (
'*language_model*mlp.experts.*weight_quantizer','*mtp*','*visual*','*linear_attn.in_proj_*') actually match the Qwen3.5-MoE module tree — the one correctness aspect not verifiable from the repo alone. - Minor doc drift: the committed filename is
nvfp4_experts_mse-fp8_rest-kv_fp8.yaml, but the PR title and CodeRabbit referencenvfp4_experts-fp8_rest-kv_fp8_mse.yaml, and the PR body's--recipeusage example cites yet another spelling. Worth reconciling so copy-pasted--recipepaths resolve.
What does this PR do?
Type of change: New example (model-specific PTQ recipe)
Adds the built-in PTQ recipe used to produce the Qwen3.5 397B NVFP4 V2 checkpoint to the model-specific recipe registry at
modelopt_recipes/huggingface/qwen3_5_moe/ptq/nvfp4_experts-fp8_rest-kv_fp8_mse.yaml. The recipe applies a mixed NVFP4/FP8 scheme to theqwen3_5_moearchitecture: NVFP4 (MSE/FP8-scale-sweep static weights, dynamic inputs) on the LM routed experts; ModelOpt-default FP8 (W8A8 per-tensor static, max-calibrated) on every other Linear layer; FP8 KV cache; MTP block left in BF16. Thefp8_scale_sweepMSE refinement is scoped to the static NVFP4 weights only — all FP8 / dynamic / KV quantizers stay max-calibrated.Usage
Testing
Recipe was used to generate the Qwen3.5 397B NVFP4 V2 checkpoint.
Summary by CodeRabbit