[Feat]: Support Dspark#1849
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughThis PR adds DSpark, a new projector variant of DFlash speculative decoding. It introduces DSpark-specific config fields, conversion registry routing, a DSparkModule with vanilla/gated/rnn Markov heads and optional confidence head, an HFDSparkModel training/generation plugin, a DSpark exporter, a training recipe, and unit tests. ChangesDSpark speculative decoding
Estimated code review effort: 4 (Complex) | ~60 minutes Sequence Diagram(s)sequenceDiagram
participant Trainer
participant HFDSparkModel
participant BaseModel
participant DSparkModule
Trainer->>HFDSparkModel: forward(input_ids, labels)
HFDSparkModel->>BaseModel: run under no_grad for teacher logits
HFDSparkModel->>DSparkModule: run backbone/heads on draft inputs
DSparkModule-->>HFDSparkModel: hidden states and backbone logits
HFDSparkModel->>HFDSparkModel: apply Markov head to correct logits
HFDSparkModel->>HFDSparkModel: compute DSpark loss (CE, TVD, confidence BCE)
HFDSparkModel-->>Trainer: ModelOutput with loss and metrics
sequenceDiagram
participant Caller
participant HFDSparkModel
participant BaseModel
participant DSparkModule
Caller->>HFDSparkModel: pseudo_speculative_generate(input_ids)
HFDSparkModel->>BaseModel: compute anchor token
HFDSparkModel->>DSparkModule: run backbone on masked block
loop for each block position
DSparkModule->>DSparkModule: markov_step(prev_token, hidden, state)
DSparkModule-->>HFDSparkModel: corrected logits
HFDSparkModel->>HFDSparkModel: argmax to select next draft token
end
HFDSparkModel-->>Caller: anchor token and draft tokens
Related PRs: None identified. Suggested labels: speculative-decoding, feature Suggested reviewers: None identified. 🐰 A hop, a skip, a markov chain, 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1849 +/- ##
==========================================
- Coverage 77.40% 76.80% -0.61%
==========================================
Files 515 517 +2
Lines 57118 57389 +271
==========================================
- Hits 44214 44075 -139
- Misses 12904 13314 +410
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
/claude review |
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
♻️ Duplicate comments (1)
modelopt/torch/speculative/plugins/hf_dspark.py (1)
1-51: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick winSame duplicate/inconsistent license header issue as
modeling_dspark.py.This file has the identical problem: a standalone Apache-2.0 header (lines 1-14, 2026) duplicated on top of the correctly combined "Apache-2.0 AND MIT" header (lines 37-50, 2024) that wraps the DeepSpec attribution (lines 16-35). The reference link also points to
blob/main/...loss.pyinstead of a pinned commit hash.As per coding guidelines: "The file which has code copied from another third-party GitHub repository should have the following in order: 1. A reference link (with commit hash)... 2. The original repository's Copyright/License. 3. The NVIDIA Apache 2.0 Copyright/License header," and copied files should be excluded from the
insert-licensepre-commit hook.📄 Proposed fix
-# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Adapted from https://github.com/deepseek-ai/DeepSpec/blob/main/deepspec/modeling/dspark/loss.py +# Adapted from https://github.com/deepseek-ai/DeepSpec/blob/<commit-sha>/deepspec/modeling/dspark/loss.py # Copyright (c) 2026 The DeepSpec Authors🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt/torch/speculative/plugins/hf_dspark.py` around lines 1 - 51, The file has a duplicated and inconsistent license header, so keep only the correctly ordered combined notice and remove the extra standalone Apache-2.0 block at the top. Ensure the DeepSpec attribution in this header uses a pinned commit-hash reference instead of a branch-based link, and preserve the expected order in this plugin file: reference link, original third-party copyright/license, then the NVIDIA Apache-2.0 header. Also make sure this copied file is excluded from the insert-license pre-commit hook so the header is not rewritten again.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modelopt/torch/speculative/plugins/modeling_dspark.py`:
- Around line 1-51: The file has duplicated and inconsistent license headers
because the generic NVIDIA Apache-2.0 header was auto-inserted on top of the
copied DeepSpec header. Update modeling_dspark.py to keep the required header
order for copied third-party code: pinned DeepSpec reference link, original
DeepSpec copyright/license text, then the NVIDIA Apache-2.0 header, and remove
the duplicate SPDX block. Also add this file to the insert-license hook exclude
list in .pre-commit-config.yaml so the header is not re-added, and reference the
existing DeepSpec attribution section and SPDX-License-Identifier lines when
editing.
---
Duplicate comments:
In `@modelopt/torch/speculative/plugins/hf_dspark.py`:
- Around line 1-51: The file has a duplicated and inconsistent license header,
so keep only the correctly ordered combined notice and remove the extra
standalone Apache-2.0 block at the top. Ensure the DeepSpec attribution in this
header uses a pinned commit-hash reference instead of a branch-based link, and
preserve the expected order in this plugin file: reference link, original
third-party copyright/license, then the NVIDIA Apache-2.0 header. Also make sure
this copied file is excluded from the insert-license pre-commit hook so the
header is not rewritten again.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d9c9750f-27de-43e0-90d1-d82c8fd1efb7
📒 Files selected for processing (8)
modelopt/torch/export/plugins/hf_spec_export.pymodelopt/torch/speculative/config.pymodelopt/torch/speculative/dflash/conversion.pymodelopt/torch/speculative/plugins/__init__.pymodelopt/torch/speculative/plugins/hf_dspark.pymodelopt/torch/speculative/plugins/modeling_dspark.pymodelopt_recipes/general/speculative_decoding/dspark.yamltests/unit/torch/speculative/plugins/test_hf_dspark.py
| # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| # Adapted from https://github.com/deepseek-ai/DeepSpec/blob/main/deepspec/modeling/dspark/markov_head.py | ||
| # Copyright (c) 2026 The DeepSpec Authors | ||
| # | ||
| # Permission is hereby granted, free of charge, to any person obtaining a copy | ||
| # of this software and associated documentation files (the "Software"), to deal | ||
| # in the Software without restriction, including without limitation the rights | ||
| # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
| # copies of the Software, and to permit persons to whom the Software is | ||
| # furnished to do so, subject to the following conditions: | ||
| # | ||
| # The above copyright notice and this permission notice shall be included in all | ||
| # copies or substantial portions of the Software. | ||
| # | ||
| # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
| # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
| # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
| # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
| # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
| # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
| # SOFTWARE. | ||
|
|
||
| # SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 AND MIT | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win
Duplicate/inconsistent license headers — likely missing pre-commit exclude entry.
This file has two separate NVIDIA header blocks: a plain Apache-2.0 header (lines 1-14, copyright 2026) and a second combined "Apache-2.0 AND MIT" header (lines 37-50, copyright 2024) wrapping the DeepSpec attribution/MIT text (lines 16-35). The mismatched years and duplicated SPDX-License-Identifier lines indicate the insert-license pre-commit hook re-added a generic header on top because this file wasn't added to its exclude list. Also, the DeepSpec reference link points at blob/main/... rather than a pinned commit hash.
As per coding guidelines: "The file which has code copied from another third-party GitHub repository should have the following in order: 1. A reference link (with commit hash)... 2. The original repository's Copyright/License. 3. The NVIDIA Apache 2.0 Copyright/License header" and "Exclude copied files from the license pre-commit hook so it doesn't auto-add the NVIDIA Apache 2.0 license on top of the file."
📄 Proposed fix: remove duplicate header, pin commit hash
-# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-# SPDX-License-Identifier: Apache-2.0
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# Adapted from https://github.com/deepseek-ai/DeepSpec/blob/main/deepspec/modeling/dspark/markov_head.py
+# Adapted from https://github.com/deepseek-ai/DeepSpec/blob/<commit-sha>/deepspec/modeling/dspark/markov_head.py
# Copyright (c) 2026 The DeepSpec AuthorsThen add this file's path to the insert-license hook's exclude list in .pre-commit-config.yaml.
Run the following to confirm the exclude list and check for other files sharing this pattern:
#!/bin/bash
rg -n "exclude" .pre-commit-config.yaml -A5
rg -n "insert-license" .pre-commit-config.yaml -A15
rg -nP "SPDX-License-Identifier" modelopt/torch/speculative/plugins/modeling_dspark.py modelopt/torch/speculative/plugins/hf_dspark.py🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modelopt/torch/speculative/plugins/modeling_dspark.py` around lines 1 - 51,
The file has duplicated and inconsistent license headers because the generic
NVIDIA Apache-2.0 header was auto-inserted on top of the copied DeepSpec header.
Update modeling_dspark.py to keep the required header order for copied
third-party code: pinned DeepSpec reference link, original DeepSpec
copyright/license text, then the NVIDIA Apache-2.0 header, and remove the
duplicate SPDX block. Also add this file to the insert-license hook exclude list
in .pre-commit-config.yaml so the header is not re-added, and reference the
existing DeepSpec attribution section and SPDX-License-Identifier lines when
editing.
Source: Coding guidelines
|
|
||
|
|
||
| def _tvd_per_token(final_logits, teacher_logits, chunk_size=1024): | ||
| """Total-variation distance ||softmax(a)-softmax(b)||_1 / ... per token, memory-lean. |
There was a problem hiding this comment.
[SUGGESTION] _tvd_per_token is named/documented as the "total-variation distance" but actually returns the L1 distance ||softmax(a)-softmax(b)||_1, which is 2·TVD (TVD = ½·L1). The math downstream is internally consistent — l1_loss uses the raw L1, and accept_rate = 1 - 0.5*l1_per_token correctly yields 1 - TVD — but the naming is a latent footgun: a future maintainer who "corrects" this to return an actual TVD (0.5 * ||·||_1) would silently break the accept_rate target (→ 1 - 0.25·L1) and halve the confidence supervision signal.
Suggest either renaming to _l1_per_token (documenting the return as L1 = 2·TVD), or keeping the name but returning 0.5 * ... and updating the accept_rate/l1_loss call sites. The former is the smaller, safer change.
There was a problem hiding this comment.
Claude review passed — no blocking issues found. LGTM
SCOPE
PR is additive: introduces DSpark as a third head in the DFlash family, selected via dflash_architecture_config.projector_type=dspark, reusing the existing DFlash mode/pipeline/registry. 8 files changed. I reviewed all of the PRs own files (plugins/hf_dspark.py, plugins/modeling_dspark.py, config.py, dflash/conversion.py, plugins/init.py, export/plugins/hf_spec_export.py, the dspark.yaml recipe, and the unit test) and cross-checked the DFlash parent (hf_dflash.py) for the integration/shape contracts. Note: a two-dot diff against origin/main surfaces ~20 commits of unrelated churn because the branch is behind main; those files are not part of this PR and were not reviewed.
FINDINGS
- CRITICAL: 0
- IMPORTANT: 0
- SUGGESTION: 1 — _tvd_per_token returns L1 (= 2*TVD) despite its total-variation-distance name; internally consistent today but a rename footgun. Non-blocking.
ASSESSMENT
Low risk. The algorithm is coherent end-to-end:
- Alignment is consistent between training and generation — shift_label next-token alignment (position k conditions on the prev token at anchor+k, predicts anchor+k+1; position 0s predecessor is the anchor) matches across _apply_markov_head/_compute_dspark_loss and pseudo_speculative_generate.
- Confidence target 1 - 0.5*L1 = 1 - TVD is the correct speculative-sampling acceptance probability.
- DDP/grad flow is handled (dummy loss flows through all draft params; recipe sets ddp_find_unused_parameters=true for the disabled-confidence-head case), and confidence_head_alpha>0 without a built head is validated in modify.
- Mode/registry: own DSparkDMRegistry avoids clobbering HFDFlashModel; convert_to_dflash_model routes correctly and the error message lists the new option.
- Export: markov head weights ride the inherited dflash_module. stripping; extra loader config fields added, mirroring DominoExporter.
- Backward compat: new dflash_*_alpha config fields are optional with defaults and inert unless projector_type==dspark; DFlash/EAGLE/Domino paths untouched.
The single SUGGESTION is a naming/maintainability nit and does not block.
|
|
||
|
|
||
| def _tvd_per_token(final_logits, teacher_logits, chunk_size=1024): | ||
| """Total-variation distance ||softmax(a)-softmax(b)||_1 / ... per token, memory-lean. |
There was a problem hiding this comment.
[SUGGESTION] _tvd_per_token is named/documented as the "total-variation distance" but actually returns the L1 distance ||softmax(a)-softmax(b)||_1, which is 2·TVD (TVD = ½·L1). The math downstream is internally consistent — l1_loss uses the raw L1, and accept_rate = 1 - 0.5*l1_per_token correctly yields 1 - TVD — but the naming is a latent footgun: a future maintainer who "corrects" this function to return an actual TVD (0.5 * ||·||_1) would silently break the accept_rate target (it would become 1 - 0.25·L1) and halve the confidence supervision signal.
Suggest either renaming to _l1_per_token (and describing the return as the L1 distance = 2·TVD), or keeping the name but returning 0.5 * ... and updating the accept_rate/l1_loss call sites accordingly. The former is the smaller, safer change.
What does this PR do?
Type of change: New feature
Adds DSpark (DeepSeek-AI, "Confidence-Scheduled Speculative Decoding with Semi-Autoregressive Generation") as a third head in the DFlash family. DSpark keeps the parallel DFlash backbone for speed and adds a lightweight sequential Markov head that injects the intra-block causal dependency the parallel backbone lacks (mitigating suffix acceptance decay), plus an optional confidence head. The Markov head adds a prefix-dependent transition bias
B_kto the backbone base logits, inducing a causal block distributionp_k(x_k | x_<k) = softmax(U_k + B_k).dflash_architecture_config.projector_type=dspark.vanilla(memoryless low-rank transition),gated(hidden-gated),rnn(GRU-like, full-prefix).ce_alpha·CE + l1_alpha·TVD + conf_alpha·confidence_BCE.New files:
plugins/hf_dspark.py,plugins/modeling_dspark.py, recipedspark.yaml, unit tests. Also touchesconfig.py(loss-weight / head fields),dflash/conversion.py,plugins/__init__.py, andhf_spec_export.py(export support).Usage
Testing
tests/unit/torch/speculative/plugins/test_hf_dspark.pycovers: convert, all three head variants, confidence-head build/grads, forward loss + metrics + grads, exported weight-key layout, and that plain DFlash mode is unaffected. Train / export / AR-generate smoke-validated end-to-end.Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/AAdditional Information
Related: #1846 — DSpark shares the DFlash offline base-logit reconstruction path (final-norm fix); rebase on top of #1846 once it merges.
Summary by CodeRabbit
New Features
Bug Fixes