Skip to content

[Feat]: Support Dspark#1849

Open
h-guo18 wants to merge 3 commits into
mainfrom
haoguo/dpsark
Open

[Feat]: Support Dspark#1849
h-guo18 wants to merge 3 commits into
mainfrom
haoguo/dpsark

Conversation

@h-guo18

@h-guo18 h-guo18 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Type of change: New feature

Adds DSpark (DeepSeek-AI, "Confidence-Scheduled Speculative Decoding with Semi-Autoregressive Generation") as a third head in the DFlash family. DSpark keeps the parallel DFlash backbone for speed and adds a lightweight sequential Markov head that injects the intra-block causal dependency the parallel backbone lacks (mitigating suffix acceptance decay), plus an optional confidence head. The Markov head adds a prefix-dependent transition bias B_k to the backbone base logits, inducing a causal block distribution p_k(x_k | x_<k) = softmax(U_k + B_k).

  • Reuses the DFlash mode/pipeline; selected via dflash_architecture_config.projector_type=dspark.
  • Three head variants: vanilla (memoryless low-rank transition), gated (hidden-gated), rnn (GRU-like, full-prefix).
  • Trained with a three-term loss: ce_alpha·CE + l1_alpha·TVD + conf_alpha·confidence_BCE.
  • Export preserves the upstream DeepSpec submodule names so checkpoints stay portable.

New files: plugins/hf_dspark.py, plugins/modeling_dspark.py, recipe dspark.yaml, unit tests. Also touches config.py (loss-weight / head fields), dflash/conversion.py, plugins/__init__.py, and hf_spec_export.py (export support).

Usage

# modelopt_recipes/general/speculative_decoding/dspark.yaml (key fields)
dflash:
  dflash_architecture_config:
    projector_type: dspark          # select the DSpark head
  dflash_ce_loss_alpha: 0.1
  dflash_l1_loss_alpha: 0.9         # L1/TVD-dominant (DeepSpec default)
  dflash_confidence_head_alpha: 1.0 # 0 disables the confidence head

Testing

tests/unit/torch/speculative/plugins/test_hf_dspark.py covers: convert, all three head variants, confidence-head build/grads, forward loss + metrics + grads, exported weight-key layout, and that plain DFlash mode is unaffected. Train / export / AR-generate smoke-validated end-to-end.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: ❌ / N/A
  • Did you get Claude approval on this PR?: ❌ / N/A

Additional Information

Related: #1846 — DSpark shares the DFlash offline base-logit reconstruction path (final-norm fix); rebase on top of #1846 once it merges.

Summary by CodeRabbit

  • New Features

    • Added support for DSpark speculative decoding, including a new draft model path, Markov-style head options, and optional confidence scoring.
    • Added a DSpark training recipe with DSpark-specific settings and loss weights.
  • Bug Fixes

    • Improved conversion and export compatibility so DSpark models use the expected configuration and checkpoint format.
    • Added validation for required DSpark settings to prevent misconfigured runs.

@copy-pr-bot

copy-pr-bot Bot commented Jun 29, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds DSpark, a new projector variant of DFlash speculative decoding. It introduces DSpark-specific config fields, conversion registry routing, a DSparkModule with vanilla/gated/rnn Markov heads and optional confidence head, an HFDSparkModel training/generation plugin, a DSpark exporter, a training recipe, and unit tests.

Changes

DSpark speculative decoding

Layer / File(s) Summary
Config fields and conversion routing
modelopt/torch/speculative/config.py, modelopt/torch/speculative/dflash/conversion.py, modelopt/torch/speculative/plugins/__init__.py
DFlashConfig gains DSpark loss-weight fields; conversion adds a DSparkDMRegistry and routes projector_type="dspark" to it; the plugins package re-exports hf_dspark.
DSparkModule Markov head architecture
modelopt/torch/speculative/plugins/modeling_dspark.py
New DSparkModule (DFlashModule subclass) adds vanilla/gated/rnn low-rank Markov transition heads, weight init, and an optional confidence head with training/inference helpers.
HFDSparkModel training and generation plugin
modelopt/torch/speculative/plugins/hf_dspark.py
New HFDSparkModel implements modify(), Markov head application, a three-term DSpark loss (CE, TVD/L1, confidence BCE), a training forward(), and autoregressive pseudo_speculative_generate().
DSpark exporter and recipe
modelopt/torch/export/plugins/hf_spec_export.py, modelopt_recipes/general/speculative_decoding/dspark.yaml
New DSparkExporter extends exported config with DSpark head fields; new YAML recipe configures DSpark training end-to-end.
Unit tests
tests/unit/torch/speculative/plugins/test_hf_dspark.py
New tests cover conversion routing, Markov head variants, forward-pass loss/metrics/gradients, and exporter output correctness.

Estimated code review effort: 4 (Complex) | ~60 minutes

Sequence Diagram(s)

sequenceDiagram
participant Trainer
participant HFDSparkModel
participant BaseModel
participant DSparkModule
Trainer->>HFDSparkModel: forward(input_ids, labels)
HFDSparkModel->>BaseModel: run under no_grad for teacher logits
HFDSparkModel->>DSparkModule: run backbone/heads on draft inputs
DSparkModule-->>HFDSparkModel: hidden states and backbone logits
HFDSparkModel->>HFDSparkModel: apply Markov head to correct logits
HFDSparkModel->>HFDSparkModel: compute DSpark loss (CE, TVD, confidence BCE)
HFDSparkModel-->>Trainer: ModelOutput with loss and metrics
Loading
sequenceDiagram
participant Caller
participant HFDSparkModel
participant BaseModel
participant DSparkModule
Caller->>HFDSparkModel: pseudo_speculative_generate(input_ids)
HFDSparkModel->>BaseModel: compute anchor token
HFDSparkModel->>DSparkModule: run backbone on masked block
loop for each block position
DSparkModule->>DSparkModule: markov_step(prev_token, hidden, state)
DSparkModule-->>HFDSparkModel: corrected logits
HFDSparkModel->>HFDSparkModel: argmax to select next draft token
end
HFDSparkModel-->>Caller: anchor token and draft tokens
Loading

Related PRs: None identified.

Suggested labels: speculative-decoding, feature

Suggested reviewers: None identified.

🐰 A hop, a skip, a markov chain,
DSpark whispers what comes next again,
Vanilla, gated, or RNN's spin,
Confidence peeking from deep within,
New draft tokens dance without a rein.

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed Scanned all changed modelopt/examples Python files; none use the banned loaders, eval/exec, # nosec, and no dependency manifests were changed.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title is concise and matches the main change: adding DSpark support.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch haoguo/dpsark

Comment @coderabbitai help to get the list of available commands.

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1849/

Built to branch gh-pages at 2026-06-29 06:53 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

h-guo18 added 2 commits June 29, 2026 06:42
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
@h-guo18 h-guo18 self-assigned this Jun 29, 2026
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 76.75277% with 63 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.80%. Comparing base (f335459) to head (133f22f).
⚠️ Report is 20 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/speculative/plugins/hf_dspark.py 70.05% 53 Missing ⚠️
...elopt/torch/speculative/plugins/modeling_dspark.py 87.65% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1849      +/-   ##
==========================================
- Coverage   77.40%   76.80%   -0.61%     
==========================================
  Files         515      517       +2     
  Lines       57118    57389     +271     
==========================================
- Hits        44214    44075     -139     
- Misses      12904    13314     +410     
Flag Coverage Δ
examples 41.88% <15.86%> (-0.25%) ⬇️
gpu 57.63% <16.23%> (-0.85%) ⬇️
regression 14.85% <16.23%> (+0.07%) ⬆️
unit 55.02% <76.75%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@h-guo18 h-guo18 marked this pull request as ready for review July 3, 2026 01:43
@h-guo18 h-guo18 requested review from a team as code owners July 3, 2026 01:43
@h-guo18 h-guo18 requested review from ChenhanYu and realAsma July 3, 2026 01:43
@h-guo18

h-guo18 commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

/claude review

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

♻️ Duplicate comments (1)
modelopt/torch/speculative/plugins/hf_dspark.py (1)

1-51: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Same duplicate/inconsistent license header issue as modeling_dspark.py.

This file has the identical problem: a standalone Apache-2.0 header (lines 1-14, 2026) duplicated on top of the correctly combined "Apache-2.0 AND MIT" header (lines 37-50, 2024) that wraps the DeepSpec attribution (lines 16-35). The reference link also points to blob/main/...loss.py instead of a pinned commit hash.

As per coding guidelines: "The file which has code copied from another third-party GitHub repository should have the following in order: 1. A reference link (with commit hash)... 2. The original repository's Copyright/License. 3. The NVIDIA Apache 2.0 Copyright/License header," and copied files should be excluded from the insert-license pre-commit hook.

📄 Proposed fix
-# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-# SPDX-License-Identifier: Apache-2.0
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# Adapted from https://github.com/deepseek-ai/DeepSpec/blob/main/deepspec/modeling/dspark/loss.py
+# Adapted from https://github.com/deepseek-ai/DeepSpec/blob/<commit-sha>/deepspec/modeling/dspark/loss.py
 # Copyright (c) 2026 The DeepSpec Authors
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/speculative/plugins/hf_dspark.py` around lines 1 - 51, The
file has a duplicated and inconsistent license header, so keep only the
correctly ordered combined notice and remove the extra standalone Apache-2.0
block at the top. Ensure the DeepSpec attribution in this header uses a pinned
commit-hash reference instead of a branch-based link, and preserve the expected
order in this plugin file: reference link, original third-party
copyright/license, then the NVIDIA Apache-2.0 header. Also make sure this copied
file is excluded from the insert-license pre-commit hook so the header is not
rewritten again.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/torch/speculative/plugins/modeling_dspark.py`:
- Around line 1-51: The file has duplicated and inconsistent license headers
because the generic NVIDIA Apache-2.0 header was auto-inserted on top of the
copied DeepSpec header. Update modeling_dspark.py to keep the required header
order for copied third-party code: pinned DeepSpec reference link, original
DeepSpec copyright/license text, then the NVIDIA Apache-2.0 header, and remove
the duplicate SPDX block. Also add this file to the insert-license hook exclude
list in .pre-commit-config.yaml so the header is not re-added, and reference the
existing DeepSpec attribution section and SPDX-License-Identifier lines when
editing.

---

Duplicate comments:
In `@modelopt/torch/speculative/plugins/hf_dspark.py`:
- Around line 1-51: The file has a duplicated and inconsistent license header,
so keep only the correctly ordered combined notice and remove the extra
standalone Apache-2.0 block at the top. Ensure the DeepSpec attribution in this
header uses a pinned commit-hash reference instead of a branch-based link, and
preserve the expected order in this plugin file: reference link, original
third-party copyright/license, then the NVIDIA Apache-2.0 header. Also make sure
this copied file is excluded from the insert-license pre-commit hook so the
header is not rewritten again.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d9c9750f-27de-43e0-90d1-d82c8fd1efb7

📥 Commits

Reviewing files that changed from the base of the PR and between f335459 and 133f22f.

📒 Files selected for processing (8)
  • modelopt/torch/export/plugins/hf_spec_export.py
  • modelopt/torch/speculative/config.py
  • modelopt/torch/speculative/dflash/conversion.py
  • modelopt/torch/speculative/plugins/__init__.py
  • modelopt/torch/speculative/plugins/hf_dspark.py
  • modelopt/torch/speculative/plugins/modeling_dspark.py
  • modelopt_recipes/general/speculative_decoding/dspark.yaml
  • tests/unit/torch/speculative/plugins/test_hf_dspark.py

Comment on lines +1 to +51
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Adapted from https://github.com/deepseek-ai/DeepSpec/blob/main/deepspec/modeling/dspark/markov_head.py
# Copyright (c) 2026 The DeepSpec Authors
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0 AND MIT
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Duplicate/inconsistent license headers — likely missing pre-commit exclude entry.

This file has two separate NVIDIA header blocks: a plain Apache-2.0 header (lines 1-14, copyright 2026) and a second combined "Apache-2.0 AND MIT" header (lines 37-50, copyright 2024) wrapping the DeepSpec attribution/MIT text (lines 16-35). The mismatched years and duplicated SPDX-License-Identifier lines indicate the insert-license pre-commit hook re-added a generic header on top because this file wasn't added to its exclude list. Also, the DeepSpec reference link points at blob/main/... rather than a pinned commit hash.

As per coding guidelines: "The file which has code copied from another third-party GitHub repository should have the following in order: 1. A reference link (with commit hash)... 2. The original repository's Copyright/License. 3. The NVIDIA Apache 2.0 Copyright/License header" and "Exclude copied files from the license pre-commit hook so it doesn't auto-add the NVIDIA Apache 2.0 license on top of the file."

📄 Proposed fix: remove duplicate header, pin commit hash
-# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-# SPDX-License-Identifier: Apache-2.0
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# Adapted from https://github.com/deepseek-ai/DeepSpec/blob/main/deepspec/modeling/dspark/markov_head.py
+# Adapted from https://github.com/deepseek-ai/DeepSpec/blob/<commit-sha>/deepspec/modeling/dspark/markov_head.py
 # Copyright (c) 2026 The DeepSpec Authors

Then add this file's path to the insert-license hook's exclude list in .pre-commit-config.yaml.

Run the following to confirm the exclude list and check for other files sharing this pattern:

#!/bin/bash
rg -n "exclude" .pre-commit-config.yaml -A5
rg -n "insert-license" .pre-commit-config.yaml -A15
rg -nP "SPDX-License-Identifier" modelopt/torch/speculative/plugins/modeling_dspark.py modelopt/torch/speculative/plugins/hf_dspark.py
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/speculative/plugins/modeling_dspark.py` around lines 1 - 51,
The file has duplicated and inconsistent license headers because the generic
NVIDIA Apache-2.0 header was auto-inserted on top of the copied DeepSpec header.
Update modeling_dspark.py to keep the required header order for copied
third-party code: pinned DeepSpec reference link, original DeepSpec
copyright/license text, then the NVIDIA Apache-2.0 header, and remove the
duplicate SPDX block. Also add this file to the insert-license hook exclude list
in .pre-commit-config.yaml so the header is not re-added, and reference the
existing DeepSpec attribution section and SPDX-License-Identifier lines when
editing.

Source: Coding guidelines



def _tvd_per_token(final_logits, teacher_logits, chunk_size=1024):
"""Total-variation distance ||softmax(a)-softmax(b)||_1 / ... per token, memory-lean.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SUGGESTION] _tvd_per_token is named/documented as the "total-variation distance" but actually returns the L1 distance ||softmax(a)-softmax(b)||_1, which is 2·TVD (TVD = ½·L1). The math downstream is internally consistent — l1_loss uses the raw L1, and accept_rate = 1 - 0.5*l1_per_token correctly yields 1 - TVD — but the naming is a latent footgun: a future maintainer who "corrects" this to return an actual TVD (0.5 * ||·||_1) would silently break the accept_rate target (→ 1 - 0.25·L1) and halve the confidence supervision signal.

Suggest either renaming to _l1_per_token (documenting the return as L1 = 2·TVD), or keeping the name but returning 0.5 * ... and updating the accept_rate/l1_loss call sites. The former is the smaller, safer change.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review passed — no blocking issues found. LGTM

SCOPE
PR is additive: introduces DSpark as a third head in the DFlash family, selected via dflash_architecture_config.projector_type=dspark, reusing the existing DFlash mode/pipeline/registry. 8 files changed. I reviewed all of the PRs own files (plugins/hf_dspark.py, plugins/modeling_dspark.py, config.py, dflash/conversion.py, plugins/init.py, export/plugins/hf_spec_export.py, the dspark.yaml recipe, and the unit test) and cross-checked the DFlash parent (hf_dflash.py) for the integration/shape contracts. Note: a two-dot diff against origin/main surfaces ~20 commits of unrelated churn because the branch is behind main; those files are not part of this PR and were not reviewed.

FINDINGS

  • CRITICAL: 0
  • IMPORTANT: 0
  • SUGGESTION: 1 — _tvd_per_token returns L1 (= 2*TVD) despite its total-variation-distance name; internally consistent today but a rename footgun. Non-blocking.

ASSESSMENT
Low risk. The algorithm is coherent end-to-end:

  • Alignment is consistent between training and generation — shift_label next-token alignment (position k conditions on the prev token at anchor+k, predicts anchor+k+1; position 0s predecessor is the anchor) matches across _apply_markov_head/_compute_dspark_loss and pseudo_speculative_generate.
  • Confidence target 1 - 0.5*L1 = 1 - TVD is the correct speculative-sampling acceptance probability.
  • DDP/grad flow is handled (dummy loss flows through all draft params; recipe sets ddp_find_unused_parameters=true for the disabled-confidence-head case), and confidence_head_alpha>0 without a built head is validated in modify.
  • Mode/registry: own DSparkDMRegistry avoids clobbering HFDFlashModel; convert_to_dflash_model routes correctly and the error message lists the new option.
  • Export: markov head weights ride the inherited dflash_module. stripping; extra loader config fields added, mirroring DominoExporter.
  • Backward compat: new dflash_*_alpha config fields are optional with defaults and inert unless projector_type==dspark; DFlash/EAGLE/Domino paths untouched.

The single SUGGESTION is a naming/maintainability nit and does not block.



def _tvd_per_token(final_logits, teacher_logits, chunk_size=1024):
"""Total-variation distance ||softmax(a)-softmax(b)||_1 / ... per token, memory-lean.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SUGGESTION] _tvd_per_token is named/documented as the "total-variation distance" but actually returns the L1 distance ||softmax(a)-softmax(b)||_1, which is 2·TVD (TVD = ½·L1). The math downstream is internally consistent — l1_loss uses the raw L1, and accept_rate = 1 - 0.5*l1_per_token correctly yields 1 - TVD — but the naming is a latent footgun: a future maintainer who "corrects" this function to return an actual TVD (0.5 * ||·||_1) would silently break the accept_rate target (it would become 1 - 0.25·L1) and halve the confidence supervision signal.

Suggest either renaming to _l1_per_token (and describing the return as the L1 distance = 2·TVD), or keeping the name but returning 0.5 * ... and updating the accept_rate/l1_loss call sites accordingly. The former is the smaller, safer change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants