specdec_bench modelopt release by talorabr · Pull Request #1886 · NVIDIA/Model-Optimizer

talorabr · 2026-07-02T11:42:26Z

What does this PR do?

Type of change: new features

New release of specdec_bench including the following features:

Agentic data split support for SPEED-Bench.
Experiment sweeps in specdec_bench.
Enhancements for error handling to enable fast failing.
Support for new specdec methods: DFLASH, PARD.

Usage

Example on how to use the agentic split:

python3 run.py       --model_dir path/to/NVIDIA-Nemotron-3-Super-120B-A12B-BF16       --tokenizer path/to/NVIDIA-Nemotron-3-Super-120B-A12B-BF16       --dataset speed       --dataset_path /path/to/SPEED-Bench/agentic/ --agentic --agentic_skip_turns_delta 4         --engine VLLM       --speculative_algorithm MTP       --output_length 2048       --draft_length 7       --tp_size 8       --concurrency 32       --show_progress

Testing

Tested the following scenarios:

Nemotron Super MTP with VLLM on SPEED-Bench (qualitative, throughput 1k, agentic)
Llama 3.1 8B with external draft (Llama 3.2 1B) on SPEED-Bench with VLLM (qualitative, throughput 1k)
Nemotron Super MTP with SGLANG on SPEED-Bench (agentic)

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ❌ This change is backward compatible in terms of the entrypoint run.py but is not backward compatible in term of the internal APIs in specdec_bench including changes to the metrics interfaces.
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ❌
Did you update Changelog?: ❌
Did you get Claude approval on this PR?: ✅ / ❌ / N/A

Additional Information

Signed-off-by: talora <talora@nvidia.com>

talorabr · 2026-07-02T11:43:40Z

/claude review

coderabbitai · 2026-07-02T11:43:44Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4156b955-597d-43e1-9e41-9bbf348c071b

📥 Commits

Reviewing files that changed from the base of the PR and between 9038b71 and 85e9227.

📒 Files selected for processing (31)

examples/specdec_bench/README.md
examples/specdec_bench/SPECBENCH_PORTING.md
examples/specdec_bench/prepare_data.py
examples/specdec_bench/run.py
examples/specdec_bench/specdec_bench/datasets/__init__.py
examples/specdec_bench/specdec_bench/datasets/agentic_speed.py
examples/specdec_bench/specdec_bench/datasets/base.py
examples/specdec_bench/specdec_bench/datasets/mtbench.py
examples/specdec_bench/specdec_bench/datasets/specbench.py
examples/specdec_bench/specdec_bench/datasets/speed.py
examples/specdec_bench/specdec_bench/metrics/__init__.py
examples/specdec_bench/specdec_bench/metrics/acceptance_rate.py
examples/specdec_bench/specdec_bench/metrics/agentic.py
examples/specdec_bench/specdec_bench/metrics/base.py
examples/specdec_bench/specdec_bench/metrics/mtbench.py
examples/specdec_bench/specdec_bench/metrics/specbench.py
examples/specdec_bench/specdec_bench/metrics/thinking_acceptance.py
examples/specdec_bench/specdec_bench/metrics/timing.py
examples/specdec_bench/specdec_bench/models/__init__.py
examples/specdec_bench/specdec_bench/models/auto_deploy.py
examples/specdec_bench/specdec_bench/models/base.py
examples/specdec_bench/specdec_bench/models/sglang.py
examples/specdec_bench/specdec_bench/models/specbench_medusa.py
examples/specdec_bench/specdec_bench/models/trtllm_torch_api.py
examples/specdec_bench/specdec_bench/models/vllm.py
examples/specdec_bench/specdec_bench/run_utils.py
examples/specdec_bench/specdec_bench/runners/__init__.py
examples/specdec_bench/specdec_bench/runners/base.py
examples/specdec_bench/specdec_bench/runners/simple.py
examples/specdec_bench/specdec_bench/utils.py
examples/specdec_bench/sweep_example.yaml

👮 Files not reviewed due to content moderation or server errors (31)

examples/specdec_bench/prepare_data.py
examples/specdec_bench/specdec_bench/datasets/init.py
examples/specdec_bench/specdec_bench/datasets/agentic_speed.py
examples/specdec_bench/specdec_bench/datasets/mtbench.py
examples/specdec_bench/specdec_bench/datasets/base.py
examples/specdec_bench/README.md
examples/specdec_bench/specdec_bench/datasets/speed.py
examples/specdec_bench/specdec_bench/datasets/specbench.py
examples/specdec_bench/run.py
examples/specdec_bench/specdec_bench/metrics/mtbench.py
examples/specdec_bench/sweep_example.yaml
examples/specdec_bench/specdec_bench/utils.py
examples/specdec_bench/specdec_bench/run_utils.py
examples/specdec_bench/specdec_bench/metrics/agentic.py
examples/specdec_bench/SPECBENCH_PORTING.md
examples/specdec_bench/specdec_bench/models/specbench_medusa.py
examples/specdec_bench/specdec_bench/models/base.py
examples/specdec_bench/specdec_bench/models/trtllm_torch_api.py
examples/specdec_bench/specdec_bench/models/sglang.py
examples/specdec_bench/specdec_bench/models/auto_deploy.py
examples/specdec_bench/specdec_bench/runners/simple.py
examples/specdec_bench/specdec_bench/models/init.py
examples/specdec_bench/specdec_bench/models/vllm.py
examples/specdec_bench/specdec_bench/metrics/timing.py
examples/specdec_bench/specdec_bench/runners/base.py
examples/specdec_bench/specdec_bench/metrics/specbench.py
examples/specdec_bench/specdec_bench/runners/init.py
examples/specdec_bench/specdec_bench/metrics/base.py
examples/specdec_bench/specdec_bench/metrics/acceptance_rate.py
examples/specdec_bench/specdec_bench/metrics/thinking_acceptance.py
examples/specdec_bench/specdec_bench/metrics/init.py

📝 Walkthrough

[!WARNING]

Walkthrough skipped

File diffs could not be summarized.

🚥 Pre-merge checks | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Security Anti-Patterns	❓ Inconclusive	Custom check execution failed before a final verdict was produced.	Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch talora/specdec-bench-july-26

_{Comment @coderabbitai help to get the list of available commands.}

codecov · 2026-07-02T11:53:49Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.12%. Comparing base (9038b71) to head (85e9227).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1886      +/-   ##
==========================================
+ Coverage   70.21%   75.12%   +4.90%     
==========================================
  Files         515      515              
  Lines       57244    57244              
==========================================
+ Hits        40196    43002    +2806     
+ Misses      17048    14242    -2806

Flag	Coverage Δ
examples	`42.97% <ø> (+10.10%)`	⬆️
unit	`54.89% <ø> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

claude

Claude review — 2 CRITICAL, 0 IMPORTANT, 0 SUGGESTION.

Scope: examples-only PR (31 files, all under examples/specdec_bench/; no modelopt/ changes). Reviewed the high-risk source files: run.py, run_utils.py, all four engine wrappers (sglang.py, trtllm_torch_api.py, vllm.py, auto_deploy.py), models/base.py, runners, the metrics, and the dataset changes. Did not deeply review the README/markdown or sweep_example.yaml (docs/config).

Most impactful findings:

[CRITICAL] SGLANG engine constructor raises TypeError on every run (models/sglang.py:43). speculative_algorithm is read via .get() so it stays in kwargs, then engine_kwargs = dict(..., speculative_algorithm=speculative_algorithm, ..., **kwargs) passes the keyword twice, producing a duplicate-keyword TypeError. It also shadows the remapped value (e.g. MTP to EAGLE) and leaks parallel_drafting (always sent by run.py) into sgl.Engine. Needs .pop() plus handling of parallel_drafting.
[CRITICAL] AutoDeploy output-format regression (models/auto_deploy.py:69). Unlike the collector-based engines, AutoDeploy still returns nested [beam][chunk][token] as output_ids with no chunk_lengths. run.py calls decode_chat(tokenizer, output_ids[0]) on a list-of-lists via tokenizer.decode, which fails/garbles for the AUTO_DEPLOY path. Convert it to CumulativeTokenCollector or flatten and pass chunk_lengths.

Risk assessment: Medium-high. The refactor to the record-based metrics pipeline and CumulativeTokenCollector is coherent, and the sweep / error-handling additions look solid, but the two engine wrappers above are broken on their primary code paths (SGLANG unconditionally, AutoDeploy on decode). Both are self-contained fixes. The metric-interface change is an acknowledged intentional internal-API break, and the run.py entrypoint stays backward compatible. Recommend addressing the two CRITICALs before merge.

claude · 2026-07-02T12:00:19Z

-        output_dict["output_logits"] = None
-        output_dict["token_times"] = timing
-        return output_dict
+        return build_model_output(reformatted_output_ids, timing)


[CRITICAL Algorithm] Output-format contract is inconsistent with the other engines. The collector-based models (TRTLLM/vLLM/SGLANG) now return output_ids as a flat per-beam token list (collector.output_ids()), with chunking carried separately in chunk_lengths. Here AutoDeploy still returns the nested reformatted_output_ids ([beam][chunk][token]) as output_ids and passes no chunk_lengths.

run.py then calls decode_chat(tokenizer, output_tokens["output_ids"][0]), i.e. tokenizer.decode(...) on beam 0. For the collector models that's a flat list[int] (works); for AutoDeploy it's a list[list[int]] (list of chunks), which tokenizer.decode cannot handle — the AUTO_DEPLOY engine path produces a decode error / garbage output. (Metrics happen to survive because Metric.chunk_lengths() detects the nested shape, but decoding does not.)

Suggest converting AutoDeploy to CumulativeTokenCollector like the other engines, or flattening tokens per beam and passing the per-chunk lengths via build_model_output(flat_output_ids, timing, chunk_lengths=...).

specdec_bench modelopt release

85e9227

Signed-off-by: talora <talora@nvidia.com>

talorabr requested a review from a team as a code owner July 2, 2026 11:42

coderabbitai Bot approved these changes Jul 2, 2026

View reviewed changes

claude Bot reviewed Jul 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

specdec_bench modelopt release#1886

specdec_bench modelopt release#1886
talorabr wants to merge 1 commit into
mainfrom
talora/specdec-bench-july-26

talorabr commented Jul 2, 2026 •

edited

Loading

Uh oh!

talorabr commented Jul 2, 2026

Uh oh!

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

Walkthrough skipped

❌ Failed checks (1 inconclusive)

Uh oh!

codecov Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

talorabr commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

talorabr commented Jul 2, 2026

Uh oh!

coderabbitai Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough skipped

❌ Failed checks (1 inconclusive)

Uh oh!

codecov Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

talorabr commented Jul 2, 2026 •

edited

Loading

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

codecov Bot commented Jul 2, 2026 •

edited

Loading