Skip to content

specdec_bench modelopt release#1886

Open
talorabr wants to merge 1 commit into
mainfrom
talora/specdec-bench-july-26
Open

specdec_bench modelopt release#1886
talorabr wants to merge 1 commit into
mainfrom
talora/specdec-bench-july-26

Conversation

@talorabr

@talorabr talorabr commented Jul 2, 2026

Copy link
Copy Markdown

What does this PR do?

Type of change: new features

New release of specdec_bench including the following features:

  • Agentic data split support for SPEED-Bench.
  • Experiment sweeps in specdec_bench.
  • Enhancements for error handling to enable fast failing.
  • Support for new specdec methods: DFLASH, PARD.

Usage

Example on how to use the agentic split:

python3 run.py       --model_dir path/to/NVIDIA-Nemotron-3-Super-120B-A12B-BF16       --tokenizer path/to/NVIDIA-Nemotron-3-Super-120B-A12B-BF16       --dataset speed       --dataset_path /path/to/SPEED-Bench/agentic/ --agentic --agentic_skip_turns_delta 4         --engine VLLM       --speculative_algorithm MTP       --output_length 2048       --draft_length 7       --tp_size 8       --concurrency 32       --show_progress

Testing

Tested the following scenarios:

  • Nemotron Super MTP with VLLM on SPEED-Bench (qualitative, throughput 1k, agentic)
  • Llama 3.1 8B with external draft (Llama 3.2 1B) on SPEED-Bench with VLLM (qualitative, throughput 1k)
  • Nemotron Super MTP with SGLANG on SPEED-Bench (agentic)

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ❌ This change is backward compatible in terms of the entrypoint run.py but is not backward compatible in term of the internal APIs in specdec_bench including changes to the metrics interfaces.
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: ❌
  • Did you update Changelog?: ❌
  • Did you get Claude approval on this PR?: ✅ / ❌ / N/A

Additional Information

Signed-off-by: talora <talora@nvidia.com>
@talorabr talorabr requested a review from a team as a code owner July 2, 2026 11:42
@talorabr

talorabr commented Jul 2, 2026

Copy link
Copy Markdown
Author

/claude review

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4156b955-597d-43e1-9e41-9bbf348c071b

📥 Commits

Reviewing files that changed from the base of the PR and between 9038b71 and 85e9227.

📒 Files selected for processing (31)
  • examples/specdec_bench/README.md
  • examples/specdec_bench/SPECBENCH_PORTING.md
  • examples/specdec_bench/prepare_data.py
  • examples/specdec_bench/run.py
  • examples/specdec_bench/specdec_bench/datasets/__init__.py
  • examples/specdec_bench/specdec_bench/datasets/agentic_speed.py
  • examples/specdec_bench/specdec_bench/datasets/base.py
  • examples/specdec_bench/specdec_bench/datasets/mtbench.py
  • examples/specdec_bench/specdec_bench/datasets/specbench.py
  • examples/specdec_bench/specdec_bench/datasets/speed.py
  • examples/specdec_bench/specdec_bench/metrics/__init__.py
  • examples/specdec_bench/specdec_bench/metrics/acceptance_rate.py
  • examples/specdec_bench/specdec_bench/metrics/agentic.py
  • examples/specdec_bench/specdec_bench/metrics/base.py
  • examples/specdec_bench/specdec_bench/metrics/mtbench.py
  • examples/specdec_bench/specdec_bench/metrics/specbench.py
  • examples/specdec_bench/specdec_bench/metrics/thinking_acceptance.py
  • examples/specdec_bench/specdec_bench/metrics/timing.py
  • examples/specdec_bench/specdec_bench/models/__init__.py
  • examples/specdec_bench/specdec_bench/models/auto_deploy.py
  • examples/specdec_bench/specdec_bench/models/base.py
  • examples/specdec_bench/specdec_bench/models/sglang.py
  • examples/specdec_bench/specdec_bench/models/specbench_medusa.py
  • examples/specdec_bench/specdec_bench/models/trtllm_torch_api.py
  • examples/specdec_bench/specdec_bench/models/vllm.py
  • examples/specdec_bench/specdec_bench/run_utils.py
  • examples/specdec_bench/specdec_bench/runners/__init__.py
  • examples/specdec_bench/specdec_bench/runners/base.py
  • examples/specdec_bench/specdec_bench/runners/simple.py
  • examples/specdec_bench/specdec_bench/utils.py
  • examples/specdec_bench/sweep_example.yaml
👮 Files not reviewed due to content moderation or server errors (31)
  • examples/specdec_bench/prepare_data.py
  • examples/specdec_bench/specdec_bench/datasets/init.py
  • examples/specdec_bench/specdec_bench/datasets/agentic_speed.py
  • examples/specdec_bench/specdec_bench/datasets/mtbench.py
  • examples/specdec_bench/specdec_bench/datasets/base.py
  • examples/specdec_bench/README.md
  • examples/specdec_bench/specdec_bench/datasets/speed.py
  • examples/specdec_bench/specdec_bench/datasets/specbench.py
  • examples/specdec_bench/run.py
  • examples/specdec_bench/specdec_bench/metrics/mtbench.py
  • examples/specdec_bench/sweep_example.yaml
  • examples/specdec_bench/specdec_bench/utils.py
  • examples/specdec_bench/specdec_bench/run_utils.py
  • examples/specdec_bench/specdec_bench/metrics/agentic.py
  • examples/specdec_bench/SPECBENCH_PORTING.md
  • examples/specdec_bench/specdec_bench/models/specbench_medusa.py
  • examples/specdec_bench/specdec_bench/models/base.py
  • examples/specdec_bench/specdec_bench/models/trtllm_torch_api.py
  • examples/specdec_bench/specdec_bench/models/sglang.py
  • examples/specdec_bench/specdec_bench/models/auto_deploy.py
  • examples/specdec_bench/specdec_bench/runners/simple.py
  • examples/specdec_bench/specdec_bench/models/init.py
  • examples/specdec_bench/specdec_bench/models/vllm.py
  • examples/specdec_bench/specdec_bench/metrics/timing.py
  • examples/specdec_bench/specdec_bench/runners/base.py
  • examples/specdec_bench/specdec_bench/metrics/specbench.py
  • examples/specdec_bench/specdec_bench/runners/init.py
  • examples/specdec_bench/specdec_bench/metrics/base.py
  • examples/specdec_bench/specdec_bench/metrics/acceptance_rate.py
  • examples/specdec_bench/specdec_bench/metrics/thinking_acceptance.py
  • examples/specdec_bench/specdec_bench/metrics/init.py

📝 Walkthrough

[!WARNING]

Walkthrough skipped

File diffs could not be summarized.

🚥 Pre-merge checks | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Security Anti-Patterns ❓ Inconclusive Custom check execution failed before a final verdict was produced. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch talora/specdec-bench-july-26

Comment @coderabbitai help to get the list of available commands.

@codecov

codecov Bot commented Jul 2, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.12%. Comparing base (9038b71) to head (85e9227).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1886      +/-   ##
==========================================
+ Coverage   70.21%   75.12%   +4.90%     
==========================================
  Files         515      515              
  Lines       57244    57244              
==========================================
+ Hits        40196    43002    +2806     
+ Misses      17048    14242    -2806     
Flag Coverage Δ
examples 42.97% <ø> (+10.10%) ⬆️
unit 54.89% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review — 2 CRITICAL, 0 IMPORTANT, 0 SUGGESTION.

Scope: examples-only PR (31 files, all under examples/specdec_bench/; no modelopt/ changes). Reviewed the high-risk source files: run.py, run_utils.py, all four engine wrappers (sglang.py, trtllm_torch_api.py, vllm.py, auto_deploy.py), models/base.py, runners, the metrics, and the dataset changes. Did not deeply review the README/markdown or sweep_example.yaml (docs/config).

Most impactful findings:

  1. [CRITICAL] SGLANG engine constructor raises TypeError on every run (models/sglang.py:43). speculative_algorithm is read via .get() so it stays in kwargs, then engine_kwargs = dict(..., speculative_algorithm=speculative_algorithm, ..., **kwargs) passes the keyword twice, producing a duplicate-keyword TypeError. It also shadows the remapped value (e.g. MTP to EAGLE) and leaks parallel_drafting (always sent by run.py) into sgl.Engine. Needs .pop() plus handling of parallel_drafting.

  2. [CRITICAL] AutoDeploy output-format regression (models/auto_deploy.py:69). Unlike the collector-based engines, AutoDeploy still returns nested [beam][chunk][token] as output_ids with no chunk_lengths. run.py calls decode_chat(tokenizer, output_ids[0]) on a list-of-lists via tokenizer.decode, which fails/garbles for the AUTO_DEPLOY path. Convert it to CumulativeTokenCollector or flatten and pass chunk_lengths.

Risk assessment: Medium-high. The refactor to the record-based metrics pipeline and CumulativeTokenCollector is coherent, and the sweep / error-handling additions look solid, but the two engine wrappers above are broken on their primary code paths (SGLANG unconditionally, AutoDeploy on decode). Both are self-contained fixes. The metric-interface change is an acknowledged intentional internal-API break, and the run.py entrypoint stays backward compatible. Recommend addressing the two CRITICALs before merge.

output_dict["output_logits"] = None
output_dict["token_times"] = timing
return output_dict
return build_model_output(reformatted_output_ids, timing)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CRITICAL Algorithm] Output-format contract is inconsistent with the other engines. The collector-based models (TRTLLM/vLLM/SGLANG) now return output_ids as a flat per-beam token list (collector.output_ids()), with chunking carried separately in chunk_lengths. Here AutoDeploy still returns the nested reformatted_output_ids ([beam][chunk][token]) as output_ids and passes no chunk_lengths.

run.py then calls decode_chat(tokenizer, output_tokens["output_ids"][0]), i.e. tokenizer.decode(...) on beam 0. For the collector models that's a flat list[int] (works); for AutoDeploy it's a list[list[int]] (list of chunks), which tokenizer.decode cannot handle — the AUTO_DEPLOY engine path produces a decode error / garbage output. (Metrics happen to survive because Metric.chunk_lengths() detects the nested shape, but decoding does not.)

Suggest converting AutoDeploy to CumulativeTokenCollector like the other engines, or flattening tokens per beam and passing the per-chunk lengths via build_model_output(flat_output_ids, timing, chunk_lengths=...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant