Puzzletron tutorial fixes for runtime optimization by grzegorz-k-karch · Pull Request #1803 · NVIDIA/Model-Optimizer

grzegorz-k-karch · 2026-06-23T09:31:49Z

What does this PR do?

Type of change: Bug fix

Fixes some issues related to runtime optimization

Solved OOM - fix: reduced GPU memory utilization
Correctly export AnyModel config for vLLM - use namespace instead of dict to correctly read config
Fixed validate_model_defaults not found error - runtime optimization has now its own separate config files instead of reusing memory optimization files

Usage

(does not apply)

Testing

Tested by running the whole pipeline as described in the tutorial

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: N/A
Did you update Changelog?: ❌
Did you get Claude approval on this PR?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

New Features
- Added runtime pruning presets for attention heads, FFN channels, and hidden dimensions.
- Added/updated default validation and solution-validation configs for the Llama 3.1 8B pruning workflow.
- Added support for converting model configs to a vLLM-compatible “AnyModel” format and capping GPU memory usage during latency benchmarks.
Bug Fixes
- Updated pruning/validation presets to use the new validation-based configuration flow.
- Reduced scoring evaluation samples for faster runs.

Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>

copy-pr-bot · 2026-06-23T09:31:53Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-06-23T09:31:58Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 52b0eff7-d1c9-470f-abb1-75d9ccc9f5b2

📥 Commits

Reviewing files that changed from the base of the PR and between bfb3619 and 32bd535.

📒 Files selected for processing (1)

modelopt/torch/puzzletron/subblock_stats/runtime_utils.py

🚧 Files skipped from review as they are similar to previous changes (1)

modelopt/torch/puzzletron/subblock_stats/runtime_utils.py

📝 Walkthrough

Walkthrough

Adds vLLM GPU memory utilization support and config conversion helpers, and introduces new Llama-3.1-8B pruneffn runtime validation and pruning YAML defaults.

Changes

vLLM GPU Memory Utilization Support

Layer / File(s)	Summary
RuntimeConfig field and calc_runtime_stats wiring `modelopt/torch/puzzletron/subblock_stats/runtime_utils.py`, `modelopt/torch/puzzletron/subblock_stats/calc_runtime_stats.py`	Adds `gpu_memory_utilization` to `RuntimeConfig` and passes the value from `runtime_stats_config` into construction.
convert_config_to_vllm_anymodel helper and runtime_vllm.py update `modelopt/torch/puzzletron/subblock_stats/runtime_utils.py`, `modelopt/torch/puzzletron/subblock_stats/runtime_vllm.py`	Adds config.json-to-AnyModel conversion, updates config serialization, and passes `--gpu-memory-utilization` to the vLLM benchmark command.

Llama-3.1-8B pruneffn Runtime YAML Configs

Layer / File(s)	Summary
Validation model and solution defaults `examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/validate_model_defaults.yaml`, `examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/validate_solutions_defaults.yaml`	Adds validation runtime defaults and solution-validation controls.
Pruning defaults base config `examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/pruning_defaults.yaml`	Adds shared pruning defaults and pruning mode configuration.
Attention, FFN, and hidden-dim pruning strategy configs `examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/attn_pruning.yaml`, `examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/ffn_pruning.yaml`, `examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/hidden_dim_pruning.yaml`	Adds three pruning strategy configs for attention heads, FFN channels, and hidden dimensions.
Top-level Llama-3_1-8B.yaml defaults update `examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/Llama-3_1-8B.yaml`	Rewires defaults to the new validation defaults and lowers scoring eval samples.

Estimated code review effort: 2 (Simple) | ~12 minutes

Sequence Diagram(s)

sequenceDiagram
  participant calc_runtime_for_subblocks
  participant RuntimeConfig
  participant run_vllm_latency_benchmark
  participant convert_config_to_vllm_anymodel
  participant vLLM subprocess

  calc_runtime_for_subblocks->>RuntimeConfig: construct with gpu_memory_utilization=0.5
  run_vllm_latency_benchmark->>convert_config_to_vllm_anymodel: load config.json and rewrite AnyModel config
  run_vllm_latency_benchmark->>vLLM subprocess: pass --gpu-memory-utilization

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title is concise and clearly relates to the PR’s runtime optimization fixes, though it is broader than the specific config and vLLM changes.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	The PR diff adds no forbidden patterns; added Python lines contain no torch.load/numpy.load/trust_remote_code/eval/exec/nosec usage.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch gkarch/puzzletron-tutorial-fixes

_{Comment @coderabbitai help to get the list of available commands.}

github-actions · 2026-06-23T09:36:16Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1803/
Built to branch `gh-pages` at 2026-07-03 08:37 UTC. Preview will be ready when the GitHub Pages deployment is complete.

codecov · 2026-06-23T09:41:34Z

Codecov Report

❌ Patch coverage is 0% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.54%. Comparing base (9038b71) to head (32bd535).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
...t/torch/puzzletron/subblock_stats/runtime_utils.py	0.00%	20 Missing ⚠️
...pt/torch/puzzletron/subblock_stats/runtime_vllm.py	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1803      +/-   ##
==========================================
- Coverage   70.21%   62.54%   -7.68%     
==========================================
  Files         515      516       +1     
  Lines       57244    57511     +267     
==========================================
- Hits        40196    35970    -4226     
- Misses      17048    21541    +4493

Flag	Coverage Δ
unit	`54.89% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

🧹 Nitpick comments (1)

modelopt/torch/puzzletron/subblock_stats/runtime_utils.py (1)
91-114: 📐 Maintainability & Code Quality | 🔵 Trivial

Add return type annotation and document (or parameterize) hardcoded Llama architecture assumption.

Return type hint: Add -> None to the function signature (line 93). The function has no explicit return statement.

Hardcoded base_architecture: Line 107 unconditionally sets base_architecture = "LlamaForCausalLM". This module is Llama-specific (imports LlamaForCausalLM and LlamaModelDescriptor), so the hardcoding appears intentional. Either add a docstring note clarifying this function is Llama-specific, or accept base_architecture as a parameter if broader model support is planned.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/puzzletron/subblock_stats/runtime_utils.py` around lines 91 -
114, The function convert_config_to_vllm_anymodel is missing a return type
annotation and has a hardcoded assumption about the model architecture. First,
add the return type hint -> None to the function signature since the function
does not explicitly return any value. Second, address the hardcoded
base_architecture assignment that unconditionally sets it to "LlamaForCausalLM".
Either add documentation in the function's docstring to clarify that this
function is Llama-specific and explain why the architecture is hardcoded, or
alternatively, parameterize the base_architecture by accepting it as an optional
function parameter with a default value to allow for broader model support in
the future.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@modelopt/torch/puzzletron/subblock_stats/runtime_utils.py`:
- Around line 91-114: The function convert_config_to_vllm_anymodel is missing a
return type annotation and has a hardcoded assumption about the model
architecture. First, add the return type hint -> None to the function signature
since the function does not explicitly return any value. Second, address the
hardcoded base_architecture assignment that unconditionally sets it to
"LlamaForCausalLM". Either add documentation in the function's docstring to
clarify that this function is Llama-specific and explain why the architecture is
hardcoded, or alternatively, parameterize the base_architecture by accepting it
as an optional function parameter with a default value to allow for broader
model support in the future.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c26ffda4-6f74-455d-a721-7a5ed0be45e2

📥 Commits

Reviewing files that changed from the base of the PR and between c3b913b and 8b02d7a.

📒 Files selected for processing (10)

examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/Llama-3_1-8B.yaml
examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/attn_pruning.yaml
examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/ffn_pruning.yaml
examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/hidden_dim_pruning.yaml
examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/pruning_defaults.yaml
examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/validate_model_defaults.yaml
examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/validate_solutions_defaults.yaml
modelopt/torch/puzzletron/subblock_stats/calc_runtime_stats.py
modelopt/torch/puzzletron/subblock_stats/runtime_utils.py
modelopt/torch/puzzletron/subblock_stats/runtime_vllm.py

Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>

Signed-off-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com>

Added a TODO comment to extend support for other models. Signed-off-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com>

Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>

grzegorz-k-karch added 2 commits June 15, 2026 12:09

adding some fixes for puzzletron/runtime tutorial

a724afd

Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>

Merge branch 'main' into gkarch/puzzletron-tutorial-fixes

8b02d7a

grzegorz-k-karch self-assigned this Jun 23, 2026

grzegorz-k-karch marked this pull request as ready for review June 23, 2026 09:33

grzegorz-k-karch requested a review from a team as a code owner June 23, 2026 09:33

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

coderabbitai Bot approved these changes Jun 23, 2026

View reviewed changes

validation->valid in llama 3.1 config

ff55eee

Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>

kevalmorabia97 reviewed Jun 23, 2026

View reviewed changes

Comment thread examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/validate_model_defaults.yaml

kevalmorabia97 reviewed Jun 23, 2026

View reviewed changes

Comment thread modelopt/torch/puzzletron/subblock_stats/runtime_utils.py Outdated

grzegorz-k-karch and others added 2 commits June 25, 2026 13:09

reverting unnecessary change

b86f13c

Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>

Merge branch 'main' into gkarch/puzzletron-tutorial-fixes

b622ab8

grzegorz-k-karch requested a review from a team July 2, 2026 14:04

kevalmorabia97 reviewed Jul 2, 2026

View reviewed changes

Comment thread examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/validate_model_defaults.yaml Outdated

Rename validation dataset from 'valid' to 'validation'

0688d00

Signed-off-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com>

kevalmorabia97 approved these changes Jul 2, 2026

View reviewed changes

grzegorz-k-karch and others added 2 commits July 2, 2026 17:11

Add TODO for extending model support in runtime_utils

bfb3619

Added a TODO comment to extend support for other models. Signed-off-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com>

ruff fix

32bd535

Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Puzzletron tutorial fixes for runtime optimization#1803

Puzzletron tutorial fixes for runtime optimization#1803
grzegorz-k-karch wants to merge 8 commits into
mainfrom
gkarch/puzzletron-tutorial-fixes

grzegorz-k-karch commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented Jun 23, 2026

Uh oh!

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

github-actions Bot commented Jun 23, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-07-03 08:37 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

grzegorz-k-karch commented Jun 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Jun 23, 2026

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

github-actions Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-07-03 08:37 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

grzegorz-k-karch commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

github-actions Bot commented Jun 23, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-07-03 08:37 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented Jun 23, 2026 •

edited

Loading