Puzzletron tutorial fixes for runtime optimization#1803
Puzzletron tutorial fixes for runtime optimization#1803grzegorz-k-karch wants to merge 8 commits into
Conversation
Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds vLLM GPU memory utilization support and config conversion helpers, and introduces new Llama-3.1-8B pruneffn runtime validation and pruning YAML defaults. ChangesvLLM GPU Memory Utilization Support
Llama-3.1-8B pruneffn Runtime YAML Configs
Estimated code review effort: 2 (Simple) | ~12 minutes Sequence Diagram(s)sequenceDiagram
participant calc_runtime_for_subblocks
participant RuntimeConfig
participant run_vllm_latency_benchmark
participant convert_config_to_vllm_anymodel
participant vLLM subprocess
calc_runtime_for_subblocks->>RuntimeConfig: construct with gpu_memory_utilization=0.5
run_vllm_latency_benchmark->>convert_config_to_vllm_anymodel: load config.json and rewrite AnyModel config
run_vllm_latency_benchmark->>vLLM subprocess: pass --gpu-memory-utilization
🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1803 +/- ##
==========================================
- Coverage 70.21% 62.54% -7.68%
==========================================
Files 515 516 +1
Lines 57244 57511 +267
==========================================
- Hits 40196 35970 -4226
- Misses 17048 21541 +4493
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
modelopt/torch/puzzletron/subblock_stats/runtime_utils.py (1)
91-114: 📐 Maintainability & Code Quality | 🔵 TrivialAdd return type annotation and document (or parameterize) hardcoded Llama architecture assumption.
Return type hint: Add
-> Noneto the function signature (line 93). The function has no explicit return statement.Hardcoded
base_architecture: Line 107 unconditionally setsbase_architecture = "LlamaForCausalLM". This module is Llama-specific (importsLlamaForCausalLMandLlamaModelDescriptor), so the hardcoding appears intentional. Either add a docstring note clarifying this function is Llama-specific, or acceptbase_architectureas a parameter if broader model support is planned.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt/torch/puzzletron/subblock_stats/runtime_utils.py` around lines 91 - 114, The function convert_config_to_vllm_anymodel is missing a return type annotation and has a hardcoded assumption about the model architecture. First, add the return type hint -> None to the function signature since the function does not explicitly return any value. Second, address the hardcoded base_architecture assignment that unconditionally sets it to "LlamaForCausalLM". Either add documentation in the function's docstring to clarify that this function is Llama-specific and explain why the architecture is hardcoded, or alternatively, parameterize the base_architecture by accepting it as an optional function parameter with a default value to allow for broader model support in the future.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@modelopt/torch/puzzletron/subblock_stats/runtime_utils.py`:
- Around line 91-114: The function convert_config_to_vllm_anymodel is missing a
return type annotation and has a hardcoded assumption about the model
architecture. First, add the return type hint -> None to the function signature
since the function does not explicitly return any value. Second, address the
hardcoded base_architecture assignment that unconditionally sets it to
"LlamaForCausalLM". Either add documentation in the function's docstring to
clarify that this function is Llama-specific and explain why the architecture is
hardcoded, or alternatively, parameterize the base_architecture by accepting it
as an optional function parameter with a default value to allow for broader
model support in the future.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c26ffda4-6f74-455d-a721-7a5ed0be45e2
📒 Files selected for processing (10)
examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/Llama-3_1-8B.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/attn_pruning.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/ffn_pruning.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/hidden_dim_pruning.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/pruning_defaults.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/validate_model_defaults.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/validate_solutions_defaults.yamlmodelopt/torch/puzzletron/subblock_stats/calc_runtime_stats.pymodelopt/torch/puzzletron/subblock_stats/runtime_utils.pymodelopt/torch/puzzletron/subblock_stats/runtime_vllm.py
Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>
Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>
Signed-off-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com>
Added a TODO comment to extend support for other models. Signed-off-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com>
What does this PR do?
Type of change: Bug fix
Fixes some issues related to runtime optimization
validate_model_defaults not founderror - runtime optimization has now its own separate config files instead of reusing memory optimization filesUsage
(does not apply)
Testing
Tested by running the whole pipeline as described in the tutorial
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/AAdditional Information
Summary by CodeRabbit