[Fix] Make h1_locomotion demo runnable on main: migrate deprecated RSL-RL config and checkpoint#5903
[Fix] Make h1_locomotion demo runnable on main: migrate deprecated RSL-RL config and checkpoint#5903hujc7 wants to merge 4 commits into
Conversation
The H1 rough agent config defines the policy via the deprecated `policy` field (RslRlPpoActorCriticCfg). With rsl-rl-lib 5.0.1, which expects per-model `actor`/`critic` configs carrying `class_name`, constructing OnPolicyRunner raises KeyError: 'class_name' because the demo fed the un-migrated config straight to the runner. Migrate the parsed config via handle_deprecated_rsl_rl_cfg before building the runner, matching scripts/reinforcement_learning/rsl_rl train/play. The helper converts `policy` into `actor`/`critic` (class_name=MLPModel), eliminating the KeyError.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot — PR #5903 (Updated)
📋 Summary
This PR now addresses both deprecation issues that arise when running h1_locomotion.py with rsl-rl >= 5.0:
- Config migration (original fix): Converts deprecated
policyfield intoactor/criticconfigs viahandle_deprecated_rsl_rl_cfg. - Checkpoint migration (new): Converts pre-5.0 combined
model_state_dictcheckpoints into the splitactor_state_dict/critic_state_dictlayout via the newhandle_deprecated_rsl_rl_checkpoint.
🔬 Isaac Lab Expert
Verdict: ✅ Well-implemented checkpoint migration
The new handle_deprecated_rsl_rl_checkpoint() in isaaclab_rl/rsl_rl/utils.py:
- Correctly gates on rsl-rl version (only converts when >= 5.0)
- Detects old layout by checking for
model_state_dictkey and absence ofactor_state_dict - Remaps keys properly:
actor.*→mlp.*in actor dict,critic.*→mlp.*in critic dict,std→distribution.std_param - Raises
ValueErrorfor unrecognized keys (defensive, avoids silent corruption) - Writes converted checkpoint to a sibling file (
_rslrl5suffix) rather than mutating the original - Carries
optimizer_state_dict,iter, andinfosfor completeness
The demo script integration is clean — calls handle_deprecated_rsl_rl_checkpoint(checkpoint, metadata.version("rsl-rl-lib")) right after fetching the published checkpoint, before the runner uses it.
🧪 Test Coverage Analyzer
Excellent test coverage added. TestHandleDeprecatedRslRlCheckpoint covers:
- ✅ Converts combined state dict with correct key remapping and value preservation
- ✅ Passes through new-layout checkpoints untouched
- ✅ Passes through when rsl-rl < 5.0 (no conversion needed)
- ✅ Raises
ValueErroron unrecognized keys
All test cases use tmp_path fixture properly and test both the path return value and the loaded content.
📦 Release Hygiene
- Version bump to 0.5.2 ✅
- CHANGELOG entry with clear description ✅
__init__.pyexports the new public function ✅
🔍 Minor Observations (non-blocking)
- The
weights_only=Falseintorch.loadis intentional (rsl-rl checkpoints may contain non-tensor objects). This matches the pattern used by rsl-rl's ownOnPolicyRunner.load(). - The conversion only supports MLP actor-critic architectures (documented in the ValueError message). RNN/Transformer checkpoints would need separate handling, but this matches the scope of the H1 demo which uses MLP.
🏁 Final Verdict: LGTM ✅
Comprehensive fix that addresses the full stack of deprecation issues (config + checkpoint). Clean implementation, good error handling, proper tests, correct release hygiene. Ready to merge.
📝 Update (bf96706): New commit extends handle_deprecated_rsl_rl_checkpoint to scripts/reinforcement_learning/rsl_rl/play.py. This is correct — play.py --use_pretrained_checkpoint loads published (pre-5.0) checkpoints through the same OnPolicyRunner.load path and would hit the same KeyError: 'actor_state_dict'. The fix imports the converter and applies it to resume_path just before runner.load(), with a no-op guard for new-layout checkpoints. Consistent with the pattern already used in the demo script. Previous assessment stands — LGTM ✅.
📝 Update (5f1d13d): Branch merged with main — brings in the daqp 0.8.5 pin bump (#5902, version 0.54.4, CHANGELOG, setup.py). These changes are from an unrelated already-merged PR and do not affect this PR's logic. No new concerns — LGTM ✅.
Greptile SummaryThis PR fixes a crash in the H1 locomotion demo caused by the
Confidence Score: 5/5Safe to merge — the change is a single, well-understood migration call that already works correctly in train.py and play.py. The fix is minimal (3 lines), directly mirrors tested patterns from the sibling training and play scripts, and handle_deprecated_rsl_rl_cfg has dedicated unit tests for rsl-rl 5.x migration. No new logic is introduced and no existing behavior is altered. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant Script as h1_locomotion.py
participant CLI as cli_args
participant Handle as handle_deprecated_rsl_rl_cfg
participant Runner as OnPolicyRunner (rsl-rl 5.x)
Script->>CLI: parse_rsl_rl_cfg(TASK, args_cli)
CLI-->>Script: agent_cfg (legacy policy field)
Note over Script: NEW: migrate config
Script->>Handle: handle_deprecated_rsl_rl_cfg(agent_cfg, "5.0.1")
Handle-->>Script: agent_cfg (actor + critic with class_name)
Script->>Runner: OnPolicyRunner(env, agent_cfg.to_dict(), ...)
Runner-->>Script: ppo_runner (no KeyError)
Reviews (1): Last reviewed commit: "Fix KeyError 'class_name' in h1_locomoti..." | Re-trigger Greptile |
The cfg migration alone unblocked OnPolicyRunner construction but the demo then failed at runner.load() with KeyError: 'actor_state_dict': the published H1 checkpoint (Isaac 5.1 assets) was saved in the pre-5.0 rsl-rl layout (single combined model_state_dict), while rsl-rl 5.x expects separate actor_state_dict/critic_state_dict. Add handle_deprecated_rsl_rl_checkpoint to isaaclab_rl, which remaps the old combined state dict (actor.*/critic.*/std) into the actor/critic layout, and call it in the demo before loading. Verified end-to-end in an Isaac Sim 5.1.0 + rsl-rl 5.0.1 container: convert -> load -> policy inference produces finite actions. Adds unit tests for the converter.
play.py --use_pretrained_checkpoint loads the published (pre-5.0) checkpoint via the same OnPolicyRunner.load path, so it hit the same KeyError: 'actor_state_dict'. Apply handle_deprecated_rsl_rl_checkpoint to the resolved checkpoint before loading (no-op for new-layout checkpoints, so locally-trained checkpoints are unaffected). Verified: baseline play.py --use_pretrained_checkpoint crashes at actor_state_dict; with the converter it loads and runs.
1. Summary
./isaaclab.sh -p scripts/demos/h1_locomotion.pycrashes onmain(Isaac Sim 5.1.0,rsl-rl-lib==5.0.1). Two sequential incompatibilities with rsl-rl 5.x; both fixed here.KeyError: 'class_name'atOnPolicyRunnerconstruction: the H1 agent config uses the deprecatedpolicyfield; rsl-rl 5.x expects per-modelactor/criticconfigs. Fixed byhandle_deprecated_rsl_rl_cfgin the demo (astrain.py/play.pyalready do).KeyError: 'actor_state_dict'atrunner.load(): the published H1 checkpoint (Isaac 5.1 assets) was saved in the pre-5.0 layout (single combinedmodel_state_dict), while rsl-rl 5.x expects separateactor_state_dict/critic_state_dict. Fixed by a newhandle_deprecated_rsl_rl_checkpointhelper that remaps the old layout, applied before loading.2. Why both are needed
The cfg migration alone unblocks construction but the demo then fails loading the old-format published checkpoint.
develop's demo avoids the second crash only because its newer (Isaac 6.0) assets ship the checkpoint in the new layout — there is no conversion code there. Onmain(5.1 assets) the checkpoint must be converted.3. Changes
source/isaaclab_rl/isaaclab_rl/rsl_rl/utils.py: addhandle_deprecated_rsl_rl_checkpoint— remapsactor.N→actor_state_dict.mlp.N,critic.N→critic_state_dict.mlp.N,std→actor_state_dict.distribution.std_param; new-layout checkpoints and rsl-rl < 5.0 pass through unchanged; raises on unrecognized keys.scripts/demos/h1_locomotion.py: migrate the agent config, then convert the checkpoint beforeppo_runner.load.scripts/reinforcement_learning/rsl_rl/play.py: apply the same converter to the resolved checkpoint beforerunner.load(covers--use_pretrained_checkpoint, which loads the same old published checkpoint; no-op for locally-trained checkpoints).source/isaaclab_rl/test/test_rsl_rl_cfg_deprecation.py: unit tests for the converter.isaaclab_rlversion bump + changelog.4. Verification
In an Isaac Sim 5.1.0 +
rsl-rl-lib==5.0.1container (the demo's GUI viewport needs a display the CI host lacks; the bug is upstream of rendering, so validated headless on the real code paths):KeyError: 'class_name'; cfg-only →KeyError: 'actor_state_dict'; with both fixes → the realh1_locomotionmodule's__init__completes (config migrates, checkpoint converts via strict-matchingload_state_dict,runner.loadsucceeds) and the policy steps with finite joint targets.play.py --use_pretrained_checkpoint: unmodified →KeyError: 'actor_state_dict'atrunner.load; with the converter → checkpoint converts, loads, and the policy runs../isaaclab.sh -fclean.5. Test plan
h1_locomotionmodule runs headless end-to-end with both fixes (init completes + inference).play.py --use_pretrained_checkpointFAIL→PASS with the converter.