Skip to content

[Fix] Make h1_locomotion demo runnable on main: migrate deprecated RSL-RL config and checkpoint#5903

Open
hujc7 wants to merge 4 commits into
isaac-sim:mainfrom
hujc7:jichuanh/fix-h1-locomotion-class-name
Open

[Fix] Make h1_locomotion demo runnable on main: migrate deprecated RSL-RL config and checkpoint#5903
hujc7 wants to merge 4 commits into
isaac-sim:mainfrom
hujc7:jichuanh/fix-h1-locomotion-class-name

Conversation

@hujc7
Copy link
Copy Markdown
Collaborator

@hujc7 hujc7 commented Jun 1, 2026

1. Summary

  • Fixes NVBug 6120938 (P0): ./isaaclab.sh -p scripts/demos/h1_locomotion.py crashes on main (Isaac Sim 5.1.0, rsl-rl-lib==5.0.1). Two sequential incompatibilities with rsl-rl 5.x; both fixed here.
  • Crash 1 — KeyError: 'class_name' at OnPolicyRunner construction: the H1 agent config uses the deprecated policy field; rsl-rl 5.x expects per-model actor/critic configs. Fixed by handle_deprecated_rsl_rl_cfg in the demo (as train.py/play.py already do).
  • Crash 2 — KeyError: 'actor_state_dict' at runner.load(): the published H1 checkpoint (Isaac 5.1 assets) was saved in the pre-5.0 layout (single combined model_state_dict), while rsl-rl 5.x expects separate actor_state_dict/critic_state_dict. Fixed by a new handle_deprecated_rsl_rl_checkpoint helper that remaps the old layout, applied before loading.

2. Why both are needed

The cfg migration alone unblocks construction but the demo then fails loading the old-format published checkpoint. develop's demo avoids the second crash only because its newer (Isaac 6.0) assets ship the checkpoint in the new layout — there is no conversion code there. On main (5.1 assets) the checkpoint must be converted.

3. Changes

  • source/isaaclab_rl/isaaclab_rl/rsl_rl/utils.py: add handle_deprecated_rsl_rl_checkpoint — remaps actor.Nactor_state_dict.mlp.N, critic.Ncritic_state_dict.mlp.N, stdactor_state_dict.distribution.std_param; new-layout checkpoints and rsl-rl < 5.0 pass through unchanged; raises on unrecognized keys.
  • scripts/demos/h1_locomotion.py: migrate the agent config, then convert the checkpoint before ppo_runner.load.
  • scripts/reinforcement_learning/rsl_rl/play.py: apply the same converter to the resolved checkpoint before runner.load (covers --use_pretrained_checkpoint, which loads the same old published checkpoint; no-op for locally-trained checkpoints).
  • source/isaaclab_rl/test/test_rsl_rl_cfg_deprecation.py: unit tests for the converter.
  • isaaclab_rl version bump + changelog.

4. Verification

In an Isaac Sim 5.1.0 + rsl-rl-lib==5.0.1 container (the demo's GUI viewport needs a display the CI host lacks; the bug is upstream of rendering, so validated headless on the real code paths):

  • Demo, unmodified → KeyError: 'class_name'; cfg-only → KeyError: 'actor_state_dict'; with both fixes → the real h1_locomotion module's __init__ completes (config migrates, checkpoint converts via strict-matching load_state_dict, runner.load succeeds) and the policy steps with finite joint targets.
  • play.py --use_pretrained_checkpoint: unmodified → KeyError: 'actor_state_dict' at runner.load; with the converter → checkpoint converts, loads, and the policy runs.
  • Converter unit tests: 4 passed. ./isaaclab.sh -f clean.

5. Test plan

  • Container repro of both demo crashes (unmodified / cfg-only).
  • Real h1_locomotion module runs headless end-to-end with both fixes (init completes + inference).
  • play.py --use_pretrained_checkpoint FAIL→PASS with the converter.
  • Converter unit tests + pre-commit clean.

The H1 rough agent config defines the policy via the deprecated
`policy` field (RslRlPpoActorCriticCfg). With rsl-rl-lib 5.0.1, which
expects per-model `actor`/`critic` configs carrying `class_name`,
constructing OnPolicyRunner raises KeyError: 'class_name' because the
demo fed the un-migrated config straight to the runner.

Migrate the parsed config via handle_deprecated_rsl_rl_cfg before
building the runner, matching scripts/reinforcement_learning/rsl_rl
train/play. The helper converts `policy` into `actor`/`critic`
(class_name=MLPModel), eliminating the KeyError.
@hujc7 hujc7 requested a review from ooctipus as a code owner June 1, 2026 19:14
@github-actions github-actions Bot added bug Something isn't working isaac-lab Related to Isaac Lab team labels Jun 1, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot — PR #5903 (Updated)

📋 Summary

This PR now addresses both deprecation issues that arise when running h1_locomotion.py with rsl-rl >= 5.0:

  1. Config migration (original fix): Converts deprecated policy field into actor/critic configs via handle_deprecated_rsl_rl_cfg.
  2. Checkpoint migration (new): Converts pre-5.0 combined model_state_dict checkpoints into the split actor_state_dict/critic_state_dict layout via the new handle_deprecated_rsl_rl_checkpoint.

🔬 Isaac Lab Expert

Verdict: ✅ Well-implemented checkpoint migration

The new handle_deprecated_rsl_rl_checkpoint() in isaaclab_rl/rsl_rl/utils.py:

  • Correctly gates on rsl-rl version (only converts when >= 5.0)
  • Detects old layout by checking for model_state_dict key and absence of actor_state_dict
  • Remaps keys properly: actor.*mlp.* in actor dict, critic.*mlp.* in critic dict, stddistribution.std_param
  • Raises ValueError for unrecognized keys (defensive, avoids silent corruption)
  • Writes converted checkpoint to a sibling file (_rslrl5 suffix) rather than mutating the original
  • Carries optimizer_state_dict, iter, and infos for completeness

The demo script integration is clean — calls handle_deprecated_rsl_rl_checkpoint(checkpoint, metadata.version("rsl-rl-lib")) right after fetching the published checkpoint, before the runner uses it.


🧪 Test Coverage Analyzer

Excellent test coverage added. TestHandleDeprecatedRslRlCheckpoint covers:

  1. ✅ Converts combined state dict with correct key remapping and value preservation
  2. ✅ Passes through new-layout checkpoints untouched
  3. ✅ Passes through when rsl-rl < 5.0 (no conversion needed)
  4. ✅ Raises ValueError on unrecognized keys

All test cases use tmp_path fixture properly and test both the path return value and the loaded content.


📦 Release Hygiene

  • Version bump to 0.5.2 ✅
  • CHANGELOG entry with clear description ✅
  • __init__.py exports the new public function ✅

🔍 Minor Observations (non-blocking)

  1. The weights_only=False in torch.load is intentional (rsl-rl checkpoints may contain non-tensor objects). This matches the pattern used by rsl-rl's own OnPolicyRunner.load().
  2. The conversion only supports MLP actor-critic architectures (documented in the ValueError message). RNN/Transformer checkpoints would need separate handling, but this matches the scope of the H1 demo which uses MLP.

🏁 Final Verdict: LGTM

Comprehensive fix that addresses the full stack of deprecation issues (config + checkpoint). Clean implementation, good error handling, proper tests, correct release hygiene. Ready to merge.


📝 Update (bf96706): New commit extends handle_deprecated_rsl_rl_checkpoint to scripts/reinforcement_learning/rsl_rl/play.py. This is correct — play.py --use_pretrained_checkpoint loads published (pre-5.0) checkpoints through the same OnPolicyRunner.load path and would hit the same KeyError: 'actor_state_dict'. The fix imports the converter and applies it to resume_path just before runner.load(), with a no-op guard for new-layout checkpoints. Consistent with the pattern already used in the demo script. Previous assessment stands — LGTM ✅.

📝 Update (5f1d13d): Branch merged with main — brings in the daqp 0.8.5 pin bump (#5902, version 0.54.4, CHANGELOG, setup.py). These changes are from an unrelated already-merged PR and do not affect this PR's logic. No new concerns — LGTM ✅.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 1, 2026

Greptile Summary

This PR fixes a crash in the H1 locomotion demo caused by the H1RoughPPORunnerCfg using the deprecated policy field from rsl-rl 4.x, which OnPolicyRunner in rsl-rl 5.x no longer accepts — raising a KeyError: 'class_name' when it tries to pop that key from the absent actor dict.

  • Adds handle_deprecated_rsl_rl_cfg(agent_cfg, metadata.version("rsl-rl-lib")) immediately after parse_rsl_rl_cfg, exactly mirroring the steps already performed in scripts/reinforcement_learning/rsl_rl/train.py and play.py.
  • The import style (from importlib import metadata) differs slightly from train.py/play.py (import importlib.metadata as metadata) but is functionally identical.
  • No logic changes beyond the migration call; the fix is minimal and well-scoped to the demo.

Confidence Score: 5/5

Safe to merge — the change is a single, well-understood migration call that already works correctly in train.py and play.py.

The fix is minimal (3 lines), directly mirrors tested patterns from the sibling training and play scripts, and handle_deprecated_rsl_rl_cfg has dedicated unit tests for rsl-rl 5.x migration. No new logic is introduced and no existing behavior is altered.

No files require special attention.

Important Files Changed

Filename Overview
scripts/demos/h1_locomotion.py Adds handle_deprecated_rsl_rl_cfg call after config parsing to migrate the legacy policy field into actor/critic sub-configs, matching the pattern already used by train.py and play.py; fixes the KeyError crash with rsl-rl-lib 5.x.

Sequence Diagram

sequenceDiagram
    participant Script as h1_locomotion.py
    participant CLI as cli_args
    participant Handle as handle_deprecated_rsl_rl_cfg
    participant Runner as OnPolicyRunner (rsl-rl 5.x)

    Script->>CLI: parse_rsl_rl_cfg(TASK, args_cli)
    CLI-->>Script: agent_cfg (legacy policy field)
    Note over Script: NEW: migrate config
    Script->>Handle: handle_deprecated_rsl_rl_cfg(agent_cfg, "5.0.1")
    Handle-->>Script: agent_cfg (actor + critic with class_name)
    Script->>Runner: OnPolicyRunner(env, agent_cfg.to_dict(), ...)
    Runner-->>Script: ppo_runner (no KeyError)
Loading

Reviews (1): Last reviewed commit: "Fix KeyError 'class_name' in h1_locomoti..." | Re-trigger Greptile

The cfg migration alone unblocked OnPolicyRunner construction but the
demo then failed at runner.load() with KeyError: 'actor_state_dict':
the published H1 checkpoint (Isaac 5.1 assets) was saved in the pre-5.0
rsl-rl layout (single combined model_state_dict), while rsl-rl 5.x
expects separate actor_state_dict/critic_state_dict.

Add handle_deprecated_rsl_rl_checkpoint to isaaclab_rl, which remaps the
old combined state dict (actor.*/critic.*/std) into the actor/critic
layout, and call it in the demo before loading. Verified end-to-end in
an Isaac Sim 5.1.0 + rsl-rl 5.0.1 container: convert -> load -> policy
inference produces finite actions. Adds unit tests for the converter.
@hujc7 hujc7 changed the title [Fix] Migrate deprecated RSL-RL config in h1_locomotion.py demo to fix KeyError: 'class_name' [Fix] Make h1_locomotion demo runnable on main: migrate deprecated RSL-RL config and checkpoint Jun 1, 2026
play.py --use_pretrained_checkpoint loads the published (pre-5.0)
checkpoint via the same OnPolicyRunner.load path, so it hit the same
KeyError: 'actor_state_dict'. Apply handle_deprecated_rsl_rl_checkpoint
to the resolved checkpoint before loading (no-op for new-layout
checkpoints, so locally-trained checkpoints are unaffected).

Verified: baseline play.py --use_pretrained_checkpoint crashes at
actor_state_dict; with the converter it loads and runs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants