Working change for benchmarking new Mujoco Menagerie robot assets on Isaac Lab environments#5908
Conversation
Add a reusable MuJoCo benchmark config and backend-aware pytest plumbing so the same smoke command can exercise base assets or Menagerie assets with comparable KPI output.
Run TorchScript LSTM actuator inference without cuDNN so Newton/MJWarp benchmark subprocesses do not fail during ANYmal reset on cuDNN initialization.
Treat non-zero training subprocess exits as benchmark failures even when stale TensorBoard logs exist for the same task.
Layer the Menagerie robot asset swaps and task config adjustments on top of the shared benchmark harness so v2 can compare new assets against the base benchmark branch.
Use concrete G1 event settings instead of an unresolved event preset so Newton/MJWarp training resolves the Menagerie G1 pelvis body correctly.
Fix G1 event resolution for the Menagerie pelvis root and add a Newton contact margin for Spot so the MJWarp smoke run avoids NaN observations.
Forward pytest video options to training subprocesses so benchmark smoke runs can capture clips for debugging robot assets.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This PR adds support for benchmarking Isaac Lab environments with MuJoCo Menagerie robot assets from the Nucleus server. It introduces a new --menagerie-physics-variant CLI option, updates robot asset configs to point to Menagerie USD paths, adapts joint/body names for the new MJCF-derived naming conventions, and extends the pytest benchmark harness with --sim-backend switching. The design is sound for a benchmarking branch, though several implementation concerns should be addressed.
Design Assessment
Design is sound — The approach of swapping USD paths via environment variables and providing CLI flags for physics variant selection is appropriate for A/B benchmarking. The "fail-fast, no silent fallback" philosophy documented in the README is the right call for benchmarking work.
Findings
🔴 Critical: Shadow Hand action/observation space mismatch — shadow_hand_env_cfg.py
The action space was changed from 20 to 24 DOFs, but the code still references 20-DOF comments in multiple places:
# shadow_hand_over_env.py lines 170, 204, 242, 255
# applied actions (20) # <-- Comment says 20, but action_space is now 24
self.actions["right_hand"],The observation space (157 → 161) and state space (187 → 191) changes suggest only +4 dimensions were added, which is correct for 24-4=20 → 24 joints. However, if trained policies exist for the 20-DOF hand, this is a breaking change that will cause shape mismatches at inference time. Ensure old checkpoints are not expected to work, or provide migration guidance.
🔴 Critical: G1 joint name mismatch potential — unitree.py:295-315
The G1 config updates joint names from *_elbow_pitch_joint / *_elbow_roll_joint to just *_elbow_joint, and removes numbered finger joints (*_one_joint, *_two_joint, etc.) in favor of *_wrist_*_joint. However, G1_29DOF_CFG still uses the original joint names in its init_state and actuators:
# G1_29DOF_CFG is NOT updated with the new Menagerie joint names
# It still references the legacy Isaac joint naming schemeThis will cause actuator resolution failures when the 29-DOF config is used with the new Menagerie USD (g1_29dof_rev_1_0.usdc). Verify that the 29-DOF USDC uses the same joint schema, or update G1_29DOF_CFG accordingly.
🟡 Warning: _spawn_variants_as_dict assumes .to_dict() method — assets.py:130-135
def _spawn_variants_as_dict(variants: object | dict | None) -> dict[str, str]:
if variants is None:
return {}
if isinstance(variants, dict):
return dict(variants)
return variants.to_dict() # type: ignore[union-attr]If variants is neither None, dict, nor has a .to_dict() method, this will raise an unclear AttributeError. Consider adding explicit validation:
if not hasattr(variants, "to_dict"):
raise TypeError(f"variants must be None, dict, or have .to_dict(); got {type(variants)}")
return variants.to_dict()🟡 Warning: Test removes Kit launch from pytest parent — test_environments_training.py
The test file previously launched AppLauncher before imports. The new version removes this and runs training as subprocesses. This is architecturally better for subprocess isolation, but any test that previously relied on the simulation context being available in the parent process will now fail silently. Verify no other tests in the benchmarking suite depend on simulation_app being initialized in the test process.
🟡 Warning: H1 joint name change may break existing configs — unitree.py:208
"torso": 0.0, # Changed to:
"torso_1": 0.0, # MuJoCo Menagerie H1 names this joint ``torso_1``, not ``torso``.This renames the torso joint throughout H1_CFG. Any downstream code or user configs referencing "torso" (e.g., reward terms, observations, custom actuators) will silently fail to match. The reward terms in h1/rough_env_cfg.py are updated, but external users won't know about this change unless it's in CHANGELOG.rst.
🟡 Warning: Potential circular import — app_launcher.py:722
def _config_resolution(self, launcher_args: dict):
from isaaclab.utils.assets import apply_menagerie_physics_variant_from_launcher_args # noqa: PLC0415This late import inside _config_resolution is fine, but the import happens on every AppLauncher instantiation. If assets.py ever imports from app_launcher.py, this will create a circular import. Currently safe, but worth a comment noting the intentional late binding.
🔵 Suggestion: Consolidate duplicate actuator configs — shadow_hand.py / shadow_hand_env_cfg.py
MENAGERIE_SHADOW_FINGERS_ACTUATOR_BASE is defined in shadow_hand.py and re-used in shadow_hand_env_cfg.py. The env_cfg version adds .replace(friction=1e-2, armature=2e-3). Consider documenting why these specific overrides are needed, or whether the base config should include them.
🔵 Suggestion: Add CHANGELOG entry — Missing
This PR makes breaking changes to robot joint names, observation/action spaces, and USD paths. A CHANGELOG.rst entry would help users understand the migration path when updating from non-Menagerie assets.
Test Coverage
- Feature PR: Coverage is partial
- Gaps:
- No unit tests verify the new
merge_mujoco_menagerie_spawn_variants()function - No tests verify the
--menagerie-physics-variantCLI flag propagation - The pytest harness tests training runs, but doesn't verify the correct USD variant was selected
- No unit tests verify the new
The existing integration tests (benchmark harness with mujoco_configs.yaml) provide smoke coverage, which is reasonable for a benchmarking PR.
CI Status
Only the labeler check is visible; main CI suite results not shown.
Verdict
Significant concerns
The PR introduces important infrastructure for MuJoCo Menagerie benchmarking, but the joint name changes and observation/action space modifications are breaking changes that need explicit documentation. The G1_29DOF config appears inconsistent with the new joint naming scheme, which could cause runtime failures. Address the critical findings before merging.
| episode_length_s = 10.0 | ||
| action_space = 20 | ||
| observation_space = 157 # (full) | ||
| action_space = 24 |
There was a problem hiding this comment.
🔴 Critical: The action_space was changed from 20 to 24, but comments in shadow_hand_over_env.py still say # applied actions (20). More importantly, this is a breaking change for any trained policies using the previous 20-DOF configuration. Consider:
- Adding a migration note in the PR description or CHANGELOG
- Updating the comments in
shadow_hand_over_env.pylines 170, 204, 242, 255 - Documenting that old checkpoints are incompatible
| @@ -294,15 +295,20 @@ | |||
| ".*_hip_pitch_joint": -0.20, | |||
There was a problem hiding this comment.
🔴 Critical: The joint names are updated here for the G1 (e.g., *_elbow_pitch_joint → *_elbow_joint, waist joints instead of torso_joint), but G1_29DOF_CFG below (line ~396) still points to the new Menagerie USD (g1_29dof_rev_1_0.usdc) without corresponding joint name updates in its init_state or actuators blocks.
Verify that the 29-DOF USDC uses the same joint schema as g1.usda. If not, the actuator regex patterns will fail to match joints, causing silent degraded control or errors.
| if variants is None: | ||
| return {} | ||
| if isinstance(variants, dict): | ||
| return dict(variants) |
There was a problem hiding this comment.
🟡 Warning: This function assumes variants has a .to_dict() method if it's not None or dict. If someone passes an unexpected type, the AttributeError will be confusing.
| return dict(variants) | |
| def _spawn_variants_as_dict(variants: object | dict | None) -> dict[str, str]: | |
| if variants is None: | |
| return {} | |
| if isinstance(variants, dict): | |
| return dict(variants) | |
| if not hasattr(variants, "to_dict"): | |
| raise TypeError(f"variants must be None, dict, or have .to_dict(); got {type(variants).__name__}") | |
| return variants.to_dict() # type: ignore[union-attr] |
| @@ -208,7 +208,7 @@ | |||
| ".*_hip_pitch": -0.28, # -16 degrees | |||
There was a problem hiding this comment.
🟡 Warning: Renaming torso → torso_1 is a breaking change for any downstream code, user configs, or custom reward terms that reference the torso joint by name. The reward terms in h1/rough_env_cfg.py are correctly updated, but external users may have custom configs.
Consider adding this to CHANGELOG.rst with migration guidance.
Greptile SummaryThis PR migrates several key robot assets (Anymal-B/C, Spot, Go2, H1, G1, Allegro, Shadow Hand) from their previous Isaac Nucleus USDs to MuJoCo Menagerie USDs, and introduces the infrastructure needed to select the correct
Confidence Score: 3/5The asset migration and variant infrastructure are broadly correct, but two changes introduce real behavioral regressions that affect all users, not just Menagerie benchmark paths. Every LSTM actuator call (Anymal-B/C locomotion) now forces the slower non-cuDNN path even when cuDNN is fully available — the Newton-compat guard was applied unconditionally. Separately, H1_MINIMAL_CFG and G1_MINIMAL_CFG now resolve to the same full-collision USD as their non-minimal counterparts; any task that chose the minimal variant for performance will silently load a heavier model. Both issues affect the default training path without any opt-out mechanism.
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
CLI["--menagerie-physics-variant"] --> ALF["apply_menagerie_physics_variant_from_launcher_args()"]
ALF --> ENVVAR["ISAACLAB_MUJOCO_MENAGERIE_PHYSICS_VARIANT env var"]
ARGV["sys.argv / Hydra presets=newton"] --> INFER["infer_menagerie_physics_variant_from_argv()"]
ENVVAR -- "auto" --> INFER
ENVVAR -- "physx/mujoco/none" --> SEL["resolved: physx | mujoco | None"]
INFER --> SEL
SEL --> MERGE["merge_mujoco_menagerie_spawn_variants()"]
USDPATH["usd_path"] --> ISMENAGERIE{"is_mujoco_menagerie_usd_path?"}
ISMENAGERIE -- No --> PASSTHROUGH["return cfg.variants unchanged"]
ISMENAGERIE -- Yes --> MERGE
CFGVARIANTS["cfg.variants"] --> MERGE
MERGE -- "Physics key set" --> PASSTHROUGH
MERGE -- "Physics key absent" --> MERGED["merged dict Physics:physx|mujoco"]
MERGED --> SV["select_usd_variants(prim_path, variants)"]
Reviews (1): Last reviewed commit: "Add video recording flags to benchmark t..." | Re-trigger Greptile |
| # Run recurrent actuator inference without cuDNN. Some benchmark subprocesses | ||
| # initialize Newton before the first TorchScript LSTM call, which can leave | ||
| # cuDNN unavailable even though the non-cuDNN CUDA LSTM path works. | ||
| with torch.inference_mode(), torch.backends.cudnn.flags(enabled=False): | ||
| torques, (self.sea_hidden_state[:], self.sea_cell_state[:]) = self.network( | ||
| self.sea_input, (self.sea_hidden_state, self.sea_cell_state) | ||
| ) |
There was a problem hiding this comment.
cuDNN disabled unconditionally on every LSTM inference step
torch.backends.cudnn.flags(enabled=False) now wraps every ActuatorNetLSTM.compute call. The comment explains this as a workaround for a specific scenario where Newton initializes before the first TorchScript LSTM call, but the guard applies universally — every user running an Anymal-B/C environment with the LSTM actuator pays the penalty of the slower non-cuDNN CUDA LSTM path on every physics step. The fix should be conditional: check torch.backends.cudnn.is_available() at the call site and only suppress cuDNN when it is genuinely unavailable, leaving the fast path intact for everyone else.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
|
||
|
|
||
| H1_MINIMAL_CFG = H1_CFG.copy() | ||
| H1_MINIMAL_CFG.spawn.usd_path = f"{ISAACLAB_NUCLEUS_DIR}/Robots/Unitree/H1/h1_minimal.usd" | ||
| H1_MINIMAL_CFG.spawn.usd_path = f"{MUJOCO_MENAGERIE_DIR}/unitree_h1/h1/h1.usda" | ||
| """Configuration for the Unitree H1 Humanoid robot with fewer collision meshes. | ||
|
|
||
| This configuration removes most collision meshes to speed up simulation. | ||
| The class name is kept for task compatibility. The USD path matches :obj:`H1_CFG` (MuJoCo Menagerie | ||
| ``h1.usda``); there is no separate minimal Menagerie file wired here yet. | ||
| """ | ||
|
|
||
|
|
||
| G1_CFG = ArticulationCfg( | ||
| spawn=sim_utils.UsdFileCfg( | ||
| usd_path=f"{ISAACLAB_NUCLEUS_DIR}/Robots/Unitree/G1/g1.usd", | ||
| usd_path=f"{MUJOCO_MENAGERIE_DIR}/unitree_g1/g1/g1.usda", | ||
| activate_contact_sensors=True, | ||
| rigid_props=sim_utils.RigidBodyPropertiesCfg( | ||
| disable_gravity=False, |
There was a problem hiding this comment.
H1_MINIMAL_CFG and G1_MINIMAL_CFG now point to the same USD as their full counterparts
Both H1_MINIMAL_CFG and G1_MINIMAL_CFG are set to the same .usda path as H1_CFG / G1_CFG. Tasks such as Isaac-Velocity-Flat-H1-v0 resolve H1_MINIMAL_CFG expecting fewer collision meshes for performance; they will now silently load the full model. The docstrings acknowledge this as intentional ("no separate minimal Menagerie file wired here yet"), but any downstream task relying on the minimal config for faster simulation or reduced collision contact will regress without any warning. At minimum, a runtime warning should be emitted when a "minimal" config resolves to the same path as the full config.
| from isaaclab.utils.assets import MUJOCO_MENAGERIE_DIR | ||
|
|
||
| ## | ||
| # Configuration | ||
| ## | ||
|
|
||
| # MuJoCo Menagerie ``right_hand`` uses the ``rh_`` joint prefix. Legacy Isaac Shadow assets used ``robot0_``. | ||
| MENAGERIE_SHADOW_FINGERS_ACTUATOR_BASE = ImplicitActuatorCfg( | ||
| joint_names_expr=["rh_.*"], | ||
| effort_limit_sim={ | ||
| "rh_WRJ1": 4.785, | ||
| "rh_WRJ2": 2.175, | ||
| "rh_(FF|MF|RF)J4": 0.9, | ||
| "rh_(FF|MF|RF)J(3|2)": 0.9, | ||
| "rh_(FF|MF|RF)J1": 0.7245, | ||
| "rh_LFJ5": 0.9, | ||
| "rh_LFJ(4|3|2)": 0.9, | ||
| "rh_LFJ1": 0.7245, | ||
| "rh_THJ5": 0.81, | ||
| "rh_THJ4": 2.3722, | ||
| "rh_THJ3": 1.45, | ||
| "rh_THJ(2|1)": 0.99, | ||
| }, | ||
| stiffness={ | ||
| "rh_WRJ.*": 5.0, | ||
| "rh_(FF|MF|RF|LF|TH)J.*": 1.0, | ||
| }, | ||
| damping={ | ||
| "rh_WRJ.*": 0.5, | ||
| "rh_(FF|MF|RF|LF|TH)J.*": 0.1, | ||
| }, | ||
| ) | ||
|
|
||
| SHADOW_HAND_CFG = ArticulationCfg( | ||
| spawn=sim_utils.UsdFileCfg( |
There was a problem hiding this comment.
Overly broad
rh_.* joint pattern in MENAGERIE_SHADOW_FINGERS_ACTUATOR_BASE
joint_names_expr=["rh_.*"] matches every joint whose name begins with rh_, including any passive or coupling joints the Menagerie right_hand.usda may define. The legacy config was explicit (robot0_WR.*, robot0_(FF|MF|RF|LF|TH)J(3|2|1), etc.). If the model includes passive tendons or fixed joints under the rh_ namespace, they will be silently captured by this pattern and have implicit actuators applied with unspecified effort limits, potentially corrupting the actuator mapping or causing a runtime error when joint count does not match actuated_joint_names.
| if mode in (MUJOCO_MENAGERIE_PHYSICS_VARIANT_PHYSX, MUJOCO_MENAGERIE_PHYSICS_VARIANT_MUJOCO): | ||
| return mode | ||
| raise ValueError( | ||
| f"Invalid {ISAACLAB_MUJOCO_MENAGERIE_PHYSICS_VARIANT_ENV}={raw!r}. " | ||
| f"Expected 'auto', 'none', '{MUJOCO_MENAGERIE_PHYSICS_VARIANT_PHYSX}', or " | ||
| f"'{MUJOCO_MENAGERIE_PHYSICS_VARIANT_MUJOCO}'." | ||
| ) |
There was a problem hiding this comment.
resolve_mujoco_menagerie_physics_variant_selection raises ValueError when ISAACLAB_MUJOCO_MENAGERIE_PHYSICS_VARIANT_ENV holds an unrecognised value. This exception propagates through merge_mujoco_menagerie_spawn_variants into _spawn_from_usd_file at scene creation time, producing a confusing traceback deep inside the USD spawner instead of at startup. A user who sets the env-var manually (bypassing the CLI) will only see the failure at spawn time. Consider logging a clear message and falling back to physx instead of raising.
| if mode in (MUJOCO_MENAGERIE_PHYSICS_VARIANT_PHYSX, MUJOCO_MENAGERIE_PHYSICS_VARIANT_MUJOCO): | |
| return mode | |
| raise ValueError( | |
| f"Invalid {ISAACLAB_MUJOCO_MENAGERIE_PHYSICS_VARIANT_ENV}={raw!r}. " | |
| f"Expected 'auto', 'none', '{MUJOCO_MENAGERIE_PHYSICS_VARIANT_PHYSX}', or " | |
| f"'{MUJOCO_MENAGERIE_PHYSICS_VARIANT_MUJOCO}'." | |
| ) | |
| if mode in (MUJOCO_MENAGERIE_PHYSICS_VARIANT_PHYSX, MUJOCO_MENAGERIE_PHYSICS_VARIANT_MUJOCO): | |
| return mode | |
| import logging as _logging | |
| _logging.getLogger(__name__).warning( | |
| "Invalid %s=%r; expected 'auto', 'none', '%s', or '%s'. Falling back to '%s'.", | |
| ISAACLAB_MUJOCO_MENAGERIE_PHYSICS_VARIANT_ENV, | |
| raw, | |
| MUJOCO_MENAGERIE_PHYSICS_VARIANT_PHYSX, | |
| MUJOCO_MENAGERIE_PHYSICS_VARIANT_MUJOCO, | |
| MUJOCO_MENAGERIE_PHYSICS_VARIANT_PHYSX, | |
| ) | |
| return MUJOCO_MENAGERIE_PHYSICS_VARIANT_PHYSX |
Description
Updates robot asset references to use the mujoco menagerie assets
Adjusts affected robot configs to match new assets, including joint names, actuator settings etc
Updates environment benchmarking test to improve benchmarking mujoco menagerie assets
Type of change
Screenshots
Please attach before and after screenshots of the change if applicable.
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there