Add repro script for ManagerBasedEnv.close memory leak#5898
Conversation
Runnable check script that demonstrates the leak fixed by PR #5896. Builds Isaac-Cartpole-RGB-v0 in a tight construct->step->close loop and counts live gym.spaces.Box objects plus the bytes their bounds arrays pin. Default invocation monkey-patches close() back to the pre-fix form so the leak reproduces on a patched IsaacLab; --patched uses the installed close() to show the fix releases everything. No library code changes; changelog fragment is .skip. Signed-off-by: Clemens Volk <cvolk@nvidia.com>
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This PR adds a standalone repro/diagnostic script that demonstrates the ManagerBasedEnv.close memory leak fixed by PR #5896. The script creates a tight construct-step-close loop with a high-resolution camera env (Isaac-Cartpole-RGB-v0 at 720x1280) and tracks live gymnasium.spaces.Box instances and their pinned numpy bytes across cycles. It monkey-patches the pre-fix close() by default so the leak reproduces even on a patched install.
Design Assessment
File placement: Correctly placed under source/isaaclab/test/envs/ with the check_ prefix, following the project convention for standalone verification scripts (vs. test_ prefix for pytest-integrated tests).
Changelog: .skip fragment in source/isaaclab/changelog.d/ is appropriate - no library behavior change.
Architecture: The script is well-structured:
- Clean separation of concerns (monkey-patch setup, snapshot utility, cycle runner, delta report)
- Uses
AppLauncherandparse_env_cfgcorrectly - Proper SPDX header and docstring with usage examples
- Configurable via argparse (
--iterations,--num_envs,--camera_width,--camera_height,--patched)
Findings
Minor Suggestions (non-blocking)
-
gc.get_objects()iteration can be slow/fragile - Walking all GC-tracked objects insnapshot()is fine for a diagnostic script but would be expensive in CI. Thetry/exceptguard aroundtype(obj) is Boxis a good defensive pattern. -
No assertion / exit-code signaling - The script prints results but always exits 0. If this were to be integrated as a CI regression test in the future, it would need a pass/fail threshold (e.g.,
sys.exit(1)iflive_boxes > 0after close on--patchedmode). The author already notes willingness to convert to pytest if maintainers prefer. -
pre_fix_closemay drift - The monkey-patched pre-fixclose()is a frozen snapshot of the old implementation. IfManagerBasedEnv.closegains new cleanup logic in future PRs, this frozen version will not reflect that. A code comment noting the specific commit/PR it mimics would help future maintainers.
Positive Observations
- Excellent docstring with exact pre-fix vs post-fix expected output - makes it trivial to verify behavior
torch.cuda.empty_cache()after each cycle avoids GPU OOM confounding the RSS measurement- Parameterized camera resolution means the script can be tuned for faster runs on CI vs. more dramatic leak demonstration
- Delta table at the end makes the leak trend immediately visible
Test Coverage Assessment
This script serves as a manual verification tool rather than an automated regression test. It requires Isaac Sim runtime and GPU, making it unsuitable for lightweight CI. However, it would be valuable as:
- A nightly/weekly GPU-CI gate if the team runs sim-dependent tests
- A reference for writing a future lightweight mock-based unit test that verifies
close()cleans up wrappers without needing the full simulator
The companion PR #5896 (the actual fix) is where the real regression test should live. This PR adds diagnostic value.
Verdict
No issues found - Well-crafted diagnostic script with correct placement, proper conventions, and clear documentation. The minor suggestions above are non-blocking improvements the author could address if desired.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This PR adds a diagnostic script that reproduces the ManagerBasedEnv.close memory leak addressed by PR #5896. The script uses an elegant monkey-patching approach to demonstrate the leak with pre-fix behavior by default, while --patched tests the fixed behavior. The implementation is well-structured with clear documentation and useful output formatting.
Design Assessment
Design is sound. The file placement in source/isaaclab/test/envs/ is appropriate for a diagnostic/test utility. The approach of monkey-patching the close() method to reproduce pre-fix behavior is clever and allows the script to be useful on both patched and unpatched IsaacLab installations. The gc-based introspection for counting Box instances is the correct approach since numpy arrays aren't gc-tracked directly.
Findings
🟡 Warning: Broad exception handler in gc inspection — source/isaaclab/test/envs/check_manager_based_env_close_leak.py:103-107
The except Exception: continue pattern in the snapshot() function could silently hide unexpected errors during gc object inspection. While this is somewhat defensible for gc introspection (objects can be in unusual states), it would be better to at least log unexpected exceptions to aid debugging.
🔵 Suggestion: Consider adding return code for CI integration — source/isaaclab/test/envs/check_manager_based_env_close_leak.py:155-162
The script currently only prints results. For potential CI integration or automated regression testing, consider adding a non-zero exit code when the leak is detected (i.e., when --patched is used but live boxes remain after close). This would make it usable in automated pipelines.
🔵 Suggestion: Minor — unused variable in step loop — source/isaaclab/test/envs/check_manager_based_env_close_leak.py:138
The obs variable from env.step() is reassigned but the value from env.reset() (line 135) is immediately overwritten. This is harmless but the initial assignment could use _ to signal intent.
Test Coverage
- Bug fix: N/A — This PR is a diagnostic tool for #5896, not the fix itself
- Script itself: The script IS a runnable regression test; it successfully demonstrates the leak when run with default flags vs
--patched - Pytest integration: The author correctly notes this could be converted to a pytest if maintainers prefer. For a memory leak test that requires full Isaac Sim initialization, a standalone script may actually be more practical.
CI Status
Some checks are still running (Installation Tests, Docker builds). Core checks (Load Config, Detect Changes) have passed.
Verdict
Minor fixes needed
The implementation is solid and serves its purpose well. The broad exception handler is the only notable concern, and even that is defensible given gc introspection quirks. The suggestions for CI integration and minor cleanup are optional improvements. This is a useful diagnostic asset for verifying the memory leak fix in #5896.
| makes via ``np.full`` and ``astype``. | ||
| """ | ||
| gc.collect() | ||
| rss_mb = process.memory_info().rss / 1024 / 1024 |
There was a problem hiding this comment.
🟡 Warning: Broad exception handler
This except Exception: continue could silently hide unexpected errors during gc object inspection. While defensible for gc introspection (objects can be in unusual states mid-collection), consider at least logging unexpected exceptions:
| rss_mb = process.memory_info().rss / 1024 / 1024 | |
| except ReferenceError: | |
| # Object was collected between gc.get_objects() and our access | |
| continue | |
| except Exception as e: | |
| # Unexpected error - log but continue to avoid blocking diagnostics | |
| print(f"Warning: unexpected error inspecting gc object: {e}", flush=True) | |
| continue |
Alternatively, if you want to keep it silent, a comment explaining why would help future readers.
| d_bounds = history[cycle_index][2] - history[cycle_index - 1][2] | ||
| print(f" {cycle_index:>6} {d_rss:>+11.1f} {d_box:>+11d} {d_bounds:>+13.1f}", flush=True) | ||
|
|
||
| simulation_app.close() |
There was a problem hiding this comment.
🔵 Suggestion: Add exit code for CI integration
For automated regression testing, consider exiting non-zero when the leak is detected:
# After the per-cycle deltas printout:
final_boxes = history[-1][1]
if args_cli.patched and final_boxes > 0:
print(f"\n❌ LEAK DETECTED: {final_boxes} Box instances still live after close", flush=True)
sys.exit(1)
elif not args_cli.patched and final_boxes == 0:
print(f"\n⚠️ Expected leak not reproduced (is the monkey-patch working?)", flush=True)This would allow the script to be used in CI pipelines to catch regressions.
| # RSS noise. Bump to 720x1280 so each Box's bounds array is ~33 MB. | ||
| env_cfg.scene.tiled_camera.width = args_cli.camera_width | ||
| env_cfg.scene.tiled_camera.height = args_cli.camera_height | ||
| env = gym.make("Isaac-Cartpole-RGB-v0", cfg=env_cfg) |
There was a problem hiding this comment.
🔵 Suggestion: Minor — clarify intent with _
The obs from reset is immediately overwritten by the step loop. Consider using _ to signal this value is intentionally discarded:
| env = gym.make("Isaac-Cartpole-RGB-v0", cfg=env_cfg) | |
| _, _ = env.reset() |
Or keep obs only in the step loop where it's actually used for the del cleanup.
|
Closing — the formal regression test landed on #5896 itself ( |
Description
Adds a runnable check script that reproduces the
ManagerBasedEnv.closememory leak fixed by #5896. Useful for verifying the fix is in place and as a starting point for a future regression test.The script builds the smallest manager-based RL task with an image observation (
Isaac-Cartpole-RGB-v0), bumps its camera to 720×1280 so eachgym.spaces.Boxbounds array is large enough to be visible above RSS noise, and runs a tight construct → step → close loop. After each close it counts livegym.spaces.Boxinstances and sums the bytes pinned by theirlow/high/bounded_below/bounded_abovenumpy attributes.Default invocation monkey-patches
ManagerBasedEnv.closeback to its pre-#5896 form, so the leak reproduces even when the installed IsaacLab is already patched.--patchedleaves the installedclose()in place.Depends on (but does not block) #5896. The script works on either branch — the
--patchedflag toggles the close() body the script uses.Fixes # (no associated issue — diagnostic script, no behaviour change)
Repro output
Pre-fix (default):
Post-fix (
--patched):Type of change
Checklist
pre-commitchecks with./isaaclab.sh --formatsource/isaaclab/test/envs/if maintainers prefer.skipfragment added (test-only, no version bump)CONTRIBUTORS.md