Use masked_fill_ for boolean-mask scalar assignments by ooctipus · Pull Request #5914 · isaac-sim/IsaacLab

ooctipus · 2026-06-02T04:58:24Z

Summary

Replaces boolean-mask index assignments of the form tensor[mask] = scalar with Tensor.masked_fill_ in a handful of GPU tensor hot paths. masked_fill_ performs the fill in a single fused kernel and avoids the nonzero-based advanced-indexing path (and its device sync), so it's more performant while preserving identical behavior.

Touched code:

isaaclab.utils.interpolation.LinearInterpolation — out-of-bounds weight zeroing.
isaaclab.envs.mdp.observations.image — depth inf clamping.
isaaclab_contrib TacSL visuotactile renderer — artifact removal and gradient-direction zeroing.
isaaclab_tasks Shadow Hand feature extractor, Cartpole camera envs (core + showcase), and drone navigation observations — depth/normalization fills.

Skipped intentionally:

stack_ik_rel_blueprint_env_cfg.py, where images may be a Warp-backed ProxyArray that does not implement masked_fill_.
Row-mask assignments (tensor[mask, :] = 0) that aren't a direct scalar masked_fill_.

Test plan

./isaaclab.sh -f (pre-commit) passes — done locally.
Existing camera/observation and interpolation tests pass unchanged (behavior-preserving refactor).

isaaclab-review-bot

Review: Use masked_fill_ for boolean-mask scalar assignments

Overall Assessment: This is a clean, well-scoped performance refactor. The transformation from tensor[bool_mask] = scalar to tensor.masked_fill_(bool_mask, scalar) is semantically equivalent in all cases here and avoids the nonzero + scatter path that triggers a device synchronization. The PR description is thorough about what was intentionally skipped.

Findings

1. 🟡 Suggestion — `drone_arl/mdp/observations.py` (lines 198–203): Consider `torch.clamp` for sequential range bounding

The four sequential masked_fill_ calls for inf-clamping + upper-bounding could be expressed more concisely:

images = images.clamp(-10.0, 10.0)  # handles ±inf and >10 in one fused op
images = images / 10.0
images.masked_fill_(images < 0.02, -1.0)

torch.clamp handles ±inf correctly and would reduce three masked_fill_ calls (each creating a boolean mask tensor) to a single fused kernel. This is both more readable and potentially faster.

Severity: Low — the current code is correct, this is a further optimization opportunity.

2. 🟡 Note — `linear_interpolation.py` (line 78): Division-by-zero produces NaN intermediates

When upper_bound == lower_bound, the denominator self._x[upper_bound] - self._x[lower_bound] is zero, producing NaN (0/0) or ±inf (nonzero/0) in weight before masked_fill_ overwrites those positions. This is correct behavior (the fill always wins), but note that:

torch.autograd.detect_anomaly() will flag NaN in forward pass if enabled during debugging
This was also the case before this PR — not a regression

No action needed; just documenting for future readers.

3. 🟢 Observation — Negative infinity not handled in core `observations.py` (line 428)

images.masked_fill_(images == float("inf"), 0)

This only handles +inf. If a depth sensor ever returns -inf (e.g., invalid measurement), it would pass through. The drone_arl code handles both ±inf separately. This is a pre-existing gap not introduced by this PR, but worth considering in a follow-up.

4. 🟢 Correctness — `visuotactile_render.py` (line 178): `torch.abs()` mask is fine

torch.abs(height_map) < 1e-6 creates a temporary boolean tensor for the mask argument. Since height_map was already .clone()d on the previous line, the in-place masked_fill_ correctly modifies the clone. No aliasing concern.

5. 🟢 Changelog & Scope

All three changelog fragments (isaaclab, isaaclab_contrib, isaaclab_tasks) are present and well-written. The PR scope is appropriately conservative — skipping Warp-backed arrays and row-mask assignments that don't fit the masked_fill_ pattern.

Summary

Aspect	Assessment
Correctness	✅ All transformations are behavior-preserving
Performance	✅ Eliminates `nonzero`-based indexing + device sync
Scope	✅ Conservative and well-documented exclusions
Tests	⚠️ Relies on existing test coverage (reasonable for a mechanical refactor)
CI	⏳ Pending (pre-commit passed)

LGTM with one minor optimization suggestion for the drone observations. Nice cleanup! 🚀

Update (commit aa5247f): Reviewed the incremental diff (large — includes Cartpole task consolidation/rename, resample_interval_on_reset event manager feature, and additional masked_fill_ applications in the restructured cartpole camera code).

Previous suggestion #1 (drone torch.clamp): Not addressed — original comment stands as a minor optimization opportunity.
Previous notes #2–#5: Still valid, no changes needed.
New masked_fill_ usages in the consolidated cartpole code follow the same correct pattern. No new issues.
The scope expansion (task rename + event manager feature) is unrelated to the masked_fill_ refactor but appears well-implemented with proper tests and changelogs.

No new issues found. Still LGTM. ✅

greptile-apps · 2026-06-02T05:01:53Z

Greptile Summary

This PR replaces boolean-mask scalar assignments (tensor[mask] = scalar) with Tensor.masked_fill_() in ten GPU hot-path locations across isaaclab, isaaclab_contrib, and isaaclab_tasks. The change is purely a performance optimization — masked_fill_ uses a fused kernel without a nonzero-backed device sync, while preserving identical tensor semantics.

linear_interpolation.py: Out-of-bounds weight zeroing and observations.py: depth-image inf clamping are the two core library changes; both are straightforward single-line replacements.
drone_arl/mdp/observations.py: Four sequential fills are correctly converted; since images = images / 10.0 creates a new tensor, the final masked_fill_ operates on the divided result, matching the original behavior.
visuotactile_render.py: Two fills, one using a cross-tensor mask (grad_mag_orig == 0 filling grad_dir) — semantically safe because the mask is a separate boolean tensor evaluated before the fill.

Confidence Score: 5/5

Safe to merge — all changes are behavior-preserving, mechanical replacements of in-place boolean-mask assignments with the semantically equivalent masked_fill_ call.

Every converted call follows the same pattern: the boolean mask is evaluated into a separate tensor before the fill executes, so there is no aliasing, no gradient-graph difference, and no device/dtype mismatch. The sequential fills in the drone observations file correctly operate on the already-mutated tensor state, matching the original order of assignments. The one cross-tensor fill (grad_mag_orig into grad_dir) is safe for the same reason. The Warp-backed ProxyArray case was correctly identified and skipped.

No files require special attention.

Important Files Changed

Filename	Overview
source/isaaclab/isaaclab/utils/interpolation/linear_interpolation.py	Single-line replacement of out-of-bounds weight zeroing; mask is independent of `weight`, so no aliasing concern.
source/isaaclab/isaaclab/envs/mdp/observations.py	Depth inf-clamping converted; `images` is cloned at the return site so in-place modification of the sensor cache is unchanged from the original.
source/isaaclab_tasks/isaaclab_tasks/contrib/drone_arl/mdp/observations.py	Four sequential masked_fill_ calls correctly mirror the original sequential index assignments; division reassignment preserves tensor identity for the post-normalization fill.
source/isaaclab_contrib/isaaclab_contrib/sensors/tacsl_sensor/visuotactile_render.py	Two fills converted; the cross-tensor mask (grad_mag_orig fills grad_dir) is safe as PyTorch evaluates the mask into a separate boolean tensor before writing.
source/isaaclab_tasks/isaaclab_tasks/core/shadow_hand/feature_extractor.py	Two depth inf-zeroing fills replaced; the _save_images path already clones before filling, and _process_images matches the original in-place behavior on the .float() view.
source/isaaclab_tasks/isaaclab_tasks/core/direct_cartpole/cartpole_camera_env.py	Single-line depth inf-clamping replacement; straightforward and correct.
source/isaaclab_tasks/isaaclab_tasks/contrib/cartpole_showcase/cartpole_camera/cartpole_camera_env.py	Single-line depth inf-clamping replacement; identical semantics to the original.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["tensor[mask] = scalar\n(boolean-mask index assignment)"] -->|"PyTorch advanced indexing"| B["nonzero() call\n(finds non-zero indices)"]
    B --> C["device sync\n(blocks GPU pipeline)"]
    C --> D["scatter write\nto tensor"]

    E["tensor.masked_fill_(mask, scalar)\n(this PR)"] -->|"fused CUDA kernel"| F["single-pass fill\n(no device sync)"]
    F --> G["result: same values\nhigher throughput"]

    A -.->|"replaced by"| E

    style A fill:#f9c,stroke:#c66
    style E fill:#9cf,stroke:#66c
    style B fill:#fdd,stroke:#c66
    style C fill:#fdd,stroke:#c66
    style F fill:#dfd,stroke:#6c6
    style G fill:#dfd,stroke:#6c6

_{Reviews (1): Last reviewed commit: "Use masked_fill_ for boolean-mask scalar..." | Re-trigger Greptile}

Replace boolean-mask index assignments of the form ``tensor[mask] = scalar`` with ``Tensor.masked_fill_`` across camera/depth observation processing, the linear interpolation utility, the Shadow Hand feature extractor, the drone navigation observations, and the TacSL visuotactile renderer. ``masked_fill_`` performs the fill in a single fused kernel and avoids the ``nonzero``-based advanced-indexing path (and its device sync), which makes these GPU tensor operations more performant while preserving behavior.

ooctipus requested a review from jtigue-bdai as a code owner June 2, 2026 04:58

github-actions Bot added the isaac-lab Related to Isaac Lab team label Jun 2, 2026

isaaclab-review-bot Bot reviewed Jun 2, 2026

View reviewed changes

StafaH approved these changes Jun 2, 2026

View reviewed changes

ooctipus force-pushed the octi/masked-fill-perf branch from 4ae5375 to aa5247f Compare June 2, 2026 08:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use masked_fill_ for boolean-mask scalar assignments#5914

Use masked_fill_ for boolean-mask scalar assignments#5914
ooctipus wants to merge 1 commit into
isaac-sim:developfrom
ooctipus:octi/masked-fill-perf

ooctipus commented Jun 2, 2026

Uh oh!

isaaclab-review-bot Bot left a comment •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ooctipus commented Jun 2, 2026

Summary

Test plan

Uh oh!

isaaclab-review-bot Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review: Use masked_fill_ for boolean-mask scalar assignments

Findings

1. 🟡 Suggestion — drone_arl/mdp/observations.py (lines 198–203): Consider torch.clamp for sequential range bounding

2. 🟡 Note — linear_interpolation.py (line 78): Division-by-zero produces NaN intermediates

3. 🟢 Observation — Negative infinity not handled in core observations.py (line 428)

4. 🟢 Correctness — visuotactile_render.py (line 178): torch.abs() mask is fine

5. 🟢 Changelog & Scope

Summary

Uh oh!

greptile-apps Bot commented Jun 2, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

isaaclab-review-bot Bot left a comment •

edited

Loading

1. 🟡 Suggestion — `drone_arl/mdp/observations.py` (lines 198–203): Consider `torch.clamp` for sequential range bounding

2. 🟡 Note — `linear_interpolation.py` (line 78): Division-by-zero produces NaN intermediates

3. 🟢 Observation — Negative infinity not handled in core `observations.py` (line 428)

4. 🟢 Correctness — `visuotactile_render.py` (line 178): `torch.abs()` mask is fine