Skip to content

Use masked_fill_ for boolean-mask scalar assignments#5914

Open
ooctipus wants to merge 1 commit into
isaac-sim:developfrom
ooctipus:octi/masked-fill-perf
Open

Use masked_fill_ for boolean-mask scalar assignments#5914
ooctipus wants to merge 1 commit into
isaac-sim:developfrom
ooctipus:octi/masked-fill-perf

Conversation

@ooctipus
Copy link
Copy Markdown
Collaborator

@ooctipus ooctipus commented Jun 2, 2026

Summary

Replaces boolean-mask index assignments of the form tensor[mask] = scalar with Tensor.masked_fill_ in a handful of GPU tensor hot paths. masked_fill_ performs the fill in a single fused kernel and avoids the nonzero-based advanced-indexing path (and its device sync), so it's more performant while preserving identical behavior.

Touched code:

  • isaaclab.utils.interpolation.LinearInterpolation — out-of-bounds weight zeroing.
  • isaaclab.envs.mdp.observations.image — depth inf clamping.
  • isaaclab_contrib TacSL visuotactile renderer — artifact removal and gradient-direction zeroing.
  • isaaclab_tasks Shadow Hand feature extractor, Cartpole camera envs (core + showcase), and drone navigation observations — depth/normalization fills.

Skipped intentionally:

  • stack_ik_rel_blueprint_env_cfg.py, where images may be a Warp-backed ProxyArray that does not implement masked_fill_.
  • Row-mask assignments (tensor[mask, :] = 0) that aren't a direct scalar masked_fill_.

Test plan

  • ./isaaclab.sh -f (pre-commit) passes — done locally.
  • Existing camera/observation and interpolation tests pass unchanged (behavior-preserving refactor).

@ooctipus ooctipus requested a review from jtigue-bdai as a code owner June 2, 2026 04:58
@github-actions github-actions Bot added the isaac-lab Related to Isaac Lab team label Jun 2, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Use masked_fill_ for boolean-mask scalar assignments

Overall Assessment: This is a clean, well-scoped performance refactor. The transformation from tensor[bool_mask] = scalar to tensor.masked_fill_(bool_mask, scalar) is semantically equivalent in all cases here and avoids the nonzero + scatter path that triggers a device synchronization. The PR description is thorough about what was intentionally skipped.

Findings

1. 🟡 Suggestion — drone_arl/mdp/observations.py (lines 198–203): Consider torch.clamp for sequential range bounding

The four sequential masked_fill_ calls for inf-clamping + upper-bounding could be expressed more concisely:

images = images.clamp(-10.0, 10.0)  # handles ±inf and >10 in one fused op
images = images / 10.0
images.masked_fill_(images < 0.02, -1.0)

torch.clamp handles ±inf correctly and would reduce three masked_fill_ calls (each creating a boolean mask tensor) to a single fused kernel. This is both more readable and potentially faster.

Severity: Low — the current code is correct, this is a further optimization opportunity.

2. 🟡 Note — linear_interpolation.py (line 78): Division-by-zero produces NaN intermediates

When upper_bound == lower_bound, the denominator self._x[upper_bound] - self._x[lower_bound] is zero, producing NaN (0/0) or ±inf (nonzero/0) in weight before masked_fill_ overwrites those positions. This is correct behavior (the fill always wins), but note that:

  • torch.autograd.detect_anomaly() will flag NaN in forward pass if enabled during debugging
  • This was also the case before this PR — not a regression

No action needed; just documenting for future readers.

3. 🟢 Observation — Negative infinity not handled in core observations.py (line 428)

images.masked_fill_(images == float("inf"), 0)

This only handles +inf. If a depth sensor ever returns -inf (e.g., invalid measurement), it would pass through. The drone_arl code handles both ±inf separately. This is a pre-existing gap not introduced by this PR, but worth considering in a follow-up.

4. 🟢 Correctness — visuotactile_render.py (line 178): torch.abs() mask is fine

torch.abs(height_map) < 1e-6 creates a temporary boolean tensor for the mask argument. Since height_map was already .clone()d on the previous line, the in-place masked_fill_ correctly modifies the clone. No aliasing concern.

5. 🟢 Changelog & Scope

All three changelog fragments (isaaclab, isaaclab_contrib, isaaclab_tasks) are present and well-written. The PR scope is appropriately conservative — skipping Warp-backed arrays and row-mask assignments that don't fit the masked_fill_ pattern.

Summary

Aspect Assessment
Correctness ✅ All transformations are behavior-preserving
Performance ✅ Eliminates nonzero-based indexing + device sync
Scope ✅ Conservative and well-documented exclusions
Tests ⚠️ Relies on existing test coverage (reasonable for a mechanical refactor)
CI ⏳ Pending (pre-commit passed)

LGTM with one minor optimization suggestion for the drone observations. Nice cleanup! 🚀


Update (commit aa5247f): Reviewed the incremental diff (large — includes Cartpole task consolidation/rename, resample_interval_on_reset event manager feature, and additional masked_fill_ applications in the restructured cartpole camera code).

  • Previous suggestion #1 (drone torch.clamp): Not addressed — original comment stands as a minor optimization opportunity.
  • Previous notes #2#5: Still valid, no changes needed.
  • New masked_fill_ usages in the consolidated cartpole code follow the same correct pattern. No new issues.
  • The scope expansion (task rename + event manager feature) is unrelated to the masked_fill_ refactor but appears well-implemented with proper tests and changelogs.

No new issues found. Still LGTM. ✅

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 2, 2026

Greptile Summary

This PR replaces boolean-mask scalar assignments (tensor[mask] = scalar) with Tensor.masked_fill_() in ten GPU hot-path locations across isaaclab, isaaclab_contrib, and isaaclab_tasks. The change is purely a performance optimization — masked_fill_ uses a fused kernel without a nonzero-backed device sync, while preserving identical tensor semantics.

  • linear_interpolation.py: Out-of-bounds weight zeroing and observations.py: depth-image inf clamping are the two core library changes; both are straightforward single-line replacements.
  • drone_arl/mdp/observations.py: Four sequential fills are correctly converted; since images = images / 10.0 creates a new tensor, the final masked_fill_ operates on the divided result, matching the original behavior.
  • visuotactile_render.py: Two fills, one using a cross-tensor mask (grad_mag_orig == 0 filling grad_dir) — semantically safe because the mask is a separate boolean tensor evaluated before the fill.

Confidence Score: 5/5

Safe to merge — all changes are behavior-preserving, mechanical replacements of in-place boolean-mask assignments with the semantically equivalent masked_fill_ call.

Every converted call follows the same pattern: the boolean mask is evaluated into a separate tensor before the fill executes, so there is no aliasing, no gradient-graph difference, and no device/dtype mismatch. The sequential fills in the drone observations file correctly operate on the already-mutated tensor state, matching the original order of assignments. The one cross-tensor fill (grad_mag_orig into grad_dir) is safe for the same reason. The Warp-backed ProxyArray case was correctly identified and skipped.

No files require special attention.

Important Files Changed

Filename Overview
source/isaaclab/isaaclab/utils/interpolation/linear_interpolation.py Single-line replacement of out-of-bounds weight zeroing; mask is independent of weight, so no aliasing concern.
source/isaaclab/isaaclab/envs/mdp/observations.py Depth inf-clamping converted; images is cloned at the return site so in-place modification of the sensor cache is unchanged from the original.
source/isaaclab_tasks/isaaclab_tasks/contrib/drone_arl/mdp/observations.py Four sequential masked_fill_ calls correctly mirror the original sequential index assignments; division reassignment preserves tensor identity for the post-normalization fill.
source/isaaclab_contrib/isaaclab_contrib/sensors/tacsl_sensor/visuotactile_render.py Two fills converted; the cross-tensor mask (grad_mag_orig fills grad_dir) is safe as PyTorch evaluates the mask into a separate boolean tensor before writing.
source/isaaclab_tasks/isaaclab_tasks/core/shadow_hand/feature_extractor.py Two depth inf-zeroing fills replaced; the _save_images path already clones before filling, and _process_images matches the original in-place behavior on the .float() view.
source/isaaclab_tasks/isaaclab_tasks/core/direct_cartpole/cartpole_camera_env.py Single-line depth inf-clamping replacement; straightforward and correct.
source/isaaclab_tasks/isaaclab_tasks/contrib/cartpole_showcase/cartpole_camera/cartpole_camera_env.py Single-line depth inf-clamping replacement; identical semantics to the original.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["tensor[mask] = scalar\n(boolean-mask index assignment)"] -->|"PyTorch advanced indexing"| B["nonzero() call\n(finds non-zero indices)"]
    B --> C["device sync\n(blocks GPU pipeline)"]
    C --> D["scatter write\nto tensor"]

    E["tensor.masked_fill_(mask, scalar)\n(this PR)"] -->|"fused CUDA kernel"| F["single-pass fill\n(no device sync)"]
    F --> G["result: same values\nhigher throughput"]

    A -.->|"replaced by"| E

    style A fill:#f9c,stroke:#c66
    style E fill:#9cf,stroke:#66c
    style B fill:#fdd,stroke:#c66
    style C fill:#fdd,stroke:#c66
    style F fill:#dfd,stroke:#6c6
    style G fill:#dfd,stroke:#6c6
Loading

Reviews (1): Last reviewed commit: "Use masked_fill_ for boolean-mask scalar..." | Re-trigger Greptile

Replace boolean-mask index assignments of the form ``tensor[mask] = scalar``
with ``Tensor.masked_fill_`` across camera/depth observation processing, the
linear interpolation utility, the Shadow Hand feature extractor, the drone
navigation observations, and the TacSL visuotactile renderer.

``masked_fill_`` performs the fill in a single fused kernel and avoids the
``nonzero``-based advanced-indexing path (and its device sync), which makes
these GPU tensor operations more performant while preserving behavior.
@ooctipus ooctipus force-pushed the octi/masked-fill-perf branch from 4ae5375 to aa5247f Compare June 2, 2026 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants