Use masked_fill_ for boolean-mask scalar assignments#5914
Conversation
There was a problem hiding this comment.
Review: Use masked_fill_ for boolean-mask scalar assignments
Overall Assessment: This is a clean, well-scoped performance refactor. The transformation from tensor[bool_mask] = scalar to tensor.masked_fill_(bool_mask, scalar) is semantically equivalent in all cases here and avoids the nonzero + scatter path that triggers a device synchronization. The PR description is thorough about what was intentionally skipped.
Findings
1. 🟡 Suggestion — drone_arl/mdp/observations.py (lines 198–203): Consider torch.clamp for sequential range bounding
The four sequential masked_fill_ calls for inf-clamping + upper-bounding could be expressed more concisely:
images = images.clamp(-10.0, 10.0) # handles ±inf and >10 in one fused op
images = images / 10.0
images.masked_fill_(images < 0.02, -1.0)torch.clamp handles ±inf correctly and would reduce three masked_fill_ calls (each creating a boolean mask tensor) to a single fused kernel. This is both more readable and potentially faster.
Severity: Low — the current code is correct, this is a further optimization opportunity.
2. 🟡 Note — linear_interpolation.py (line 78): Division-by-zero produces NaN intermediates
When upper_bound == lower_bound, the denominator self._x[upper_bound] - self._x[lower_bound] is zero, producing NaN (0/0) or ±inf (nonzero/0) in weight before masked_fill_ overwrites those positions. This is correct behavior (the fill always wins), but note that:
torch.autograd.detect_anomaly()will flag NaN in forward pass if enabled during debugging- This was also the case before this PR — not a regression
No action needed; just documenting for future readers.
3. 🟢 Observation — Negative infinity not handled in core observations.py (line 428)
images.masked_fill_(images == float("inf"), 0)This only handles +inf. If a depth sensor ever returns -inf (e.g., invalid measurement), it would pass through. The drone_arl code handles both ±inf separately. This is a pre-existing gap not introduced by this PR, but worth considering in a follow-up.
4. 🟢 Correctness — visuotactile_render.py (line 178): torch.abs() mask is fine
torch.abs(height_map) < 1e-6 creates a temporary boolean tensor for the mask argument. Since height_map was already .clone()d on the previous line, the in-place masked_fill_ correctly modifies the clone. No aliasing concern.
5. 🟢 Changelog & Scope
All three changelog fragments (isaaclab, isaaclab_contrib, isaaclab_tasks) are present and well-written. The PR scope is appropriately conservative — skipping Warp-backed arrays and row-mask assignments that don't fit the masked_fill_ pattern.
Summary
| Aspect | Assessment |
|---|---|
| Correctness | ✅ All transformations are behavior-preserving |
| Performance | ✅ Eliminates nonzero-based indexing + device sync |
| Scope | ✅ Conservative and well-documented exclusions |
| Tests | |
| CI | ⏳ Pending (pre-commit passed) |
LGTM with one minor optimization suggestion for the drone observations. Nice cleanup! 🚀
Update (commit aa5247f): Reviewed the incremental diff (large — includes Cartpole task consolidation/rename, resample_interval_on_reset event manager feature, and additional masked_fill_ applications in the restructured cartpole camera code).
- Previous suggestion #1 (drone
torch.clamp): Not addressed — original comment stands as a minor optimization opportunity. - Previous notes #2–#5: Still valid, no changes needed.
- New
masked_fill_usages in the consolidated cartpole code follow the same correct pattern. No new issues. - The scope expansion (task rename + event manager feature) is unrelated to the
masked_fill_refactor but appears well-implemented with proper tests and changelogs.
No new issues found. Still LGTM. ✅
Greptile SummaryThis PR replaces boolean-mask scalar assignments (
Confidence Score: 5/5Safe to merge — all changes are behavior-preserving, mechanical replacements of in-place boolean-mask assignments with the semantically equivalent masked_fill_ call. Every converted call follows the same pattern: the boolean mask is evaluated into a separate tensor before the fill executes, so there is no aliasing, no gradient-graph difference, and no device/dtype mismatch. The sequential fills in the drone observations file correctly operate on the already-mutated tensor state, matching the original order of assignments. The one cross-tensor fill (grad_mag_orig into grad_dir) is safe for the same reason. The Warp-backed ProxyArray case was correctly identified and skipped. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["tensor[mask] = scalar\n(boolean-mask index assignment)"] -->|"PyTorch advanced indexing"| B["nonzero() call\n(finds non-zero indices)"]
B --> C["device sync\n(blocks GPU pipeline)"]
C --> D["scatter write\nto tensor"]
E["tensor.masked_fill_(mask, scalar)\n(this PR)"] -->|"fused CUDA kernel"| F["single-pass fill\n(no device sync)"]
F --> G["result: same values\nhigher throughput"]
A -.->|"replaced by"| E
style A fill:#f9c,stroke:#c66
style E fill:#9cf,stroke:#66c
style B fill:#fdd,stroke:#c66
style C fill:#fdd,stroke:#c66
style F fill:#dfd,stroke:#6c6
style G fill:#dfd,stroke:#6c6
Reviews (1): Last reviewed commit: "Use masked_fill_ for boolean-mask scalar..." | Re-trigger Greptile |
Replace boolean-mask index assignments of the form ``tensor[mask] = scalar`` with ``Tensor.masked_fill_`` across camera/depth observation processing, the linear interpolation utility, the Shadow Hand feature extractor, the drone navigation observations, and the TacSL visuotactile renderer. ``masked_fill_`` performs the fill in a single fused kernel and avoids the ``nonzero``-based advanced-indexing path (and its device sync), which makes these GPU tensor operations more performant while preserving behavior.
4ae5375 to
aa5247f
Compare
Summary
Replaces boolean-mask index assignments of the form
tensor[mask] = scalarwithTensor.masked_fill_in a handful of GPU tensor hot paths.masked_fill_performs the fill in a single fused kernel and avoids thenonzero-based advanced-indexing path (and its device sync), so it's more performant while preserving identical behavior.Touched code:
isaaclab.utils.interpolation.LinearInterpolation— out-of-bounds weight zeroing.isaaclab.envs.mdp.observations.image— depthinfclamping.isaaclab_contribTacSL visuotactile renderer — artifact removal and gradient-direction zeroing.isaaclab_tasksShadow Hand feature extractor, Cartpole camera envs (core + showcase), and drone navigation observations — depth/normalization fills.Skipped intentionally:
stack_ik_rel_blueprint_env_cfg.py, whereimagesmay be a Warp-backedProxyArraythat does not implementmasked_fill_.tensor[mask, :] = 0) that aren't a direct scalarmasked_fill_.Test plan
./isaaclab.sh -f(pre-commit) passes — done locally.