fix(meshing): SPD-floor the mmpde metric to stop silent NaN-bail on deformed meshes#259
Conversation
…eformed meshes _winslow_mmpde's energy/gradient use fractional powers (sqrt(detM), detM**((1-p)/2), and S**q with S = tr(J M^-1 J^T)) that are defined only for an SPD metric. The metric is a guide field FE-evaluated at a FIXED reference cloud; once the interior deforms, a reference point can fall outside the current mesh and the P1 density is then evaluated by FE extrapolation, going negative. A scalar density rho<0 gives M = rho*I with det = rho^2 > 0 (so a detM>0 check passes) but M negative-definite, so S<0 and S**q = NaN. The energy is then non-finite and the mover bails with zero displacement -> adaptation silently stops on roughly half the adapts of an adaptive convection loop (mesh under-resolves; the symptom was previously misread as adapt/solver coupling or "passive fault damping"). Project every evaluated metric tensor onto SPD with a small relative eigenvalue floor at the _eval_M chokepoint. A genuinely SPD metric is returned unchanged (short-circuit no-op); extrapolation garbage becomes a benign "coarsen here" instead of a NaN. No behaviour change on valid metrics (bit-identical), no API change. Tests: test_0762/0850/0750/0830 green (33 passed, 5 skipped MMG, 4 pre-existing xfail). Validated on the stagnant-lid adaptive convection driver: all forced adapts move, T stays in [0,1], vrms bounded over 52 steps (no-fault) and 32 steps (passive fault). Underworld development team with AI support from Claude Code
There was a problem hiding this comment.
Pull request overview
This PR hardens the Winslow/MMPDE mesh smoother against non-SPD metric tensors (typically produced by FE extrapolation after mesh deformation) by sanitizing the evaluated metric so fractional-power terms in the MMPDE functional do not generate NaNs and silently stall adaptation.
Changes:
- Wraps
_eval_Mwith an SPD projection step using eigenvalue flooring to ensure the metric is symmetric positive-definite. - Adds detailed in-code rationale describing how non-SPD metrics arise and how the sanitizer prevents NaN-driven no-op adapts.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _spd_sanitise(M): | ||
| Ms = 0.5 * (M + np.swapaxes(M, -1, -2)) | ||
| w, Vc = np.linalg.eigh(Ms) | ||
| if w.size: | ||
| wmax = float(np.nanmax(w)) | ||
| else: | ||
| wmax = 1.0 | ||
| floor = max(wmax, 1.0) * 1.0e-8 | ||
| if np.all(np.isfinite(w)) and float(w.min()) >= floor: | ||
| return Ms # already SPD → no-op | ||
| w = np.clip(np.nan_to_num(w, nan=floor, posinf=wmax, neginf=floor), | ||
| floor, None) | ||
| return np.einsum('nij,nj,nkj->nik', Vc, w, Vc) |
There was a problem hiding this comment.
Addressed in 732557e: added an empty-batch early return (parallel rank with no owned cells no longer hits the zero-size reduction) and made the SPD test per-tensor — only the cells whose own eigenvalues fail are projected, every already-SPD tensor is returned bit-identical. Verified empty-safe + valid cells untouched + all outputs SPD.
Address Copilot review on _spd_sanitise: - Guard the empty batch (a parallel rank that owns no cells): return early before the eigenvalue reductions, which previously raised on a zero-size array. - Make the SPD test per-tensor: only the cells whose OWN eigenvalues are non-finite or below the floor are projected; every already-SPD tensor is returned bit-identical to the symmetrised input. One bad point can no longer perturb otherwise-valid cells (matches the "no-op for SPD" contract). Verified: empty input is safe, a single garbage cell leaves the rest bit-identical, all output tensors SPD. Underworld development team with AI support from Claude Code
Restores the intended "RBF metric eval" robustness. The mmpde mover's RBF/Shepard metric path baked the metric via an FE evaluation at the FIXED reference cloud `ref`; on a deformed mesh that reference can mis-locate / drift outside the deformed interior and the FE evaluation returns garbage (the P1 density — strictly positive by construction — comes back NEGATIVE, even at a field's own DOFs). A negative density is a non-SPD metric, which the mover then either NaN-bails on (a hidden stall) or, with the #259 SPD-floor, acts on as "coarsen hard here" → giant cells / holes / divergence in the adapted mesh. Fix: bake the metric at the CURRENT mesh NODES (its own DOF locations) and Shepard-interpolate from there. A Shepard (positive-weight, convex) average of the positive nodal values is GUARANTEED >= 0 (monotone) — the metric can never go non-SPD from the eval — and nodes are always inside the mesh (no out-of-domain / drift) and need no per-step cell location (fast). `ref` is kept for the _edge_mats reference frame. Best paired with #264 (deformed-mesh point-location fix), which makes the nodal FE bake exact; together the adapted mesh stays clean under forced every-step adaptation at R=5 (folded=0, cell-area-ratio flat) where it previously tore holes. Validated in a sibling worktree (this is a one-line source change; CI runs the full mover suite). Underworld development team with AI support from Claude Code
Problem
_winslow_mmpde's energy/gradient use fractional powers —sqrt(detM),detM**((1-p)/2), andS**qwithS = tr(J·M⁻¹·Jᵀ)— that are defined only for an SPD metric.The metric is a guide field FE-evaluated at a fixed reference cloud. Once the interior deforms, a reference point can fall outside the current mesh and the P1 density is then evaluated by FE extrapolation, going negative. A scalar density
ρ<0givesM = ρ·Iwithdet = ρ² > 0— so adetM>0check passes — butMis negative-definite, soS<0andS**q = NaN. The energy becomes non-finite and the mover bails with zero displacement.The effect is intermittent and silent: on an adaptive convection loop roughly half the adapts no-op, the mesh under-resolves, and the symptom is easily misread as an adapt/solver coupling problem or "passive-fault damping."
Fix
Project every evaluated metric tensor onto SPD with a small relative eigenvalue floor at the single
_eval_Mchokepoint:No API change. No behaviour change where the metric is already valid — verified bit-identical on the adapts that previously succeeded.
Validation
test_0762_fault_metric_tensor,test_0850_mesh_smoothing,test_0750_meshing_follow_metric,test_0830_mesh_adapt_variable_transfer: 33 passed, 5 skipped (MMG), 4 pre-existing xfail.[0,1], vrms bounded over 52 steps (no-fault) and 32 steps (passive fault). Without it, ~half the adapts NaN-bail and the mesh stops adapting.Underworld development team with AI support from Claude Code