Stokes_Constrained parallel correctness: gauge, convergence, knockout, rotation-gauge#265
Open
lmoresi wants to merge 3 commits into
Open
Stokes_Constrained parallel correctness: gauge, convergence, knockout, rotation-gauge#265lmoresi wants to merge 3 commits into
lmoresi wants to merge 3 commits into
Conversation
… audit Three fixes from a focused parallel-correctness investigation of the in-saddle Lagrange-multiplier free-slip solver (Stokes_Constrained), validated against the Zhong (2008) spherical-shell response (3-D, SphericalShellInternalBoundary). Item 1 — consistent pressure gauge (raw-field reproducibility). On an enclosed constrained problem the constant pressure and constant multiplier are independent gauge freedoms; the solver lands on a partition-dependent level for each, so raw pressure/multiplier are not reproducible across ranks. Added a guarded automatic pressure gauge (auto_pressure_gauge, default on) that pins the surface-mean pressure on the first constraint boundary when a pressure null space is active and the user has registered no gauge of their own. It is physics- neutral (velocity bit-identical) and fixes the raw mean pressure (10.5% -> 0.4% serial vs np=8). The pressure pin does NOT fix the raw multiplier (separate gauge freedom): dynamic topography is read gauge-invariantly via topography(reference="mean"). New parallel regression in test_1063. Item 2 — the grouped-Schur solve was false-converging. The grouped u | [p,h] Schur preconditioner uses inner iterative sub-solves (a variable/nonlinear preconditioner). A non-flexible Krylov method (the base's gmres) is invalid for that and false-converged: it reported CONVERGED_RTOL in ~2 iterations on the preconditioned norm while the TRUE residual ||r||/||b|| blew up to ~1e3, landing on a wrong, partition-dependent answer (the ~0.4% velocity spread and the failure to reproduce the Zhong response). Fixed with four defaults in Stokes_Constrained: ksp_type=fgmres (flexible Krylov), ksp_norm_type= unpreconditioned (honest true-residual stopping test), Eisenstat-Walker rtol0=rtolmax=tolerance*0.1 (EW's default 0.3 capped the linear solve at one iteration), and a bounded ksp_max_it. The solve now genuinely converges (true residual ~1e-13) and reproduces Zhong (Ut within 1%, Ub within 0.1%); new test_1064 default-constrained test asserts this. Velocity partition spread 0.36% -> 0.12%. The residual 0.12% is a separate, pre-existing bug: add_natural_bc on an INTERNAL surface assembles a partition-dependent load (the force *function* integrates identically to 1e-16, but the assembled vector differs 0.027% because interior facets at partition seams lack their neighbour support cell on the non-overlapped assembly DM). Marked with TODO(BUG) at the natural-BC assembly; filed in the planning file. Universal (affects Nitsche too), amplified here by the augmented conditioning. Item 3 — interior-multiplier knockout audit. Confirmed the knockout is genuinely lossless (a converged solve with it disabled moves the answer ~1e-8 in velocity, no interior-h blow-up); the previously observed "2-5% when off" is an iterative-conditioning artifact, not lost physics. Rewrote the docstrings accordingly. Added a guard that warns a monolithic direct solve (pc_type lu/cholesky) of the constrained saddle point is a serial diagnostic only (wrong response + segfaults in parallel). Underworld development team with AI support from Claude Code
…p solution For a free-slip (non-Dirichlet) velocity the rigid-body rotations are a true nullspace of the velocity block (A_uu·rotation = 0). The monolithic nullspace attached for the solve is not removed inside the fieldsplit/Schur iteration: the inner velocity KSP has no rotation nullspace, so it leaves an unconstrained rigid rotation in the velocity. The operator is blind to it (true residual still converges to machine precision), but the rotation amplitude is partition- dependent — the tangential velocity differed serial-vs-parallel by ~0.1–1% even at a converged residual, while the physical (radial / gauge-invariant) flow was already partition-clean. SNES_Stokes_SaddlePt.solve() now projects the velocity rotation modes out of the converged solution via MatNullSpaceRemove (_remove_velocity_rotation_gauge). Because the rotation is a genuine nullspace, removing it does not change the residual and yields the same rotation-free solution on any decomposition. This is the velocity analogue of the constant-pressure gauge (set_pressure_gauge). Surgical: only the velocity rotation modes (zero in the pressure/multiplier blocks), so the existing pressure/multiplier gauge handling is untouched; a no-op when there are no rotation modes (e.g. Dirichlet velocity BCs). Validated on the SphericalShellInternalBoundary constrained free-slip problem: the tangential surface velocity serial-vs-np8 relative difference drops from 1.2% to 2.5e-14, with the iteration count unchanged. Regression clean: test_1064 + test_1010 (9 passed, 2 pre-existing xfails) and test_1063 parallel (np=2) all pass. Also refines the TODO(BUG) at the natural-BC assembly loop to document the underlying PETSc interior-facet support[0] partition-dependence and the workarounds that were tested and ruled out. Underworld development team with AI support from Claude Code
Adds a fast serial tier-a/level-1 test that solves a fully free-slip annulus (both boundaries constraint BCs, no Dirichlet velocity BC -> rigid-rotation nullspace active) driven by a purely radial body force, and asserts the converged velocity carries no rigid-rotation component. The existing test_1063 cases all pin the rotation nullspace with a Dirichlet BC, so they did not cover this path; this test fails without _remove_velocity_rotation_gauge (the rotation coefficient is ~1e-3) and passes with it (~4e-15). Underworld development team with AI support from Claude Code
Contributor
There was a problem hiding this comment.
Pull request overview
This PR improves parallel correctness and solver robustness for the constrained free-slip Stokes solver (SNES_Stokes_Constrained / SNES_Stokes_SaddlePt) by addressing gauge freedoms (pressure and rigid-rotation), fixing false convergence in the grouped Schur fieldsplit configuration, and adding targeted regression tests to lock in partition-independent behavior.
Changes:
- Add guarded automatic pressure gauge pinning for enclosed constrained problems (
auto_pressure_gauge) to make raw mean pressure partition-reproducible. - Switch default outer Krylov/convergence settings for constrained solves to a flexible method with true-residual norm monitoring (FGMRES + unpreconditioned norm) and tighten EW behavior.
- Project rigid-body rotation modes out of the converged velocity to remove a partition-dependent rotation gauge; add regression coverage (serial + MPI).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/underworld3/systems/solvers.py |
Adds constrained-solver defaults (fgmres/unpreconditioned norm/EW tightening) and guarded automatic pressure gauge installation. |
src/underworld3/cython/petsc_generic_snes_solvers.pyx |
Adds post-solve rigid-rotation gauge removal plus extensive notes on internal-surface natural BC partition dependence. |
tests/test_1065_rotation_gauge_freeslip.py |
New serial regression test asserting the free-slip solution has ~0 projection onto the rigid-rotation mode. |
tests/parallel/test_1063_constrained_freeslip_parallel.py |
Adds an MPI regression for raw pressure gauge reproducibility and updates the CLI recomputation helper. |
tests/test_1064_constrained_spherical_shell_response.py |
Adds a new test ensuring the default constrained path matches Zhong benchmark velocity response. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+2201
to
+2206
| self._tolerance = value | ||
| self.petsc_options["snes_rtol"] = value | ||
| self.petsc_options["ksp_rtol"] = value * 1.0e-1 | ||
| self.petsc_options["ksp_atol"] = value * 1.0e-6 | ||
| self.petsc_options["snes_ksp_ew_rtol0"] = value * 1.0e-1 | ||
| self.petsc_options["snes_ksp_ew_rtolmax"] = value * 1.0e-1 |
Comment on lines
+5565
to
+5568
| if getattr(self, "_velocity_rotation_nullspace", None) is not None: | ||
| return self._velocity_rotation_nullspace | ||
| if not self._petsc_velocity_nullspace_basis: | ||
| return None |
Comment on lines
+5605
to
+5607
| rot_ns = self._build_velocity_rotation_nullspace() | ||
| if rot_ns is not None: | ||
| rot_ns.remove(gvec) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Parallel-correctness work for
SNES_Stokes_Constrained(the in-saddle Lagrange-multiplier free-slip solver), bringing the constrained free-slip solve to round-off partition-independence.What's here
1. Auto pressure gauge. Guarded
auto_pressure_gauge(default on) pins the surface-mean pressure on the first constraint boundary when a pressure null space is active and the user set no gauge. Physics-neutral; raw mean pressure 10.5%→0.4% at np=8, velocity bit-identical. Topography is read gauge-invariantly viatopography(reference="mean")— the raw multiplier keeps its own independent gauge freedom the pressure pin does not touch.2. Solver convergence. The grouped
u | [p,h]Schur preconditioner uses inner iterative sub-solves = a variable preconditioner; plaingmresis invalid for that and false-converged (preconditioned norm reported CONVERGED while the true residual blew up), landing on a wrong, partition-dependent answer. Fixed withksp_type=fgmres,ksp_norm_type=unpreconditioned, a tightened Eisenstat–Walker default, and a defensiveksp_max_it. Now genuinely converges (true resid ~1e-13) and reproduces the Zhong benchmark.3. Knockout audit + LU guard. Confirmed the interior-multiplier knockout is lossless (the earlier "2–5% off" was iterative conditioning, not lost physics). Added a warning that a monolithic direct solve (
pc_type lu/cholesky) of the constrained saddle point is a serial diagnostic only.4. Rigid-rotation gauge fix (the main new result). On a fully free-slip velocity the rigid rotations are a true nullspace of the velocity block (
A_uu·rotation = 0). The monolithic nullspace attached for the solve is not removed inside the fieldsplit/Schur iteration — the inner velocity KSP has no rotation nullspace, so it leaves an unconstrained, partition-dependent rigid rotation in the velocity. The operator is blind to it (the true residual still converges to machine precision), so the tangential velocity differed serial-vs-parallel by ~0.1–1% while the physical (radial) flow was already clean.SNES_Stokes_SaddlePt.solvenow projects the velocity rotation modes out of the converged solution viaMatNullSpaceRemove(_remove_velocity_rotation_gauge) — the velocity analogue of the constant-pressure gauge. Because the rotation is a genuine nullspace, removing it does not change the residual and yields the same rotation-free solution on any decomposition.Validated on the constrained free-slip problem: tangential surface velocity serial-vs-np8 relative difference 1.2% → 2.5e-14, iteration count unchanged. New serial guard
tests/test_1065_rotation_gauge_freeslip.py(the existingtest_1063cases all pin the rotation nullspace with a Dirichlet BC, so they did not cover this path).Known PETSc limitation (documented, not fixed here)
add_natural_bcon an internal surface assembles a slightly partition-dependent load:DMPlexComputeBdResidual_Single_Internalattributes each interior facet's integral to a partition-dependentsupport[0]cell closure, so a seam facet's non-owned closure DOFs are dropped at global assembly (F0 differs ~3e-4). The refinedTODO(BUG)records this and the workarounds that were tested and ruled out (owner-only label strip, 1-cell overlap, codim-1 submesh). Once the rotation gauge is removed, this leaves only ~3e-6 in the velocity (the elliptic solve smooths the localized load error); applying the internal traction as a volume body force makes it round-off.Tests
test_1064(constrained spherical shell) +test_1010(Cartesian Stokes): 9 passed, 2 pre-existing xfails.test_1063(parallel constrained free-slip, np=2): passed.test_1065(new rotation-gauge guard): passed.Underworld development team with AI support from Claude Code