|
| 1 | +# splat-native-ultrasound — SIMD substrate (ndarray perspective) — v1 |
| 2 | + |
| 3 | +> **Status:** PROPOSAL / integration plan (ndarray-side perspective). Design-spec only; **no code in this plan**. |
| 4 | +> **Authored:** 2026-06-05 (session `claude/lance-graph-ontology-review-Pyry3`). |
| 5 | +> **Canonical plan:** `lance-graph/.claude/plans/splat-native-ultrasound-v1.md` — this doc is the ndarray-side companion. |
| 6 | +> |
| 7 | +> **Anchored to:** `I-NOISE-FLOOR-JIRAK` (Jirak weak-dependence Berry-Esseen for any significance threshold); the ndarray vertical-SIMD consumer contract at `lance-graph/.claude/knowledge/ndarray-vertical-simd-alien-magic.md` (W1a-style primitive-addition pattern, all three backends mandatory). |
| 8 | +
|
| 9 | +--- |
| 10 | + |
| 11 | +## 0. What this is from ndarray's perspective |
| 12 | + |
| 13 | +ndarray is the **SIMD substrate** for the splat-native ultrasound arc. Every floating-point op in the splat-fit → registration → render pipeline runs on the same `src/simd_*` family ndarray already exposes. This doc spells out the **one new module** owed by ndarray — `src/simd_splat.rs` — and its W1c-style five primitives. |
| 14 | + |
| 15 | +This is one deliverable (**D-SPLAT-2**) from the canonical plan. The carriers (`Gaussian3D`, `SplatBatch<N>`) live in `lance-graph-contract`, not here; ndarray sees them as opaque `&[u8]` / typed slice views. The same separation already governs `Fingerprint<256>` and `CamCodecContract` — math here, carriers there, engines downstream. |
| 16 | + |
| 17 | +**Sprint window:** P1 (sprint 1-2) — the **foundation phase**. Must land first; the entire downstream arc depends on these primitives. |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## 1. What ndarray owns |
| 22 | + |
| 23 | +### D-SPLAT-2 — `src/simd_splat.rs` (the W1c primitive bundle) |
| 24 | + |
| 25 | +A new module under the existing `src/simd_*` family. Five batch primitives, all dispatched through the runtime `simd_caps()` singleton, all three backends mandatory (AVX-512 / NEON / scalar) per the consumer-contract knowledge doc. |
| 26 | + |
| 27 | +**Module skeleton:** |
| 28 | + |
| 29 | +```rust |
| 30 | +//! src/simd_splat.rs — anisotropic Gaussian batch ops for splat-native ultrasound |
| 31 | +//! |
| 32 | +//! Five primitives, all SIMD-dispatched via `simd_caps()`: |
| 33 | +//! - batched_cholesky_3x3 |
| 34 | +//! - batched_mahalanobis |
| 35 | +//! - batched_opacity_blend |
| 36 | +//! - batched_sh_eval_l3 |
| 37 | +//! - batched_se3_transform |
| 38 | +//! |
| 39 | +//! Operates on packed `Gaussian3D` SoA layouts owned by lance-graph-contract. |
| 40 | +//! All primitives are zero-allocation in the hot loop; SoA column slices in/out. |
| 41 | + |
| 42 | +use crate::simd::SimdBackend; |
| 43 | +use crate::simd_caps; |
| 44 | + |
| 45 | +pub fn batched_cholesky_3x3( |
| 46 | + sigma_packed: &[f32], // N × 6 packed lower-tri Σ (xx, yy, zz, xy, xz, yz) |
| 47 | + out_l_packed: &mut [f32], // N × 6 Cholesky factor L (same packing) |
| 48 | +); |
| 49 | + |
| 50 | +pub fn batched_mahalanobis( |
| 51 | + query_xyz: &[f32], // M × 3 query points |
| 52 | + mu_xyz: &[f32], // N × 3 Gaussian centroids |
| 53 | + sigma_packed: &[f32], // N × 6 packed Σ |
| 54 | + out_dist_sq: &mut [f32], // M × N output (squared Mahalanobis) |
| 55 | +); |
| 56 | + |
| 57 | +pub fn batched_opacity_blend( |
| 58 | + sorted_amplitudes: &[f32], // N (front-to-back along view ray) |
| 59 | + opacity_lut: &[u8; 256], // amplitude → opacity LUT |
| 60 | + out_alpha: &mut [u8], // composited alpha per pixel |
| 61 | +); |
| 62 | + |
| 63 | +pub fn batched_sh_eval_l3( |
| 64 | + view_dir: [f32; 3], // unit view direction |
| 65 | + sh_coeffs: &[f32], // N × 16 SH coefficients per Gaussian |
| 66 | + out_luminance: &mut [f32], // N evaluated luminances |
| 67 | +); |
| 68 | + |
| 69 | +pub fn batched_se3_transform( |
| 70 | + pose_se3: [f32; 12], // 3 translation + 9 rotation matrix |
| 71 | + mu_in: &[f32], // N × 3 input centroids |
| 72 | + mu_out: &mut [f32], // N × 3 transformed centroids |
| 73 | + sigma_in_packed: &[f32], // N × 6 input Σ |
| 74 | + sigma_out_packed: &mut [f32], // N × 6 transformed (R Σ Rᵀ) |
| 75 | +); |
| 76 | +``` |
| 77 | + |
| 78 | +### Three-backend implementation matrix (mandatory) |
| 79 | + |
| 80 | +Per the consumer-contract knowledge doc, every primitive ships in three implementations: |
| 81 | + |
| 82 | +| Primitive | AVX-512 | NEON | Scalar | |
| 83 | +|---|---|---|---| |
| 84 | +| `batched_cholesky_3x3` | 16-lane batch (process 16 Σ matrices in parallel) | 4-lane batch (NEON 128-bit) | per-Σ scalar | |
| 85 | +| `batched_mahalanobis` | 16-lane query × 16-lane Gaussian unroll | 4-lane × 4-lane | nested scalar loop | |
| 86 | +| `batched_opacity_blend` | gather LUT + alpha composite, 16-lane | 4-lane | scalar | |
| 87 | +| `batched_sh_eval_l3` | 16-lane SH-basis evaluation, view dir broadcast | 4-lane | scalar | |
| 88 | +| `batched_se3_transform` | 16-lane matmul (μ vector × R) + 16-lane Σ similarity | 4-lane | scalar | |
| 89 | + |
| 90 | +Parity tests are MANDATORY (`I-NOISE-FLOOR-JIRAK`-aware significance gates if any thresholds are claimed). |
| 91 | + |
| 92 | +### Saturating / overflow semantics |
| 93 | + |
| 94 | +- **Cholesky**: degenerate Σ (det ≤ 0) yields NaN in L; downstream uses NaN as a sentinel for "this Gaussian is degenerate, skip it" rather than panicking. Documented as the W1c "VPABSB-correction-style" semantic carve-out in the function docstring. |
| 95 | +- **Mahalanobis**: returns `f32::INFINITY` for distances against degenerate Σ rather than NaN (lets the registration math sort distance arrays without NaN-propagation pitfalls). |
| 96 | +- **Opacity blend**: front-to-back saturation at 1.0 (alpha ceiling); accumulator uses `u16` internally then truncates to `u8` output. |
| 97 | +- **SH evaluation**: f32 luminance output, no clamping (caller decides whether to clamp for visual rendering vs computational pipeline). |
| 98 | +- **SE(3) transform**: assumes input rotation matrix is orthonormal; documentation includes a `debug_assert!` validation in debug builds. |
| 99 | + |
| 100 | +--- |
| 101 | + |
| 102 | +## 2. What ndarray consumes |
| 103 | + |
| 104 | +Nothing splat-specific — D-SPLAT-2 is foundation work. The primitives consume: |
| 105 | + |
| 106 | +- **Existing `simd_caps()` dispatch** — already shipped, already battle-tested across the 5 SIMD modules. |
| 107 | +- **Existing AVX-512 / NEON test infrastructure** — same parity-test patterns as `simd_avx2`, `simd_avx512`, `simd_neon`. |
| 108 | +- **Standard `core::arch::x86_64` / `core::arch::aarch64` intrinsics** — no new third-party deps. |
| 109 | + |
| 110 | +No upstream blockers. |
| 111 | + |
| 112 | +--- |
| 113 | + |
| 114 | +## 3. What ndarray hands off |
| 115 | + |
| 116 | +### To `lance-graph-contract` (D-SPLAT-1/3 in the canonical plan) |
| 117 | + |
| 118 | +`Gaussian3D` and `SplatBatch<N>` carrier types — they consume ndarray's primitives via the typed-slice API. The carrier crate has zero direct dep on ndarray (consistent with the zero-dep policy for contract); it exposes `&[f32]` column slices for the consumer to feed into `simd_splat`. |
| 119 | + |
| 120 | +### To `crates/splat-fit` (D-SPLAT-6) |
| 121 | + |
| 122 | +The new standalone `splat-fit` crate consumes `ndarray::simd_splat` via the `ndarray-hpc` feature flag (mirrors the `deepnsm` / `bgz17` pattern: optional ndarray dep, zero-dep core). |
| 123 | + |
| 124 | +### To `lance-graph::splat::registration` (D-SPLAT-5) |
| 125 | + |
| 126 | +The ICP / SE(3) optimizer consumes `batched_cholesky_3x3`, `batched_mahalanobis`, `batched_se3_transform` directly. The Σ-sandwich Mahalanobis is two `batched_mahalanobis` calls (against atlas Σ, against live Σ) with elementwise sum — no new primitive needed. |
| 127 | + |
| 128 | +### To `lance-graph::splat-render` (D-SPLAT-12) |
| 129 | + |
| 130 | +The renderer consumes `batched_opacity_blend` + `batched_sh_eval_l3` per-frame. Sort + paint = two ndarray calls. |
| 131 | + |
| 132 | +--- |
| 133 | + |
| 134 | +## 4. Per-primitive specifications |
| 135 | + |
| 136 | +### 4.1 `batched_cholesky_3x3` |
| 137 | + |
| 138 | +**Signature** (full Rust): |
| 139 | + |
| 140 | +```rust |
| 141 | +/// Compute Cholesky factor L such that L Lᵀ = Σ, for N packed 3×3 symmetric |
| 142 | +/// positive-definite matrices Σ. |
| 143 | +/// |
| 144 | +/// # Layout |
| 145 | +/// - `sigma_packed[i*6..i*6+6]` = [Σ₀₀, Σ₁₁, Σ₂₂, Σ₀₁, Σ₀₂, Σ₁₂] |
| 146 | +/// - `out_l_packed[i*6..i*6+6]` = [L₀₀, L₁₁, L₂₂, L₁₀, L₂₀, L₂₁] |
| 147 | +/// (lower-triangular, diag + sub-diag) |
| 148 | +/// |
| 149 | +/// # Semantics |
| 150 | +/// - Degenerate Σ (non-PD, det ≤ 0) produces NaN in the output. Downstream |
| 151 | +/// uses NaN as a sentinel for "skip this Gaussian"; do NOT propagate. |
| 152 | +/// - SIMD-batched 16-at-a-time on AVX-512, 4-at-a-time on NEON, scalar fallback. |
| 153 | +/// |
| 154 | +/// # Panics |
| 155 | +/// Never. |
| 156 | +pub fn batched_cholesky_3x3(sigma_packed: &[f32], out_l_packed: &mut [f32]); |
| 157 | +``` |
| 158 | + |
| 159 | +**Algorithm** (3×3 explicit, no loops): |
| 160 | + |
| 161 | +```text |
| 162 | +L₀₀ = √Σ₀₀ |
| 163 | +L₁₀ = Σ₀₁ / L₀₀ |
| 164 | +L₂₀ = Σ₀₂ / L₀₀ |
| 165 | +L₁₁ = √(Σ₁₁ - L₁₀²) |
| 166 | +L₂₁ = (Σ₁₂ - L₂₀·L₁₀) / L₁₁ |
| 167 | +L₂₂ = √(Σ₂₂ - L₂₀² - L₂₁²) |
| 168 | +``` |
| 169 | + |
| 170 | +**Tests:** |
| 171 | +- Reference comparison against `nalgebra::Matrix3::cholesky()` on random PD Σ (N = 100k). |
| 172 | +- Degenerate-Σ NaN sentinel (Σ = zero matrix; Σ with negative eigenvalue). |
| 173 | +- SIMD parity AVX-512 ≡ NEON ≡ scalar (f32 ULP tolerance ≤ 4). |
| 174 | +- Bench: ≥ 5× scalar throughput on AVX-512 at N = 1M. |
| 175 | + |
| 176 | +### 4.2 `batched_mahalanobis` |
| 177 | + |
| 178 | +**Signature:** |
| 179 | + |
| 180 | +```rust |
| 181 | +/// Squared Mahalanobis distance: (x - μ)ᵀ Σ⁻¹ (x - μ). |
| 182 | +/// |
| 183 | +/// Computes M queries against N Gaussians. Output is M × N row-major. |
| 184 | +/// |
| 185 | +/// # Inputs |
| 186 | +/// - `query_xyz[m*3..m*3+3]` = m-th query point |
| 187 | +/// - `mu_xyz[n*3..n*3+3]` = n-th Gaussian centroid |
| 188 | +/// - `sigma_packed[n*6..n*6+6]` = n-th Σ (same packing as Cholesky) |
| 189 | +/// |
| 190 | +/// # Output |
| 191 | +/// - `out_dist_sq[m*N + n]` = ‖x_m - μ_n‖²_{Σ_n} |
| 192 | +/// |
| 193 | +/// # Semantics |
| 194 | +/// - Degenerate Σ (Cholesky NaN) yields f32::INFINITY (sortable; never NaN). |
| 195 | +/// - SIMD-batched 16×16 on AVX-512, 4×4 on NEON. |
| 196 | +pub fn batched_mahalanobis( |
| 197 | + query_xyz: &[f32], mu_xyz: &[f32], sigma_packed: &[f32], out_dist_sq: &mut [f32], |
| 198 | +); |
| 199 | +``` |
| 200 | + |
| 201 | +**Implementation note:** internally calls `batched_cholesky_3x3` once on `sigma_packed`, caches L (heap-free via stack or caller-provided scratch), then triangular-solve + squared norm per (m, n) pair. |
| 202 | + |
| 203 | +**Tests:** |
| 204 | +- Reference comparison against scipy `scipy.spatial.distance.mahalanobis` on random points + Σ. |
| 205 | +- Triangle inequality on a metric subset (commutativity / non-negativity / zero-on-equal). |
| 206 | +- SIMD parity. |
| 207 | +- Bench: ≥ 1.4× scalar at M=1k, N=1M (the splat-to-splat registration regime). |
| 208 | + |
| 209 | +### 4.3 `batched_opacity_blend` |
| 210 | + |
| 211 | +**Signature:** |
| 212 | + |
| 213 | +```rust |
| 214 | +/// Front-to-back alpha compositing over a sorted Gaussian sequence. |
| 215 | +/// |
| 216 | +/// # Inputs |
| 217 | +/// - `sorted_amplitudes[i]` = i-th Gaussian's peak echo amplitude, sorted |
| 218 | +/// front-to-back along the view ray. |
| 219 | +/// - `opacity_lut[i]` = amplitude-bucket → opacity LUT (256 buckets; |
| 220 | +/// amplitude quantized into [0..256) via the per-frame max-amplitude). |
| 221 | +/// |
| 222 | +/// # Output |
| 223 | +/// - `out_alpha[ray]` = composited alpha per ray (8-bit final). |
| 224 | +/// |
| 225 | +/// # Semantics |
| 226 | +/// - Composition: α_new = α_old + (1 - α_old) · α_i. |
| 227 | +/// - Internal accumulator: u16 with saturation; truncate to u8 at end. |
| 228 | +pub fn batched_opacity_blend( |
| 229 | + sorted_amplitudes: &[f32], opacity_lut: &[u8; 256], out_alpha: &mut [u8], |
| 230 | +); |
| 231 | +``` |
| 232 | + |
| 233 | +**Tests:** |
| 234 | +- Reference comparison against scalar reference for known sequences. |
| 235 | +- Saturation at full opacity (sequence of high-amplitude Gaussians → α = 255). |
| 236 | +- Empty sequence → α = 0. |
| 237 | +- SIMD parity. |
| 238 | + |
| 239 | +### 4.4 `batched_sh_eval_l3` |
| 240 | + |
| 241 | +**Signature:** |
| 242 | + |
| 243 | +```rust |
| 244 | +/// Evaluate degree-3 spherical harmonics at a single view direction for N Gaussians. |
| 245 | +/// |
| 246 | +/// # Inputs |
| 247 | +/// - `view_dir[3]` — unit view direction in scanner frame. |
| 248 | +/// - `sh_coeffs[i*16..i*16+16]` — i-th Gaussian's 16 SH coefficients |
| 249 | +/// (degree 0: 1, degree 1: 3, degree 2: 5, degree 3: 7 — totaling 16). |
| 250 | +/// |
| 251 | +/// # Output |
| 252 | +/// - `out_luminance[i]` — i-th Gaussian's view-dependent luminance. |
| 253 | +/// |
| 254 | +/// # Semantics |
| 255 | +/// - No clamping; caller decides downstream behavior (visual: clamp to [0, 1]; |
| 256 | +/// computational pipeline: signed luminance is fine). |
| 257 | +pub fn batched_sh_eval_l3(view_dir: [f32; 3], sh_coeffs: &[f32], out_luminance: &mut [f32]); |
| 258 | +``` |
| 259 | + |
| 260 | +**Tests:** |
| 261 | +- Reference comparison against analytical SH evaluation (closed-form basis functions at degree ≤ 3). |
| 262 | +- Rotation invariance: SH coefficients rotated by R, evaluated at view dir = identity; vs original SH coefficients, evaluated at view dir = R⁻¹. |
| 263 | +- SIMD parity. |
| 264 | + |
| 265 | +### 4.5 `batched_se3_transform` |
| 266 | + |
| 267 | +**Signature:** |
| 268 | + |
| 269 | +```rust |
| 270 | +/// Apply a rigid SE(3) transform to N Gaussian centroids and covariance matrices. |
| 271 | +/// |
| 272 | +/// # Inputs |
| 273 | +/// - `pose_se3[0..3]` = translation t |
| 274 | +/// - `pose_se3[3..12]` = rotation R (row-major 3×3) |
| 275 | +/// - `mu_in[i*3..i*3+3]` = input centroid |
| 276 | +/// - `sigma_in_packed[i*6..i*6+6]` = input Σ |
| 277 | +/// |
| 278 | +/// # Output |
| 279 | +/// - `mu_out[i*3..i*3+3]` = R · μ_in + t |
| 280 | +/// - `sigma_out_packed[i*6..i*6+6]` = R · Σ_in · Rᵀ |
| 281 | +/// |
| 282 | +/// # Semantics |
| 283 | +/// - Assumes R is orthonormal (debug_assert in debug builds). |
| 284 | +/// - SIMD-batched 16-at-a-time on AVX-512. |
| 285 | +pub fn batched_se3_transform( |
| 286 | + pose_se3: [f32; 12], mu_in: &[f32], mu_out: &mut [f32], |
| 287 | + sigma_in_packed: &[f32], sigma_out_packed: &mut [f32], |
| 288 | +); |
| 289 | +``` |
| 290 | + |
| 291 | +**Tests:** |
| 292 | +- Identity transform (pose = (0, I)) → output equals input. |
| 293 | +- Orthonormal rotation preserves Σ trace. |
| 294 | +- Composition: SE(3)(A) followed by SE(3)(B) ≡ SE(3)(B·A) within ULP tolerance. |
| 295 | +- SIMD parity. |
| 296 | + |
| 297 | +--- |
| 298 | + |
| 299 | +## 5. Cross-repo dependencies (what ndarray waits on / blocks) |
| 300 | + |
| 301 | +### Upstream (what ndarray waits on) |
| 302 | + |
| 303 | +- **Nothing.** D-SPLAT-2 is foundation work; runs before everything else. |
| 304 | + |
| 305 | +### Downstream (what blocks on ndarray D-SPLAT-2) |
| 306 | + |
| 307 | +- **`lance-graph-contract`** D-SPLAT-1/3 (carriers) — can develop in parallel; the carrier API is independent of the math implementation. Soft edge. |
| 308 | +- **`crates/splat-fit`** D-SPLAT-6 — **hard blocker**. PSF estimation calls `batched_cholesky_3x3` per peak. |
| 309 | +- **`lance-graph::splat::registration`** D-SPLAT-5 — **hard blocker**. ICP runs Mahalanobis + SE(3) transforms per iteration. |
| 310 | +- **`lance-graph::splat-render`** D-SPLAT-12 — **hard blocker**. Opacity blend + SH eval per frame. |
| 311 | +- **`splat-actors`** D-SPLAT-7 — **soft blocker**. Pose accumulator only needs `batched_se3_transform`; could partially run on scalar fallback while waiting for SIMD. |
| 312 | +- **`MedCare-rs`** D-SPLAT-10/11 — **independent**. Storage / audit don't run math. |
| 313 | +- **`OGAR`** Phase 8 (FMA TTL) — **independent**. Ontology prep doesn't touch ndarray. |
| 314 | + |
| 315 | +--- |
| 316 | + |
| 317 | +## 6. Risk matrix (ndarray-specific) |
| 318 | + |
| 319 | +| Risk | Severity | Mitigation | |
| 320 | +|---|---|---| |
| 321 | +| SIMD parity diverges across backends (AVX-512 vs NEON vs scalar) | MED | Mandatory parity tests on every primitive (per consumer-contract knowledge doc); ULP-tolerance gates documented per primitive. | |
| 322 | +| Degenerate-Σ NaN propagation into downstream | HIGH | Documented NaN-sentinel semantics; `batched_mahalanobis` converts NaN distances to `f32::INFINITY` to make sort stable; integration tests cover degenerate paths. | |
| 323 | +| AVX-512 throughput regression on older Skylake-X / Cascade Lake (VBMI gate) | MED | Reuse existing `VBMI gate for permute_bytes` pattern from ndarray PR #142; gate `batched_opacity_blend`'s 16-lane gather variant behind VBMI feature detection. | |
| 324 | +| Bench fluctuation across CI runners | LOW | Use the existing nightly bench infrastructure; ≥ 5 runs minimum; report median + IQR. | |
| 325 | +| Carrier-API changes in `lance-graph-contract` between draft and final | LOW | ndarray exposes typed-slice API (`&[f32]`); carrier struct layout changes don't affect ndarray. | |
| 326 | + |
| 327 | +--- |
| 328 | + |
| 329 | +## 7. Acceptance criteria |
| 330 | + |
| 331 | +### Per-primitive |
| 332 | + |
| 333 | +- **Correctness:** reference comparison passes (≤ 4 ULP) against the named reference implementation. |
| 334 | +- **Parity:** AVX-512 ≡ NEON ≡ scalar (within ULP tolerance). |
| 335 | +- **Bench:** primitive-specific throughput gate met on the target hardware (AVX-512: x86-64-v4; NEON: Apple M-series or Cortex-A78). |
| 336 | + |
| 337 | +### Module-level |
| 338 | + |
| 339 | +- `cargo test -p ndarray --features simd-splat` — green. |
| 340 | +- `cargo bench -p ndarray --bench simd_splat` — published baseline numbers. |
| 341 | +- Three-backend implementations live in the existing `src/simd_*` family pattern; no new `unsafe` blocks beyond what the family already uses; all `unsafe` blocks have `// SAFETY:` comments. |
| 342 | +- Module is feature-gated behind `simd-splat` (default-on, per ndarray's "ndarray is mandatory" policy from PR #463); zero-impact on existing consumers. |
| 343 | + |
| 344 | +--- |
| 345 | + |
| 346 | +## 8. Cross-references |
| 347 | + |
| 348 | +- **Canonical plan:** `lance-graph/.claude/plans/splat-native-ultrasound-v1.md` §§3.1 (D-SPLAT-1), 3.2 (D-SPLAT-2 detailed spec), 11.1-11.7 (work division) |
| 349 | +- **Consumer contract:** `lance-graph/.claude/knowledge/ndarray-vertical-simd-alien-magic.md` (W1a primitive-addition pattern; mandatory three-backend implementation; parity-test gates; VPABSB-correction example for saturating semantics) |
| 350 | +- **ndarray priors:** |
| 351 | + - `src/simd_avx512.rs` (W1a primitives in production) |
| 352 | + - `src/simd_avx2.rs`, `src/simd_neon.rs` (companion backends) |
| 353 | + - PR #142 (VBMI gate for `permute_bytes` — relevant for `batched_opacity_blend`) |
| 354 | + - PR #189 (`OntologySchema::is_ancestor` — relevant for D-SPLAT-8 FMA traversal downstream) |
| 355 | + - PR #463 (ndarray-is-mandatory in lance-graph — relevant for splat-fit feature-flag design) |
| 356 | +- **Iron rules:** `I-NOISE-FLOOR-JIRAK` (any threshold derived in downstream registration cites Jirak under weak dependence, not classical Berry-Esseen) |
| 357 | + |
| 358 | +--- |
| 359 | + |
| 360 | +## 9. Sprint window |
| 361 | + |
| 362 | +**P1, sprint 1-2.** Foundation work. **Must land before everything else in the splat-native arc.** |
| 363 | + |
| 364 | +### Sprint 1 |
| 365 | + |
| 366 | +- D-SPLAT-2 implementations on the AVX-512 backend (the lead path, since ndarray's pinned target is x86-64-v4). |
| 367 | +- Reference-implementation parity tests against `nalgebra` + analytical SH closed-form + scipy where available. |
| 368 | +- Initial bench numbers. |
| 369 | + |
| 370 | +### Sprint 2 |
| 371 | + |
| 372 | +- D-SPLAT-2 NEON backend (Apple M-series + Cortex-A78 target). |
| 373 | +- D-SPLAT-2 scalar fallback (CI floor for non-AVX-512 / non-NEON CI runners). |
| 374 | +- Three-backend parity test green. |
| 375 | +- Bench gates verified on representative hardware. |
| 376 | +- AGENT_LOG entry; PR_ARC entry. |
| 377 | + |
| 378 | +**Acceptance for handoff to P2 (lance-graph splat-fit):** all five primitives green on all three backends; bench targets met; carrier API stable from `lance-graph-contract` D-SPLAT-1/3 (concurrent sprint 1-2). |
| 379 | + |
| 380 | +--- |
| 381 | + |
| 382 | +_End of ndarray companion v1._ |
0 commit comments