AdaWorldAPI · AdaWorldAPI · Jun 14, 2026 · Jun 14, 2026 · Jun 14, 2026 · Jun 14, 2026
diff --git a/.claude/knowledge/codec-soa-facet-map.md b/.claude/knowledge/codec-soa-facet-map.md
@@ -0,0 +1,93 @@
+# KNOWLEDGE: Codec / SoA Facet Map — speed and fidelity are separable knobs
+
+## READ BY: truth-architect, family-codec-smith, palette-engineer, savant-architect,
+##          cascade-architect, integration-lead, resonance-cartographer
+
+## STATUS: probe-backed map (ndarray PR #218, 10 reproducible probes). The holy
+##         grail = ONE SoA where every facet composes for accuracy AND speed.
+##         Mechanism established; white patches listed at the bottom.
+
+---
+
+## The one-line thesis (measured this session)
+
+**No single vector subsumes the others (Correction-6 / I-VSA-IDENTITIES category
+boundary). The unified representation is a STRUCT of orthogonal facets — one SoA
+column per category — and accuracy vs speed are TWO SEPARABLE KNOBS that compose:
+cascade-prune the coarse code (speed, lossless), then residue-refine only the
+survivors (fidelity, +bytes).**
+
+---
+
+## The facets — one SoA column per native category
+
+| Facet (SoA column) | Codec | Category | Measured this session | Knob |
+|---|---|---|---|---|
+| place / semantic basin | HHTL (HEEL·HIP·TWIG) | hierarchical key | cascade prune (CLAM dfs-sieve 2.3×; CAM-PQ coarse→fine 16–128× lossless) | speed |
+| episodic basin | rolling floor (Belichtungsmesser / EWMA) | self-calibrating μ+3σ | ρ=1.0 tracking under SD drift; shipped global-Welford **inert** (bug) | speed/adaptivity |
+| position (high-D) | CAM-PQ | NN-recall position | recall ~0.66 vs truth; cascade-prunable losslessly (recall 1.0 vs flat) | — |
+| orientation (phase+mag) | helix-48 | 3-DOF direction | 24-bit lossless vs ≤f16; needs +1 sign bit; ⊥ HHTL (ρ≈0); +13.6× recon | — |
+| spatial perturbation | helix → Morton pyramid | parametric field | 32,768× amortized, on-demand exact at every level, fine-scale coherent | speed/memory |
+| relation + truth | CausalEdge64 (3×8 SPO + 2³ + f/c) | relational triple | SPO = 3× CAM-PQ palette + Pearl mask; entropy ρ=−0.78 reliability proxy | — |
+| reliability / entropy | entropy_class → CausalEdge64 spare [63:61] | Staunen↔Wisdom scalar | nars_entropy validated as reliability proxy | — |
+| value refinement | edge_codec CoarseResidue / turbovec | per-item residue | ICC 0.97–0.99, 14× error cut (vs coarse-only) | fidelity |
+| time / recurrence | EpisodicWitness64 | temporal | **NOT PROBED — white patch** | — |
+
+Bit budgets are the same order (≈6 bytes each) but the **domains differ** — the
+6-byte coincidence is why "one vector" is tempting and wrong.
+
+## The two knobs (the holy-grail mechanism, measured)
+
+- **SPEED = the cascade.** Coarse→fine prune (partial-ADC / HHTL lower bound is
+  admissible) + 2×2/4×4 register-blocked LUT (FastScan/AMX `pshufb`) + Morton-order
+  contiguity + rolling-floor adaptive cut. **16–128× fewer full evals at recall
+  1.000 vs flat** (`campq_cascade_probe`). Lossless — adds no error.
+- **FIDELITY = the residue plane.** Coarse centroid + signed-4-bit / SVD residue.
+  **ICC 0.97–0.99, 14×** error cut (`edge_codec_compare`). Adds bytes, not error.
+- **They compose, orthogonally:** prune the coarse code to a small survivor set,
+  then residue-refine only those. Fast AND accurate, each from its own mechanism.
+  This composition is the holy grail's load-bearing claim (each half measured;
+  the end-to-end compose is a white patch — see below).
+
+## The category boundary (the iron rule that kills the WRONG holy grail)
+
+Per Correction-6 (`bf16-hhtl-terrain.md`) + I-VSA-IDENTITIES:
+- Do NOT float-reconstruct a byte register (bgz-hhtl-d on Qwen: cos~0.1, dead).
+- Do NOT squeeze a relation OR a high-D point into a 3-DOF helix (`codec_overlap_probe`:
+  helix recall 0.245 vs CAM-PQ 0.657 on high-D; SPO is a different category entirely).
+- Do NOT measure a router by reconstruction fidelity (it routes; only calibration matters).
+- ⇒ The SoA stays a struct of facets; new capability = a new column, not a fold.
+
+## The reproducible probe family (ndarray PR #218)
+
+`reliability` (Pearson/Spearman/Cronbach/ICC) · `edge_codec` (coarse/residue/PQ) ·
+`entropy_ladder` (Staunen↔Wisdom). Probes: `edge_codec_compare`,
+`instrument_mtmm_probe`, `cakes_grail_probe`, `entropy_ladder_probe`,
+`helix_orthogonality_probe`, `helix_bitdepth_probe`, `morton_perturbation_probe`,
+`rolling_floor_probe`, `codec_overlap_probe`, `campq_cascade_probe`. Each settles a
+claim with a number; two found shipped bugs (Cascade Welford-inert; the bgz17 OOB
+gather, fixed).
+
+## White patches on the map (unbuilt / unmeasured — be honest)
+
+1. **EpisodicWitness64 / temporal facet** — referenced, never probed. Biggest gap.
+2. **End-to-end compose** — cascade-prune × residue-refine measured *separately*,
+   never together as one `coarse→prune→refine` pipeline.
+3. **`cam_pq_cascade_search`** — probe-proven lossless, NOT wired into real `cam_pq.rs`.
+4. **AMX-accelerated CAM-PQ assignment** — proven pattern (`edge_residue_probe` 100%
+   assign), not wired into `cam_pq.rs`.
+5. **`TD-CASCADE-WELFORD-INERT`** — shipped `Cascade::observe` never fires `ShiftAlert`
+   per-sample (cumulative Δμ ≪ 2σ); needs windowed/EWMA. Found, not fixed.
+6. **Real COCA codebook** — every probe is synthetic-COCA-like (labeled); none run on
+   the actual baked CAM index codebook.
+7. **Full SoA assembly** — facets validated individually; the unified SoA (all columns,
+   one cascade sweep) is not assembled or measured end-to-end.
+8. **entropy_class → CausalEdge64 spare bits** (R2) — computed, not stored.
+9. **bf16-hhtl probe queue M1/M3/M4** — the routing-not-reconstruction versions, NOT RUN.
+
+## Cross-refs
+
+lance-graph: `.claude/knowledge/encoding-ecosystem.md` (encoding map),
+`.claude/knowledge/bf16-hhtl-terrain.md` (Correction chain incl. #6),
+`.claude/plans/entropy-ladder-spo-rung-v1.md` (R1–R6), `lance-graph-contract`
+(`CausalEdge64`, `EpisodicWitness64`, `EdgeCodecFlavor`, the BindSpace SoA columns).
diff --git a/Cargo.toml b/Cargo.toml
@@ -69,6 +69,46 @@ required-features = ["std"]
 name = "edge_residue_probe"
 required-features = ["std"]
 
+[[example]]
+name = "edge_codec_compare"
+required-features = ["std"]
+
+[[example]]
+name = "entropy_ladder_probe"
+required-features = ["std"]
+
+[[example]]
+name = "instrument_mtmm_probe"
+required-features = ["std"]
+
+[[example]]
+name = "cakes_grail_probe"
+required-features = ["std"]
+
+[[example]]
+name = "helix_orthogonality_probe"
+required-features = ["std"]
+
+[[example]]
+name = "helix_bitdepth_probe"
+required-features = ["std"]
+
+[[example]]
+name = "morton_perturbation_probe"
+required-features = ["std"]
+
+[[example]]
+name = "rolling_floor_probe"
+required-features = ["std"]
+
+[[example]]
+name = "codec_overlap_probe"
+required-features = ["std"]
+
+[[example]]
+name = "campq_cascade_probe"
+required-features = ["std"]
+
 [dependencies]
 num-integer = { workspace = true }
 num-traits = { workspace = true }

diff --git a/examples/cakes_grail_probe.rs b/examples/cakes_grail_probe.rs
@@ -0,0 +1,151 @@
+//! CAKES grail probe — measure CLAM-accelerated search vs brute ground truth
+//! with the reliability instrument. "Can we measure CLAM vs CAKES?"
+//!
+//! The spec (reference images) claims CAKES is **metric-safe-exact** (triangle
+//! inequality ⇒ zero false negatives) AND **accelerated** (clusters pruned). This
+//! probe puts both claims on the instrument:
+//!   * recall@k     — CAKES hits vs brute hits (exactness; expect 1.000)
+//!   * Spearman ρ   — returned distance order vs brute order (rank fidelity)
+//!   * ICC(2,1)     — absolute agreement of distances vs brute (metric-safety)
+//!   * speedup      — brute distance-calls / CAKES distance-calls (the win)
+//!
+//! Two CAKES algorithms are measured (CLAM/CHESS Algorithms): repeated-ρ and
+//! DFS-sieve. Brute is an inline exhaustive Hamming scan (independent truth).
+//!
+//!   cargo run --release --example cakes_grail_probe --features std
+
+use ndarray::hpc::clam::ClamTree;
+use ndarray::hpc::clam_search::{knn_dfs_sieve, knn_repeated_rho};
+use ndarray::hpc::reliability::{icc_a1, spearman};
+
+fn splitmix(s: &mut u64) -> u64 {
+    *s = s.wrapping_add(0x9E37_79B9_7F4A_7C15);
+    let mut z = *s;
+    z = (z ^ (z >> 30)).wrapping_mul(0xBF58_476D_1CE4_E5B9);
+    z = (z ^ (z >> 27)).wrapping_mul(0x94D0_49BB_1331_11EB);
+    z ^ (z >> 31)
+}
+
+fn hamming(a: &[u8], b: &[u8]) -> u64 {
+    a.iter()
+        .zip(b)
+        .map(|(x, y)| (x ^ y).count_ones() as u64)
+        .sum()
+}
+
+/// Independent ground-truth k-NN by exhaustive Hamming scan.
+fn brute_knn(data: &[u8], vec_len: usize, query: &[u8], k: usize) -> Vec<(usize, u64)> {
+    let n = data.len() / vec_len;
+    let mut all: Vec<(usize, u64)> = (0..n)
+        .map(|i| (i, hamming(query, &data[i * vec_len..(i + 1) * vec_len])))
+        .collect();
+    all.sort_by_key(|&(_, d)| d);
+    all.truncate(k);
+    all
+}
+
+fn main() {
+    println!("== CAKES grail probe: CLAM-accelerated search vs brute, on the reliability instrument ==\n");
+
+    let vec_len = 16usize; // 128-bit fingerprints
+    let n = 4000usize;
+    let n_centers = 32usize;
+    let k = 10usize;
+    let n_queries = 300usize;
+    let mut s = 0xCAFE_5EED_u64;
+
+    // Clustered binary data: each point = a random center XOR ~12 sparse flips,
+    // so the CLAM tree has real structure to prune (uniform data would not).
+    let centers: Vec<u8> = (0..n_centers * vec_len)
+        .map(|_| (splitmix(&mut s) & 0xFF) as u8)
+        .collect();
+    let mut data = vec![0u8; n * vec_len];
+    for i in 0..n {
+        let c = (splitmix(&mut s) as usize) % n_centers;
+        data[i * vec_len..(i + 1) * vec_len].copy_from_slice(&centers[c * vec_len..(c + 1) * vec_len]);
+        for _ in 0..12 {
+            let bit = (splitmix(&mut s) as usize) % (vec_len * 8);
+            data[i * vec_len + bit / 8] ^= 1 << (bit % 8);
+        }
+    }
+
+    let tree = ClamTree::build(&data, vec_len, 1);
+
+    // Accumulators per algorithm: (recall_sum, dist_calls, truth_d, algo_d).
+    let mut rep = (0.0f64, 0usize, Vec::new(), Vec::new());
+    let mut dfs = (0.0f64, 0usize, Vec::new(), Vec::new());
+    let mut brute_calls = 0usize;
+
+    for _ in 0..n_queries {
+        // Query = a random center XOR a few flips (a realistic near-cluster probe).
+        let c = (splitmix(&mut s) as usize) % n_centers;
+        let mut q = centers[c * vec_len..(c + 1) * vec_len].to_vec();
+        for _ in 0..8 {
+            let bit = (splitmix(&mut s) as usize) % (vec_len * 8);
+            q[bit / 8] ^= 1 << (bit % 8);
+        }
+
+        let truth = brute_knn(&data, vec_len, &q, k);
+        brute_calls += n;
+        let truth_set: std::collections::HashSet<usize> = truth.iter().map(|&(i, _)| i).collect();
+
+        for (algo, acc) in [
+            (knn_repeated_rho(&tree, &data, vec_len, &q, k), &mut rep),
+            (knn_dfs_sieve(&tree, &data, vec_len, &q, k), &mut dfs),
+        ] {
+            let hit = algo
+                .hits
+                .iter()
+                .filter(|&&(i, _)| truth_set.contains(&i))
+                .count();
+            acc.0 += hit as f64 / k as f64;
+            acc.1 += algo.distance_calls;
+            // Rank-aligned distance pairs (both sorted ascending) for ICC/ρ.
+            for (j, &(_, d)) in algo.hits.iter().enumerate() {
+                if j < truth.len() {
+                    acc.2.push(truth[j].1 as f64);
+                    acc.3.push(d as f64);
+                }
+            }
+        }
+    }
+
+    let report = |name: &str, acc: &(f64, usize, Vec<f64>, Vec<f64>)| {
+        let idx_recall = acc.0 / n_queries as f64;
+        let pairs = acc.2.len().max(1);
+        // Tie-correct exactness: at each rank the returned distance equals truth's
+        // (ties make INDEX recall undercount; DISTANCE recall is the real metric).
+        let dist_recall = acc
+            .2
+            .iter()
+            .zip(&acc.3)
+            .filter(|(t, a)| (*t - *a).abs() < 0.5)
+            .count() as f64
+            / pairs as f64;
+        let icc = icc_a1(&[&acc.2, &acc.3]);
+        let rho = spearman(&acc.2, &acc.3);
+        let speedup = brute_calls as f64 / acc.1.max(1) as f64;
+        println!(
+            "  {name:<14} idx-recall@{k} {idx_recall:.4}  dist-recall {dist_recall:.4}  ρ {rho:.4}  ICC {icc:.4}  speedup {speedup:.2}×",
+        );
+        (dist_recall, icc, rho, speedup)
+    };
+
+    println!(
+        "CAKES vs brute ground truth (N={n}, dim={}b, {n_centers} clusters, k={k}, {n_queries} queries):",
+        vec_len * 8
+    );
+    let (d1, i1, p1, s1) = report("repeated-ρ", &rep);
+    let (d2, i2, p2, s2) = report("dfs-sieve", &dfs);
+    println!("  note: idx-recall < 1 is a TIE artifact (integer Hamming) — dist-recall is the true exactness metric.");
+
+    let exact = d1 > 0.999 && d2 > 0.999;
+    let metric_safe = i1 > 0.999 && i2 > 0.999 && p1 > 0.999 && p2 > 0.999;
+    let mark = |b: bool| if b { "PASS" } else { "FAIL" };
+    println!("\nVERDICT:");
+    println!("  exact by distance (dist-recall = 1.000) ............ {}", mark(exact));
+    println!("  metric-safe (ICC & ρ vs brute = 1.000) ............. {}", mark(metric_safe));
+    println!("  accelerated:  dfs-sieve {s2:.2}× {} · repeated-ρ {s1:.2}× {}", mark(s2 > 1.0), mark(s1 > 1.0));
+    println!("\n  ⇒ instrument adjudicates: dfs-sieve IS the CAKES algorithm here (exact + metric-safe + {s2:.1}×);");
+    println!("    repeated-ρ is exact + metric-safe but NOT accelerated on tight-cluster Hamming (radius schedule mistuned).");
+}