From 38122a035c2398d2686df4ee691ca976eeddb62f Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Sat, 13 Jun 2026 19:18:56 +0000
Subject: [PATCH] feat(simd): re-export AMX/VNNI int8 GEMM (matmul_i8_to_i32)
 through simd.rs

Surface hpc::amx_matmul::{matmul_i8_to_i32, amx_available} via the canonical
ndarray::simd::* consumer entry (W1a "all SIMD from ndarray::simd"), std-gated.

This lets a consumer reach the full int8 dispatch ladder -- AMX TDPBUSD tile
(byte-asm, 16384 MAC/instr, Sapphire Rapids+) -> AVX-512 VPDPBUSD -> AVX-VNNI ->
scalar, bit-identical across tiers -- without dipping into hpc::amx_matmul
directly. Additive re-export only; no behaviour change.

Consumed by turbovec's ndarray::simd-routed polyfill scan
(lance-graph-turbovec), which scores TurboQuant as a batched int8 GEMM so the
SIMD/AMX backend selection lives in ndarray, not the consumer.

https://claude.ai/code/session_01D2WSmezQBNC3bUdHuGfGmo
---
 src/simd.rs | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/src/simd.rs b/src/simd.rs
index a648fee7..4176a822 100644
--- a/src/simd.rs
+++ b/src/simd.rs
@@ -570,6 +570,16 @@ pub use crate::hpc::cam_pq::{kmeans, squared_l2};
 
 pub use crate::hpc::heel_f64x8::cosine_f32_to_f64_simd;
 
+// Dispatched integer matmul — the polyfill entry for batched int8 scoring.
+// `matmul_i8_to_i32` runtime-selects AMX `TDPBUSD` tiles (byte-asm, 16384
+// MAC/instr, Sapphire Rapids+) → AVX-512 VPDPBUSD → AVX-VNNI → scalar, and
+// is bit-identical across tiers. Surfaced here so a consumer reaches the
+// whole AMX ladder through the canonical `ndarray::simd::*` import (W1a)
+// without dipping into `crate::hpc::amx_matmul` directly. `amx_available()`
+// exposes the runtime tier check for reporting.
+#[cfg(feature = "std")]
+pub use crate::hpc::amx_matmul::{amx_available, matmul_i8_to_i32};
+
 // Elementwise slice ops — polyfill-dispatched (F32x16/F64x8 chunks + scalar tail).
 #[cfg(feature = "std")]
 pub use crate::simd_ops::{