Vamana build optimization set and fp16 support#2264
Open
bkarsin wants to merge 15 commits into
Open
Conversation
(cherry picked from commit 14e36b3)
(cherry picked from commit d8b547b)
(cherry picked from commit 4d970e0)
…L2 comparators. (cherry picked from commit 0746d15)
…e vector in shared memory in the RobustPrune occlusion loop (cherry picked from commit f45fd1b49283434eb4a3017da069ead501e938c3)
…efix sum with cub scan and hoist per-batch reverse-edge allocations (cherry picked from commit 3b8650f5ebea52421f187332a2f6f3bdd599c42e)
…lusion across multiple warps per query (raise occupancy) (cherry picked from commit 2e02f938f97e65ca073daf07397d538432b52867)
… query->existing-edge distances in the RobustPrune merge (avoid recompute) (cherry picked from commit 041d355c585b7f98302d054b80ac287d637a07e4)
…oords in FP16 smem for dim>=512 to raise GreedySearch occupancy (salvage of N8)
…nce (one warp) instead of redundantly on all 128 threads, then broadcast
…ck (4 vs 8) to raise occupancy/MLP on the degree-64 occlusion sweep
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Series of GPU Vamana build performance optimizations that addresses #2178 and #1757. Initial estimates from #1757 were not accurate, so many other optimizations were tried (some abandoned, some successful). This PR includes:
GreedySearch optimizations:
RobustPrune optimizations:
General optimizations:
Together these optimizations give significant speedups across all configs with minimal recall variance compared to the current baseline. I benchmarked performance across a range of synthetic datasets and two real-world datasets. (NOTE: finishing benchmarks and will update tables below once they are all collected).
Synthetic dataset build tests (all 1M vector datasets)
Also tested real-world BIGANN 10M (uint8 128D) and GIST (fp32 960D) datasets: