loader: increase async upload staging buffer to 4 MiB by cl0ckt0wer · Pull Request #23915 · ggml-org/llama.cpp

cl0ckt0wer · 2026-05-30T16:29:35Z

Overview

On Windows, the load uses alignment == 1, so the async upload staging buffer is currently 1 MiB.

Benchmarks are WARM READS.

I have read and agree with the contributing guidelines
AI usage disclosure: YES - Codex assisted with code exploration, benchmarking, and PR preparation. I manually reviewed the change and benchmark results.

Environment:

Chunk	Runs	Avg load	Median	Avg throughput
1 MiB	5	4960.94 ms	4928.89 ms	4.32 GiB/s
2 MiB	5	3687.27 ms	3735.25 ms	5.82 GiB/s
4 MiB	5	3148.23 ms	3102.45 ms	6.81 GiB/s
8 MiB	5	3246.70 ms	3254.84 ms	6.61 GiB/s
16 MiB	5	3370.00 ms	3293.10 ms	6.37 GiB/s
32 MiB	2	3264.91 ms	3255.14 ms	6.57 GiB/s
64 MiB	2	3286.43 ms	3203.69 ms	6.53 GiB/s

git diff --check
Windows Vulkan build:
- cmake -S . -B C:\tmp\llama-pr-build -DGGML_VULKAN=ON -DLLAMA_BUILD_TESTS=OFF -DLLAMA_BUILD_EXAMPLES=OFF -DLLAMA_BUILD_SERVER=OFF
- cmake --build C:\tmp\llama-pr-build --config Release --parallel 8 --target llama-bench

loader: increase async upload staging buffer

541834e

cl0ckt0wer marked this pull request as ready for review May 30, 2026 17:15

cl0ckt0wer requested a review from ggerganov as a code owner May 30, 2026 17:15