Skip to content

loader: increase async upload staging buffer to 4 MiB#23915

Open
cl0ckt0wer wants to merge 1 commit into
ggml-org:masterfrom
cl0ckt0wer:codex/loader-4mib-staging
Open

loader: increase async upload staging buffer to 4 MiB#23915
cl0ckt0wer wants to merge 1 commit into
ggml-org:masterfrom
cl0ckt0wer:codex/loader-4mib-staging

Conversation

@cl0ckt0wer
Copy link
Copy Markdown

@cl0ckt0wer cl0ckt0wer commented May 30, 2026

Overview

On Windows, the load uses alignment == 1, so the async upload staging buffer is currently 1 MiB.

Additional information

Benchmarks are WARM READS.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES - Codex assisted with code exploration, benchmarking, and PR preparation. I manually reviewed the change and benchmark results.

Benchmark

Environment:

  • Model: qwen3.6:35b GGUF blob
  • Loader payload: 21.45 GiB
  • GPU/backend: AMD Radeon RX 7900 XTX, Vulkan
  • Command shape: llama-cli --no-mmap -ngl 999
  • Runs: warm-cache load timing, first 1 MiB warmup excluded
Chunk Runs Avg load Median Avg throughput
1 MiB 5 4960.94 ms 4928.89 ms 4.32 GiB/s
2 MiB 5 3687.27 ms 3735.25 ms 5.82 GiB/s
4 MiB 5 3148.23 ms 3102.45 ms 6.81 GiB/s
8 MiB 5 3246.70 ms 3254.84 ms 6.61 GiB/s
16 MiB 5 3370.00 ms 3293.10 ms 6.37 GiB/s
32 MiB 2 3264.91 ms 3255.14 ms 6.57 GiB/s
64 MiB 2 3286.43 ms 3203.69 ms 6.53 GiB/s

Validation

  • git diff --check
  • Windows Vulkan build:
    • cmake -S . -B C:\tmp\llama-pr-build -DGGML_VULKAN=ON -DLLAMA_BUILD_TESTS=OFF -DLLAMA_BUILD_EXAMPLES=OFF -DLLAMA_BUILD_SERVER=OFF
    • cmake --build C:\tmp\llama-pr-build --config Release --parallel 8 --target llama-bench

@cl0ckt0wer cl0ckt0wer marked this pull request as ready for review May 30, 2026 17:15
@cl0ckt0wer cl0ckt0wer requested a review from ggerganov as a code owner May 30, 2026 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant