cuda: enable streaming auto cache (implement recommended_working_set_size) by riccardo-galbani · Pull Request #488 · antirez/ds4

riccardo-galbani · 2026-07-02T21:52:42Z

ds4_backend_supports_streaming_auto_cache() only allowed the SSD
streaming auto cache planner to run under DS4_BACKEND_METAL, or
under DS4_BACKEND_CUDA when built with DS4_ROCM_BUILD — which a
plain make cuda-generic (nvcc) build never defines. As a result,
CUDA users always had to pass --ssd-streaming-cache-experts
explicitly, and ds4_gpu_recommended_working_set_size() in
ds4_cuda.cu was an unimplemented stub returning 0.

This implements the CUDA working set size using cudaMemGetInfo's
total device memory (the closest analogue to Metal's
recommendedMaxWorkingSetSize), and extends the guard so CUDA can use
the same auto cache planner Metal already has.

Tested on a CUDA GPU with 8GB VRAM with DeepSeek V4 Flash (the
project's reference model):

ds4: SSD streaming auto cache budget
ds4: cuda recommends 7.62 GiB working set
ds4: using 80% total for model + cached experts: 6.10 GiB
ds4: non-routed weights: 8.20 GiB
ds4: routed expert size: 6.75 MiB
ds4: cached expert count: 1 (0.01 GiB)

…ache

jhohertz · 2026-07-03T04:58:36Z

This worked for me on a 7800 XT but I also needed some stuff from #461 like these flags: DS4_ROCM_STREAM_FREE_RESERVE_GIB=1 DS4_CUDA_Q8_F16_CACHE_MB=0.

It's not exactly fast, ~33 t/s PP, 5-6 on TG out of the gate. I just wanted to see if I could. :)

cuda: implement recommended_working_set_size, enable streaming auto c…

d7a1433

…ache

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda: enable streaming auto cache (implement recommended_working_set_size)#488

cuda: enable streaming auto cache (implement recommended_working_set_size)#488
riccardo-galbani wants to merge 1 commit into
antirez:mainfrom
riccardo-galbani:fix-cuda-auto-cache-guard

riccardo-galbani commented Jul 2, 2026

Uh oh!

jhohertz commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

riccardo-galbani commented Jul 2, 2026

Uh oh!

jhohertz commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants