-
Notifications
You must be signed in to change notification settings - Fork 19k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Remove redundant CUDA copies after gated_delta_net.
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#23940
opened May 31, 2026 by
gaugarg-nv
Contributor
Loading…
speculative : fix out-of-bounds read in ngram-map on prompt shrink
#23936
opened May 31, 2026 by
o7si
Contributor
Loading…
cuda: reset cuda context after reading memory size
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#23935
opened May 31, 2026 by
0cc4m
Contributor
Loading…
common : retry HTTP requests over IPv4 when IPv6 connect fails
#23933
opened May 31, 2026 by
CptTZ
Loading…
common: use physical core count for --threads -1 default
#23932
opened May 31, 2026 by
Oxygen56
Loading…
ci: remove redundant or duplicate jobs
devops
improvements to build systems and github actions
#23927
opened May 31, 2026 by
netrunnereve
Collaborator
Loading…
opencl: fix compiler warnings for non-adreno path
ggml
changes relating to the ggml tensor library for machine learning
OpenCL
Issues specific to the OpenCL backend
server: handle If-None-Match weak ETags
examples
server
#23916
opened May 30, 2026 by
EZForever
Contributor
Loading…
loader: increase async upload staging buffer to 4 MiB
#23915
opened May 30, 2026 by
cl0ckt0wer
Loading…
ci : disable ccache for msvc windows release jobs
devops
improvements to build systems and github actions
#23911
opened May 30, 2026 by
ggerganov
Member
Loading…
cuda: reserve space for quantize kv-cache at startup
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#23907
opened May 30, 2026 by
am17an
Contributor
Loading…
lama-bench : fix default thread count evaluated at static init time
examples
#23905
opened May 30, 2026 by
Iamsujithd
Loading…
fix: VMM pool cuMemSetAccess for ROCm gfx1151 APU
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
vocab: add normalizer.lowercase support to WPM
python
python script changes
#23899
opened May 30, 2026 by
o7si
Contributor
Loading…
llama-cli: fix model params not propagated
examples
#23893
opened May 30, 2026 by
therealkenc
Loading…
docs: update HOWTO-add-model.md [no release]
documentation
Improvements or additions to documentation
#23883
opened May 29, 2026 by
Xarbirus
Contributor
Loading…
metal: template GLU kernels to support f16/f32
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
ggml
changes relating to the ggml tensor library for machine learning
#23882
opened May 29, 2026 by
shrivasshankar
Loading…
ggml-hip: enable -ffast-math for HIP builds
ggml
changes relating to the ggml tensor library for machine learning
#23862
opened May 29, 2026 by
a-huk
Loading…
1 task done
llama: save more VRAM by reserving n_outputs == n_seqs when possible
examples
server
#23861
opened May 29, 2026 by
am17an
Contributor
Loading…
chat: route LiquidAI LFM2.5 through specialized parser
testing
Everything test related
#23856
opened May 29, 2026 by
mattngaw
Loading…
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.