cuda: reset cuda context after reading memory size by 0cc4m · Pull Request #23935 · ggml-org/llama.cpp

0cc4m · 2026-05-31T06:37:18Z

Overview

Alternative to #23604, to allow reading CUDA memory in the router process in #21231 without allocating permanent memory through an initialized CUDA context. Instead of using NVML, this checks before running cudaMemGetInfo whether the context is already initialized. If not, it releases the context after the call.

I tried ref-counting as well as suggested in #23604 (comment), but that is harder to get right and introduces more edge cases.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES

0cc4m added 2 commits May 30, 2026 12:28

cuda: reset device in get_memory function if no backend is active

e21ad06

use cuDevicePrimaryCtxGetState instead of active counter

a182b35

0cc4m requested a review from a team as a code owner May 31, 2026 06:37

github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda: reset cuda context after reading memory size#23935

cuda: reset cuda context after reading memory size#23935
0cc4m wants to merge 2 commits into
masterfrom
0cc4m/cuda-get-memory-device-reset

0cc4m commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0cc4m commented May 31, 2026

Overview

Requirements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant