lama-bench : fix default thread count evaluated at static init time#23905
lama-bench : fix default thread count evaluated at static init time#23905Iamsujithd wants to merge 1 commit into
Conversation
The n_threads field in cmd_params_defaults is initialized via a call to common_cpu_get_num_math() at static initialization (file scope), before main() runs. On Linux systems with NUMA nodes, when the process CPU affinity is set after startup (e.g. via numactl or the system scheduler), common_cpu_get_num_math() reads the wrong CPU affinity mask at static init time and returns an incorrect thread count for that run. Fix: call common_cpu_get_num_math() at runtime inside get_cmd_params() when no --threads argument is provided by the user, instead of copying the statically-evaluated value from cmd_params_defaults. Also update print_usage() to reflect that the default is computed dynamically rather than showing a hard-coded number. Fixes: ggml-org#17611
|
Hi @Iamsujithd, thanks for your contribution! Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:
Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below. |
|
Hi! I used an AI coding assistant (Google Antigravity) to help trace the static initialization root cause of Issue #17611 and write the initial description. However, the fix itself is a simple 2-line runtime change that I have reviewed, compiled, and verified. I understand the code completely: it moves the common_cpu_get_num_math() call from static initialization time into the runtime get_cmd_params() function so that NUMA CPU affinity setup (like numactl) is correctly captured at runtime instead of static startup. I have simplified the PR description below to be brief and direct. Please let me know if you have any questions! |
Title: llama-bench : fix default thread count evaluated at static init time
Description: Fixes #17611.
Moves the default thread count evaluation common_cpu_get_num_math() from static initialization time to runtime inside get_cmd_params(). This ensures that on Linux systems with NUMA nodes, any CPU affinity configured after process startup (e.g. via numactl) is read correctly.
Also updated print_usage() to show auto as the default thread count.
Verification: Tested on Apple M4 (macOS).