Per-worker proxy threads for UDF callbacks (#1136)#7
Open
otegami wants to merge 2 commits into
Open
Conversation
10c8fcf to
82dd2aa
Compare
6610a28 to
83bd94e
Compare
Wire the scalar execute path to per-worker proxy threads on DuckDB >= 1.5.0. An init callback registered via duckdb_scalar_function_set_init runs once per worker thread, creates a proxy (allocating its Ruby thread under the GVL through the global executor, since init runs on a non-Ruby thread), and stores it as per-worker state via duckdb_scalar_function_init_set_state. The execute callback retrieves that proxy with duckdb_scalar_function_get_state and dispatches through it via rbduckdb_function_executor_dispatch_via_proxy, so callbacks from different workers run concurrently instead of serializing on the single global executor. DuckDB frees each proxy through rbduckdb_worker_proxy_destroy. The proxy-creating wrapper runs rbduckdb_worker_proxy_create under rb_protect, implementing the raise contract documented on that function: the executor runs callbacks unprotected, so an uncaught raise would longjmp past its done-signaling and block the waiting DuckDB worker forever. On failure the proxy stays NULL and the execute callback falls back to the global executor. On DuckDB < 1.5.0 the init hook is absent and the execute callback keeps using the global executor unchanged. The added test records which Ruby threads run the callback and asserts more than two distinct threads, which the old implementation can never produce (calling thread plus the single global executor), in addition to result correctness. Simultaneity assertions are avoided as scheduler-dependent; sample/issue1136.rb demonstrates the throughput win with a GVL-releasing callback (about 3.8x at SET threads=4 locally).
Wire the table execute path to per-worker proxy threads on DuckDB >= 1.5.0. A local_init callback registered via duckdb_table_function_set_local_init runs once per worker thread, creates a proxy (allocating its Ruby thread under the GVL through the global executor, since local_init runs on a non-Ruby thread), and stores it as thread-local init data via duckdb_init_set_init_data. The execute callback retrieves that proxy with duckdb_function_get_local_init_data and dispatches through it via rbduckdb_function_executor_dispatch_via_proxy, so callbacks from different workers run concurrently instead of serializing on the single global executor. bind and init stay on the global executor. DuckDB frees each proxy through rbduckdb_worker_proxy_destroy. The proxy-creating wrapper runs rbduckdb_worker_proxy_create under rb_protect, implementing the raise contract documented on that function: the executor runs callbacks unprotected, so an uncaught raise would longjmp past its done-signaling and block the waiting DuckDB worker forever. On failure the proxy stays NULL and the execute callback falls back to the global executor. On DuckDB < 1.5.0 the local_init hook is absent and the execute callback keeps using the global executor unchanged. The added test records which Ruby threads run the execute callback and asserts more than two distinct threads, which the old implementation can never produce (calling thread plus the single global executor), in addition to result correctness. Verified to fail against a build without this change. Simultaneity assertions are avoided as scheduler-dependent.
83bd94e to
89f19d8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements per-worker proxy threads for scalar and table function UDF
callbacks (refs suketaGH-1136), so callbacks from different DuckDB worker threads
run concurrently instead of serializing through a single global executor.
This is a personal-fork PR for local verification and integration — the
living picture of the whole feature. It is sent upstream (suketa/ruby-duckdb)
one PR at a time, in order:
Why
With one global executor, callbacks from different workers can never overlap,
even when they release the GVL (e.g. on I/O). One proxy thread per DuckDB
worker lifts exactly that ceiling. Measured with
sample/issue1136.rb(GVL-releasing scalar callback over a 500k-row scan):
The before run caps at 2 threads (calling thread + global executor) no matter
how many workers DuckDB spawns. Pure-CPU callbacks stay bounded by the GVL,
so the win is specific to GVL-releasing UDFs.
Design notes
(non-Ruby thread): route through the worker's own proxy when present, else
fall back to the global executor.
behind
HAVE_DUCKDB_H_GE_V1_5_0(set_init/set_local_initare 1.5.0APIs).
calloc/free(notxcalloc/xfree): DuckDB freesthem from non-Ruby threads. Proxy threads are GC-protected via a global
array.
DuckDB::*) and theduckdb_nativeartifact name are untouched.function_executor.{c,h}->executor.{c,h};the rename was dropped (the name is accurate and matches the
*_function_*file family).Verification
0 failures.
rake compileclean (only the pre-existing-Wshorten-64-to-32warning); RuboCop clean.more than two distinct threads — the global executor structurally caps at
two, so each test fails without its commit (verified red/green against a
proxy-less build).