[FEAT][STUBGEN] Add Rust code generation backend#609
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a Rust code generator backend to tvm-ffi-stubgen, enabling the generation of Rust object bindings from C++ reflection metadata. It restructures the tool with a pluggable generator architecture, updates CMake and CLI configurations, adds comprehensive documentation, and provides an end-to-end example. Additionally, the tvm-ffi Rust runtime is updated to support passing container types as arguments. The review feedback highlights several critical improvements: using rpath linker flags in build.rs for runtime library discovery, handling forbidden Rust keywords (like self and super) by appending an underscore since they cannot be raw identifiers, adding a null check to inc_ref_raw_object to prevent undefined behavior, and caching resolved FFI functions with std::sync::OnceLock to eliminate performance bottlenecks during method invocations.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| fn update_runtime_library_env(lib_dir: &str) { | ||
| let os_env_var = match env::var("CARGO_CFG_TARGET_OS").as_deref() { | ||
| Ok("windows") => "PATH", | ||
| Ok("macos") => "DYLD_LIBRARY_PATH", | ||
| Ok("linux") => "LD_LIBRARY_PATH", | ||
| _ => return, | ||
| }; | ||
| let current_val = env::var(os_env_var).unwrap_or_default(); | ||
| let separator = if os_env_var == "PATH" { ";" } else { ":" }; | ||
| let new_val = if current_val.is_empty() { | ||
| lib_dir.to_string() | ||
| } else { | ||
| format!("{current_val}{separator}{lib_dir}") | ||
| }; | ||
| println!("cargo:rustc-env={os_env_var}={new_val}"); | ||
| } |
There was a problem hiding this comment.
Using cargo:rustc-env only sets the environment variable for the compiler (and makes it available via the env! macro) during the compilation of the current crate. It does not set or update the environment variable for the runtime execution of the compiled binary (e.g., when running cargo run --example demo).
To make the dynamic library discoverable at runtime without requiring the user to manually set LD_LIBRARY_PATH/DYLD_LIBRARY_PATH, you should pass rpath linker flags instead.
fn update_runtime_library_env(lib_dir: &str) {
match env::var("CARGO_CFG_TARGET_OS").as_deref() {
Ok("linux") => println!("cargo:rustc-link-arg=-Wl,-rpath,{lib_dir}"),
Ok("macos") => println!("cargo:rustc-link-arg=-Wl,-rpath,{lib_dir}"),
_ => {}
}
}| def _rust_ident(name: str) -> str: | ||
| """Make ``name`` a usable Rust identifier (raw-escape keywords).""" | ||
| if name in C.RUST_KEYWORDS and name not in C.RUST_RAW_IDENT_FORBIDDEN: | ||
| return f"r#{name}" | ||
| return name |
There was a problem hiding this comment.
If a C++ field or parameter is named after one of the forbidden Rust keywords (such as self, Self, super, or crate), returning the name as-is will result in invalid Rust code (e.g., pub self: i64), which fails to compile because these keywords cannot be used as raw identifiers. We should handle these forbidden keywords by appending an underscore (e.g., self_) or using another renaming scheme to ensure the generated code is valid.
| def _rust_ident(name: str) -> str: | |
| """Make ``name`` a usable Rust identifier (raw-escape keywords).""" | |
| if name in C.RUST_KEYWORDS and name not in C.RUST_RAW_IDENT_FORBIDDEN: | |
| return f"r#{name}" | |
| return name | |
| def _rust_ident(name: str) -> str: | |
| """Make ``name`` a usable Rust identifier (raw-escape keywords).""" | |
| if name in C.RUST_RAW_IDENT_FORBIDDEN: | |
| return f"{name}_" | |
| if name in C.RUST_KEYWORDS: | |
| return f"r#{name}" | |
| return name |
| #[inline] | ||
| pub unsafe fn inc_ref_raw_object(handle: *mut TVMFFIObject) { | ||
| unsafe_::inc_ref(handle) | ||
| } |
There was a problem hiding this comment.
If handle is null, dereferencing it inside unsafe_::inc_ref will cause undefined behavior (segmentation fault). To make this API more robust and prevent crashes when dealing with potentially null raw pointers (e.g., from nullable object references), we should add a null check before calling inc_ref.
| #[inline] | |
| pub unsafe fn inc_ref_raw_object(handle: *mut TVMFFIObject) { | |
| unsafe_::inc_ref(handle) | |
| } | |
| #[inline] | |
| pub unsafe fn inc_ref_raw_object(handle: *mut TVMFFIObject) { | |
| if !handle.is_null() { | |
| unsafe_::inc_ref(handle) | |
| } | |
| } |
| if method.is_member or params: | ||
| _use(self.imports, "tvm_ffi::AnyView") | ||
| packed = _packed_args_expr(params, method.is_member) | ||
| getter = f' let f = get_type_method({self.obj_struct}::TYPE_KEY, "{ffi_name}")?;' |
There was a problem hiding this comment.
Calling get_type_method on every single method invocation is a major performance bottleneck because it performs a global Mutex lock, a C FFI call (TVMFFIGetTypeInfo), and a string comparison loop over all methods of the type. Since the reflected methods of a registered type do not change at runtime, we can cache the resolved tvm_ffi::Function using std::sync::OnceLock inside the generated method to make subsequent calls extremely fast.
| getter = f' let f = get_type_method({self.obj_struct}::TYPE_KEY, "{ffi_name}")?;' | |
| getter = ( | |
| f' static F: std::sync::OnceLock<tvm_ffi::Function> = std::sync::OnceLock::new();\n' | |
| f' let f = F.get_or_init(|| get_type_method({self.obj_struct}::TYPE_KEY, "{ffi_name}").unwrap());' | |
| ) |
|
thanks @Seven-Streams some quick notes:
|
6568420 to
e17bfcd
Compare
e17bfcd to
3076743
Compare
f251c6c to
401b50d
Compare
9ef6b82 to
7fdf6ce
Compare
Signed-off-by: yuchuan <yuchuan.7streams@gmail.com>
7fdf6ce to
bba0638
Compare
Summary
TVM-FFI provides an efficient, easy-to-use mechanism for exposing C++ classes to Rust. Objects share the same memory representation, so Rust code can directly access objects created in C++. However, users have to hand-write the Rust definition of each registered object to avoid memory layout and alignment mismatches between the two sides. To eliminate this manual work,
tvm-ffi-stubgensupports generating Rust code directly.Key Changes
Example
Given the following C++ definition:
The Rust stub generator produces Rust wrappers for the reflected objects and methods. The object has the same memory layout as its C++ counterpart. Users can create new objects on the Rust side and call the methods defined in C++:
IntPairObjmirrors the C++ memory layout exactly, so Rust code can directly access objects created in C++ and vice versa.IntPairis the reference type that owns an allocation of it; fields are read throughDeref(and written throughDerefMut, since the class declares_type_mutable = true), andsumcalls into the C++ implementation through the FFI.Builder-style Construction
Construction is fully Rust-native -- no FFI call is involved -- and uniform: a
nullary
ffi_new()opens the builder; every field is set through itslike-named consuming setter, and
build()finishes the chain:ffi_new() -> IntPairBuilder: Opens the builder. A field with arefl::default_value(herescale) starts prefilled with its default, rendered as a Rust literal at stub-generation time; every other field starts unset.a(..)/b(..)/scale(..): One consuming setter per field. Setting a defaulted field overrides its default.build() -> Result<IntPair>: Validates and allocates: returns an error if a field without a default is still unset (theerrcase above), otherwise wraps the assembled value inObjectArcand returns the reference type. This is the endpoint touse in ordinary code.
build_obj() -> Result<IntPairObj>: Performs the same validation and assembly asbuild()--build()in fact delegates to it -- but stops at the bare, unallocated struct value. It exists for inheritance: a C++ class deriving fromIntPairembedsIntPairObjas its first field, and the derived type's generated builder gains abase(..)setter that takes exactly this value:When
baseis left unset, the derivedbuild()falls back to default-constructing the parent through its all-default builder. This succeeds silently when every parent field has a default; forIntPairit would fail with an error namingbase, sinceaandbcarry no default.The builder deliberately bypasses any C++ constructor logic (it never runs
IntPairObj's C++ constructor); users who need the faithful C++ semantics can hand-write anewconstructor (outside the generated markers) on top ofthe builder.
Testing
test_stubgen.py.e2e_test.tar.gz