Skip to content

[FEAT][STUBGEN] Add Rust code generation backend#609

Open
Seven-Streams wants to merge 2 commits into
apache:mainfrom
Seven-Streams:main-dev/2026-06-04/rust_stubgen
Open

[FEAT][STUBGEN] Add Rust code generation backend#609
Seven-Streams wants to merge 2 commits into
apache:mainfrom
Seven-Streams:main-dev/2026-06-04/rust_stubgen

Conversation

@Seven-Streams

@Seven-Streams Seven-Streams commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Summary

TVM-FFI provides an efficient, easy-to-use mechanism for exposing C++ classes to Rust. Objects share the same memory representation, so Rust code can directly access objects created in C++. However, users have to hand-write the Rust definition of each registered object to avoid memory layout and alignment mismatches between the two sides. To eliminate this manual work, tvm-ffi-stubgen supports generating Rust code directly.

Key Changes

  • Generate Rust code for registered C++ classes automatically (via CLI or CMake).
  • Mirror the C++ memory layout exactly in Rust.
  • Provide Rust-native builder-style constructors for registered classes, with reflected default values prefilled and overridable through setters.
  • Expose methods of registered classes through cross-language calls.

Example

Given the following C++ definition:

class IntPairObj: public ffi::Object {
 public:
  int64_t a;
  int64_t b;
  // `scale` carries a reflected default: the generated Rust builder prefills
  // it and exposes a `.scale(..)` setter instead of a required parameter.
  int64_t scale = 1;

  IntPairObj(int64_t a, int64_t b) : a(a), b(b) {}

  int64_t Sum() const { return (a + b) * scale; }

  // All fields are writable, so the generated Rust wrapper gets `DerefMut`.
  static constexpr bool _type_mutable = true;
  TVM_FFI_DECLARE_OBJECT_INFO_FINAL(
      /*type_key=*/"rust_stubgen.IntPair",
      /*class=*/IntPairObj,
      /*parent_class=*/ffi::Object);
};

TVM_FFI_STATIC_INIT_BLOCK() {
  namespace refl = tvm::ffi::reflection;
  refl::ObjectDef<IntPairObj>()
      .def(refl::init<int64_t, int64_t>())
      .def_rw("a", &IntPairObj::a, "the first field")
      .def_rw("b", &IntPairObj::b, "the second field")
      .def_rw("scale", &IntPairObj::scale, refl::init(false), refl::default_value(int64_t{1}),
              "sum multiplier (defaulted -> builder setter in Rust)")
      .def("sum", &IntPairObj::Sum, "(a + b) * scale");
}

The Rust stub generator produces Rust wrappers for the reflected objects and methods. The object has the same memory layout as its C++ counterpart. Users can create new objects on the Rust side and call the methods defined in C++:

   #[repr(C)]
   #[derive(tvm_ffi::derive::Object)]
   #[type_key = "rust_stubgen.IntPair"]
   pub struct IntPairObj {
       base: Object,   // the parent type, embedded as the first field
       pub a: i64,
       pub b: i64,
       pub scale: i64,
   }

   #[repr(C)]
   #[derive(tvm_ffi::derive::ObjectRef, Clone)]
   pub struct IntPair {
       data: ObjectArc<IntPairObj>,
   }

   impl IntPair {
       pub fn ffi_new() -> IntPairBuilder { /* ... */ }
       pub fn sum(&mut self) -> Result<i64> { /* ... */ }
   }

   pub struct IntPairBuilder { /* base + every field */ }

   impl IntPairBuilder {
       pub fn a(mut self, a: i64) -> Self { /* ... */ }
       pub fn b(mut self, b: i64) -> Self { /* ... */ }
       pub fn scale(mut self, scale: i64) -> Self { /* ... */ }
       pub fn build(self) -> Result<IntPair> { /* ... */ }
       pub fn build_obj(self) -> Result<IntPairObj> { /* ... */ }
   }

IntPairObj mirrors the C++ memory layout exactly, so Rust code can directly access objects created in C++ and vice versa. IntPair is the reference type that owns an allocation of it; fields are read through Deref (and written through DerefMut, since the class declares _type_mutable = true), and sum calls into the C++ implementation through the FFI.

Builder-style Construction

Construction is fully Rust-native -- no FFI call is involved -- and uniform: a
nullary ffi_new() opens the builder; every field is set through its
like-named consuming setter, and build() finishes the chain:

let pair = IntPair::ffi_new().a(1).b(2).build()?;             // scale = 1 (default)
let scaled = IntPair::ffi_new().a(1).b(2).scale(10).build()?; // override the default
let err = IntPair::ffi_new().a(1).build();                    // Err: field `b` is not set
  • ffi_new() -> IntPairBuilder : Opens the builder. A field with a refl::default_value (here scale) starts prefilled with its default, rendered as a Rust literal at stub-generation time; every other field starts unset.
  • a(..) / b(..) / scale(..): One consuming setter per field. Setting a defaulted field overrides its default.
  • build() -> Result<IntPair>: Validates and allocates: returns an error if a field without a default is still unset (the err case above), otherwise wraps the assembled value in ObjectArc and returns the reference type. This is the endpoint to
    use in ordinary code.
  • build_obj() -> Result<IntPairObj>: Performs the same validation and assembly as build() -- build() in fact delegates to it -- but stops at the bare, unallocated struct value. It exists for inheritance: a C++ class deriving from IntPair embeds IntPairObj as its first field, and the derived type's generated builder gains a base(..) setter that takes exactly this value:
// for a hypothetical `Derived` extending IntPair with a field `c`
let d = Derived::ffi_new()
    .base(IntPair::ffi_new().a(1).b(2).build_obj()?)
    .c(3)
    .build()?;

When base is left unset, the derived build() falls back to default-constructing the parent through its all-default builder. This succeeds silently when every parent field has a default; for IntPair it would fail with an error naming base, since a and b carry no default.

The builder deliberately bypasses any C++ constructor logic (it never runs IntPairObj's C++ constructor); users who need the faithful C++ semantics can hand-write a new constructor (outside the generated markers) on top of
the builder.

Testing

  • Added unit tests and string-level tests in test_stubgen.py.
  • End-to-end tests are attached in the PR comments.

e2e_test.tar.gz

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Rust code generator backend to tvm-ffi-stubgen, enabling the generation of Rust object bindings from C++ reflection metadata. It restructures the tool with a pluggable generator architecture, updates CMake and CLI configurations, adds comprehensive documentation, and provides an end-to-end example. Additionally, the tvm-ffi Rust runtime is updated to support passing container types as arguments. The review feedback highlights several critical improvements: using rpath linker flags in build.rs for runtime library discovery, handling forbidden Rust keywords (like self and super) by appending an underscore since they cannot be raw identifiers, adding a null check to inc_ref_raw_object to prevent undefined behavior, and caching resolved FFI functions with std::sync::OnceLock to eliminate performance bottlenecks during method invocations.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread examples/rust_stubgen/rust/build.rs Outdated
Comment on lines +29 to +44
fn update_runtime_library_env(lib_dir: &str) {
let os_env_var = match env::var("CARGO_CFG_TARGET_OS").as_deref() {
Ok("windows") => "PATH",
Ok("macos") => "DYLD_LIBRARY_PATH",
Ok("linux") => "LD_LIBRARY_PATH",
_ => return,
};
let current_val = env::var(os_env_var).unwrap_or_default();
let separator = if os_env_var == "PATH" { ";" } else { ":" };
let new_val = if current_val.is_empty() {
lib_dir.to_string()
} else {
format!("{current_val}{separator}{lib_dir}")
};
println!("cargo:rustc-env={os_env_var}={new_val}");
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using cargo:rustc-env only sets the environment variable for the compiler (and makes it available via the env! macro) during the compilation of the current crate. It does not set or update the environment variable for the runtime execution of the compiled binary (e.g., when running cargo run --example demo).

To make the dynamic library discoverable at runtime without requiring the user to manually set LD_LIBRARY_PATH/DYLD_LIBRARY_PATH, you should pass rpath linker flags instead.

fn update_runtime_library_env(lib_dir: &str) {
    match env::var("CARGO_CFG_TARGET_OS").as_deref() {
        Ok("linux") => println!("cargo:rustc-link-arg=-Wl,-rpath,{lib_dir}"),
        Ok("macos") => println!("cargo:rustc-link-arg=-Wl,-rpath,{lib_dir}"),
        _ => {}
    }
}

Comment on lines +236 to +240
def _rust_ident(name: str) -> str:
"""Make ``name`` a usable Rust identifier (raw-escape keywords)."""
if name in C.RUST_KEYWORDS and name not in C.RUST_RAW_IDENT_FORBIDDEN:
return f"r#{name}"
return name

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If a C++ field or parameter is named after one of the forbidden Rust keywords (such as self, Self, super, or crate), returning the name as-is will result in invalid Rust code (e.g., pub self: i64), which fails to compile because these keywords cannot be used as raw identifiers. We should handle these forbidden keywords by appending an underscore (e.g., self_) or using another renaming scheme to ensure the generated code is valid.

Suggested change
def _rust_ident(name: str) -> str:
"""Make ``name`` a usable Rust identifier (raw-escape keywords)."""
if name in C.RUST_KEYWORDS and name not in C.RUST_RAW_IDENT_FORBIDDEN:
return f"r#{name}"
return name
def _rust_ident(name: str) -> str:
"""Make ``name`` a usable Rust identifier (raw-escape keywords)."""
if name in C.RUST_RAW_IDENT_FORBIDDEN:
return f"{name}_"
if name in C.RUST_KEYWORDS:
return f"r#{name}"
return name

Comment on lines +121 to +124
#[inline]
pub unsafe fn inc_ref_raw_object(handle: *mut TVMFFIObject) {
unsafe_::inc_ref(handle)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If handle is null, dereferencing it inside unsafe_::inc_ref will cause undefined behavior (segmentation fault). To make this API more robust and prevent crashes when dealing with potentially null raw pointers (e.g., from nullable object references), we should add a null check before calling inc_ref.

Suggested change
#[inline]
pub unsafe fn inc_ref_raw_object(handle: *mut TVMFFIObject) {
unsafe_::inc_ref(handle)
}
#[inline]
pub unsafe fn inc_ref_raw_object(handle: *mut TVMFFIObject) {
if !handle.is_null() {
unsafe_::inc_ref(handle)
}
}

if method.is_member or params:
_use(self.imports, "tvm_ffi::AnyView")
packed = _packed_args_expr(params, method.is_member)
getter = f' let f = get_type_method({self.obj_struct}::TYPE_KEY, "{ffi_name}")?;'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling get_type_method on every single method invocation is a major performance bottleneck because it performs a global Mutex lock, a C FFI call (TVMFFIGetTypeInfo), and a string comparison loop over all methods of the type. Since the reflected methods of a registered type do not change at runtime, we can cache the resolved tvm_ffi::Function using std::sync::OnceLock inside the generated method to make subsequent calls extremely fast.

Suggested change
getter = f' let f = get_type_method({self.obj_struct}::TYPE_KEY, "{ffi_name}")?;'
getter = (
f' static F: std::sync::OnceLock<tvm_ffi::Function> = std::sync::OnceLock::new();\n'
f' let f = F.get_or_init(|| get_type_method({self.obj_struct}::TYPE_KEY, "{ffi_name}").unwrap());'
)

@tqchen

tqchen commented Jun 6, 2026

Copy link
Copy Markdown
Member

thanks @Seven-Streams some quick notes:

  • i think we should be able to look up type_index = lookup_type_index(type_key); per type which makes it faster than the global hash map
  • We might be able to update https://github.com/apache/tvm-ffi/blob/main/rust/tvm-ffi-macros/src/object_macros.rs to generate some type index fetch related bolier plate
  • same remark likely applies to ctor since we could have a global static one lock for ctor per type if needed
    • because we are in rust, likely we don't necessarily need the ffi ctor, which is slower, instead, directly construct the object via rust API and allocation would be preferred

@Seven-Streams Seven-Streams force-pushed the main-dev/2026-06-04/rust_stubgen branch from 6568420 to e17bfcd Compare June 6, 2026 21:25
@Seven-Streams Seven-Streams force-pushed the main-dev/2026-06-04/rust_stubgen branch from e17bfcd to 3076743 Compare June 10, 2026 19:03
@Seven-Streams Seven-Streams marked this pull request as draft June 10, 2026 20:19
@Seven-Streams Seven-Streams force-pushed the main-dev/2026-06-04/rust_stubgen branch 24 times, most recently from f251c6c to 401b50d Compare June 12, 2026 17:08
@Seven-Streams Seven-Streams force-pushed the main-dev/2026-06-04/rust_stubgen branch 2 times, most recently from 9ef6b82 to 7fdf6ce Compare June 12, 2026 17:51
Signed-off-by: yuchuan <yuchuan.7streams@gmail.com>
@Seven-Streams Seven-Streams force-pushed the main-dev/2026-06-04/rust_stubgen branch from 7fdf6ce to bba0638 Compare June 12, 2026 18:05
@Seven-Streams Seven-Streams marked this pull request as ready for review June 12, 2026 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants