Skip to content

[FEAT][Python] Tie Python wrapper lifetime to underlying C++ FFI object#593

Open
cyx-6 wants to merge 1 commit into
apache:mainfrom
cyx-6:pyobject
Open

[FEAT][Python] Tie Python wrapper lifetime to underlying C++ FFI object#593
cyx-6 wants to merge 1 commit into
apache:mainfrom
cyx-6:pyobject

Conversation

@cyx-6

@cyx-6 cyx-6 commented May 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Bind one Python wrapper to one C++ FFI object ("chandle") for the object's lifetime, so identity is stable: a.x is a.x, id(a.x) stable across drop+refetch, and f(x) is x for FFI returns. Works on both the GIL and free-threaded (3.14t) builds. Implements the PyObjectTying design.

Before:

a = MyClass(Inner(...))
assert a.x is a.x          # False — fresh wrapper per attribute access
assert id(a.x) == id(a.x)  # flaky

After: both hold, and identity is preserved across a wrapper death-and-revive cycle whenever the C++ object outlives the wrapper.

Allocation layout

A two-layer custom-allocator hook lives in core libtvm_ffi:

  • TVMFFIObjectAllocHeader { delete_space } — 8-byte base header preceding every Object body.
  • TVMFFICustomAllocator { allocate, context } — process-wide registry; libtvm_ffi installs a builtin default at registry init so TVMFFIGetCustomAllocator never returns NULL.
  • TVMFFIGetCustomAllocator / TVMFFISetCustomAllocator — frontends override the default at module load.

The Python Cython module overrides the global default with TVMFFIPyAllocate, which prepends a 16-byte PyCustomAllocHeader encoding the wrapper binding. The Rust crate (ObjectArc::new[_with_extra_items]) and the Python-defined types in extra/dataclass.cc route through the same registry, so layout and lifetime semantics are uniform across frontends.

Binding state machine

State is concentrated in python/tvm_ffi/cython/tvm_ffi_python_helpers.h. Each header word (tagged_pyobj) holds the wrapper back-pointer plus low tag bits, giving a four-state machine:

State Meaning
Detached No wrapper bound
Active Wrapper bound and owns a +1 on the chandle
Inactive Wrapper dead, but its allocation is cached for in-place revival
InTransit A dealloc is mid-flight (handshake bit)

Every FFI return funnels through make_ret_object (C++ entry TVMFFIPyMakeRetObject), which returns the canonical wrapper for a chandle when one exists, reviving an Inactive cached allocation in place so a re-fetched wrapper keeps a stable id() at the same address. The cache-vs-free handshake spans three slots — a pre-bump tp_dealloc opens it, tp_free settles it, and the C++ weak deleter (TVMFFIPyDeleteSpace) reclaims the block — coordinated so a chandle that outlives its wrapper keeps the cached bytes, and a genuinely dead chandle frees them exactly once.

Frontend-allocation is detected by delete_space pointer comparison (TVMFFIPyIsCanonical), avoiding a flag bit on TVMFFIObject. Chandles created before the Python allocator is registered (statically-initialized global functions in libtvm_ffi.so) carry only the base header and are skipped.

Free-threaded build

The same tying runs on Py_GIL_DISABLED. All free-threaded machinery is behind #ifdef Py_GIL_DISABLED; the GIL build is byte-identical (verified by a function-body-map diff).

  • Per-word spin-lock. The tagged_pyobj word doubles as a spin-lock (a Locked tag bit, __atomic_* CAS) serializing every binding transition. The lock is held only across short, park-free word/header edits; alloc/revival run lock-released. The back-off (TVMFFIPyLockYield) detaches the thread state so a concurrent stop-the-world GC is never starved.
  • The revival UAF, and the fix. Cython's generated tp_dealloc bumps the wrapper refcount before running __dealloc__. On free-threaded builds that bump makes PyUnstable_TryIncRef spuriously succeed on a wrapper being torn down, so a concurrent make_ret Active-hit could revive a corpse (borrowed-ref UAF). The fix replaces Cython's tp_dealloc on each cdef CObject-family carrier with one hand-built TVMFFIPyTpDeallocSlot that runs the binding transition (bracketed by PyErr_Get/SetRaisedException) before any bump, then — stripped of the now-dead bump — GC-untrack (guarded by GC-ness), a generic __dict__ clear (guarded by a real tp_dictoffset), and tp_free. A plain carrier runs exactly transition; tp_free; Function fires both guards; both are faithful to Cython's originals minus the bump.
  • Total carrier coverage. The six carriers (CObject, CContainerBase, OpaquePyObject, Error, Tensor, Function) are a closed compile-time set, each wrapped once at init via TVMFFIPyWrapDealloc; every heap subtype is covered for free through subtype_dealloc's base-walk to the nearest carrier. An import-time layout guard Py_FatalErrors if a non-GC carrier ever gains a __dict__, turning silent owned-member drift into a loud failure.
  • Active-hit revival uses PyUnstable_TryIncRef / EnableTryIncRef to close the borrowed-read UAF (the CPython "weakmap" pattern, whose tp_dealloc support requirement this implements).

Verified safe through CPython 3.15/3.16: the PyUnstable_TryIncRef/PyMutex contract is unchanged, and the 3.15 rule that managed dict/weakref implies HAVE_GC does not affect us — no carrier uses a managed dict.

Robustness

  • align>8 double-free fix (latent). The builtin allocator offset the body by round_up(header, alignment) but free subtracted a fixed sizeof(header), symmetric only for alignment ≤ 8. A 16-aligned reflection dataclass (alignof(max_align_t)=16) was freed 8 bytes early. Both sides now use a fixed alignof(max_align_t) body offset. It was masked on the GIL build because the symmetric custom allocator shadows the builtin one for all Python objects.
  • Shutdown guard. TVMFFIPyMarkPythonFinalizing (wired to atexit) flips an atomic flag read before any PyGILState_Ensure, to avoid acquiring the GIL after Python finalization has begun. Wrapper bytes on chandles still alive at exit are intentionally leaked — the process is exiting.

_move() semantics

Under universal cache-on, callback args alias the caller's wrapper and FFI returns of the same chandle alias the caller's wrapper (one wrapper, one chandle ref). _move() is kept as an API: the rvalue setter on either side eager-detaches the canonical binding before the C++ AnyViewToOwnedAny transfer nulls the source chandle, so a downstream cache lookup never sees a stale back-pointer.

Test plan

  • New tests/python/test_pyobject_tying.py covers Active/Inactive/InTransit transitions, cache-on aliasing, _move() under cache-on, pickle stress, threading stress, GC integration, multi-chandle isolation, the weakref limitation, free-threaded concurrent carrier-type stress (Function/Error/multi-level-heap), and OpaquePyObject roundtrip/leak.
  • test_function.py::test_rvalue_ref refactored for cache-on aliasing semantics.
  • Free-threaded (3.14t): crash oracle clean, full suite 2333 passed, tying tests 43/43.
  • GIL (3.13): full suite 2358 passed, tying tests 43/43.
  • Rust suite passes.
  • CI: lint, clang-tidy, doc, C++/Python/Rust on Linux x86_64 + aarch64, macOS arm64, Windows AMD64.

Out-of-scope follow-ups

  • Name-keyed dict cache for _get_global_func to deliver id-stability for static-init Functions (whose chandles predate the Python allocator and so carry only the base header). Tracked as a TODO in function.pxi::_get_global_func.

@cyx-6 cyx-6 marked this pull request as draft May 19, 2026 10:32

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces 'PyObjectTying,' a mechanism to link the lifetime of Python wrappers to their underlying C++ FFI objects. By prepending a PyCustomizeAllocHeader to object allocations, the implementation enables stable object identity and address preservation across wrapper finalization cycles. The changes include modifications to the C++ SimpleObjAllocator, Cython bindings to utilize tp_finalize, and updates to the reflection system. Feedback from the review highlights a critical thread-safety issue in the callback used to free cached wrapper memory, which lacks GIL protection, and a potential memory leak when installing handles if a previous phantom wrapper is not correctly released.

Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread python/tvm_ffi/cython/object.pxi Outdated
@cyx-6 cyx-6 force-pushed the pyobject branch 2 times, most recently from 435caeb to 1b751b2 Compare May 24, 2026 13:03
Comment thread include/tvm/ffi/c_api.h Outdated
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread include/tvm/ffi/c_api.h Outdated
Comment thread include/tvm/ffi/c_api.h
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread src/ffi/custom_allocator.cc Outdated
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
@cyx-6 cyx-6 marked this pull request as ready for review May 28, 2026 18:10
Comment thread python/tvm_ffi/cython/object.pxi Outdated
if self.chandle != NULL:
CHECK_CALL(TVMFFIObjectDecRef(self.chandle))
self.chandle = NULL
def __del__(self):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's double check and test it out, making sure this __del__ method will be properly invoked. I had some really negative experience with overloading __del__ because it could possibly prevent Python GC from recycling memory in certain cases. Not 100% sure if it applies to Cython's cdef class's __del__ method though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a tp_finalize-free version is updated. we are now using tp_alloc + tp_free as alternative.

Comment thread python/tvm_ffi/cython/object.pxi Outdated
cls = make_fallback_cls_for_type_index(type_index)
obj = cls.__new__(cls)
(<CObject>obj).chandle = result.v_obj
TVMFFIPyAttachPyObject(result.v_obj, <PyObject*>obj)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment, this is a new class, so it must be in detached state, attach it

dlpack.deleter(dlpack)
pass
# default return the tensor
# default return the tensor.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: we don't try to attach py object for tensor types as it may appear in callback and there maybe translation depending on the context

// ----------
// I1. When a PyObject goes out of scope (no Python var refers to it),
// its +1 on chandle is always released.
// I2. When a chandle is destroyed, its cached PyObject (if any) is

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a chandle is destroyed, we can only be in Detached or Inactive state, if it is in Inactive state, the PyObject is reclaimed

// ``self.chandle`` and DecRef.
// Detached -> Detached : ``self`` is not the canonical wrapper for
// this chandle (eager-detach via move already
// cleared the binding). Just DecRef.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need clarification here, since Detached menas chandle == NULL

*/
TVM_FFI_INLINE void TVMFFIPyTPFinalize(void** ptr_to_chandle, PyObject* wrapper) {
void* chandle = *ptr_to_chandle;
if (chandle == nullptr) return;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment Detached

*
* Detached -> Detached: ``cached_pyobj`` was already cleared by an
* eager move, or chandle was not allocated through the Python custom
* allocator. Just null ``wrapper.chandle`` and DecRef.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to DecRef here? just return and pass to dealloc

@junrushao

Copy link
Copy Markdown
Member

Btw - does this approach also work for String and Bytes? I’m asking because I wasn’t sure if small string optimization is compatible with this design

@cyx-6

cyx-6 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

@junrushao Not working for String and Bytes. For small ones, they are not heap-allocated. For large ones, they are not sub-classes of CObject. Both are out of the scope of this PR.

TVMFFIObjectAllocHeader base;
};

constexpr size_t kPyHeaderOffset = sizeof(PyCustomAllocHeader); // 16

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try inline

* need ``PyObject_InitVar(.., nitems)`` here and a matching basicsize check.
*/
inline PyObject* TVMFFIPyTpAlloc(PyTypeObject* type, Py_ssize_t nitems) {
void* blk = TVMFFIPyTlsReviveSlot();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use std::swap

* count). Per-thread -> free-threading safe; strict read-and-clear.
*/
inline void*& TVMFFIPyTlsReviveSlot() {
static thread_local void* slot = nullptr;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyObject*

* ``TVMFFIPyTpAlloc`` (which is handed only ``type`` and an item
* count). Per-thread -> free-threading safe; strict read-and-clear.
*/
inline void*& TVMFFIPyTlsReviveSlot() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLS

// stable wrapper class for the life of the process, and ``make_ret_object``
// derives both the cached allocation (from the chandle) and ``type`` (= cls for that
// same type_index) from the very same chandle on the revival path.
assert(type->tp_itemsize == 0 &&

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static PyObject* custom_tp_alloc(PyTypeObject *type, Py_ssize_t nitems) {
    PyCustomizeAllocHeader *header = get_current_header();

    // Hot Path: Reuse
    if (header && header->cached_mem) {
        PyObject *obj = header->cached_mem;
        
        // Inline-friendly initialization
        PyObject_VarInit((PyVarObject *)obj, type, nitems);
        
        // Only memset the delta between header and body
        // This is where your real time is spent
        size_t head_sz = (nitems > 0 || type->tp_itemsize > 0) ? sizeof(PyVarObject) : sizeof(PyObject);
        memset((char*)obj + head_sz, 0, type->tp_basicsize - head_sz);

        if (PyType_IS_GC(type)) {
            PyObject_GC_Track(obj);
        }
        return obj;
    }

    // Cold Path: Slow allocation
    return PyType_GenericAlloc(type, nitems);
}

* on, GC or not.
*/
inline void TVMFFIPyTpFree(void* self) {
if (reinterpret_cast<TVMFFIPyCObjectHead*>(self)->chandle == TVM_FFI_PY_INACTIVE_SENTINEL) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a runtime offset check test case

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void** TVMFFICyObjectGetCHandlePtr(PyObject* ptr)


cdef void** TVMFFICyObjectGetCHandlePtr(PyObject* ptr):
   return &((<CObject>ptr).chandle)

TVM_FFI_INLINE void TVMFFIPyTpDealloc(void** ptr_to_chandle, PyObject* wrapper) {
void* chandle = *ptr_to_chandle;
// NULL: already detached/moved. INACTIVE_SENTINEL: an inactive cached
// allocation must never re-enter __dealloc__ while inactive (defensive).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move TVMFFIPyTpDealloc after TVMFFIPyTpAlloc, TVM_FFI_PY_INACTIVE_SENTINEL/TVMFFIPyTpDealloc/ TVMFFIPyTpFree/ sit together, document the overall logic in a section, TVM_FFI_PY_INACTIVE_SENTINEL=> TVM_FFI_PY_CHANDLE_INACTIVE_SENTINEL

* header. Just null ``wrapper.chandle`` and DecRef.
*/
TVM_FFI_INLINE void TVMFFIPyTpDealloc(void** ptr_to_chandle, PyObject* wrapper) {
void* chandle = *ptr_to_chandle;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document, after dealloc, there are two possibilities:

  • chandle is set to CHANDLE_INACTIVE_SENTINEL, tpfree will not be triggered
  • other cases, normal free

* time); the only cost is no stable-id-across-drop.
*/
TVM_FFI_INLINE bool TVMFFIPyIsInactiveEligible(PyObject* wrapper) {
PyTypeObject* tp = Py_TYPE(wrapper);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cross check

*
* The (return, ``*out``) pair has four combinations; one cannot occur:
* (true, non-NULL) : Active -- the live canonical wrapper.
* (false, NULL) : Detached / non-Python -- no wrapper bound.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document

\return whether returned PyObject state is active

cyx-6 added a commit to cyx-6/tvm-ffi that referenced this pull request Jun 4, 2026
Apply Tianqi's PR apache#593 review feedback to the pyobject-tying layer:

- Drop the kPyHeaderOffset constant; use sizeof(PyCustomAllocHeader)
  directly at every site (and in the static_asserts).
- Rename TVMFFIPyTlsReviveSlot -> TVMFFIPyTLSReviveSlot and make its sole
  access primitive a swap-exchange (store next, return prior). The slot is
  typed PyObject*, and taking the block is TVMFFIPyTLSReviveSlot(nullptr) --
  read-and-clear in one step, no separate clear to forget.
- Document \param/\return on TVMFFIPyTryGetAttachedPyObject.
- Reorder the slot functions into execution order (TpAlloc -> TpDealloc ->
  TpFree) with the DEALLOC_TRANSIT marker declared between them.
- Replace the hand-rolled TVMFFIPyCObjectHead layout mirror with a Cython
  cdef public accessor TVMFFICyObjectGetCHandlePtr, so the chandle field
  offset comes from the real cdef class layout instead of a hardcoded one.

Build clean; full Python suite passes (2336 passed, 18 skipped, 2 xfailed).
cyx-6 added a commit to cyx-6/tvm-ffi that referenced this pull request Jun 4, 2026
Resolve the lint job failures on PR apache#593:

- ruff-check: add one-line docstrings to test methods (D102), hoist
  threading/weakref imports to top-level (PLC0415).
- ty: annotate the rvalue-move callback params as `Any` instead of
  `object` so `x._move()` resolves (matches test_function.py).
- ruff-format / clang-format / cmake-format: apply formatter reflow.
- taplo-format: expand the cibuildwheel `build` array to multi-line in
  pyproject.toml (applied from CI log; taplo has no wheel for this
  platform).

Reproduced locally via pre-commit (taplo-format skipped).
@junrushao

Copy link
Copy Markdown
Member

Any updates on this issue?

@cyx-6 cyx-6 force-pushed the pyobject branch 2 times, most recently from 88b235f to 38ca72d Compare June 17, 2026 08:12
Make `a.x is a.x`, `id(a.x)` stable across drop+refetch, and `f(x) is x`
for FFI returns, on both the GIL and free-threaded (3.14t) builds, by
binding one Python wrapper to one C++ FFI object ("chandle") for the
object's lifetime. Implements the PyObjectTying design.

## Allocation layout

A two-layer custom-allocator hook lives in core libtvm_ffi.so:
`TVMFFIObjectAllocHeader { delete_space }`, `TVMFFICustomAllocator
{ allocate, context }`, plus `TVMFFIGetCustomAllocator` /
`TVMFFISetCustomAllocator`. libtvm_ffi installs a builtin default at
registry init, so every Object always carries the 8-byte base header and
`TVMFFIGetCustomAllocator` never returns NULL. The Python Cython module
overrides the global default at module load with `TVMFFIPyAllocate`,
which prepends a 16-byte `PyCustomAllocHeader` encoding the wrapper
binding. `make_object` / `make_inplace_array_object` / `PyClassDeleter`
and the Rust `ObjectArc::new[_with_extra_items]` paths all funnel through
the registry, so Python-defined types and Rust-allocated objects share
the layout and lifetime semantics.

## Binding state machine

State is concentrated in `tvm_ffi_python_helpers.h`. Each header word
(`tagged_pyobj`) holds the wrapper back-pointer plus low tag bits, giving
a four-state machine:

  Detached:  no wrapper bound
  Active:    wrapper bound and owns a +1 on the chandle
  Inactive:  wrapper dead but its allocation is cached for in-place revival
  InTransit: a dealloc is mid-flight (handshake bit)

Every FFI return funnels through `make_ret_object` (C++ entry
`TVMFFIPyMakeRetObject`), which returns the canonical wrapper for a
chandle when one exists, reviving an Inactive cached allocation in place
so a re-fetched wrapper keeps a stable `id()` at the same address. The
cache-vs-free handshake spans three slots -- a pre-bump `tp_dealloc`
opens it, `tp_free` settles it, and the C++ weak deleter
(`TVMFFIPyDeleteSpace`) reclaims the block -- coordinated so a chandle
that outlives its wrapper keeps the cached bytes, and a genuinely dead
chandle frees them exactly once. Frontend-allocation is detected by
`delete_space` pointer comparison (`TVMFFIPyIsCanonical`), avoiding a
flag bit on `TVMFFIObject`; chandles created before the Python allocator
is registered (statically-initialized global functions in libtvm_ffi.so)
carry only the base header and are skipped.

## Free-threaded build

The same tying runs on Py_GIL_DISABLED. The `tagged_pyobj` word doubles
as a per-word spin-lock (a Locked tag bit, `__atomic_*` CAS) that
serializes every binding transition; the lock is held only across short,
park-free word/header edits, while alloc/revival run lock-released. The
wait/back-off (`TVMFFIPyLockYield`) detaches the thread state so a
concurrent stop-the-world GC is never starved.

The free-threaded revival hazard is Cython's generated `tp_dealloc`,
which bumps the wrapper refcount before running `__dealloc__`: that bump
makes `PyUnstable_TryIncRef` spuriously succeed on a wrapper being torn
down, so a concurrent `make_ret` Active-hit could revive a corpse
(borrowed-ref UAF). The fix replaces Cython's `tp_dealloc` on each cdef
CObject-family carrier with one hand-built `TVMFFIPyTpDeallocSlot` that
runs the binding transition (bracketed by
`PyErr_Get/SetRaisedException`) before any bump, then -- stripped of the
now-dead bump -- GC-untrack (guarded by GC-ness), a generic `__dict__`
clear (guarded by a real `tp_dictoffset`), and `tp_free`. A plain
carrier runs exactly `transition; tp_free`; Function fires both guards;
both are faithful to Cython's originals minus the bump. The six carriers
(CObject, CContainerBase, OpaquePyObject, Error, Tensor, Function) are a
closed compile-time set, each wrapped once at init via
`TVMFFIPyWrapDealloc`; every heap subtype is covered for free through
`subtype_dealloc`'s base-walk to the nearest carrier. An import-time
layout guard `Py_FatalError`s if a non-GC carrier ever gains a
`__dict__`, turning silent owned-member drift into a loud failure.
Active-hit revival uses `PyUnstable_TryIncRef` / `EnableTryIncRef` to
close the borrowed-read UAF. The GIL build is byte-identical: all
free-threaded machinery is behind `#ifdef Py_GIL_DISABLED`, and on the
GIL build the post-bump `__dealloc__` hook runs the transition directly
(safe under the GIL) while the slot installer is a no-op stub.

## Robustness

A double-free in the builtin allocator is fixed along the way: allocate
offset the body by `round_up(header, alignment)` but free subtracted a
fixed `sizeof(header)`, symmetric only for alignment <= 8 -- a 16-aligned
reflection dataclass (alignof(max_align_t)=16) was freed 8 bytes early.
Both sides now use a fixed `alignof(max_align_t)` body offset. A shutdown
guard (`TVMFFIPyMarkPythonFinalizing`, wired to atexit) flips an
atomic flag read before any `PyGILState_Ensure` to avoid acquiring the
GIL after Python finalization has begun; wrapper bytes on chandles still
alive at exit are intentionally leaked.

## `_move()` semantics

Under universal cache-on, callback args alias the caller's wrapper and
FFI returns of the same chandle alias the caller's wrapper (one wrapper,
one chandle ref). `_move()` is kept as an API: the rvalue setter on
either side eager-detaches the canonical binding before the C++
`AnyViewToOwnedAny` transfer nulls the source chandle.

## Tests

New `tests/python/test_pyobject_tying.py` covers Active/Inactive/InTransit
transitions, cache-on aliasing, `_move()` under cache-on, pickle stress,
threading stress, GC integration, multi-chandle isolation, the weakref
limitation, free-threaded concurrent carrier-type stress
(Function/Error/multi-level-heap), and OpaquePyObject roundtrip/leak.
`test_function.py::test_rvalue_ref` is refactored for the new
aliasing-aware use_count expectations. Verified: FT crash oracle clean,
FT suite 2321 passed, GIL suite 2346 passed, tying tests 43/43 on both
builds; Rust suite passes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants