Skip to content

Add ByteRepr for Ptr#225

Open
lucic71 wants to merge 15 commits into
Cpp2Rust:masterfrom
lucic71:ptr-byterepr
Open

Add ByteRepr for Ptr#225
lucic71 wants to merge 15 commits into
Cpp2Rust:masterfrom
lucic71:ptr-byterepr

Conversation

@lucic71

@lucic71 lucic71 commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

ByteRepr::to_bytes and ByteRepr::from_bytes are implemented using the new PtrRegistry. PtrRegistry is a global collection that keeps mappings between Ptr and [base, base + byte_len]. The right side of the interval includes base + byte_len because it's valid to have a pointer +1 OOB.

When switching from Ptr to the integer representation, the program is free to do arithmetic on the integer representation. When switching back form integer to Ptr, we construct a Ptr if the integer representation is describing a valid Ptr inside PtrRegistry, otherwise panic.

The PtrRegistry interface is defined as follows:

struct PtrRegistry {
    // Builds a synthetic address space that behaves like C's. Neither Weak::as_ptr (1) nor &T (2)
    // can be range bases. A bump cursor inside RangeAllocator hands disjoint, never-reused,
    // [base, base + byte_len] ranges.
    //
    // (1) Weak<RefCell<T>>::as_ptr() returns the address of the Rc payload. That's around 16
    // bytes. This breaks for ranges > 16 bytes as it would overlap with other valid ranges.
    // (2) &T does not live across reallocations, especially when T is Vec<T>
    ranges: RangeAllocator,

    // Mapping between Ptr and [base, base + byte_len] ranges. For efficiency, the mapping is stored
    // as base -> (Ptr, byte_len). Byte::from_bytes is doing O(log N) lookups of SyntheticAddr
    // inside the [base, base + byte_len] ranges.
    entries: BTreeMap<SyntheticAddr, (AnyPtr, ByteLen)>,
}

impl PtrRegistry {
  // Using the real address (Weak::as_ptr) and the length of the pointed-to data, create
  // a synthetic address (stable address) that is used as a key in PtrRegistry::entries
  fn put(&mut self, real_addr: RealAddr, byte_len: ByteLen, ptr: AnyPtr) -> SyntheticAddr;

  // Using a SyntheticAddr, return the associated entry in PtrRegistry::entries. The input
  // addr can be in the middle of [base, base + offset), get returns base. 
  fn get(&self, addr: SyntheticAddr) -> Option<(SyntheticAddr, AnyPtr, ByteLen)>;
}

ByteRepr::to_bytes becomes a call to:

PtrRegistry::put(
    // address() (Weak::as_ptr()) is only an identity key. It cannot act as a range base. This
    // is the real address described above in the interface of PtrRegistry::put. It points
    // at the Rc payload which does not know how many c_byte_len bytes are layed out in
    // memory. For example, the Rc payload is around 16 bytes, but we want to serialize a
    // pointer to object that is 100 bytes long. Doing [address(), address() + 100] is wrong because
    // that can overlap with other valid Rc allocations.
    //
    // PtrRegistry::put takes care of this by creating a SyntheticAddr out of (address(), c_byte_len())
    // that can be used as a non-overlapping range with the correct size.                                                                                    
    self.kind.address(),                                                                                               
    self.c_byte_len(),
    // This is a weak pointer and a type erased pointer at the same time. It will be saved in
    // PtrRegistry::entries. It's rebased at offset 0 becaues when it will be deserialized the
    // correct offset will be applied.                                                                                                   
    Ptr {                                                                                                              
        offset: 0,                                                                                                     
        kind: self.kind.clone(),                                                                                       
    }                                                                                                                  
    .to_any(),                                                                                                         
) 

ByteRepr::from_bytes becomes a call to PtrRegistry::get + reconstructing the Ptr with the correct offset and the correct type.

Notes about this implementation:

  1. PtrRegistry::put does O(1) amortized eviction using AnyPtr::is_dangling. Instead of traversing all entries an every put, which would be O(n), we use the following strategy: evict only after entries has doubled in size since the last eviction. If n is the size after the current eviction, we put n elements until we reach 2n, which is the next eviction step. This means that the amortized cost is O(2n/n) = O(1)
  2. This model does not panic on pointers crafted from out-of-bounds integers, for example:
int a = 1;
int b[100];
auto p = (uintptr_t) &a;
*(int*)(p + 50) = 42;

&a is serialized as a Ptr with [base_a, base_a + 4], then on the integer representation of the Ptr, the program does base_a + 50 which happens to fit into [base_b, base_b + 100]. Ptr::from_bytes happily deserializes base_a + 50 as being part of the b range. This is not ideal, a panic should be generated instead. @nunoplopes should we focus on fixing this now or can we consider it a known limitation?

  1. RangeAllocator always allocates disjoint ranges. So endA == startB is not supported because the right end of the interval [base, base + size] is inclusive in order to accept creating a +1 OOB pointer
  2. The integer representation of NULL is always 0
  3. Because AnyPtr is saved in PtrRegistry, putting a pointer of type A and getting a pointer of type B is allowed using reinterpret_cast

@lucic71 lucic71 marked this pull request as draft July 2, 2026 15:17
@lucic71 lucic71 marked this pull request as ready for review July 3, 2026 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant