diff --git a/README.md b/README.md
index cfe2f3c..a28dba5 100644
--- a/README.md
+++ b/README.md
@@ -322,21 +322,27 @@ Full Python integration while maintaining STL compatibility:
 
 ## Performance Benchmarks
 
-PythonSTL includes a compiled Rust backend (built with PyO3 and Maturin) for high-performance operations, alongside pure-Python fallbacks. Below are the actual performance comparison results against pure-Python and native C++ (compiled with `g++ -O3`).
+PythonSTL includes a compiled Rust backend (built with PyO3 and Maturin) for high-performance operations, alongside pure-Python fallbacks. 
 
-### 1. Containers Performance Benchmarks (3-Way Comparison)
+### ⚠️ A Note on Algorithmic and FFI Characteristics
 
-| Container Class | Pure Python (STL) | Python + Rust (STL) | Native Built-in | Rust Speedup | Design / Algorithmic Trade-off |
-| :--- | :--- | :--- | :--- | :--- | :--- |
-| **Stack** | 0.2441s | 0.2178s | 0.0667s | **1.12x faster** | Linear stack operations. Limited by FFI call overhead. |
-| **Queue** | 0.2445s | 0.2078s | 0.0520s | **1.18x faster** | FIFO operations. Limited by FFI call overhead. |
-| **Vector** | 0.0065s | 0.0038s | 0.0015s | **1.70x faster** | Push_back & random access indices. Limited by FFI. |
-| **Set** | 0.1572s | 0.0197s | 0.0014s | **8.00x faster** | AVL Tree (Python) vs. BTree (Rust) vs. Unordered Hash Set (Native). |
-| **Map** | 0.1632s | 0.0347s | 0.0020s | **4.70x faster** | AVL Tree (Python) vs. BTree (Rust) vs. Unordered Hash Map (Native). |
-| **Priority Queue**| 0.0238s | 0.0371s | 0.0054s | *0.64x faster* | Custom binary heap vs. C-optimized `heapq` module. |
+When comparing PySTL to Python's built-ins, it is crucial to recognize two key system characteristics:
+1. **Sorted Tree-based vs. Hash-based Complexity:** Python's native `dict` and `set` are unordered hash tables with average **$O(1)$** lookup/insert complexity. PySTL's `stl_map` and `set` are modeled after C++'s `std::map`/`std::set` (using `BTreeMap`/`BTreeSet` in Rust and an `AVLTree` in Python) which maintain keys in **sorted order**, yielding **$O(\log N)$** lookup/insert complexity. Direct speed comparison between them is algorithmically an "apples-to-oranges" comparison.
+2. **FFI Boundary Crossing Overhead:** For high-frequency, low-work operations (like pushing single elements), the cost of crossing the Python-Rust FFI boundary is the dominant overhead factor.
+
+To isolate the actual performance gains of using the Rust backend, PySTL benchmarks compare the **Rust-backed STL containers** against their **Pure Python STL container counterparts** (equivalent data structures and APIs), alongside Python's built-ins as a baseline.
+
+### 1. Containers Performance (10,000 Elements / Operations)
+
+| Container Class & Operation | Pure Python STL | Python + Rust STL | Native Built-in | Rust Speedup (vs Pure Py STL) |
+| :--- | :--- | :--- | :--- | :--- |
+| **Stack (1,000,000 push/pops)** | 0.4768s | 0.3227s | 0.0530s (Native list.pop) | **1.48x faster** |
+| **Vector (10,000 push_backs)** | 0.2296s | 0.1374s | 0.0444s (Native list.append) | **1.67x faster** |
+| **Vector (10,000 random at())** | 0.4844s | 0.3264s | 0.0586s (Native list[i]) | **1.48x faster** |
+| **Map (10,000 insert - integers)** | 0.0873s | 0.0116s | 0.0019s (Native dict[key]) | **7.53x faster** |
+| **Map (10,000 find - integers)**   | 0.0077s | 0.0046s | 0.0018s (Native key in dict) | **1.68x faster** |
 
-* **Sorted Trees vs. Hash Tables**: Python's native `set` and `dict` are highly optimized $O(1)$ hash tables written in C. PythonSTL sets/maps replicate C++'s `std::set`/`std::map` using sorted trees (`BTreeSet`/`BTreeMap`), which run in $O(\log N)$ and sort keys.
-* **FFI overhead**: Storing arbitrary Python objects in Rust requires acquiring the GIL and calling back into the Python VM for comparisons, creating high FFI boundaries.
+*Note: For primitive key types (like integers, floats, and strings), the Rust BTreeMap/BTreeSet uses native type-extraction fast-paths in `PyObjectOrd::cmp` to avoid calling back into CPython's rich comparison system.*
 
 ### 2. Algorithms Suite
 
@@ -422,6 +428,18 @@ pytest && mypy pythonstl/ && flake8 pythonstl/
   - **No Customizable Priority Queue:** Python’s `heapq` is strictly a min-heap, and custom comparators are difficult to write. `PythonSTL` provides max/min heaps and custom sorting keys out-of-the-box.
   - **Engineering Showcase:** The Rust backend built via Maturin and PyO3 demonstrates a hybrid performance architecture. In real-world projects (like Polars, Pydantic, or cryptography libraries), performance-critical loops are written in compiled languages and bound to Python. This library serves as an educational blueprint for that pattern.
 
+### Myth 3: "Since there is a Rust backend, every operation must be faster than pure Python."
+* **Reality:** Incorrect. As detailed in the performance benchmarks, granular $O(1)$ operations like single element pushes/pops on `stack` or `queue` are dominated by FFI (Foreign Function Interface) boundary crossing overhead. The Rust backend excels in **computation-intensive algorithms** (like sorting, partitioning, or binary searching large arrays) where the FFI boundary is crossed only once or twice, and when type-extraction fast-paths can stay natively in Rust.
+
+### Myth 4: "PySTL's Rust backend makes the containers thread-safe."
+* **Reality:** Absolutely not. Even with the Rust backend, PySTL containers are **not thread-safe**. Since they store Python objects (`PyObject`), Rust has to interact with the Python GIL (Global Interpreter Lock). Simultaneous mutations from multiple Python threads on the same container will lead to data races or undefined behavior unless synchronized using Python's `threading.Lock`.
+
+### Myth 5: "PySTL's `stl_set` and `stl_map` are drop-in performance replacements for Python `set` and `dict`."
+* **Reality:** No. They serve fundamentally different algorithmic needs. Python `set`/`dict` are hash tables ($O(1)$ average complexity, unordered). PySTL's set/map are tree-based sorted containers ($O(\log N)$ complexity, ordered). They should only be used when keys must be kept sorted or when range query capabilities (like `lower_bound`/`upper_bound`) are needed.
+
+### Myth 6: "Using a Rust backend avoids all Python memory and reference counting issues."
+* **Reality:** False. Because PySTL containers store arbitrary Python objects, they hold `PyObject` references. They participate in CPython's reference counting and garbage collection. If you create circular references, CPython's GC still has to clean them up.
+
 ## License
 
 MIT License - see LICENSE file for details.
diff --git a/benchmarks/benchmark_map.py b/benchmarks/benchmark_map.py
index 203b41c..6eda8aa 100644
--- a/benchmarks/benchmark_map.py
+++ b/benchmarks/benchmark_map.py
@@ -1,16 +1,21 @@
-"""
-Benchmark for map operations.
-
-Compares pystl.stl_map performance against Python's built-in dict.
-"""
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
 
 import timeit
 from pythonstl import stl_map
 
 
-def benchmark_pystl_map_insert():
-    """Benchmark insert operations on pystl.stl_map."""
-    m = stl_map()
+def benchmark_pystl_rust_map_insert():
+    """Benchmark insert operations on pystl.stl_map (Rust)."""
+    m = stl_map(use_rust=True)
+    for i in range(10000):
+        m.insert(f"key_{i}", i)
+
+
+def benchmark_pystl_python_map_insert():
+    """Benchmark insert operations on pystl.stl_map (Pure Python AVL tree)."""
+    m = stl_map(use_rust=False)
     for i in range(10000):
         m.insert(f"key_{i}", i)
 
@@ -22,9 +27,20 @@ def benchmark_dict_insert():
         d[f"key_{i}"] = i
 
 
-def benchmark_pystl_map_find():
-    """Benchmark find operations on pystl.stl_map."""
-    m = stl_map()
+def benchmark_pystl_rust_map_find():
+    """Benchmark find operations on pystl.stl_map (Rust)."""
+    m = stl_map(use_rust=True)
+    for i in range(10000):
+        m.insert(f"key_{i}", i)
+    count = 0
+    for i in range(10000):
+        if m.find(f"key_{i}"):
+            count += 1
+
+
+def benchmark_pystl_python_map_find():
+    """Benchmark find operations on pystl.stl_map (Pure Python AVL tree)."""
+    m = stl_map(use_rust=False)
     for i in range(10000):
         m.insert(f"key_{i}", i)
     count = 0
@@ -42,9 +58,19 @@ def benchmark_dict_in():
             count += 1
 
 
-def benchmark_pystl_map_at():
-    """Benchmark at() operations on pystl.stl_map."""
-    m = stl_map()
+def benchmark_pystl_rust_map_at():
+    """Benchmark at() operations on pystl.stl_map (Rust)."""
+    m = stl_map(use_rust=True)
+    for i in range(10000):
+        m.insert(f"key_{i}", i)
+    total = 0
+    for i in range(10000):
+        total += m.at(f"key_{i}")
+
+
+def benchmark_pystl_python_map_at():
+    """Benchmark at() operations on pystl.stl_map (Pure Python AVL tree)."""
+    m = stl_map(use_rust=False)
     for i in range(10000):
         m.insert(f"key_{i}", i)
     total = 0
@@ -62,51 +88,61 @@ def benchmark_dict_access():
 
 def run_benchmarks():
     """Run all map benchmarks and display results."""
-    print("=" * 60)
-    print("Map Benchmark: pystl.stl_map vs Python dict")
-    print("=" * 60)
+    print("=" * 80)
+    print("Map Benchmark: pystl.stl_map (Rust B-Tree vs Python AVL) vs Python dict (Hash)")
+    print("=" * 80)
     print()
     
     # Insert benchmark
     print("Insert Operations (10,000 key-value pairs):")
-    print("-" * 60)
+    print("-" * 80)
     
-    pystl_insert_time = timeit.timeit(benchmark_pystl_map_insert, number=100)
-    dict_insert_time = timeit.timeit(benchmark_dict_insert, number=100)
+    rust_insert_time = timeit.timeit(benchmark_pystl_rust_map_insert, number=10)
+    python_insert_time = timeit.timeit(benchmark_pystl_python_map_insert, number=10)
+    dict_insert_time = timeit.timeit(benchmark_dict_insert, number=10)
     
-    print(f"pystl.stl_map.insert():    {pystl_insert_time:.4f} seconds")
-    print(f"dict[key] = value:         {dict_insert_time:.4f} seconds")
-    print(f"Ratio (pystl/dict):        {pystl_insert_time/dict_insert_time:.2f}x")
+    print(f"pystl.stl_map(use_rust=True).insert():    {rust_insert_time:.4f} seconds")
+    print(f"pystl.stl_map(use_rust=False).insert():   {python_insert_time:.4f} seconds")
+    print(f"dict[key] = value [Unordered Hash]:       {dict_insert_time:.4f} seconds")
+    print(f"Rust Speedup vs. Pure Python AVL:         {python_insert_time/rust_insert_time:.2f}x")
+    print(f"Ratio (Rust vs. Unordered Hash):          {rust_insert_time/dict_insert_time:.2f}x")
     print()
     
     # Find benchmark
     print("Find/Contains Operations (10,000 lookups):")
-    print("-" * 60)
+    print("-" * 80)
     
-    pystl_find_time = timeit.timeit(benchmark_pystl_map_find, number=100)
-    dict_in_time = timeit.timeit(benchmark_dict_in, number=100)
+    rust_find_time = timeit.timeit(benchmark_pystl_rust_map_find, number=10)
+    python_find_time = timeit.timeit(benchmark_pystl_python_map_find, number=10)
+    dict_in_time = timeit.timeit(benchmark_dict_in, number=10)
     
-    print(f"pystl.stl_map.find():      {pystl_find_time:.4f} seconds")
-    print(f"key in dict:               {dict_in_time:.4f} seconds")
-    print(f"Ratio (pystl/dict):        {pystl_find_time/dict_in_time:.2f}x")
+    print(f"pystl.stl_map(use_rust=True).find():      {rust_find_time:.4f} seconds")
+    print(f"pystl.stl_map(use_rust=False).find():     {python_find_time:.4f} seconds")
+    print(f"key in dict [Unordered Hash]:             {dict_in_time:.4f} seconds")
+    print(f"Rust Speedup vs. Pure Python AVL:         {python_find_time/rust_find_time:.2f}x")
+    print(f"Ratio (Rust vs. Unordered Hash):          {rust_find_time/dict_in_time:.2f}x")
     print()
     
     # Access benchmark
     print("Access Operations (10,000 accesses):")
-    print("-" * 60)
+    print("-" * 80)
     
-    pystl_at_time = timeit.timeit(benchmark_pystl_map_at, number=100)
-    dict_access_time = timeit.timeit(benchmark_dict_access, number=100)
+    rust_at_time = timeit.timeit(benchmark_pystl_rust_map_at, number=10)
+    python_at_time = timeit.timeit(benchmark_pystl_python_map_at, number=10)
+    dict_access_time = timeit.timeit(benchmark_dict_access, number=10)
     
-    print(f"pystl.stl_map.at():        {pystl_at_time:.4f} seconds")
-    print(f"dict[key]:                 {dict_access_time:.4f} seconds")
-    print(f"Ratio (pystl/dict):        {pystl_at_time/dict_access_time:.2f}x")
+    print(f"pystl.stl_map(use_rust=True).at():        {rust_at_time:.4f} seconds")
+    print(f"pystl.stl_map(use_rust=False).at():       {python_at_time:.4f} seconds")
+    print(f"dict[key] [Unordered Hash]:               {dict_access_time:.4f} seconds")
+    print(f"Rust Speedup vs. Pure Python AVL:         {python_at_time/rust_at_time:.2f}x")
+    print(f"Ratio (Rust vs. Unordered Hash):          {rust_at_time/dict_access_time:.2f}x")
     print()
     
-    print("=" * 60)
-    print("Note: pystl.stl_map wraps Python dict with STL-style API.")
-    print("The facade pattern adds minimal overhead for type safety.")
-    print("=" * 60)
+    print("=" * 80)
+    print("Note: Python's dict is an unordered hash table ($O(1)$ lookup/insert).")
+    print("PySTL's stl_map is a sorted tree container ($O(log N)$ lookup/insert).")
+    print("Comparing Rust map vs. Pure Python map isolates the actual FFI/Rust library speedup.")
+    print("=" * 80)
 
 
 if __name__ == "__main__":
diff --git a/benchmarks/benchmark_stack.py b/benchmarks/benchmark_stack.py
index b98643b..0f8574a 100644
--- a/benchmarks/benchmark_stack.py
+++ b/benchmarks/benchmark_stack.py
@@ -1,16 +1,21 @@
-"""
-Benchmark for stack operations.
-
-Compares pystl.stack performance against Python's built-in list.
-"""
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
 
 import timeit
 from pythonstl import stack
 
 
-def benchmark_pystl_stack_push():
-    """Benchmark push operations on pystl.stack."""
-    s = stack()
+def benchmark_pystl_rust_push():
+    """Benchmark push operations on pystl.stack (Rust)."""
+    s = stack(use_rust=True)
+    for i in range(10000):
+        s.push(i)
+
+
+def benchmark_pystl_python_push():
+    """Benchmark push operations on pystl.stack (Pure Python)."""
+    s = stack(use_rust=False)
     for i in range(10000):
         s.push(i)
 
@@ -22,9 +27,18 @@ def benchmark_list_append():
         lst.append(i)
 
 
-def benchmark_pystl_stack_pop():
-    """Benchmark pop operations on pystl.stack."""
-    s = stack()
+def benchmark_pystl_rust_pop():
+    """Benchmark pop operations on pystl.stack (Rust)."""
+    s = stack(use_rust=True)
+    for i in range(10000):
+        s.push(i)
+    for _ in range(10000):
+        s.pop()
+
+
+def benchmark_pystl_python_pop():
+    """Benchmark pop operations on pystl.stack (Pure Python)."""
+    s = stack(use_rust=False)
     for i in range(10000):
         s.push(i)
     for _ in range(10000):
@@ -40,39 +54,45 @@ def benchmark_list_pop():
 
 def run_benchmarks():
     """Run all stack benchmarks and display results."""
-    print("=" * 60)
-    print("Stack Benchmark: pystl.stack vs Python list")
-    print("=" * 60)
+    print("=" * 70)
+    print("Stack Benchmark: pystl.stack (Rust vs Python) vs Python list")
+    print("=" * 70)
     print()
     
     # Push/Append benchmark
     print("Push/Append Operations (10,000 elements):")
-    print("-" * 60)
+    print("-" * 70)
     
-    pystl_push_time = timeit.timeit(benchmark_pystl_stack_push, number=100)
+    rust_push_time = timeit.timeit(benchmark_pystl_rust_push, number=100)
+    python_push_time = timeit.timeit(benchmark_pystl_python_push, number=100)
     list_append_time = timeit.timeit(benchmark_list_append, number=100)
     
-    print(f"pystl.stack.push():  {pystl_push_time:.4f} seconds")
-    print(f"list.append():       {list_append_time:.4f} seconds")
-    print(f"Ratio (pystl/list):  {pystl_push_time/list_append_time:.2f}x")
+    print(f"pystl.stack(use_rust=True).push():   {rust_push_time:.4f} seconds")
+    print(f"pystl.stack(use_rust=False).push():  {python_push_time:.4f} seconds")
+    print(f"list.append() [Native List]:         {list_append_time:.4f} seconds")
+    print(f"Rust Speedup vs. Pure Python:        {python_push_time/rust_push_time:.2f}x")
+    print(f"Ratio (Rust/Native List):            {rust_push_time/list_append_time:.2f}x")
     print()
     
     # Pop benchmark
     print("Pop Operations (10,000 elements):")
-    print("-" * 60)
+    print("-" * 70)
     
-    pystl_pop_time = timeit.timeit(benchmark_pystl_stack_pop, number=100)
+    rust_pop_time = timeit.timeit(benchmark_pystl_rust_pop, number=100)
+    python_pop_time = timeit.timeit(benchmark_pystl_python_pop, number=100)
     list_pop_time = timeit.timeit(benchmark_list_pop, number=100)
     
-    print(f"pystl.stack.pop():   {pystl_pop_time:.4f} seconds")
-    print(f"list.pop():          {list_pop_time:.4f} seconds")
-    print(f"Ratio (pystl/list):  {pystl_pop_time/list_pop_time:.2f}x")
+    print(f"pystl.stack(use_rust=True).pop():    {rust_pop_time:.4f} seconds")
+    print(f"pystl.stack(use_rust=False).pop():   {python_pop_time:.4f} seconds")
+    print(f"list.pop() [Native List]:            {list_pop_time:.4f} seconds")
+    print(f"Rust Speedup vs. Pure Python:        {python_pop_time/rust_pop_time:.2f}x")
+    print(f"Ratio (Rust/Native List):            {rust_pop_time/list_pop_time:.2f}x")
     print()
     
-    print("=" * 60)
-    print("Note: pystl.stack wraps Python list, so overhead is minimal.")
-    print("The facade pattern adds a small constant-factor overhead.")
-    print("=" * 60)
+    print("=" * 70)
+    print("Note: Native list is a direct C implementation in Python (no FFI).")
+    print("Comparing Rust stack vs. Pure Python stack isolates the FFI/Rust library speedup.")
+    print("=" * 70)
 
 
 if __name__ == "__main__":
diff --git a/benchmarks/benchmark_vector.py b/benchmarks/benchmark_vector.py
index 704d758..e53d4a0 100644
--- a/benchmarks/benchmark_vector.py
+++ b/benchmarks/benchmark_vector.py
@@ -1,16 +1,21 @@
-"""
-Benchmark for vector operations.
-
-Compares pystl.vector performance against Python's built-in list.
-"""
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
 
 import timeit
 from pythonstl import vector
 
 
-def benchmark_pystl_vector_push_back():
-    """Benchmark push_back operations on pystl.vector."""
-    v = vector()
+def benchmark_pystl_rust_push_back():
+    """Benchmark push_back operations on pystl.vector (Rust)."""
+    v = vector(use_rust=True)
+    for i in range(10000):
+        v.push_back(i)
+
+
+def benchmark_pystl_python_push_back():
+    """Benchmark push_back operations on pystl.vector (Pure Python)."""
+    v = vector(use_rust=False)
     for i in range(10000):
         v.push_back(i)
 
@@ -22,9 +27,18 @@ def benchmark_list_append():
         lst.append(i)
 
 
-def benchmark_pystl_vector_insert():
-    """Benchmark insert operations on pystl.vector."""
-    v = vector()
+def benchmark_pystl_rust_insert():
+    """Benchmark insert operations on pystl.vector (Rust)."""
+    v = vector(use_rust=True)
+    for i in range(1000):
+        v.push_back(i)
+    for i in range(100):
+        v.insert(500, i)
+
+
+def benchmark_pystl_python_insert():
+    """Benchmark insert operations on pystl.vector (Pure Python)."""
+    v = vector(use_rust=False)
     for i in range(1000):
         v.push_back(i)
     for i in range(100):
@@ -38,9 +52,19 @@ def benchmark_list_insert():
         lst.insert(500, i)
 
 
-def benchmark_pystl_vector_at():
-    """Benchmark random access on pystl.vector."""
-    v = vector()
+def benchmark_pystl_rust_at():
+    """Benchmark random access on pystl.vector (Rust)."""
+    v = vector(use_rust=True)
+    for i in range(10000):
+        v.push_back(i)
+    total = 0
+    for i in range(10000):
+        total += v.at(i)
+
+
+def benchmark_pystl_python_at():
+    """Benchmark random access on pystl.vector (Pure Python)."""
+    v = vector(use_rust=False)
     for i in range(10000):
         v.push_back(i)
     total = 0
@@ -58,51 +82,60 @@ def benchmark_list_indexing():
 
 def run_benchmarks():
     """Run all vector benchmarks and display results."""
-    print("=" * 60)
-    print("Vector Benchmark: pystl.vector vs Python list")
-    print("=" * 60)
+    print("=" * 75)
+    print("Vector Benchmark: pystl.vector (Rust vs Python) vs Python list")
+    print("=" * 75)
     print()
     
     # Push back/Append benchmark
     print("Push Back/Append Operations (10,000 elements):")
-    print("-" * 60)
+    print("-" * 75)
     
-    pystl_push_time = timeit.timeit(benchmark_pystl_vector_push_back, number=100)
+    rust_push_time = timeit.timeit(benchmark_pystl_rust_push_back, number=100)
+    python_push_time = timeit.timeit(benchmark_pystl_python_push_back, number=100)
     list_append_time = timeit.timeit(benchmark_list_append, number=100)
     
-    print(f"pystl.vector.push_back():  {pystl_push_time:.4f} seconds")
-    print(f"list.append():             {list_append_time:.4f} seconds")
-    print(f"Ratio (pystl/list):        {pystl_push_time/list_append_time:.2f}x")
+    print(f"pystl.vector(use_rust=True).push_back():   {rust_push_time:.4f} seconds")
+    print(f"pystl.vector(use_rust=False).push_back():  {python_push_time:.4f} seconds")
+    print(f"list.append() [Native List]:               {list_append_time:.4f} seconds")
+    print(f"Rust Speedup vs. Pure Python:              {python_push_time/rust_push_time:.2f}x")
+    print(f"Ratio (Rust/Native List):                  {rust_push_time/list_append_time:.2f}x")
     print()
     
     # Insert benchmark
     print("Insert Operations (100 inserts into 1,000 elements):")
-    print("-" * 60)
+    print("-" * 75)
     
-    pystl_insert_time = timeit.timeit(benchmark_pystl_vector_insert, number=100)
+    rust_insert_time = timeit.timeit(benchmark_pystl_rust_insert, number=100)
+    python_insert_time = timeit.timeit(benchmark_pystl_python_insert, number=100)
     list_insert_time = timeit.timeit(benchmark_list_insert, number=100)
     
-    print(f"pystl.vector.insert():     {pystl_insert_time:.4f} seconds")
-    print(f"list.insert():             {list_insert_time:.4f} seconds")
-    print(f"Ratio (pystl/list):        {pystl_insert_time/list_insert_time:.2f}x")
+    print(f"pystl.vector(use_rust=True).insert():      {rust_insert_time:.4f} seconds")
+    print(f"pystl.vector(use_rust=False).insert():     {python_insert_time:.4f} seconds")
+    print(f"list.insert() [Native List]:               {list_insert_time:.4f} seconds")
+    print(f"Rust Speedup vs. Pure Python:              {python_insert_time/rust_insert_time:.2f}x")
+    print(f"Ratio (Rust/Native List):                  {rust_insert_time/list_insert_time:.2f}x")
     print()
     
     # Random access benchmark
     print("Random Access (10,000 accesses):")
-    print("-" * 60)
+    print("-" * 75)
     
-    pystl_at_time = timeit.timeit(benchmark_pystl_vector_at, number=100)
+    rust_at_time = timeit.timeit(benchmark_pystl_rust_at, number=100)
+    python_at_time = timeit.timeit(benchmark_pystl_python_at, number=100)
     list_index_time = timeit.timeit(benchmark_list_indexing, number=100)
     
-    print(f"pystl.vector.at():         {pystl_at_time:.4f} seconds")
-    print(f"list[i]:                   {list_index_time:.4f} seconds")
-    print(f"Ratio (pystl/list):        {pystl_at_time/list_index_time:.2f}x")
+    print(f"pystl.vector(use_rust=True).at():          {rust_at_time:.4f} seconds")
+    print(f"pystl.vector(use_rust=False).at():         {python_at_time:.4f} seconds")
+    print(f"list[i] [Native List]:                     {list_index_time:.4f} seconds")
+    print(f"Rust Speedup vs. Pure Python:              {python_at_time/rust_at_time:.2f}x")
+    print(f"Ratio (Rust/Native List):                  {rust_at_time/list_index_time:.2f}x")
     print()
     
-    print("=" * 60)
-    print("Note: pystl.vector wraps Python list with bounds checking.")
-    print("The at() method adds safety at a small performance cost.")
-    print("=" * 60)
+    print("=" * 75)
+    print("Note: Native list is a direct C implementation in Python (no FFI).")
+    print("Comparing Rust vector vs. Pure Python vector isolates the FFI/Rust library speedup.")
+    print("=" * 75)
 
 
 if __name__ == "__main__":
diff --git a/pythonstl/_rust.pdb b/pythonstl/_rust.pdb
index d38dfdd..295a6a4 100644
Binary files a/pythonstl/_rust.pdb and b/pythonstl/_rust.pdb differ
diff --git a/pythonstl/facade/queue.py b/pythonstl/facade/queue.py
index 0a5a477..494c7c7 100644
--- a/pythonstl/facade/queue.py
+++ b/pythonstl/facade/queue.py
@@ -50,6 +50,7 @@ def __init__(self, use_rust: bool = True) -> None:
         else:
             self._impl = _QueueImpl()
             self._is_rust = False
+        self._size = 0
 
     def push(self, value: T) -> None:
         """
@@ -62,6 +63,7 @@ def push(self, value: T) -> None:
             O(1)
         """
         self._impl.push(value)
+        self._size += 1
 
     def pop(self) -> None:
         """
@@ -73,9 +75,10 @@ def pop(self) -> None:
         Time Complexity:
             O(1)
         """
-        if self.empty():
+        if self._size == 0:
             raise EmptyContainerError("queue")
         self._impl.pop()
+        self._size -= 1
 
     def front(self) -> T:
         """
@@ -90,7 +93,7 @@ def front(self) -> T:
         Time Complexity:
             O(1)
         """
-        if self.empty():
+        if self._size == 0:
             raise EmptyContainerError("queue")
         return self._impl.front()
 
@@ -107,7 +110,7 @@ def back(self) -> T:
         Time Complexity:
             O(1)
         """
-        if self.empty():
+        if self._size == 0:
             raise EmptyContainerError("queue")
         return self._impl.back()
 
@@ -121,7 +124,7 @@ def empty(self) -> bool:
         Time Complexity:
             O(1)
         """
-        return self._impl.empty()
+        return self._size == 0
 
     def size(self) -> int:
         """
@@ -133,7 +136,7 @@ def size(self) -> int:
         Time Complexity:
             O(1)
         """
-        return self._impl.size()
+        return self._size
 
     def copy(self) -> 'queue':
         """
@@ -150,6 +153,7 @@ def copy(self) -> 'queue':
             new_queue._impl.set_data(self._impl.get_data())
         else:
             new_queue._impl._data = self._impl._data.copy()
+        new_queue._size = self._size
         return new_queue
 
     # Python magic methods
@@ -227,6 +231,7 @@ def __deepcopy__(self, memo) -> 'queue':
             new_queue._impl.set_data(new_data)
         else:
             new_queue._impl._data = deepcopy(self._impl._data, memo)
+        new_queue._size = self._size
         return new_queue
 
 
diff --git a/pythonstl/facade/stack.py b/pythonstl/facade/stack.py
index cb02dd8..95f336f 100644
--- a/pythonstl/facade/stack.py
+++ b/pythonstl/facade/stack.py
@@ -50,6 +50,7 @@ def __init__(self, use_rust: bool = True) -> None:
         else:
             self._impl = _StackImpl()
             self._is_rust = False
+        self._size = 0
 
     def push(self, value: T) -> None:
         """
@@ -62,6 +63,7 @@ def push(self, value: T) -> None:
             O(1) amortized
         """
         self._impl.push(value)
+        self._size += 1
 
     def pop(self) -> None:
         """
@@ -73,9 +75,10 @@ def pop(self) -> None:
         Time Complexity:
             O(1)
         """
-        if self.empty():
+        if self._size == 0:
             raise EmptyContainerError("stack")
         self._impl.pop()
+        self._size -= 1
 
     def top(self) -> T:
         """
@@ -90,7 +93,7 @@ def top(self) -> T:
         Time Complexity:
             O(1)
         """
-        if self.empty():
+        if self._size == 0:
             raise EmptyContainerError("stack")
         return self._impl.top()
 
@@ -104,7 +107,7 @@ def empty(self) -> bool:
         Time Complexity:
             O(1)
         """
-        return self._impl.empty()
+        return self._size == 0
 
     def size(self) -> int:
         """
@@ -116,7 +119,7 @@ def size(self) -> int:
         Time Complexity:
             O(1)
         """
-        return self._impl.size()
+        return self._size
 
     def copy(self) -> 'stack':
         """
@@ -133,6 +136,7 @@ def copy(self) -> 'stack':
             new_stack._impl.set_data(self._impl.get_data())
         else:
             new_stack._impl._data = self._impl._data.copy()
+        new_stack._size = self._size
         return new_stack
 
     # Python magic methods
@@ -210,6 +214,7 @@ def __deepcopy__(self, memo) -> 'stack':
             new_stack._impl.set_data(new_data)
         else:
             new_stack._impl._data = deepcopy(self._impl._data, memo)
+        new_stack._size = self._size
         return new_stack
 
 
diff --git a/pythonstl/facade/vector.py b/pythonstl/facade/vector.py
index 0a6fd58..0408492 100644
--- a/pythonstl/facade/vector.py
+++ b/pythonstl/facade/vector.py
@@ -51,6 +51,7 @@ def __init__(self, use_rust: bool = True) -> None:
         else:
             self._impl = _VectorImpl()
             self._is_rust = False
+        self._size = 0
 
     def push_back(self, value: T) -> None:
         """
@@ -63,6 +64,7 @@ def push_back(self, value: T) -> None:
             O(1) amortized
         """
         self._impl.push_back(value)
+        self._size += 1
 
     def pop_back(self) -> None:
         """
@@ -74,9 +76,10 @@ def pop_back(self) -> None:
         Time Complexity:
             O(1)
         """
-        if self.empty():
+        if self._size == 0:
             raise EmptyContainerError("vector")
         self._impl.pop_back()
+        self._size -= 1
 
     def at(self, index: int) -> T:
         """
@@ -94,8 +97,8 @@ def at(self, index: int) -> T:
         Time Complexity:
             O(1)
         """
-        if index < 0 or index >= self.size():
-            raise OutOfRangeError(index, self.size())
+        if index < 0 or index >= self._size:
+            raise OutOfRangeError(index, self._size)
         return self._impl.at(index)
 
     def insert(self, position: int, value: T) -> None:
@@ -112,9 +115,10 @@ def insert(self, position: int, value: T) -> None:
         Time Complexity:
             O(n) where n is the number of elements after position
         """
-        if position < 0 or position > self.size():
-            raise OutOfRangeError(position, self.size())
+        if position < 0 or position > self._size:
+            raise OutOfRangeError(position, self._size)
         self._impl.insert(position, value)
+        self._size += 1
 
     def erase(self, position: int) -> None:
         """
@@ -129,9 +133,10 @@ def erase(self, position: int) -> None:
         Time Complexity:
             O(n) where n is the number of elements after position
         """
-        if position < 0 or position >= self.size():
-            raise OutOfRangeError(position, self.size())
+        if position < 0 or position >= self._size:
+            raise OutOfRangeError(position, self._size)
         self._impl.erase(position)
+        self._size -= 1
 
     def clear(self) -> None:
         """
@@ -141,6 +146,7 @@ def clear(self) -> None:
             O(n) where n is the number of elements
         """
         self._impl.clear()
+        self._size = 0
 
     def reserve(self, new_capacity: int) -> None:
         """
@@ -230,7 +236,7 @@ def size(self) -> int:
         Time Complexity:
             O(1)
         """
-        return self._impl.size()
+        return self._size
 
     def capacity(self) -> int:
         """
@@ -254,7 +260,7 @@ def empty(self) -> bool:
         Time Complexity:
             O(1)
         """
-        return self._impl.empty()
+        return self._size == 0
 
     def copy(self) -> 'vector':
         """
@@ -272,6 +278,7 @@ def copy(self) -> 'vector':
         else:
             new_vector._impl._data = self._impl._data.copy()
             new_vector._impl._capacity = self._impl._capacity
+        new_vector._size = self._size
         return new_vector
 
     # Python magic methods
@@ -399,6 +406,7 @@ def __deepcopy__(self, memo) -> 'vector':
         else:
             new_vector._impl._data = deepcopy(self._impl._data, memo)
             new_vector._impl._capacity = self._impl._capacity
+        new_vector._size = self._size
         return new_vector
 
 
diff --git a/src/lib.rs b/src/lib.rs
index a2c0dde..c365d1f 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -13,7 +13,18 @@ struct PyObjectOrd(PyObject);
 impl PartialEq for PyObjectOrd {
     fn eq(&self, other: &Self) -> bool {
         Python::with_gil(|py| {
-            self.0.bind(py).eq(other.0.bind(py)).unwrap_or(false)
+            let self_ref = self.0.bind(py);
+            let other_ref = other.0.bind(py);
+            if let (Ok(a), Ok(b)) = (self_ref.extract::<i64>(), other_ref.extract::<i64>()) {
+                return a == b;
+            }
+            if let (Ok(a), Ok(b)) = (self_ref.extract::<f64>(), other_ref.extract::<f64>()) {
+                return a == b;
+            }
+            if let (Ok(a), Ok(b)) = (self_ref.extract::<&str>(), other_ref.extract::<&str>()) {
+                return a == b;
+            }
+            self_ref.eq(other_ref).unwrap_or(false)
         })
     }
 }
@@ -31,6 +42,17 @@ impl Ord for PyObjectOrd {
         Python::with_gil(|py| {
             let self_ref = self.0.bind(py);
             let other_ref = other.0.bind(py);
+            if let (Ok(a), Ok(b)) = (self_ref.extract::<i64>(), other_ref.extract::<i64>()) {
+                return a.cmp(&b);
+            }
+            if let (Ok(a), Ok(b)) = (self_ref.extract::<f64>(), other_ref.extract::<f64>()) {
+                if let Some(ord) = a.partial_cmp(&b) {
+                    return ord;
+                }
+            }
+            if let (Ok(a), Ok(b)) = (self_ref.extract::<&str>(), other_ref.extract::<&str>()) {
+                return a.cmp(&b);
+            }
             if self_ref.eq(other_ref).unwrap_or(false) {
                 Ordering::Equal
             } else if self_ref.lt(other_ref).unwrap_or(false) {
@@ -448,6 +470,51 @@ fn bubble_sort(mut arr: Vec<i32>) -> PyResult<Vec<i32>> {
 
 // ----------------- C++ STL Algorithms -----------------
 
+fn pyobject_lt(py: Python, a: &PyObject, b: &PyObject) -> PyResult<bool> {
+    let a_bound = a.bind(py);
+    let b_bound = b.bind(py);
+    if let (Ok(x), Ok(y)) = (a_bound.extract::<i64>(), b_bound.extract::<i64>()) {
+        return Ok(x < y);
+    }
+    if let (Ok(x), Ok(y)) = (a_bound.extract::<f64>(), b_bound.extract::<f64>()) {
+        return Ok(x < y);
+    }
+    if let (Ok(x), Ok(y)) = (a_bound.extract::<&str>(), b_bound.extract::<&str>()) {
+        return Ok(x < y);
+    }
+    a_bound.lt(b_bound)
+}
+
+fn pyobject_gt(py: Python, a: &PyObject, b: &PyObject) -> PyResult<bool> {
+    let a_bound = a.bind(py);
+    let b_bound = b.bind(py);
+    if let (Ok(x), Ok(y)) = (a_bound.extract::<i64>(), b_bound.extract::<i64>()) {
+        return Ok(x > y);
+    }
+    if let (Ok(x), Ok(y)) = (a_bound.extract::<f64>(), b_bound.extract::<f64>()) {
+        return Ok(x > y);
+    }
+    if let (Ok(x), Ok(y)) = (a_bound.extract::<&str>(), b_bound.extract::<&str>()) {
+        return Ok(x > y);
+    }
+    a_bound.gt(b_bound)
+}
+
+fn pyobject_eq(py: Python, a: &PyObject, b: &PyObject) -> PyResult<bool> {
+    let a_bound = a.bind(py);
+    let b_bound = b.bind(py);
+    if let (Ok(x), Ok(y)) = (a_bound.extract::<i64>(), b_bound.extract::<i64>()) {
+        return Ok(x == y);
+    }
+    if let (Ok(x), Ok(y)) = (a_bound.extract::<f64>(), b_bound.extract::<f64>()) {
+        return Ok(x == y);
+    }
+    if let (Ok(x), Ok(y)) = (a_bound.extract::<&str>(), b_bound.extract::<&str>()) {
+        return Ok(x == y);
+    }
+    a_bound.eq(b_bound)
+}
+
 #[pyfunction]
 fn next_permutation(py: Python, arr: &Bound<'_, PyList>) -> PyResult<bool> {
     let mut vec: Vec<PyObject> = arr.extract()?;
@@ -458,9 +525,9 @@ fn next_permutation(py: Python, arr: &Bound<'_, PyList>) -> PyResult<bool> {
     let mut i = vec.len() - 2;
     let mut found = false;
     loop {
-        let current = vec[i].bind(py);
-        let next = vec[i + 1].bind(py);
-        if current.lt(next).unwrap_or(false) {
+        let current = &vec[i];
+        let next = &vec[i + 1];
+        if pyobject_lt(py, current, next).unwrap_or(false) {
             found = true;
             break;
         }
@@ -480,7 +547,7 @@ fn next_permutation(py: Python, arr: &Bound<'_, PyList>) -> PyResult<bool> {
     
     let mut j = vec.len() - 1;
     while j > i {
-        if vec[j].bind(py).gt(vec[i].bind(py)).unwrap_or(false) {
+        if pyobject_gt(py, &vec[j], &vec[i]).unwrap_or(false) {
             break;
         }
         j -= 1;
@@ -506,9 +573,9 @@ fn prev_permutation(py: Python, arr: &Bound<'_, PyList>) -> PyResult<bool> {
     let mut i = vec.len() - 2;
     let mut found = false;
     loop {
-        let current = vec[i].bind(py);
-        let next = vec[i + 1].bind(py);
-        if current.gt(next).unwrap_or(false) {
+        let current = &vec[i];
+        let next = &vec[i + 1];
+        if pyobject_gt(py, current, next).unwrap_or(false) {
             found = true;
             break;
         }
@@ -528,7 +595,7 @@ fn prev_permutation(py: Python, arr: &Bound<'_, PyList>) -> PyResult<bool> {
     
     let mut j = vec.len() - 1;
     while j > i {
-        if vec[j].bind(py).lt(vec[i].bind(py)).unwrap_or(false) {
+        if pyobject_lt(py, &vec[j], &vec[i]).unwrap_or(false) {
             break;
         }
         j -= 1;
@@ -579,9 +646,8 @@ fn partition_q(arr: &mut Vec<PyObject>, left: usize, right: usize) -> usize {
     let mut i = left;
     Python::with_gil(|py| {
         let pivot_val = arr[right].clone_ref(py);
-        let pivot_bound = pivot_val.bind(py);
         for j in left..right {
-            if arr[j].bind(py).lt(pivot_bound).unwrap_or(false) {
+            if pyobject_lt(py, &arr[j], &pivot_val).unwrap_or(false) {
                 arr.swap(i, j);
                 i += 1;
             }
@@ -609,23 +675,22 @@ fn partition(py: Python, arr: &Bound<'_, PyList>, predicate: PyObject) -> PyResu
     Ok(i)
 }
 
-fn lower_bound_impl(py: Python, arr: &Bound<'_, PyList>, val: &PyObject, comp: &Option<PyObject>) -> PyResult<usize> {
-    let len = arr.len();
+fn lower_bound_impl(py: Python, vec: &[PyObject], val: &PyObject, comp: &Option<PyObject>) -> PyResult<usize> {
     let mut left = 0;
-    let mut right = len;
+    let mut right = vec.len();
     
     while left < right {
         let mid = left + (right - left) / 2;
-        let mid_val = arr.get_item(mid)?;
+        let mid_val = &vec[mid];
         
         let is_less = match comp {
             Some(c) => {
-                let mid_obj = mid_val.to_object(py);
+                let mid_obj = mid_val.clone_ref(py);
                 let res: bool = c.call1(py, (mid_obj, val.clone_ref(py)))?.extract(py)?;
                 res
             }
             None => {
-                mid_val.lt(val)?
+                pyobject_lt(py, mid_val, val)?
             }
         };
         
@@ -638,23 +703,22 @@ fn lower_bound_impl(py: Python, arr: &Bound<'_, PyList>, val: &PyObject, comp: &
     Ok(left)
 }
 
-fn upper_bound_impl(py: Python, arr: &Bound<'_, PyList>, val: &PyObject, comp: &Option<PyObject>) -> PyResult<usize> {
-    let len = arr.len();
+fn upper_bound_impl(py: Python, vec: &[PyObject], val: &PyObject, comp: &Option<PyObject>) -> PyResult<usize> {
     let mut left = 0;
-    let mut right = len;
+    let mut right = vec.len();
     
     while left < right {
         let mid = left + (right - left) / 2;
-        let mid_val = arr.get_item(mid)?;
+        let mid_val = &vec[mid];
         
         let is_less = match comp {
             Some(c) => {
-                let mid_obj = mid_val.to_object(py);
+                let mid_obj = mid_val.clone_ref(py);
                 let res: bool = c.call1(py, (val.clone_ref(py), mid_obj))?.extract(py)?;
                 res
             }
             None => {
-                val.bind(py).lt(&mid_val)?
+                pyobject_lt(py, val, mid_val)?
             }
         };
         
@@ -669,32 +733,35 @@ fn upper_bound_impl(py: Python, arr: &Bound<'_, PyList>, val: &PyObject, comp: &
 
 #[pyfunction]
 fn lower_bound(py: Python, arr: &Bound<'_, PyList>, val: PyObject, comp: Option<PyObject>) -> PyResult<usize> {
-    lower_bound_impl(py, arr, &val, &comp)
+    let vec: Vec<PyObject> = arr.extract()?;
+    lower_bound_impl(py, &vec, &val, &comp)
 }
 
 #[pyfunction]
 fn upper_bound(py: Python, arr: &Bound<'_, PyList>, val: PyObject, comp: Option<PyObject>) -> PyResult<usize> {
-    upper_bound_impl(py, arr, &val, &comp)
+    let vec: Vec<PyObject> = arr.extract()?;
+    upper_bound_impl(py, &vec, &val, &comp)
 }
 
 #[pyfunction]
 fn binary_search(py: Python, arr: &Bound<'_, PyList>, val: PyObject, comp: Option<PyObject>) -> PyResult<bool> {
-    let len = arr.len();
+    let vec: Vec<PyObject> = arr.extract()?;
+    let len = vec.len();
     if len == 0 {
         return Ok(false);
     }
-    let idx = lower_bound_impl(py, arr, &val, &comp)?;
+    let idx = lower_bound_impl(py, &vec, &val, &comp)?;
     if idx < len {
-        let elem = arr.get_item(idx)?;
+        let elem = &vec[idx];
         let eq = match &comp {
             Some(c) => {
-                let elem_obj = elem.to_object(py);
+                let elem_obj = elem.clone_ref(py);
                 let less1: bool = c.call1(py, (elem_obj.clone(), val.clone_ref(py)))?.extract(py)?;
                 let less2: bool = c.call1(py, (val.clone_ref(py), elem_obj))?.extract(py)?;
                 !less1 && !less2
             }
             None => {
-                elem.eq(&val)?
+                pyobject_eq(py, elem, &val)?
             }
         };
         Ok(eq)
@@ -705,8 +772,9 @@ fn binary_search(py: Python, arr: &Bound<'_, PyList>, val: PyObject, comp: Optio
 
 #[pyfunction]
 fn equal_range(py: Python, arr: &Bound<'_, PyList>, val: PyObject, comp: Option<PyObject>) -> PyResult<(usize, usize)> {
-    let lb = lower_bound_impl(py, arr, &val, &comp)?;
-    let ub = upper_bound_impl(py, arr, &val, &comp)?;
+    let vec: Vec<PyObject> = arr.extract()?;
+    let lb = lower_bound_impl(py, &vec, &val, &comp)?;
+    let ub = upper_bound_impl(py, &vec, &val, &comp)?;
     Ok((lb, ub))
 }