Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 30 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -322,21 +322,27 @@ Full Python integration while maintaining STL compatibility:

## Performance Benchmarks

PythonSTL includes a compiled Rust backend (built with PyO3 and Maturin) for high-performance operations, alongside pure-Python fallbacks. Below are the actual performance comparison results against pure-Python and native C++ (compiled with `g++ -O3`).
PythonSTL includes a compiled Rust backend (built with PyO3 and Maturin) for high-performance operations, alongside pure-Python fallbacks.

### 1. Containers Performance Benchmarks (3-Way Comparison)
### ⚠️ A Note on Algorithmic and FFI Characteristics

| Container Class | Pure Python (STL) | Python + Rust (STL) | Native Built-in | Rust Speedup | Design / Algorithmic Trade-off |
| :--- | :--- | :--- | :--- | :--- | :--- |
| **Stack** | 0.2441s | 0.2178s | 0.0667s | **1.12x faster** | Linear stack operations. Limited by FFI call overhead. |
| **Queue** | 0.2445s | 0.2078s | 0.0520s | **1.18x faster** | FIFO operations. Limited by FFI call overhead. |
| **Vector** | 0.0065s | 0.0038s | 0.0015s | **1.70x faster** | Push_back & random access indices. Limited by FFI. |
| **Set** | 0.1572s | 0.0197s | 0.0014s | **8.00x faster** | AVL Tree (Python) vs. BTree (Rust) vs. Unordered Hash Set (Native). |
| **Map** | 0.1632s | 0.0347s | 0.0020s | **4.70x faster** | AVL Tree (Python) vs. BTree (Rust) vs. Unordered Hash Map (Native). |
| **Priority Queue**| 0.0238s | 0.0371s | 0.0054s | *0.64x faster* | Custom binary heap vs. C-optimized `heapq` module. |
When comparing PySTL to Python's built-ins, it is crucial to recognize two key system characteristics:
1. **Sorted Tree-based vs. Hash-based Complexity:** Python's native `dict` and `set` are unordered hash tables with average **$O(1)$** lookup/insert complexity. PySTL's `stl_map` and `set` are modeled after C++'s `std::map`/`std::set` (using `BTreeMap`/`BTreeSet` in Rust and an `AVLTree` in Python) which maintain keys in **sorted order**, yielding **$O(\log N)$** lookup/insert complexity. Direct speed comparison between them is algorithmically an "apples-to-oranges" comparison.
2. **FFI Boundary Crossing Overhead:** For high-frequency, low-work operations (like pushing single elements), the cost of crossing the Python-Rust FFI boundary is the dominant overhead factor.

To isolate the actual performance gains of using the Rust backend, PySTL benchmarks compare the **Rust-backed STL containers** against their **Pure Python STL container counterparts** (equivalent data structures and APIs), alongside Python's built-ins as a baseline.

### 1. Containers Performance (10,000 Elements / Operations)

| Container Class & Operation | Pure Python STL | Python + Rust STL | Native Built-in | Rust Speedup (vs Pure Py STL) |
| :--- | :--- | :--- | :--- | :--- |
| **Stack (1,000,000 push/pops)** | 0.4768s | 0.3227s | 0.0530s (Native list.pop) | **1.48x faster** |
| **Vector (10,000 push_backs)** | 0.2296s | 0.1374s | 0.0444s (Native list.append) | **1.67x faster** |
| **Vector (10,000 random at())** | 0.4844s | 0.3264s | 0.0586s (Native list[i]) | **1.48x faster** |
| **Map (10,000 insert - integers)** | 0.0873s | 0.0116s | 0.0019s (Native dict[key]) | **7.53x faster** |
| **Map (10,000 find - integers)** | 0.0077s | 0.0046s | 0.0018s (Native key in dict) | **1.68x faster** |

* **Sorted Trees vs. Hash Tables**: Python's native `set` and `dict` are highly optimized $O(1)$ hash tables written in C. PythonSTL sets/maps replicate C++'s `std::set`/`std::map` using sorted trees (`BTreeSet`/`BTreeMap`), which run in $O(\log N)$ and sort keys.
* **FFI overhead**: Storing arbitrary Python objects in Rust requires acquiring the GIL and calling back into the Python VM for comparisons, creating high FFI boundaries.
*Note: For primitive key types (like integers, floats, and strings), the Rust BTreeMap/BTreeSet uses native type-extraction fast-paths in `PyObjectOrd::cmp` to avoid calling back into CPython's rich comparison system.*

### 2. Algorithms Suite

Expand Down Expand Up @@ -422,6 +428,18 @@ pytest && mypy pythonstl/ && flake8 pythonstl/
- **No Customizable Priority Queue:** Python’s `heapq` is strictly a min-heap, and custom comparators are difficult to write. `PythonSTL` provides max/min heaps and custom sorting keys out-of-the-box.
- **Engineering Showcase:** The Rust backend built via Maturin and PyO3 demonstrates a hybrid performance architecture. In real-world projects (like Polars, Pydantic, or cryptography libraries), performance-critical loops are written in compiled languages and bound to Python. This library serves as an educational blueprint for that pattern.

### Myth 3: "Since there is a Rust backend, every operation must be faster than pure Python."
* **Reality:** Incorrect. As detailed in the performance benchmarks, granular $O(1)$ operations like single element pushes/pops on `stack` or `queue` are dominated by FFI (Foreign Function Interface) boundary crossing overhead. The Rust backend excels in **computation-intensive algorithms** (like sorting, partitioning, or binary searching large arrays) where the FFI boundary is crossed only once or twice, and when type-extraction fast-paths can stay natively in Rust.

### Myth 4: "PySTL's Rust backend makes the containers thread-safe."
* **Reality:** Absolutely not. Even with the Rust backend, PySTL containers are **not thread-safe**. Since they store Python objects (`PyObject`), Rust has to interact with the Python GIL (Global Interpreter Lock). Simultaneous mutations from multiple Python threads on the same container will lead to data races or undefined behavior unless synchronized using Python's `threading.Lock`.

### Myth 5: "PySTL's `stl_set` and `stl_map` are drop-in performance replacements for Python `set` and `dict`."
* **Reality:** No. They serve fundamentally different algorithmic needs. Python `set`/`dict` are hash tables ($O(1)$ average complexity, unordered). PySTL's set/map are tree-based sorted containers ($O(\log N)$ complexity, ordered). They should only be used when keys must be kept sorted or when range query capabilities (like `lower_bound`/`upper_bound`) are needed.

### Myth 6: "Using a Rust backend avoids all Python memory and reference counting issues."
* **Reality:** False. Because PySTL containers store arbitrary Python objects, they hold `PyObject` references. They participate in CPython's reference counting and garbage collection. If you create circular references, CPython's GC still has to clean them up.

## License

MIT License - see LICENSE file for details.
Expand Down
114 changes: 75 additions & 39 deletions benchmarks/benchmark_map.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,21 @@
"""
Benchmark for map operations.

Compares pystl.stl_map performance against Python's built-in dict.
"""
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parents[1]))

import timeit
from pythonstl import stl_map


def benchmark_pystl_map_insert():
"""Benchmark insert operations on pystl.stl_map."""
m = stl_map()
def benchmark_pystl_rust_map_insert():
"""Benchmark insert operations on pystl.stl_map (Rust)."""
m = stl_map(use_rust=True)
for i in range(10000):
m.insert(f"key_{i}", i)


def benchmark_pystl_python_map_insert():
"""Benchmark insert operations on pystl.stl_map (Pure Python AVL tree)."""
m = stl_map(use_rust=False)
for i in range(10000):
m.insert(f"key_{i}", i)

Expand All @@ -22,9 +27,20 @@ def benchmark_dict_insert():
d[f"key_{i}"] = i


def benchmark_pystl_map_find():
"""Benchmark find operations on pystl.stl_map."""
m = stl_map()
def benchmark_pystl_rust_map_find():
"""Benchmark find operations on pystl.stl_map (Rust)."""
m = stl_map(use_rust=True)
for i in range(10000):
m.insert(f"key_{i}", i)
count = 0
for i in range(10000):
if m.find(f"key_{i}"):
count += 1


def benchmark_pystl_python_map_find():
"""Benchmark find operations on pystl.stl_map (Pure Python AVL tree)."""
m = stl_map(use_rust=False)
for i in range(10000):
m.insert(f"key_{i}", i)
count = 0
Expand All @@ -42,9 +58,19 @@ def benchmark_dict_in():
count += 1


def benchmark_pystl_map_at():
"""Benchmark at() operations on pystl.stl_map."""
m = stl_map()
def benchmark_pystl_rust_map_at():
"""Benchmark at() operations on pystl.stl_map (Rust)."""
m = stl_map(use_rust=True)
for i in range(10000):
m.insert(f"key_{i}", i)
total = 0
for i in range(10000):
total += m.at(f"key_{i}")


def benchmark_pystl_python_map_at():
"""Benchmark at() operations on pystl.stl_map (Pure Python AVL tree)."""
m = stl_map(use_rust=False)
for i in range(10000):
m.insert(f"key_{i}", i)
total = 0
Expand All @@ -62,51 +88,61 @@ def benchmark_dict_access():

def run_benchmarks():
"""Run all map benchmarks and display results."""
print("=" * 60)
print("Map Benchmark: pystl.stl_map vs Python dict")
print("=" * 60)
print("=" * 80)
print("Map Benchmark: pystl.stl_map (Rust B-Tree vs Python AVL) vs Python dict (Hash)")
print("=" * 80)
print()

# Insert benchmark
print("Insert Operations (10,000 key-value pairs):")
print("-" * 60)
print("-" * 80)

pystl_insert_time = timeit.timeit(benchmark_pystl_map_insert, number=100)
dict_insert_time = timeit.timeit(benchmark_dict_insert, number=100)
rust_insert_time = timeit.timeit(benchmark_pystl_rust_map_insert, number=10)
python_insert_time = timeit.timeit(benchmark_pystl_python_map_insert, number=10)
dict_insert_time = timeit.timeit(benchmark_dict_insert, number=10)

print(f"pystl.stl_map.insert(): {pystl_insert_time:.4f} seconds")
print(f"dict[key] = value: {dict_insert_time:.4f} seconds")
print(f"Ratio (pystl/dict): {pystl_insert_time/dict_insert_time:.2f}x")
print(f"pystl.stl_map(use_rust=True).insert(): {rust_insert_time:.4f} seconds")
print(f"pystl.stl_map(use_rust=False).insert(): {python_insert_time:.4f} seconds")
print(f"dict[key] = value [Unordered Hash]: {dict_insert_time:.4f} seconds")
print(f"Rust Speedup vs. Pure Python AVL: {python_insert_time/rust_insert_time:.2f}x")
print(f"Ratio (Rust vs. Unordered Hash): {rust_insert_time/dict_insert_time:.2f}x")
print()

# Find benchmark
print("Find/Contains Operations (10,000 lookups):")
print("-" * 60)
print("-" * 80)

pystl_find_time = timeit.timeit(benchmark_pystl_map_find, number=100)
dict_in_time = timeit.timeit(benchmark_dict_in, number=100)
rust_find_time = timeit.timeit(benchmark_pystl_rust_map_find, number=10)
python_find_time = timeit.timeit(benchmark_pystl_python_map_find, number=10)
dict_in_time = timeit.timeit(benchmark_dict_in, number=10)

print(f"pystl.stl_map.find(): {pystl_find_time:.4f} seconds")
print(f"key in dict: {dict_in_time:.4f} seconds")
print(f"Ratio (pystl/dict): {pystl_find_time/dict_in_time:.2f}x")
print(f"pystl.stl_map(use_rust=True).find(): {rust_find_time:.4f} seconds")
print(f"pystl.stl_map(use_rust=False).find(): {python_find_time:.4f} seconds")
print(f"key in dict [Unordered Hash]: {dict_in_time:.4f} seconds")
print(f"Rust Speedup vs. Pure Python AVL: {python_find_time/rust_find_time:.2f}x")
print(f"Ratio (Rust vs. Unordered Hash): {rust_find_time/dict_in_time:.2f}x")
print()

# Access benchmark
print("Access Operations (10,000 accesses):")
print("-" * 60)
print("-" * 80)

pystl_at_time = timeit.timeit(benchmark_pystl_map_at, number=100)
dict_access_time = timeit.timeit(benchmark_dict_access, number=100)
rust_at_time = timeit.timeit(benchmark_pystl_rust_map_at, number=10)
python_at_time = timeit.timeit(benchmark_pystl_python_map_at, number=10)
dict_access_time = timeit.timeit(benchmark_dict_access, number=10)

print(f"pystl.stl_map.at(): {pystl_at_time:.4f} seconds")
print(f"dict[key]: {dict_access_time:.4f} seconds")
print(f"Ratio (pystl/dict): {pystl_at_time/dict_access_time:.2f}x")
print(f"pystl.stl_map(use_rust=True).at(): {rust_at_time:.4f} seconds")
print(f"pystl.stl_map(use_rust=False).at(): {python_at_time:.4f} seconds")
print(f"dict[key] [Unordered Hash]: {dict_access_time:.4f} seconds")
print(f"Rust Speedup vs. Pure Python AVL: {python_at_time/rust_at_time:.2f}x")
print(f"Ratio (Rust vs. Unordered Hash): {rust_at_time/dict_access_time:.2f}x")
print()

print("=" * 60)
print("Note: pystl.stl_map wraps Python dict with STL-style API.")
print("The facade pattern adds minimal overhead for type safety.")
print("=" * 60)
print("=" * 80)
print("Note: Python's dict is an unordered hash table ($O(1)$ lookup/insert).")
print("PySTL's stl_map is a sorted tree container ($O(log N)$ lookup/insert).")
print("Comparing Rust map vs. Pure Python map isolates the actual FFI/Rust library speedup.")
print("=" * 80)


if __name__ == "__main__":
Expand Down
76 changes: 48 additions & 28 deletions benchmarks/benchmark_stack.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,21 @@
"""
Benchmark for stack operations.

Compares pystl.stack performance against Python's built-in list.
"""
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parents[1]))

import timeit
from pythonstl import stack


def benchmark_pystl_stack_push():
"""Benchmark push operations on pystl.stack."""
s = stack()
def benchmark_pystl_rust_push():
"""Benchmark push operations on pystl.stack (Rust)."""
s = stack(use_rust=True)
for i in range(10000):
s.push(i)


def benchmark_pystl_python_push():
"""Benchmark push operations on pystl.stack (Pure Python)."""
s = stack(use_rust=False)
for i in range(10000):
s.push(i)

Expand All @@ -22,9 +27,18 @@ def benchmark_list_append():
lst.append(i)


def benchmark_pystl_stack_pop():
"""Benchmark pop operations on pystl.stack."""
s = stack()
def benchmark_pystl_rust_pop():
"""Benchmark pop operations on pystl.stack (Rust)."""
s = stack(use_rust=True)
for i in range(10000):
s.push(i)
for _ in range(10000):
s.pop()


def benchmark_pystl_python_pop():
"""Benchmark pop operations on pystl.stack (Pure Python)."""
s = stack(use_rust=False)
for i in range(10000):
s.push(i)
for _ in range(10000):
Expand All @@ -40,39 +54,45 @@ def benchmark_list_pop():

def run_benchmarks():
"""Run all stack benchmarks and display results."""
print("=" * 60)
print("Stack Benchmark: pystl.stack vs Python list")
print("=" * 60)
print("=" * 70)
print("Stack Benchmark: pystl.stack (Rust vs Python) vs Python list")
print("=" * 70)
print()

# Push/Append benchmark
print("Push/Append Operations (10,000 elements):")
print("-" * 60)
print("-" * 70)

pystl_push_time = timeit.timeit(benchmark_pystl_stack_push, number=100)
rust_push_time = timeit.timeit(benchmark_pystl_rust_push, number=100)
python_push_time = timeit.timeit(benchmark_pystl_python_push, number=100)
list_append_time = timeit.timeit(benchmark_list_append, number=100)

print(f"pystl.stack.push(): {pystl_push_time:.4f} seconds")
print(f"list.append(): {list_append_time:.4f} seconds")
print(f"Ratio (pystl/list): {pystl_push_time/list_append_time:.2f}x")
print(f"pystl.stack(use_rust=True).push(): {rust_push_time:.4f} seconds")
print(f"pystl.stack(use_rust=False).push(): {python_push_time:.4f} seconds")
print(f"list.append() [Native List]: {list_append_time:.4f} seconds")
print(f"Rust Speedup vs. Pure Python: {python_push_time/rust_push_time:.2f}x")
print(f"Ratio (Rust/Native List): {rust_push_time/list_append_time:.2f}x")
print()

# Pop benchmark
print("Pop Operations (10,000 elements):")
print("-" * 60)
print("-" * 70)

pystl_pop_time = timeit.timeit(benchmark_pystl_stack_pop, number=100)
rust_pop_time = timeit.timeit(benchmark_pystl_rust_pop, number=100)
python_pop_time = timeit.timeit(benchmark_pystl_python_pop, number=100)
list_pop_time = timeit.timeit(benchmark_list_pop, number=100)

print(f"pystl.stack.pop(): {pystl_pop_time:.4f} seconds")
print(f"list.pop(): {list_pop_time:.4f} seconds")
print(f"Ratio (pystl/list): {pystl_pop_time/list_pop_time:.2f}x")
print(f"pystl.stack(use_rust=True).pop(): {rust_pop_time:.4f} seconds")
print(f"pystl.stack(use_rust=False).pop(): {python_pop_time:.4f} seconds")
print(f"list.pop() [Native List]: {list_pop_time:.4f} seconds")
print(f"Rust Speedup vs. Pure Python: {python_pop_time/rust_pop_time:.2f}x")
print(f"Ratio (Rust/Native List): {rust_pop_time/list_pop_time:.2f}x")
print()

print("=" * 60)
print("Note: pystl.stack wraps Python list, so overhead is minimal.")
print("The facade pattern adds a small constant-factor overhead.")
print("=" * 60)
print("=" * 70)
print("Note: Native list is a direct C implementation in Python (no FFI).")
print("Comparing Rust stack vs. Pure Python stack isolates the FFI/Rust library speedup.")
print("=" * 70)


if __name__ == "__main__":
Expand Down
Loading
Loading