snmalloc

snmalloc is a high-performance allocator. snmalloc can be used directly in a project as a header-only C++ library, it can be LD_PRELOADed on Elf platforms (e.g. Linux, BSD), and there is a crate to use it from Rust.

Its key design features are:

Memory that is freed by the same thread that allocated it does not require any synchronising operations.
Freeing memory in a different thread to initially allocated it, does not take any locks and instead uses a novel message passing scheme to return the memory to the original allocator, where it is recycled. This enables 1000s of remote deallocations to be performed with only a single atomic operation enabling great scaling with core count.
The allocator uses large ranges of pages to reduce the amount of meta-data required.
The fast paths are highly optimised with just two branches on the fast path for malloc (On Linux compiled with Clang).
The platform dependencies are abstracted away to enable porting to other platforms.

snmalloc's design is particular well suited to the following two difficult scenarios that can be problematic for other allocators:

Allocations on one thread are freed by a different thread
Deallocations occur in large batches

Both of these can cause massive reductions in performance of other allocators, but do not for snmalloc.

The implementation of snmalloc has evolved significantly since the initial paper. The mechanism for returning memory to remote threads has remained, but most of the meta-data layout has changed. We recommend you read docs/security to find out about the current design, and if you want to dive into the code docs/AddressSpace.md provides a good overview of the allocation and deallocation paths.

Hardening

There is a hardened version of snmalloc, it contains

Randomisation of the allocations' relative locations,
Most meta-data is stored separately from allocations, and is protected with guard pages,
All in-band meta-data is protected with a novel encoding that can detect corruption, and
Provides a memcpy that automatically checks the bounds relative to the underlying malloc.

A more comprehensive write up is in docs/security.

Further documentation

Heap Profiling

snmalloc ships with an opt-in, low-overhead statistical heap profiler. When enabled at build time, the allocator captures a Poisson-distributed sample of every allocation with its call stack, suitable for offline analysis with the same tooling (flamegraphs, pprof) commonly used for CPU profiles.

Enabling at build time

The profiler is gated behind a single CMake option, off by default:

cmake -B build -DSNMALLOC_PROFILE=ON
cmake --build build

With SNMALLOC_PROFILE=OFF (the default) every profiling code path is compiled out — the sampler countdown, the per-allocation branch, and the FFI export bodies all degrade to empty stubs. There is no runtime cost for builds that do not opt in.

What it samples

Each allocation has an independent probability of being recorded, governed by a single tunable: the mean sampling interval, expressed in bytes. The default is 524 288 bytes (512 KiB), meaning the sampler captures roughly one allocation per 512 KiB of total request volume. Per-sample weights are unbiased Poisson estimators, so summing weight across the snapshot yields an unbiased estimate of total bytes requested (or, scaled by allocated_size / requested_size, of total bytes the allocator actually handed back).

The sampling rate can be adjusted at runtime: lowering it (e.g. to 64 KiB) gives higher resolution and ~1.5% throughput overhead; raising it (e.g. to 1 MiB) reduces overhead further at the cost of fidelity. See docs/profile-weight.md for guidance on choosing a rate for your workload.

C ABI for embedding

The C++ build exposes a small set of extern "C" symbols for embedders that want to drive the profiler from a non-Rust host:

Symbol	Purpose
`sn_rust_profile_supported`	Returns `true` iff built with `SNMALLOC_PROFILE=ON`.
`sn_rust_profile_set_sampling_rate`	Set the mean sampling interval in bytes. `0` disables.
`sn_rust_profile_get_sampling_rate`	Read the current sampling interval.
`sn_rust_profile_snapshot_begin` / `_count` / `_get` / `_end`	RAII-style enumeration of currently-live sampled allocations.
`sn_rust_profile_streaming_start` / `_stop`	Register a `void()(const SnRustProfileRawSample)` callback that receives every sample as it occurs.

Each SnRustProfileRawSample carries a kind byte (SN_RUST_PROFILE_KIND_ALLOC / SN_RUST_PROFILE_KIND_RESIZE) that tells streaming consumers whether the broadcast describes a fresh sampled allocation or an in-place realloc that updated the size of an already-sampled allocation. Resize events carry the post-resize requested_size / allocated_size and preserve the original sample's stack and Poisson weight; the sampler is not re-rolled on resize. Out-of-place realloc (alloc + memcpy + dealloc) is reported via the existing alloc and dealloc paths -- there is no synthetic Resize event for it. Snapshot mode always reports kind == ALLOC; the persisted slot is updated in place but its kind tag is not re-stamped.

These are the same exports the Rust crate calls into; see src/snmalloc/override/rust.cc for the full ABI surface and src/snmalloc/override/rust.h for the header layout.

Rust crate

For Rust applications, the snmalloc-rs crate provides a fully safe wrapper around the C ABI: an RAII snapshot type (HeapProfile), an RAII streaming session (ProfilingSession), and an env-var-driven initializer (SnMalloc::init_profiling_from_env) that lets operators turn profiling on at the command line without recompiling. See snmalloc-rs/README.md for the full Rust API and code samples.

Output formats

Two viewer formats are supported out of the box from the Rust crate:

Folded / collapsed flame-graph format — one line per unique stack, summed weights, consumable by Brendan Gregg's flamegraph.pl, the pure-Rust inferno-flamegraph, and the Speedscope viewer (via its "Brendan Gregg's collapsed stack format" importer).
Google pprof Profile protobuf — consumable by go tool pprof, Pyroscope, Polar Signals Cloud, Parca, and the Datadog continuous profiler. Emitted with two sample axes (alloc_objects/count and alloc_space/bytes).

Overhead

At the default 512 KiB sampling rate, the profiler adds <1% throughput overhead on the criterion micro-benchmark suite shipped in snmalloc-rs/benches/profile_bench.rs (Phase 7 of the heap-profiling design). The bench measures three configurations — profile-off, profile-on-inactive, and profile-on-active — and verifies that even the active configuration stays within the 1% budget on the standard sizes. Builds with SNMALLOC_PROFILE=OFF are bit-for-bit identical on the hot path to those without any profiling code at all.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Name		Name	Last commit message	Last commit date
Latest commit History 1,547 Commits
.github		.github
benchmark		benchmark
ci		ci
cmake		cmake
docs		docs
examples		examples
fuzzing		fuzzing
ports/snmalloc		ports/snmalloc
scripts		scripts
snmalloc-rs		snmalloc-rs
snmalloc-tools		snmalloc-tools
src		src
test/vcpkg-consumer		test/vcpkg-consumer
.bazelrc		.bazelrc
.bazelversion		.bazelversion
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.gitignore		.gitignore
BUILD.bazel		BUILD.bazel
CMakeLists.txt		CMakeLists.txt
Cargo.toml		Cargo.toml
LICENSE		LICENSE
MODULE.bazel		MODULE.bazel
MODULE.bazel.lock		MODULE.bazel.lock
README.md		README.md
fuzztest.bazelrc		fuzztest.bazelrc
security.md		security.md
snmalloc.pdf		snmalloc.pdf
vcpkg.json		vcpkg.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

snmalloc

Hardening

Further documentation

Heap Profiling

Enabling at build time

What it samples

C ABI for embedding

Rust crate

Output formats

Overhead

Further reading

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

snmalloc

Hardening

Further documentation

Heap Profiling

Enabling at build time

What it samples

C ABI for embedding

Rust crate

Output formats

Overhead

Further reading

Contributing

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages