Neuron Explorer

Streamlit GUI for exploring neuron activations with a streaming concept labeler, active probing, and a realtime map view inspired by the real‑time neuron labeling demo.

Features

Live neuron labeling with confidence tracking, trends, and auto‑explore
Active probing to focus on under‑sampled concepts
Unit inspection + activation dictionary (top prompts per unit)
Interactive map with prompt projection; 3D map in All layers mode
Token‑level Sparse Autoencoder decoding (offline + realtime)
Storage tools: rebuild indices, purge decoded databases
Works with a built‑in toy model, TorchScript models, and Transformers safetensors models

Requirements

Windows 10
Python 3.10+

Quick start (Windows 10)

Use the bundled runner (installs requirements and launches the app):

.\run.bat

Manual setup:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
pip install -r requirements-hf.txt
streamlit run app.py

Optional TorchScript support only:

pip install -r requirements-torch.txt

Configuration

All settings live in the Streamlit sidebar and are saved to settings.json on exit. To publish, keep settings.json out of Git and copy defaults from settings.example.json.

Model selection, device, dtype, and attention implementation
Concept list or generated concept dictionary
Active probing and auto-explore controls
Map settings and prompt mapper options
Token‑level Sparse Autoencoder decoding options and storage paths

Run and development

streamlit run app.py

Models

Transformers: enter a Hugging Face model id or a local folder path containing .safetensors, tokenizer, and config files. Local model folders are ignored in Git via .gitignore.

Token‑level Sparse Autoencoder decoding

Use the Token Sparse Autoencoder tab to:

Run offline dataset passes (batch decode, top‑K per token, sqlite output)
Run realtime decoding during generation (newest token only)
Build indices for feature → contexts and span → features

Dataset formats:

.jsonl with { "text": "..." } or { "prompt": "..." }
.txt with one prompt per line

Decoded data lives under token_out_dir/decoded.sqlite and can be purged from the Storage tab.

Troubleshooting

Start by reading the Streamlit console output.

HFValidationError: Repo id must use alphanumeric...
Hidden characters or an invalid local path. Re-type the model path and delete settings.json if it persists.
CUDA error: an illegal memory access was encountered
Switch Device to CPU, set Attention to eager, lower batch size / max tokens, or disable All layers.
expected mat1 and mat2 to have the same dtype
Set Torch dtype to float16 or float32 and keep it consistent with the device.
TypeError: set_autocast_dtype(): ... must be torch.dtype, not str
Ensure Torch dtype is set to auto, float16, bfloat16, or float32 in the UI.
SVD did not converge (map)
Reduce map jitter/spread or run a few more steps to stabilize activations.
Map/graphs empty
Click Run to generate data; the map updates after steps are collected.

Research

TokenSensor (Experimental)

TokenSensor is an experimental interpretability mode that treats a language model like a running system and exposes pertoken telemetry.

Every token that flows through the model produces hidden states at each layer. TokenSensor attaches a small recorder at a chosen hook point (usually a residual stream) and translates each token hidden state into a sparse, humaninspectable feature readout using a pretrained Sparse Autoencoder (SAE). This reframes mapping as building an index over token events, more like profiling and debugging than afterthefact inspection.

What it records

For each token position, TokenSensor stores a compact record:

token id and decoded token string
position in the sequence
hook point (layer and stream)
top K SAE features and activation values

From this stream you can build:

feature to top activating contexts
prompt span to dominant features
feature cooccurrence graphs
feature exemplars for labeling and verification

How it works

At a chosen hook point, for each token hidden vector h with size D:

Run the model to obtain h
Decode SAE activations a = f(W_enc · h + b)
Keep only top K activations for that token
Store (token, pos, topK feature ids, topK values)

The point is sparsity. You do not store everything, you store the most informative slice.

Operating modes

Offline mapping mode (best signal, fastest overall)

Run a dataset of prompts in batches
Collect hidden states for all tokens
Decode SAE with a single large batched matmul
Build indices for browsing and labeling

Realtime sensing mode (best interactivity)

During generation, hook only the newest token position
Decode SAE for that one token
Display top K features live with a rolling context window

Speed notes

TokenSensor cannot be free. You must run tokens to get hidden states, and SAE decode adds compute. Overhead stays bounded by design:

decode only one or a few layers
batch decode whenever possible
keep only top K per token
prefer GPU half precision for SAE decode when available
in realtime decode only the newest token

Labeling and truth

Feature names are hypotheses, not truth. Treat labels as metadata backed by evidence:

label from many top contexts, not a single example
look for clusters inside a feature’s contexts
collect counterexamples where the label should apply but does not
use causal tests (feature ablation or steering) when it matters

Coaxing features for exploration

TokenSensor supports targeted exploration, with a strict evidence mindset:

Probe search: find where a feature fires in real datasets and read contexts (evidence)
Feature targeted generation: bias sampling toward continuations that increase a chosen feature (microscope, not proof)

Data contract

Datasets

.jsonl with { "text": "..." } or { "prompt": "..." }
.txt with one prompt per line

SAEs

pretrained SAE weights compatible with the chosen hook point
config describing D, number of features, nonlinearity, and target layer

Outputs

per token top K feature records
feature and span indices for fast browsing and search

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
neuron_explorer		neuron_explorer
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements-hf.txt		requirements-hf.txt
requirements-torch.txt		requirements-torch.txt
requirements.txt		requirements.txt
run.bat		run.bat
settings.example.json		settings.example.json
token_sae_decode.py		token_sae_decode.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neuron Explorer

Features

Requirements

Quick start (Windows 10)

Configuration

Run and development

Models

Token‑level Sparse Autoencoder decoding

Troubleshooting

Research

What it records

How it works

Operating modes

Offline mapping mode (best signal, fastest overall)

Realtime sensing mode (best interactivity)

Speed notes

Labeling and truth

Coaxing features for exploration

Data contract

Datasets

SAEs

Outputs

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Neuron Explorer

Features

Requirements

Quick start (Windows 10)

Configuration

Run and development

Models

Token‑level Sparse Autoencoder decoding

Troubleshooting

Research

What it records

How it works

Operating modes

Offline mapping mode (best signal, fastest overall)

Realtime sensing mode (best interactivity)

Speed notes

Labeling and truth

Coaxing features for exploration

Data contract

Datasets

SAEs

Outputs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages