Streamlit GUI for exploring neuron activations with a streaming concept labeler, active probing, and a realtime map view inspired by the real‑time neuron labeling demo.
- Live neuron labeling with confidence tracking, trends, and auto‑explore
- Active probing to focus on under‑sampled concepts
- Unit inspection + activation dictionary (top prompts per unit)
- Interactive map with prompt projection; 3D map in All layers mode
- Token‑level Sparse Autoencoder decoding (offline + realtime)
- Storage tools: rebuild indices, purge decoded databases
- Works with a built‑in toy model, TorchScript models, and Transformers safetensors models
- Windows 10
- Python 3.10+
Use the bundled runner (installs requirements and launches the app):
.\run.batManual setup:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
pip install -r requirements-hf.txt
streamlit run app.pyOptional TorchScript support only:
pip install -r requirements-torch.txtAll settings live in the Streamlit sidebar and are saved to settings.json on exit.
To publish, keep settings.json out of Git and copy defaults from settings.example.json.
- Model selection, device, dtype, and attention implementation
- Concept list or generated concept dictionary
- Active probing and auto-explore controls
- Map settings and prompt mapper options
- Token‑level Sparse Autoencoder decoding options and storage paths
streamlit run app.pyTransformers: enter a Hugging Face model id or a local folder path containing .safetensors, tokenizer, and config files.
Local model folders are ignored in Git via .gitignore.
Use the Token Sparse Autoencoder tab to:
- Run offline dataset passes (batch decode, top‑K per token, sqlite output)
- Run realtime decoding during generation (newest token only)
- Build indices for feature → contexts and span → features
Dataset formats:
.jsonlwith{ "text": "..." }or{ "prompt": "..." }.txtwith one prompt per line
Decoded data lives under token_out_dir/decoded.sqlite and can be purged from the Storage tab.
Start by reading the Streamlit console output.
HFValidationError: Repo id must use alphanumeric...
Hidden characters or an invalid local path. Re-type the model path and deletesettings.jsonif it persists.CUDA error: an illegal memory access was encountered
Switch Device to CPU, set Attention to eager, lower batch size / max tokens, or disable All layers.expected mat1 and mat2 to have the same dtype
Set Torch dtype tofloat16orfloat32and keep it consistent with the device.TypeError: set_autocast_dtype(): ... must be torch.dtype, not str
Ensure Torch dtype is set toauto,float16,bfloat16, orfloat32in the UI.SVD did not converge(map)
Reduce map jitter/spread or run a few more steps to stabilize activations.- Map/graphs empty
Click Run to generate data; the map updates after steps are collected.
TokenSensor (Experimental)
TokenSensor is an experimental interpretability mode that treats a language model like a running system and exposes pertoken telemetry.
Every token that flows through the model produces hidden states at each layer. TokenSensor attaches a small recorder at a chosen hook point (usually a residual stream) and translates each token hidden state into a sparse, humaninspectable feature readout using a pretrained Sparse Autoencoder (SAE). This reframes mapping as building an index over token events, more like profiling and debugging than afterthefact inspection.
For each token position, TokenSensor stores a compact record:
- token id and decoded token string
- position in the sequence
- hook point (layer and stream)
- top K SAE features and activation values
From this stream you can build:
- feature to top activating contexts
- prompt span to dominant features
- feature cooccurrence graphs
- feature exemplars for labeling and verification
At a chosen hook point, for each token hidden vector h with size D:
- Run the model to obtain
h - Decode SAE activations
a = f(W_enc · h + b) - Keep only top K activations for that token
- Store
(token, pos, topK feature ids, topK values)
The point is sparsity. You do not store everything, you store the most informative slice.
- Run a dataset of prompts in batches
- Collect hidden states for all tokens
- Decode SAE with a single large batched matmul
- Build indices for browsing and labeling
- During generation, hook only the newest token position
- Decode SAE for that one token
- Display top K features live with a rolling context window
TokenSensor cannot be free. You must run tokens to get hidden states, and SAE decode adds compute. Overhead stays bounded by design:
- decode only one or a few layers
- batch decode whenever possible
- keep only top K per token
- prefer GPU half precision for SAE decode when available
- in realtime decode only the newest token
Feature names are hypotheses, not truth. Treat labels as metadata backed by evidence:
- label from many top contexts, not a single example
- look for clusters inside a feature’s contexts
- collect counterexamples where the label should apply but does not
- use causal tests (feature ablation or steering) when it matters
TokenSensor supports targeted exploration, with a strict evidence mindset:
- Probe search: find where a feature fires in real datasets and read contexts (evidence)
- Feature targeted generation: bias sampling toward continuations that increase a chosen feature (microscope, not proof)
.jsonlwith{ "text": "..." }or{ "prompt": "..." }.txtwith one prompt per line
- pretrained SAE weights compatible with the chosen hook point
- config describing
D, number of features, nonlinearity, and target layer
- per token top K feature records
- feature and span indices for fast browsing and search
