Discourse Explorer

Scrape, analyze, and explore discussions from any Discourse forum.

What it does

Scrape — Download all topics, posts, and metadata as structured JSON. Supports delta sync.
Analyze — DuckDB analytics (tag distribution, top contributors, activity trends, keyword search, SQL REPL).
Discover — Derive an entity-type vocabulary tailored to your forum by sampling topics with an LLM. Drives extraction quality in step 4.
Query — Ask natural-language questions using a local GraphRAG knowledge graph (LightRAG + OpenAI/Ollama).
Visualize — Interactive HTML graph explorer: entities, relationships, communities.

Everything runs locally — no cloud services required except LLM calls if you opt for OpenAI and the initial scrape.

Setup

Requires Python ≥ 3.10 and uv. For GraphRAG features, also install Ollama or bring an OpenAI key.

uv sync

Try the demo (no Discourse, no LLM)

A 588 KB committed fixture under sample/fixtures/seed42-tiny/ carries a full deterministic forum (33 topics / 116 posts) plus the minimum GraphRAG artefacts the offline tools need. Try the analyzer + visualizer end-to-end without scraping anything:

uv run discourse-explorer stats --path sample/fixtures/seed42-tiny categories
uv run discourse-explorer visualize sample/fixtures/seed42-tiny --open

The fixture comes from the synthetic-forum seeder under sample/ — see sample/README.md for the Docker-stack path that lets you regenerate it locally and test the live init / extend paths against a real Discourse instance.

Configuration in two tiers

A single checkout supports multiple forums: the project root has a 1-line selector, each forum has its own config directory.

# 1. Selector at project root (one line, points at whichever forum is "active")
echo 'DISCOURSE_DATA_DIR=./data/my-forum' > .env

# 2. Per-forum config (URL, auth, models, gleaning — all env vars for this corpus)
mkdir -p ./data/my-forum/config
cp discourse_explorer/config/env.example ./data/my-forum/config/.env
# edit ./data/my-forum/config/.env

Priority when both dotenv files set the same key: data-dir wins. Shell exports override both. CLI flags override everything.

Full env-var reference and layering rules: docs/analysis/vocabulary-and-config.md.

Authentication

Edit <data-dir>/config/.env and pick one:

Method	Env vars	Notes
API key (preferred)	`DISCOURSE_API_KEY` + `DISCOURSE_API_USERNAME`	Generate at Discourse Admin → API → New API Key.
Session cookie (fallback)	`DISCOURSE_COOKIE`	F12 → Cookies → copy `_t` value. Expires in a few weeks.
OIDC / Keycloak	`DISCOURSE_USERNAME` + `DISCOURSE_PASSWORD`	Automated SSO. May not work with all setups.

Priority at runtime: API key ≻ cookie ≻ OIDC.

Also set DISCOURSE_URL=https://discourse.example.com in the same file for unflagged scraper runs.

Tools at a glance

Tool	Purpose	Reference
`scrape`	Download topics + posts + metadata; delta sync	Manual §1
`stats`	DuckDB analytics + SQL REPL	Manual §2
`discover-types`	Distill an entity-type vocabulary from sampled topics	Manual §3 — Discover
`query`	Build the knowledge graph (`--index`) and ask questions	Manual §3 — Build · Ask
`visualize`	Render the interactive HTML graph explorer	Manual §4
Claude Code skills	Slash commands for end-to-end workflows	Manual — Guided workflows

Documentation

docs/MANUAL.md — per-tool usage reference: CLI flags, env vars, examples, the end-to-end workflow.
CLAUDE.md — maintainer-facing map of the codebase and invariants.
docs/analysis/ — deep-dives on indexing, canonicalization, visualization, configuration.
docs/lightrag/ — read before editing query.py or discover_types.py.
docs/discourse/ — Discourse JSON shape + terminology.
docs/ideas/ — forward-looking proposals.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.claude/skills		.claude/skills
discourse_explorer		discourse_explorer
docs		docs
sample		sample
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Discourse Explorer

What it does

Setup

Try the demo (no Discourse, no LLM)

Configuration in two tiers

Authentication

Tools at a glance

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Discourse Explorer

What it does

Setup

Try the demo (no Discourse, no LLM)

Configuration in two tiers

Authentication

Tools at a glance

Documentation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages