TinyDocy uses a multi-layer testing strategy covering schema correctness, feature behavior, performance, long-running stability, and real-time collaboration at scale.
Recent additions: Configurable performance tests — make test-perf (single-user typing latency, PERF_HEADINGS, PERF_SHAPE) and make test-perf-collab (multi-user concurrent typing on a shared document, PERF_COLLAB_USERS, PERF_COLLAB_HEADINGS, PERF_COLLAB_SHAPE). See Performance Tests and Multi-User Performance Tests sections below. All test layers write JSON reports to test-reports/ when run via Make targets. Collaboration tests use retries: 2 and extended waits to reduce flakiness. Load harness uses WebSocketPolyfill (type assertion) for Node.js compatibility.
Prerequisites: Dev servers must be running for Playwright tests.
make dev| Command | What it runs | Duration |
|---|---|---|
make test |
Vitest unit + fuzz + schema + plugin tests | ~2s |
make test-fuzz |
Chaos and structured fuzz tests | ~1s |
make test-stress |
Headless stress probe (binary search for heading ceiling) | ~30s |
make test-e2e |
Playwright chromium E2E suite (features, performance, collaboration) | ~90s |
make test-perf |
Typing latency tests only (default 10, 50 headings; PERF_HEADINGS=200 for custom) |
~15s |
make test-perf-collab |
Multi-user typing latency on shared doc (default 2 users, 50 headings) | ~30s |
make test-load |
Yjs load harness (100 headless clients, convergence) | ~40s |
make test-soak-quick |
Single-user soak (5 min, 50 headings, rich content) | ~6 min |
make test-soak-collab-quick |
Multi-user soak (30s, 3 users, 10 headings) | ~50s |
make test-soak |
Full soak suite (single + multi-user, 30 min) | ~35 min |
make test-yjs-soak |
Yjs reconnection + rapid tab switch | ~20s |
tests/
├── setup.ts # Vitest global setup (jsdom, localStorage mock)
├── helpers/
│ ├── create-test-editor.ts # Headless Tiptap Editor factory
│ ├── document-builders.ts # ProseMirror node builders (via prosemirror-test-builder)
│ ├── document-generators.ts # Random and large document generators
│ ├── env-parsers.ts # Generic parseEnvNumber (?? semantics)
│ ├── perf-config.ts # PERF_* env parsing (parsePerfShape, parsePerfHeadings)
│ ├── report-writer.ts # writeReport(filename, data) → test-reports/
│ ├── soak-config.ts # SOAK_* env parsing (parseSoakDuration, parseSoakHeadings, etc.)
│ ├── assert-invariants.ts # Schema invariant assertions (Vitest)
│ └── assert-invariants-json.ts # Schema invariant assertions (Playwright-safe, no vitest dep)
├── fixtures/ # Static JSON document fixtures
├── unit/
│ ├── schema/ # TitleDocument enforcement, paste handling, schema invariants
│ ├── plugins/ # HeadingScale, HeadingFold, HeadingDrag, HeadingFilter state
│ ├── helpers/ # canMapDecorations, computeSection, fold-storage, filter-url, move-section, tab-api
│ ├── fuzz/ # Chaos fuzz (5000 ops) + structured schema fuzz (5000 ops)
│ └── stress/ # Headless stress probe (binary search for heading ceiling)
├── e2e/
│ ├── helpers/
│ │ ├── editor-page.ts # Playwright Page Object Model for the editor
│ │ ├── playwright-modifier-key.ts # Meta vs Control for keyboard shortcuts
│ │ ├── perf-observer.ts # PerformanceObserver injection + latency stats
│ │ ├── soak-bot.ts # Weighted-random action generator (10 action types)
│ │ └── soak-journeys.ts # 2 scripted critical-path sequences
│ ├── collaboration.spec.ts # 2-user collaboration (join + real-time sync)
│ ├── drag.spec.ts # Heading drag handle visibility
│ ├── filter.spec.ts # Heading filter open + query + highlight
│ ├── fold.spec.ts # Fold chevron + crinkle unfold
│ ├── performance.spec.ts # Keystroke-to-paint latency (10 + 50 headings)
│ ├── performance-collab.spec.ts # Multi-user typing latency on shared doc
│ ├── toc-sidebar.spec.ts # TOC rendering + scroll-to-heading
│ ├── toc-drag.spec.ts # TOC drag-and-drop reorder + level change
│ ├── tabs.spec.ts # Tab bar: regenerate, new tab, close-all
│ ├── soak.spec.ts # Single-user soak (journeys + bot + memory tracking)
│ ├── soak-collab.spec.ts # N-user collaboration soak (dynamic, configurable)
│ └── yjs-soak/
│ ├── reconnection.spec.ts # Kill/restart Hocuspocus mid-session
│ └── rapid-tab-switch.spec.ts # Mount/unmount Yjs providers via tab switching
└── load/
└── yjs-load-harness.ts # Standalone Bun script: N-client Yjs convergence test
All test layers write JSON reports when run via Make targets:
| Report file | Source | Contents |
|---|---|---|
unit-report.json |
make test |
Vitest JSON (suites, tests, durations) |
fuzz-report.json |
make test-fuzz |
Vitest JSON |
stress-report.json |
make test-stress |
Vitest JSON |
e2e-report.json |
make test-e2e |
Playwright JSON (tests, outcomes) |
perf-report-{ts}.json |
make test-perf |
Latency stats per heading count |
perf-collab-report-{ts}.json |
make test-perf-collab |
Multi-user latency stats |
load-report-{ts}.json |
make test-load |
Clients, edits, throughput, convergence |
soak-report-{ts}.json |
make test-soak-quick |
Journeys, memory, latency, verdict |
soak-collab-report-{ts}.json |
make test-soak-collab-quick |
Users, actions, errors |
*-playwright-report.json |
soak targets | Playwright JSON for soak runs |
yjs-soak-report.json |
make test-yjs-soak |
Playwright JSON for yjs-soak |
Fast, headless tests that validate the document model, ProseMirror plugin state, and utility functions without a browser.
make test # all unit tests
make test-fuzz # fuzz tests only
make test-stress # stress probe onlyVerify the heading block* schema (TitleDocument extension) — first node is always H1, heading levels 1-6, paste handling preserves invariants. 10 random documents are generated and validated each run.
Assert internal state of HeadingScale, HeadingFold, HeadingFilter, and HeadingDrag after dispatching specific transactions. Validates decoration rebuilds, fold toggling, filter matching, canMapDecorations fast-path behavior, and moveSection atomicity (reorder, level change, combined, undo).
- Chaos fuzz: 5000 random malformed operations (invalid positions, null content, out-of-range levels) — verifies the editor never crashes
- Schema fuzz: 5000 random valid operations (insert heading, delete, split, join) — verifies invariants hold after each
Binary search for the heading count ceiling where ProseMirror transaction time exceeds 16ms (one frame budget). Tests 5 transaction types (text insert, heading insert/delete, fold toggle, filter apply) on flat and deep document shapes. Reports the ceiling — expected minimum 200 headings.
Browser-based feature verification using Chromium. Tests interact with the real editor via the EditorPage page object model.
make test-e2e| Test | Validates |
|---|---|
drag.spec.ts |
Hovering a heading shows the drag handle |
filter.spec.ts |
CMD+SHIFT+F opens filter; typing highlights matches |
fold.spec.ts |
Fold chevron hides content; crinkle click unfolds |
toc-sidebar.spec.ts |
TOC lists all headings; clicking scrolls to heading |
toc-drag.spec.ts |
TOC drag handle visibility; title H1 not draggable; drag reorder moves editor sections; folded section drag handle |
Measure keystroke-to-paint latency using the Event Timing API via PerformanceObserver. Reports p50/p95/mean/max. Config is via env vars; the Makefile serves as the CLI interface (e.g. make test-perf PERF_HEADINGS=200).
make test-perf # default: 10, 50 headings, flat shape
make test-perf PERF_HEADINGS=200 # single size
make test-perf PERF_HEADINGS=10,50,200 # multiple sizes
make test-perf PERF_SHAPE=mixed # random heading hierarchy (H2–H6)
make test-perf PERF_SHAPE=deep PERF_HEADINGS=50 # deep shape + 50 headings| Variable | Default | Description |
|---|---|---|
PERF_HEADINGS |
10,50 |
Comma-separated heading counts to test |
PERF_SHAPE |
flat |
Document hierarchy: flat (all H2), deep (cycling H2–H6), mixed (random H2–H6) |
Measures keystroke-to-paint latency when multiple users type concurrently in the same document. Each user types in a different paragraph to avoid conflicts. Reports per-user p50/p95 and aggregate stats.
make test-perf-collab # default: 2 users, 50 headings
make test-perf-collab PERF_COLLAB_USERS=3 # 3 concurrent users
make test-perf-collab PERF_COLLAB_HEADINGS=100 # larger document
make test-perf-collab PERF_COLLAB_SHAPE=mixed # random heading hierarchy| Variable | Default | Description |
|---|---|---|
PERF_COLLAB_USERS |
2 |
Number of concurrent users typing |
PERF_COLLAB_HEADINGS |
50 |
Document heading count |
PERF_COLLAB_SHAPE |
flat |
Document hierarchy: flat, deep, mixed |
Two tests with distinct strategies:
- "user B sees content created by user A" — User A types first, waits 4s for Hocuspocus persistence, then User B joins. Uses precondition
expect(.tiptap).not.toHaveText("")before main assertion. May be flaky;retries: 2. - "real-time sync between two connected users" — Both users connected simultaneously. User A types; 2s wait before asserting on User B. Inherently timing-sensitive;
retries: 2.
Each user gets a unique identity via window.__HOCUS_TOKEN injected through page.addInitScript().
Sustained editing sessions that detect memory leaks, latency degradation, and schema corruption over time. Documents use realistic content (paragraphs, bullet lists, ordered lists, task lists, code blocks, blockquotes per section).
make test-soak-collab-quick # 30s, 3 users — fast verification
make test-soak-quick # 5 min single-user
make test-soak # full suite (30 min)- Runs 2 scripted journeys (typing + structural edits, filter lifecycle)
- Warm-up phase with stochastic bot
- Sustained editing with memory sampling every 30s
- Reports: JS heap growth, p50/p95 latency, action counts, error count
- Writes JSON report to
test-reports/soak-report-{timestamp}.json
- User 0 seeds a rich document, waits for rendering
- Users 1..N-1 join sequentially (each verifies initial sync)
- All N users run stochastic bots concurrently
- Schema invariants checked every 60s per user
- Final convergence assertion: all N documents must be identical
Environment variables:
| Variable | Default | Description |
|---|---|---|
SOAK_DURATION |
1800000 (30 min) |
How long bots run |
SOAK_HEADINGS |
200 (single) / 20 (collab) |
Document heading count |
SOAK_USERS |
3 |
Number of concurrent browser users |
SOAK_WARMUP |
auto (30s if <10 min, 120s otherwise) | Warm-up duration before memory baseline |
SOAK_MEMORY_GROWTH_LIMIT |
50 |
Max heap growth % before fail (single-user soak) |
Targeted tests for collaboration-layer resilience.
make test-yjs-soakSpawns a dedicated Hocuspocus instance on port 1235 (bun run hocus), types content, kills the server, restarts it with SQLite persistence, and verifies no data loss. Uses window.__HOCUS_URL to redirect the editor's WebSocket.
Creates multiple tabs via the tab bar UI, switches between them 20 times, types in each, and verifies content persists. Exercises the Y.Doc cache lifecycle in useYjsDocument.
Headless N-client Yjs convergence test — no browser required. Uses Node.js ws package via WebSocketPolyfill (type assertion required; Hocuspocus runtime supports it when passing url directly, but TypeScript types omit it). CLI args override env vars (industry pattern: k6 K6_*, Benchmark BENCHMARK_*).
make test-load # 100 clients, 30s, distributed
make test-load LOAD_CLIENTS=50 LOAD_DURATION=10000 # env vars
bun tests/load/yjs-load-harness.ts --clients 50 --duration 10000 --scenario conflict| Variable | Default | Description |
|---|---|---|
LOAD_CLIENTS |
100 |
Concurrent Yjs clients |
LOAD_DURATION |
30000 |
Steady-state duration (ms) |
LOAD_RATE |
2 |
Operations/second per client |
LOAD_SCENARIO |
distributed |
distributed or conflict |
LOAD_URL |
ws://127.0.0.1:1234 |
Hocuspocus WebSocket URL |
Phases: connect → seed (realistic document with paragraphs, lists, task lists) → warm-up → steady-state editing → drain → byte-level + JSON-level convergence verification → structured report.
Each client gets a unique identity token (load-client-{i}-{docId}).
- Rich document content —
generateLargeDocumentproduces 3-5 paragraphs + mixed structured blocks per heading section by default. Pass{ richContent: false }for lightweight documents. Playground content includes tables, syntax-highlighted code blocks (multi-language), and highlighted text. - Test editor extensions —
create-test-editor.tsregistersTableKit,CodeBlockLowlight(lowlight with common languages),TaskList/TaskItem,Image,Highlight,TextAlign, andUniqueIDalongside schema enforcement extensions. - User identity — all multi-user tests inject unique
window.__HOCUS_TOKENper browser context. TheuseYjsDocumenthook reads this and passes it toHocuspocusProvider. - Y.Doc caching —
useYjsDocumentuses a module-leveldocCachewith reference counting. Y.Doc instances persist across component unmounts (tab switches) to prevent content loss. - Platform-specific keys — soak bot uses
Meta(macOS) orControl(others) for keyboard shortcuts. Never useMod— it is ProseMirror-only and unknown to Playwright. - Test isolation — each test generates a unique
docIdwith timestamp + random suffix to prevent cross-test interference on the shared Hocuspocus server. Tab state is isolated viawindow.__GLOBAL_TABS_DOC(set per test inEditorPage.goto()) so parallel tests use separateglobal-tabs-{docId}Yjs documents. - Assertion split —
assert-invariants.ts(imports vitestexpect) for unit tests;assert-invariants-json.ts(purethrow) for Playwright tests.
- E2E Testing Strategy Plan — original infrastructure plan (completed)
- Test Realism & Soak Plan — soak, stress, and collaboration layers (completed)
- E2E Testing Brainstorm — initial requirements exploration
- Test Realism Brainstorm — soak and performance requirements
Historical-doc note: some completed plans/brainstorms predate the extensions/
refactor and may reference legacy paths under
components/tiptap-node/{heading-node,document-node}.