Skip to content

docs: proposal — URL-persist selected cluster identity#182

Merged
rdhyee merged 2 commits intoisamplesorg:mainfrom
rdhyee:explorer-cluster-url-proposal
May 9, 2026
Merged

docs: proposal — URL-persist selected cluster identity#182
rdhyee merged 2 commits intoisamplesorg:mainfrom
rdhyee:explorer-cluster-url-proposal

Conversation

@rdhyee
Copy link
Copy Markdown
Contributor

@rdhyee rdhyee commented May 9, 2026

Summary

Doc-only proposal. Companion to EXPLORER_STATE.md. Audits what selection state the Explorer URL captures today and proposes adding cluster selection as a single &h3=<cell> hash token. No code changes.

Goal: enable URLs that replay "I clicked this specific dot" — both for sample dots (already works via &pid=) and cluster dots (currently in-memory only). The use case is debugging conversations: "tell me about this dot, why does it disappear when I zoom to res 6?".

Audit findings

Selection type URL-persisted today? Mechanism
Sample (single PID) ✅ Yes #pid=<urlencoded> — round-trips through readHash_globeState.selectedPid → side-panel sample card + lazy wide_url description fetch (explorer.qmd:892, 1813, 2167)
Cluster (H3 cell aggregation) ❌ No Cluster-click clears selectedPid (:919) and writes nothing else; the cluster card + nearby-samples list are DOM-only and lost on reload

Recommended encoding (revised after Codex review)

Option C: &h3=<cell> — H3 cell index

#h3=841a067ffffffff

Codex review of v1 confirmed h3_cell is already SELECTed at explorer.qmd:973 (phase1) and :1316 (loadRes); only the cluster .id object needs to carry it forward. So:

  • Exact lookup: WHERE h3_cell = ? is a primary-key join. The earlier Option B (lat/lng-tuple) recommendation was lossy and is rejected.
  • Single token, 15 hex chars, resolution implicit in the cell.
  • No data-pipeline change — purely a runtime fix to thread h3_cell through.
  • ?sources= already URL-persisted — the only filter that affects cluster aggregation (per :1706-1710).
  • No backwards-compat tax&pid= URLs keep working unchanged; mutual exclusion preserved at runtime.

Scope of cluster reproducibility

Cluster identity depends on:

  • The H3 cell (encoded in &h3=)
  • The ?sources= filter (already URL-persisted)

It does not depend on ?material= / ?context= / ?object_type= — H3 summary parquets only carry dominant_source, and the code at :1706-1710 explicitly documents that those facet filters cannot affect cluster counts.

Open questions for review

  1. Cross-resolution behavior on URL load: should &h3=...&alt=... force the camera/altitude to match the cell's natural resolution, or just populate the side panel? Recommendation in the doc: populate the side panel only.
  2. Does the side panel need a "stale cluster" hint if the recipient's filters cause the on-globe rendering to differ from the URL one? Defer until we see whether it's confusing in practice.

Phasing

  • Phase 1 (this proposal): ship &h3= per Option C. ~25-line patch in explorer.qmd. Update EXPLORER_STATE.md §2 to add the row.
  • Phase 2 (only if needed): unified &sel= field for a hypothetical third selection type (e.g., region/polygon).

The previously-planned Phase 2 compat shim (&cluster=&h3=) is dropped — going straight to the canonical encoding means no lossy intermediate form to migrate from.

Review history

  • ffc744d — initial v1 doc (recommended Option B, treated H3 cell index as an open question).
  • 877afcd — revised per Codex review: switched to Option C, narrowed filter-dependence section to source-filter only, simplified phasing.

Refs EXPLORER_STATE.md, #163 (UX rework umbrella).

🤖 Generated with Claude Code

rdhyee and others added 2 commits May 9, 2026 08:34
Companion to EXPLORER_STATE.md. Audits what selection state the explorer
URL captures today (sample selection via &pid is already wired; cluster
selection is in-memory only) and proposes adding &cluster= as a single
packed token (source:res:lat,lng) with no backwards-compat tax for
existing &pid URLs.

Doc-only — no code changes.

Open questions called out for review:
- Whether the parquet schema already carries an H3 cell index (would let
  us prefer &h3= over the lat/lng-tuple encoding)
- Cross-resolution UX on load (force camera vs populate side-panel only)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two material findings from Codex review of v1:

1. P2 — H3 cell index is already in the data, not unknown.
   The explorer's H3 summary queries already SELECT h3_cell at
   explorer.qmd:973 (phase1) and :1316 (loadRes); the cluster .id
   object simply drops it. The v1 doc treated this as an open
   question requiring a parquet schema check before committing to
   Option C. It isn't — Option C is feasible today.

2. P2 — Filter dependence was overstated.
   The H3 summary parquets only carry dominant_source. The code at
   :1706-1710 documents that material/context/object_type filters
   cannot affect cluster counts. Cluster identity reproduction
   only depends on ?sources=, which is already URL-persisted.

Changes:
- Recommendation switches from Option B (lossy lat/lng tuple) to
  Option C (&h3=<cell>, exact key join).
- Filter-dependence section narrowed to source-filter only.
- Phasing simplified: Phase 1 ships Option C directly; the
  previously-planned Phase 2 compat shim is dropped (no lossy
  intermediate form to migrate from).
- Acceptance criteria updated to reflect h3_cell threading and
  &h3= rather than &cluster=.

Doc-only — no code changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rdhyee
Copy link
Copy Markdown
Contributor Author

rdhyee commented May 9, 2026

Pushed revision 877afcd integrating Codex's two P2 findings:

  1. H3 cell index is not unknown — already in the parquet and SELECTed at explorer.qmd:973 and :1316. Switched recommendation from Option B (&cluster=SESAR:4:lat,lng, lossy tuple) to Option C (&h3=<cell>, exact key join). No data-pipeline change required; only thread h3_cell into the runtime cluster id object.

  2. Filter dependence was overstated — H3 summary parquets only carry dominant_source; the code at :1706-1710 documents that material/context/object_type filters cannot affect cluster counts. Tightened §2 to reflect this; cluster reproducibility only depends on ?sources= which is already URL-persisted.

Phasing simplified: Phase 1 now ships Option C directly. The previously-planned Phase 2 compat shim (migrating &cluster=&h3=) is dropped — by going straight to the canonical encoding we never ship the intermediate lossy form.

Doc-only PR. Ready for re-review whenever convenient.

@rdhyee rdhyee merged commit 24a7e17 into isamplesorg:main May 9, 2026
1 check passed
rdhyee added a commit that referenced this pull request May 9, 2026
…ase 1) (#186)

* explorer: persist selected cluster identity in URL via &h3=<cell>

Phase 1 of EXPLORER_CLUSTER_URL_PROPOSAL.md (#182). Cluster selection now
round-trips through the URL hash, complementing the existing &pid= for
samples. Use case: share or bookmark a specific cluster you clicked, and
have collaborators land on the same H3 cell with side-panel populated.

Encoding:

  #h3=843f6d3ffffffff

H3 cell index in canonical 15-char hex (no 0x prefix). The cell index
encodes its own resolution; no separate &res= or &cluster_source= field
needed. The existing &sources= filter (already URL-persisted) covers the
only filter that affects cluster aggregation — material/context/object_type
filters can't, per the comment at :1706-1710.

Mechanics:

- h3_cell carried into the runtime cluster .id at both add() sites
  (:992, :1335) as a hex string via row.h3_cell.toString(16). The parquet
  column is UBIGINT; converting to hex once at ingestion keeps the URL
  representation canonical.
- _globeState.selectedH3 added; mutated by cluster-click (:923) and
  cleared by sample-click (:895) for mutual exclusion. Same pattern as
  selectedPid.
- readHash parses h3 (:626); buildHash emits h3 when set (:645).
- fetchClusterByH3 helper at :1791 looks up the row across res4/res6/res8
  parquets via UNION ALL. DuckDB-WASM doesn't accept 0x... literals, so
  hex is converted to decimal in JS via BigInt and CAST AS UBIGINT in SQL.
- hydrateClusterUI helper at :1827 mirrors the cluster-click side-panel +
  nearby-samples query, called from both the boot deep-link (:2266) and
  the back/forward hashchange handler (:1899).
- Mutual-exclusion at hydration time: &pid= wins if both are present, per
  the proposal §4.

EXPLORER_STATE.md §2 updated with the new h3 row.

Verified locally:
- URL #h3=843f6d3ffffffff (a known res4 cell with 151,334 OpenContext
  samples in central Turkey) round-trips: side panel shows 'Selected
  Cluster / OpenContext / H3 res4 / 151,334 samples / 37.6619, 32.8334'
  with 30 nearby samples loaded.
- Empty hash + hash without h3/pid both load without errors.

Closes Phase 1 of #182. Phase 2 (unified &sel=) deferred unless a third
selection type appears.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* explorer: address Codex review of #186

Five fixes from Codex review:

1. Race-safe hash hydration (BLOCKER)
   Both pid and h3 hashchange branches now use a monotonic `viewer._selGen`
   token, bumped per hashchange and rechecked after every await. Fast
   back/forward across pid/h3/empty no longer lets stale fetch results
   repaint the side panel.

2. Strict h3 validation
   Replaced `replace(/[^0-9a-fA-F]/g, '')` with `/^[0-9a-f]{15}$/i.test()`
   over a lowercased input. Reject malformed input rather than silently
   strip — `h3=xxx843f...` no longer becomes a different lookup key.

3. Canonical lowercase normalization
   After successful lookup, runtime `selectedH3` is set from the parquet
   row's `h3_cell.toString(16)` (always lowercase), not the raw URL token.
   Subsequent `buildHash` writes always emit canonical form regardless of
   what the user typed. Boot deep-link applies the same normalization.

4. Resolution routing instead of UNION ALL
   Canonical H3 cells encode resolution in the 2nd hex char (after the
   leading-zero strip). `RES_TO_H3_URL[parseInt(lower[1], 16)]` picks the
   right parquet directly — one fetch instead of three on every &h3=
   load.

5. Mutual-exclusion in buildHash
   Changed independent `if`s for `selectedPid` / `selectedH3` to `else if`,
   making the runtime invariant load-bearing in one place.

Also: unknown / malformed h3 now actively clears the cluster card and
nearby-samples list, matching the empty-hash and missing-pid paths
(previously left stale content).

Verified locally:
- Uppercase #h3=843F6D3FFFFFFFF — hydrates, then runtime canonicalizes.
- Unknown well-formed cell #h3=843ffffffffffff — side panel clears, no
  errors.
- Non-hex #h3=zzz_NOT_HEX_zzz — silent reject, no JS errors.
- Known #h3=843f6d3ffffffff — round-trips identically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* explorer: thread freshness check into hydrateClusterUI; bump _selGen earlier

Codex's second review found the previous race fix was incomplete:
hydrateClusterUI has its own internal `await db.query(...)` for the nearby-
samples list, then calls updateSamples(samples). The hashchange-handler-
side selGen check happened only AFTER hydrateClusterUI returned, so a
stale fetch INSIDE hydrateClusterUI could still repaint the side panel
with samples for an older h3 selection.

Fix: hydrateClusterUI now accepts an optional `isStale` predicate and
checks it after its inner await, before updateSamples (and before the
catch-path's "Query failed" message). The hashchange caller passes
`() => selGen !== viewer._selGen`. The cluster-click and boot-deep-link
callers leave it undefined — clicks are user-serialized and there's only
one boot, so no race possible there.

Also (Codex non-blocking nits):
- Bump `_selGen` at the very top of the hashchange handler, before the
  lat/lng early return — so even hashchanges that lack lat/lng invalidate
  any in-flight stale work.
- Reject non-cell H3 modes (`lower[0] !== '8'`) in fetchClusterByH3 —
  defensive guard against edges/vertices/etc. ever ending up in `&h3=`.

Verified locally: known-good `#h3=843f6d3ffffffff` round-trips identically
(151,334 OpenContext samples, 30 nearby rendered).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* explorer: address Codex review v3 — boot race + source-filter consistency

Two P2 findings from Codex's third pass on PR #186:

1. Boot deep-link could still race with a later hashchange.
   The hashchange listener registers earlier in the same OJS cell, so a
   slow initial &h3= or &pid= lookup can be superseded by browser
   back/forward (or a manual hash edit) during the await — the boot path
   would then finish later and repaint stale data. Apply the same _selGen
   guard to the boot path: bump the token at boot start, capture
   bootSelGen, define isBootStale = () => bootSelGen !== viewer._selGen,
   and check it after every await (pid lookup, wide-parquet description
   fetch, h3 lookup, and inside hydrateClusterUI via the existing
   isStale-predicate parameter).

2. fetchClusterByH3 bypassed the active source filter.
   The cluster lookup did `WHERE h3_cell = ?` without sourceFilterSQL —
   so an &h3= URL whose dominant_source is currently unchecked in
   ?sources= would still hydrate a cluster card for a dot the user can't
   see on the globe. Worse, hydrateClusterUI's nearby-samples query DOES
   apply source filter, producing a mismatched panel: full unfiltered
   cluster card with a filtered-down samples list. Add
   sourceFilterSQL('dominant_source') to the lookup; an excluded source
   now returns null and the side panel stays empty (matching what the
   globe shows).

Verified locally:
- ?sources=SESAR,GEOME,SMITHSONIAN#h3=843f6d3ffffffff (the cluster's
  dominant_source OPENCONTEXT is excluded) → side panel stays empty.
- ?sources= default (all checked) #h3=843f6d3ffffffff → hydrates as
  before with 151,334 OpenContext samples and 30 nearby rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* explorer: source-filter invalidates selection; boot finalize in try/finally

Two findings from Codex's fourth review:

1. Source-filter changes don't invalidate selection state.
   When the user unchecked the source for an already-hydrated cluster
   (or sample), the globe correctly hid the dot but the side panel and
   `&h3=` / `&pid=` URL stayed stale. Source filter changes also raced
   against in-flight selection lookups since they didn't bump `_selGen`.

   Fix: in the source-filter change handler (`:1690`), bump `_selGen`
   immediately, then after the existing globe-data reload, re-validate
   the current selection under the new filter:
     - Cluster (selectedH3): re-run fetchClusterByH3 (already honors
       sourceFilterSQL after v3); if returns null, clear selectedH3,
       cluster card, samples list, and rewrite the URL via replaceState.
     - Sample (selectedPid): probe lite_url with the same source filter;
       if no match, clear selectedPid + side panel + URL.
   Both branches re-check `_selGen` after the await to bail if a newer
   filter change has fired.

2. Boot's stale-abort early-returns skipped `_suppressHashWrite = false`.
   A no-lat/lng hashchange during boot's awaits could leave hash writes
   suppressed forever (the lat/lng path clears it via _suppressTimer; a
   stale-aborted boot leaves it set with no later cleanup).

   Fix: wrap the boot deep-link block in try/finally; move the
   `_suppressHashWrite = false` assignment into the finally so it runs
   on every path, including stale-abort early returns.

Verified locally:
- Load #h3=843f6d3ffffffff (OpenContext cluster); side panel hydrates.
- Uncheck OPENCONTEXT in the source filter → `&h3=` drops from URL,
  cluster card returns to empty state, samples list clears, ?sources=
  written with the remaining 3 sources. Globe also re-renders without
  OpenContext clusters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* explorer: fix UBIGINT precision-loss in h3_cell + rehydrate cluster on filter

Two issues from Codex's fifth review:

1. (P2 NEW) Selected cluster surviving the filter wasn't being rehydrated.
   When the user toggled a non-cluster source (e.g. unchecked SESAR while
   the selected cluster's dominant_source = OPENCONTEXT), the cluster
   stayed in URL but the nearby-samples list could now show stale rows
   from unchecked sources or miss newly-checked ones (hydrateClusterUI's
   nearby query uses sourceFilterSQL('source')).

   Fix: in the source-filter handler's revalidate branch, when meta is
   truthy (cluster still valid), call hydrateClusterUI(meta, isStale) to
   refresh the side panel under the new filter — not just leave it.

2. (UBIGINT precision regression — surfaced by testing #1) DuckDB-WASM
   returns h3_cell (UBIGINT > 2^53) as a JS Number, which loses precision
   on .toString(16). Boot worked because the SQL WHERE matched at the
   parquet level, but `selectedH3 = meta.h3_cell` (lossy roundtrip)
   stored a corrupted hex; subsequent revalidations against the corrupted
   key would never match and the panel would clear. The bug was latent
   in PR-as-of-ebd7978; the rehydrate branch above made it visible.

   Fix: SQL SELECT now CASTs h3_cell to VARCHAR (decimal string), and JS
   converts to hex via BigInt(decString).toString(16) — no precision
   loss. Applied at the two cluster-render sites (phase1, loadRes).
   fetchClusterByH3's return now uses the validated input `lower` as the
   canonical hex so the helper is also lossless.

   `to_hex()` in DuckDB-WASM doesn't exist (tried first, errored
   "Catalog Error: Scalar Function with name to_hex does not exist!" —
   the VARCHAR cast + JS BigInt is portable across versions).

Verified locally:
- Boot at #h3=843f6d3ffffffff hydrates correctly.
- Uncheck SESAR (OPENCONTEXT survives): cluster card unchanged, samples
  re-rendered with only OpenContext rows, &h3= preserved.
- Uncheck OPENCONTEXT (cluster's own source): card + samples cleared,
  &h3= dropped from URL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: sync EXPLORER_STATE.md h3 row to current implementation

Codex's sixth review only finding (P3, non-runtime): the EXPLORER_STATE.md
description still reflected the original v1 implementation:
- "regex `[^0-9a-fA-F]` strip" → now strict `/^[0-9a-f]{15}$/i` reject-not-strip
- "UNION ALL across all 3 parquets" → now resolution-routed via RES_TO_H3_URL
- Missing: cell-mode guard (`lower[0] === '8'`)
- Missing: source filter applied (sourceFilterSQL('dominant_source'))
- Missing: UBIGINT precision-loss workaround (CAST AS VARCHAR + BigInt)
- Missing: source-filter change re-validation
- Missing: _selGen race guard

Updated the h3 row to describe current behavior so future URL-state work
finds accurate docs.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant