Fix structure search and add 3-tier bundled molecule library#17
Merged
Conversation
Introduce a smart structure resolver and harden PubChem client: add classify_query, fetch_structure and student_friendly_resolve to route SMILES/InChI locally (RDKit) and CID/InChIKey/name to PubChem. Add inchi_to_xyz and search_cid_by_inchikey helpers, URL-encode queries, and implement client-side throttling plus exponential back-off retries for 503 responses. Move PubChem network parameters into config constants and expose new symbols from the package; update the UI to call the new resolver. Add tests covering query classification, URL encoding, throttle/backoff behavior, routing, and provenance metadata.
Introduce an NCI CACTUS resolver (quantui.cactus) to fetch SDF/3D structures as a chained fallback after PubChem, with robust error mapping to existing PubChem exception types. Add a unified provider chain (quantui.structure_providers) that normalizes results into a ResolvedStructure dataclass and implements resolution order: local RDKit → bundled library exact → PubChem → CACTUS → bundled-library fuzzy offline fallback. Expose resolve_structure, ResolvedStructure and fetch_from_cactus via package __init__, wire student_friendly_resolve usage in app, and add CACTUS_TIMEOUT_S config. Add comprehensive tests for CACTUS and the provider chain (tests/test_struct_providers.py).
Move the hard-coded MOLECULE_LIBRARY into a packaged, indexed library and provide a lazy loader/back-compat shim. - Add quantui/data/manifests/presets.json and quantui/data/library/library.sqlite as packaged data. - Implement quantui/molecule_library.py: compact coord encoder/decoder, deterministic SQLite store builder, read-only query API (get/search/categories/count), JSON-manifest fallback, and get_preset_dict() that returns the legacy MOLECULE_LIBRARY shape while excluding bulk categories. - Replace the inline MOLECULE_LIBRARY in quantui/config.py with a PEP 562 __getattr__ shim that lazily loads presets via get_preset_dict() to preserve existing consumers. - Add scripts/build_library.py to rebuild the SQLite store from manifests (with a 10 MB budget check) and tests/test_molecule_library.py covering codec, store, queries, back-compat, and size governance. - Include package-data in pyproject.toml so the manifest and sqlite store are bundled in distributions. This keeps imports fast, preserves backwards compatibility, supports offline use, and provides a compact, indexed store that can scale to larger manifest sets.
Introduce a bundled curated molecule library and integrate it into the app. Changes include: - Add a large curated manifest (quantui/data/manifests/curated.json) plus curated seed and build script (scripts/curated_seed.json, scripts/build_curated_library.py) and tests (tests/test_curated_library.py). - Update the packaged library database (quantui/data/library/library.sqlite) and adjust molecule library code/tests (quantui/molecule_library.py, tests/test_molecule_library.py) to use the curated content. - Add configuration limits to keep bundled content classroom-friendly: LIBRARY_SIZE_BUDGET_BYTES = 10 * 1024 * 1024, LIBRARY_HEAVY_ATOM_CEILING_CURATED = 30, LIBRARY_HEAVY_ATOM_CEILING_BULK = 9. - Small UI copy change in build_molecule_section to reference the bundled library of curated molecules. These changes add curated molecule data, enforce size/atom ceilings for usability, and provide tooling/tests to build and validate the bundled library.
Add a bulk QM9 dataset: new manifest (quantui/data/manifests/bulk_qm9.json) and provenance file (quantui/data/library/QM9-PROVENANCE.md). Include a build script (scripts/build_bulk_library.py) and tests (tests/test_bulk_library.py), and update tests/test_curated_library.py accordingly. The repository SQLite (quantui/data/library/library.sqlite) was updated to include the new entries. The provenance notes selection criteria (1956 entries, QM9/GDB-9 source, B3LYP/6-31G(2df,p) geometries, CC0) and provides the regeneration command.
Replace the old preset dropdown with a browsable, searchable molecule library UI and wire handlers to load selections. app_builders.py: add category labels, a helper to build dropdown options (library_result_options), and new widgets (category dropdown, search text, results dropdown, count label); adjust PubChem placeholder and tab titles. app.py: import molecule_library, build and observe the new widgets, add _refresh_lib_results/_on_lib_filter_changed/_on_lib_select handlers to fetch entries and set the active molecule. Add tests (tests/test_struct_ui.py) covering option building, widget presence/wiring, and handler behavior (including refresh-guard and bulk entries). Improves offline browsing of bundled molecules and UX for selecting library entries.
Support disambiguating ambiguous name/formula PubChem queries. Adds search_cids_by_name and search_pubchem_candidates in pubchem.py to fetch lightweight candidate descriptors (cid, title, formula, mw) while preserving PubChem order. Exposes search_candidates in structure_providers.py (only for name/formula queries; network errors are swallowed to let the caller fall back). UI changes: adds a hidden dropdown for candidate pick-list, handlers to show/hide it, and a selection handler that resolves the chosen CID in a background thread and applies the result. Updates app wiring and __init__ exports. Includes unit tests (tests/test_struct_disambiguation.py) that mock network calls and exercise the backend and UI pick-list behavior.
Add whole-library governance tests and support for efficient iteration. A new tests/test_library_governance.py validates every shipped library entry (well-formed atoms/coords, unique ids, coordinate round-trip stability), enforces the ≤10MB library budget, checks the preset/bulk contract, and ensures build scripts are present in-dev. molecule_library.iter_entries() was added to yield full decoded entries from the store (or fall back to the JSON manifest) for efficient whole-library checks. Also update help content: clarify Molecule Input instructions and add a "Finding a molecule" help topic explaining Library, XYZ input, and Online Search behaviors and provenance.
UI: install a stronger scroll guard that disables browser scroll-anchoring (overflow-anchor: none) on the log and its scrollable ancestors to prevent the page from jumping when new output lines arrive, while keeping the existing stick-to-bottom behavior. Chemistry: improve sdf_to_xyz robustness with RDKit by preserving explicit Hs (removeHs=False), adding hydrogens with coordinates (AddHs(..., addCoords=True)) to avoid Hs defaulting to the origin, detecting flat/2D conformers and re-embedding to 3D, and using MMFF optimization with a UFF fallback (and an Embed fallback using random coords) to avoid degenerate geometries that caused STRUCT.12 valence errors. Tests: add a test that a 2D SDF is re-embedded to a genuine 3D structure and that atoms are not coincident (prevents Hs piling at 0,0,0).
Fix several UI, network, and molecule-embedding issues: - app.py: improve log auto-scroll logic by tracking lastScrollHeight and handling mutations that indicate the log was cleared so the view correctly "follows" new output (prevents scrollbar sticking / BUG-SCROLL). - app_builders.py: set fixed heights for the visualization and run-output containers to avoid layout reflows and scrollbar jumps when swapping content or streaming output. - config.py / cactus.py: add a separate CACTUS connect timeout and use a (connect, read) timeout tuple for requests; lower the CACTUS read timeout to avoid slow/hanging fallback requests. - pubchem.py: add _separate_fragments to radially separate disconnected fragments (e.g. salts/counterions) after embedding so bond perception doesn't treat counterions as bonded (resolves STRUCT.14). Call this helper from sdf_to_xyz, smiles_to_xyz, and inchi_to_xyz. - tests: add a unit test ensuring a chloride counterion is placed well separated from an ammonium cation. These changes prevent UI scrolling regressions, make CACTUS lookups fail fast on network problems, and avoid embedding-related rendering failures for salts.
Replace the MutationObserver/install-once scroll-guard with a requestAnimationFrame polling approach. The new code re-queries the current .quantui-run-output element each frame, sets overflowAnchor:none, and pins the scroller while output is streaming (stops after ~600ms idle) to avoid flicker and broken observers caused by ipywidgets replacing nodes. This fixes BUG-SCROLL where observers watched stale nodes or produced visible scroll jumps.
Add a session-scoped autouse fixture that redirects QUANTUI_RESULTS_DIR to a temporary directory for the whole test run, preventing tests from creating a cwd-relative results/ directory and restoring the previous env var afterwards. Also update docstring and inline JS comments in quantui/app.py to better explain the run-output scroll guard: it re-queries the output node each animation frame, pins on requestAnimationFrame to avoid flicker while output streams, and stops pinning after the log is idle.
Add a _MAX_ISOSURFACE_POINTS cap (48,000) and downsample the cube volume by a computed stride so the plotted isosurface payload stays bounded. Adjust coordinate grids to match the strided volume. Replace separate positive/negative Isosurface traces with a single trace rendering both lobes (surface_count=2) using a stepped two-color colorscale and cmin/cmax, reducing payload and draw overhead while preserving opacity and caps. This keeps saved .cube full-resolution while improving browser/Plotly rendering performance.
Clean up internal architecture tags and inline TODO markers in docstrings and comments across multiple modules (app.py, app_analysis.py, app_builders.py, cactus.py, config.py, molecule_library.py, pubchem.py, structure_providers.py). These edits remove legacy STRUCT./BUG. annotations and tighten wording without changing runtime behavior. Update tests/test_orbital_visualization.py: tidy fixture string formatting, rename/clarify a test, and add assertions ensuring the isosurface is drawn as a single go.Isosurface trace (surface_count=2) and that large voxel grids are downsampled to avoid OOM in the renderer. No other functional changes were introduced.
Persist MP2/CCSD/(T) correlation values and render a post-HF breakdown in saved-result cards, including HF reference + correlation rows. Prefix geometry-optimization seed options with a ⚙ "Geom-opt" label so users can distinguish optimized geometries from raw inputs. Update tests to use substring matching for labels (more resilient than startswith) and add unit tests verifying MP2/CCSD breakdown rendering.
Bump package version to 0.3.0 in pyproject.toml and quantui/__init__.py and add the 0.3.0 changelog entry. The changelog documents the 'structure-sourcing' release: external structure search (name/CID/InChI/SMILES/CAS) with PubChem/NCI fallbacks and an offline bundled library; disambiguation pick-list; a three-tier bundled molecule library and new browse/search UI; migration of presets into the indexed library store; labeling clarification for seed geometries. Also notes fixes for orbital isosurface memory use, CCSD/MP2 result-card fields, live-log scrolling, and 3D re-embedding/salt handling.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request refactors and enhances the molecule input and search experience in the QuantUI app. The main changes include replacing the old preset molecule dropdown with a more powerful and searchable library browser, improving PubChem structure search with candidate disambiguation, and modernizing the live calculation log's scroll behavior. Several internal APIs and imports are updated to support these features.
Molecule library and search improvements:
preset_dddropdown with a new molecule library browser, including category filters, search box, and result dropdown, all powered by the newmolecule_librarybackend. The UI and callback logic inquantui/app.pyare updated accordingly, and helper functions are added inquantui/app_builders.pyfor building result options with friendly category labels. [1] [2] [3].sqliteand.jsonfiles) is now included in the package viapyproject.toml, and excluded from the pre-commit large file check. [1] [2]PubChem and structure search enhancements:
student_friendly_resolvefunction (fromstructure_providers), supporting CIDs and InChI as well as names/SMILES. When multiple candidates are found, a disambiguation dropdown is presented for user selection. The UI and callback logic are updated to handle this flow. [1] [2]quantui/__init__.pyAPI surface is expanded to expose new structure and search utilities, includingfetch_structure,resolve_structure, and candidate search helpers. [1] [2]UI/UX improvements:
Code cleanup and maintenance:
MOLECULE_LIBRARYand updates related logic to use the new library backend. [1] [2]These changes collectively modernize the molecule input experience, improve structure search reliability, and enhance the app's usability and maintainability.