diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 269a6e9..5e36262 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -12,10 +12,8 @@ QuantUI is an interactive Jupyter/Voilà platform for running PySCF quantum chemistry workflows end-to-end inside one app: setup, execution, analysis, visualization, and comparison. It is local-first today (no cluster account, no -SLURM required for normal use), and is designed to evolve toward optional -cluster-backed execution through interactive Jupyter/HPC environments. It is a -downstream port of the cluster-focused -`QuantUI` repo with all SLURM infrastructure removed. +SLURM required), and a future roadmap item is to add optional cluster-backed +execution through interactive Jupyter/HPC environments. **Primary users:** Undergraduate chemistry students and researchers at North Carolina Central University and collaborators. The UI runs as a Voilà app so users can run @@ -701,15 +699,17 @@ across kernel restarts and are accessible from the host (home dir is bind-mounte --- -## Relationship to Source Repo +## Scope Notes — Intentionally Out of Repo -QuantUI is a downstream port of `NCCU-Schultz-Lab/QuantUI` (the cluster version). -Bug fixes and module updates originate in `QuantUI` and are ported here. +The following module/file names are deliberately absent from `quantui/` and +should not be reintroduced without an explicit roadmap milestone. They would +only make sense once cluster-backed execution is added (a future roadmap +item, not currently scoped). -| Removed from source | Reason | +| File / module | Why it's not here | | --- | --- | -| `job_manager.py` | SLURM batch submission | -| `storage.py` | SLURM job metadata | -| `slurm_errors.py` | SLURM error translation | -| `visualization.py` | PlotlyMol fallback (excluded here) | -| SLURM templates in `config.py` | No cluster | +| `job_manager.py` | SLURM batch submission belongs to the future cluster-execution path | +| `storage.py` | SLURM job-metadata persistence — same future scope | +| `slurm_errors.py` | SLURM error translation — same future scope | +| `visualization.py` (the PlotlyMol-fallback module) | Superseded by `viz_backend_router.py` + `visualization_py3dmol.py` | +| SLURM-related templates in `config.py` | No cluster orchestration today | diff --git a/README.md b/README.md index 5a6641c..450bc20 100644 --- a/README.md +++ b/README.md @@ -47,6 +47,11 @@ research and classroom use. saved results; script export for a standalone `.py` file - **Plot export** — save IR, UV-Vis, PES, and orbital diagrams as standalone HTML +- **Optional GPU acceleration** — when [gpu4pyscf](https://github.com/pyscf/gpu4pyscf) + and a CUDA-capable NVIDIA GPU are present, SCF calculations auto-offload + via `mf.to_gpu()` (RHF / UHF / RKS / UKS supported; CCSD(T) stays on CPU). + The Status tab + every result card show which compute device was used. + Set `QUANTUI_DISABLE_GPU=1` to force CPU even when the GPU is available. - **Timing calibration** — one-click benchmark suite populates the time estimator with real machine data so predictions are accurate from the first run - **Voilà app mode** — serve the notebook as a polished widget-only UI (no @@ -95,6 +100,62 @@ python -m pip install quantui[pyscf,ase,app] See [apptainer/README.md](apptainer/README.md). +### Optional: GPU acceleration (NVIDIA + Linux / WSL) + +If you have an NVIDIA GPU, QuantUI can offload SCF calculations to it +through [gpu4pyscf](https://github.com/pyscf/gpu4pyscf). This is **fully +optional** — without these packages QuantUI runs on CPU exactly as +before, and you can re-disable GPU at any time with +`export QUANTUI_DISABLE_GPU=1`. + +**Step 1 — check your CUDA driver version:** + +```bash +nvidia-smi # "CUDA Version: 13.x" or "CUDA Version: 12.x" in the top-right +``` + +> The `CUDA Version` field reports your **driver's** maximum supported +> runtime. You do **not** need to install the CUDA Toolkit — the wheels +> below bundle their own runtime libraries. + +**Step 2 — install the CUDA-suffixed wheels matching your driver:** + +```bash +# CUDA 13.x driver +pip install gpu4pyscf-cuda13x cupy-cuda13x cutensor-cu13 + +# CUDA 12.x driver +pip install gpu4pyscf-cuda12x cupy-cuda12x cutensor-cu12 +``` + +> ⚠ **Do not** `pip install gpu4pyscf` or `pip install cupy` (without a +> CUDA suffix). Those are source distributions that try to compile +> against your local CUDA toolkit and will fail with +> `FileNotFoundError: 'nvcc'` on any machine without the full toolkit +> installed. The CUDA-suffixed wheels (`-cuda12x`, `-cuda13x`) are +> prebuilt binaries — no `nvcc`, no compilation, no toolkit required. + +**Step 3 — verify the install:** + +```bash +python -c "import gpu4pyscf, cupy; print('GPUs:', cupy.cuda.runtime.getDeviceCount())" +``` + +Should print `GPUs: 1` (or more). Once verified, launch QuantUI as usual +— the Status tab will show "GPU offload: active (NVIDIA {device-name})" +and result cards will display the compute device. + +**Method coverage** (per the gpu4pyscf docs): + +| Method | GPU offload | +| --- | --- | +| RHF, UHF, RKS, UKS (any DFT functional), TD-DFT | Yes | +| MP2, CCSD | Experimental on GPU (auto-offload) | +| CCSD(T) | CPU only (gpu4pyscf doesn't support GPU triples; QuantUI's dispatcher detects this and skips) | + +Whenever gpu4pyscf can't offload a particular call, QuantUI falls back +to CPU automatically and the result card reflects which device ran. + --- ## Quick start @@ -192,6 +253,41 @@ Dock — it just runs the `.command` script under the hood, so any --- +## Command-line toolkit + +QuantUI ships a small CLI for inspecting state and generating reports +from outside the notebook — useful for verifying GPU offload before a +long run, tailing the event log, and building a usage / speedup +dashboard. After installation: + +```bash +quantui log tail -n 50 # last 50 events from event_log.jsonl +quantui gpu check # is GPU offload available right now? +quantui analytics build --open # build dashboard.html + open in browser +``` + +Full reference with all flags and examples: [docs/CLI.md](docs/CLI.md). + +--- + +## Using QuantUI results in other tools + +QuantUI's M-EXPORT milestone writes portable companion files alongside +every result so you can hand-off to Avogadro, IQmol, Jmol, VMD, ASE-GUI, +or any spreadsheet without screen-scraping. The quick reference: + +| Goal | QuantUI file | Tool | +| --- | --- | --- | +| MOs in 3D, vibrations | `result.molden` | Avogadro 2, IQmol, Jmol | +| Geometry-opt / PES replay | `trajectory.xyz` or `.traj` | VMD, Avogadro, ASE-GUI | +| Orbital isosurface | `isosurfaces/.cube` | Avogadro, VMD, ChimeraX | +| Spectrum data in Excel | `*_data_*.csv` | Excel, LibreOffice, pandas | +| Share whole result | `.zip` (Export bundle) | Any unzip tool | + +Full per-tool walkthrough with troubleshooting: [docs/IMPORTING-INTO-AVOGADRO.md](docs/IMPORTING-INTO-AVOGADRO.md). + +--- + ## Tutorials Five step-by-step notebooks in [`notebooks/tutorials/`](notebooks/tutorials/): @@ -306,15 +402,6 @@ CHANGELOG.md Release history (Keep a Changelog format) --- -## Relationship to the cluster version - -QuantUI (this repo) is a downstream port of the cluster-based -[QuantUI-cluster](https://github.com/The-Schultz-Lab/QuantUI) repository. All SLURM -infrastructure (job manager, job storage, batch templates) has been removed. -Bug fixes flow from the cluster repo into this one, not the other way around. - ---- - ## License [MIT](LICENSE) — Copyright 2026 The Schultz Lab, North Carolina Central University diff --git a/docs/CLI.md b/docs/CLI.md new file mode 100644 index 0000000..a835585 --- /dev/null +++ b/docs/CLI.md @@ -0,0 +1,303 @@ +# QuantUI CLI + +QuantUI ships a small command-line toolkit for inspecting state and +generating reports from outside the notebook. After installing the +package (`pip install -e .` or `pip install quantui`), the `quantui` +command is on your `PATH`. + +```bash +quantui --help +``` + +The CLI is meant to *complement* the Voilà app — not replace it. Reach +for the CLI when you want to: + +- check what the app has been doing without opening a notebook +- confirm GPU offload is wired correctly before starting a long run +- generate a usage / GPU-speedup report you can share or pin to a tab +- script log inspection or analytics into a shell pipeline / cron job + +The CLI never touches your live calculations or notebook server. All +commands are read-only against `~/.quantui/` (or whatever +`QUANTUI_LOG_DIR` points at). + +--- + +## Command reference + +| Command | What it does | +| --- | --- | +| [`quantui log tail`](#quantui-log-tail) | Print recent events from `event_log.jsonl` | +| [`quantui gpu check`](#quantui-gpu-check) | Probe GPU-offload availability and explain failures | +| [`quantui analytics build`](#quantui-analytics-build) | Build an HTML usage dashboard from `perf_log.jsonl` | + +--- + +## `quantui log tail` + +Print the last *N* entries from the QuantUI event log +(`~/.quantui/logs/event_log.jsonl`). Each event is rendered on one +line as `timestamp event_type message k=v k=v ...`, so the output +is grep-friendly. + +### Flags + +| Flag | Default | Description | +| --- | --- | --- | +| `-n N` | `20` | Number of most-recent events to print | + +### Examples + +```bash +# Last 20 events +quantui log tail + +# Last 50 events +quantui log tail -n 50 + +# Find every GPU-related event +quantui log tail -n 500 | grep -i gpu + +# Watch the most recent error +quantui log tail -n 200 | grep -i error | tail -5 +``` + +### Sample output + +``` +2026-05-25T13:55:22.421910+00:00 viz_route_decision task=molecule_preview pref=auto chosen=py3dmol reason=auto -> task primary (py3dmol) +2026-05-25T13:55:22.470028+00:00 startup QuantUI 0.2.0 started +2026-05-25T14:08:14.102544+00:00 calc_done B3LYP/STO-3G on H2O elapsed_s=1.2 converged=True gpu_used=True gpu_name=NVIDIA GeForce RTX 4050 Laptop GPU +``` + +### Notes + +- The event log auto-prunes entries older than 7 days on every write, + so `tail` always reflects the active week. +- Output goes to stdout; "log is empty" / "log not found" notices go to + stderr so they don't pollute pipelines. +- Exit code: always `0` (even when no events exist — the absence of + events is not an error). + +--- + +## `quantui gpu check` + +Probe whether QuantUI's GPU offload path is functional in the current +environment. This is the canonical one-liner for verifying that +`gpu4pyscf` + `cupy` are installed correctly and that +`is_gpu_available()` will return `True` when the app runs. + +### Flags + +None. + +### Examples + +```bash +# Is GPU offload working right now? +quantui gpu check + +# Use in a shell condition +if quantui gpu check; then + echo "GPU mode" +else + echo "Falling back to CPU" +fi + +# Diagnose a CI machine +QUANTUI_DISABLE_GPU=1 quantui gpu check # confirms env-var path +``` + +### Sample output + +**When GPU is available:** + +``` +GPU offload available: NVIDIA GeForce RTX 4050 Laptop GPU +``` + +(exit code `0`) + +**When GPU is unavailable**, the command prints a reason so you know +where to look next: + +``` +GPU offload not available + reason: gpu4pyscf not installed (see README → 'Optional: GPU acceleration') +``` + +``` +GPU offload not available + reason: QUANTUI_DISABLE_GPU is set in the environment +``` + +``` +GPU offload not available + reason: cupy reports 0 CUDA devices +``` + +``` +GPU offload not available + reason: cupy/gpu4pyscf import succeeded but probe raised — run + `python -c "import cupy; cupy.show_config()"` to inspect +``` + +(all return exit code `1`) + +### Notes + +- Detection is cached for the lifetime of QuantUI's runtime (so the + Voilà app doesn't re-probe on every result-card render); the CLI + clears that cache before probing so each invocation reflects the + current state — useful right after a `pip install`. +- Returns exit `1` rather than raising, so the command is safe to use + in `if ...; then ... fi` and `&& ...` chains. + +--- + +## `quantui analytics build` + +Build a self-contained HTML analytics dashboard from +`~/.quantui/logs/perf_log.jsonl` and write it to a file you can open +in any browser. + +The dashboard contains: + +- **Overview cards** — total runs, total compute time, GPU vs CPU run + counts, unique molecules / methods / basis sets used. +- **GPU vs CPU speedup table** — for every `(method, basis, formula)` + tuple that has runs on *both* devices, the median CPU time, median + GPU time, and the speedup factor. Sorted best-speedup first. +- **Method usage** — bar chart of run counts per method. +- **Calc-type distribution** — bar chart of run counts per calculation + type. +- **Recent timeline** — scatter of `elapsed_s` over time, coloured by + compute device (CPU grey, GPU green, Unknown light grey for + pre-2026-05-25 records that don't yet have device info). + +Plotly's JavaScript is inlined into the HTML, so the file works +offline and can be emailed, attached to a writeup, or pinned to a +browser tab. + +### Flags + +| Flag | Default | Description | +| --- | --- | --- | +| `-o PATH`, `--output PATH` | `~/.quantui/dashboard.html` | Output HTML path | +| `--open` | off | After writing, open the dashboard in the default browser (WSL-aware — uses `wslview` / `explorer.exe` on WSL) | + +### Examples + +```bash +# Build the dashboard at the default location +quantui analytics build + +# Build and immediately open it in the browser +quantui analytics build --open + +# Write somewhere shareable +quantui analytics build -o ~/Desktop/quantui-report.html + +# Build into a shared folder + open +quantui analytics build -o ~/projects/lab-share/quantui-report.html --open +``` + +### Sample output + +``` +Wrote /home/schul/.quantui/dashboard.html +``` + +With `--open`, the CLI picks the right opener for your environment: + +- **WSL**: tries `wslview` first (bundled with the `wslu` package), + then falls back to `explorer.exe`. Both delegate to your **Windows + default browser** via WSL interop — no Linux-side browser install + needed. If neither is available, `sudo apt install wslu` fixes it + in one step. +- **Linux native**: stdlib `webbrowser.open` (which uses `xdg-open`). +- **macOS / Windows native**: stdlib `webbrowser.open`. + +If no opener succeeds — e.g. a headless container with no display — +you'll see: + +``` +Wrote /home/schul/.quantui/dashboard.html +(could not auto-open browser — open /home/schul/.quantui/dashboard.html manually) +``` + +The exit code stays `0` either way — the dashboard was written +successfully; only the auto-open is best-effort. + +### Notes + +- **Empty perf log**: if `perf_log.jsonl` doesn't exist yet, the + command prints `(perf log is empty — run a calculation first)` to + stderr and exits `0`. No file is written. +- **Old records with no GPU info**: records written before session 55 + (2026-05-25) don't have `gpu_used`. The dashboard counts those in a + separate "Unknown" device bucket rather than assuming CPU — that + keeps the GPU-vs-CPU speedup table honest. +- **Speedup table empty?** It only shows tuples that have runs on + *both* devices. After enabling GPU, re-run any prior CPU calc on + the GPU to populate at least one row. + +--- + +## Environment variables + +| Variable | Effect | +| --- | --- | +| `QUANTUI_LOG_DIR` | Override the default `~/.quantui/logs/` location. The dashboard's default output (`~/.quantui/dashboard.html`) follows: it lives one level up from the active `QUANTUI_LOG_DIR`. | +| `QUANTUI_DISABLE_GPU` | Force CPU mode even when gpu4pyscf is installed. `quantui gpu check` reports this as the reason. Accepted truthy values: `1`, `true`, `True`. | + +--- + +## Common workflows + +### Verify GPU is wired before a long run + +```bash +quantui gpu check && voila notebooks/molecule_computations.ipynb +``` + +If `gpu check` exits non-zero, the Voilà launch is skipped and the +reason was printed to stderr. + +### Quick "what happened in my last session?" + +```bash +quantui log tail -n 100 | grep -E "calc_done|calc_error|startup" +``` + +### After a benchmarking run, open the report + +```bash +quantui analytics build --open +``` + +The dashboard opens; the speedup table summarises everything across +runs without you needing to remember which calc ran where. + +### Plumbing into cron / CI + +```bash +# Daily snapshot, no auto-open (headless) +quantui analytics build -o /var/reports/quantui-$(date +%F).html +``` + +--- + +## Adding a new subcommand + +Each verb is one `_cmd_(args: argparse.Namespace) -> int` in +[`quantui/cli.py`](../quantui/cli.py) plus a registration in +`_build_parser`. The pattern is short by design — `gpu check`, +`log tail`, and `analytics build` all fit in well under 50 lines of +production code apiece. See the module docstring for the contract. + +Tests live in [`tests/test_cli.py`](../tests/test_cli.py) — every +subcommand should cover its happy path, its empty/missing-data path, +and any flag-specific behavior (e.g. `--open` was tested against both +a successful `webbrowser.open` and a failed one). diff --git a/docs/IMPORTING-INTO-AVOGADRO.md b/docs/IMPORTING-INTO-AVOGADRO.md new file mode 100644 index 0000000..4f3b44f --- /dev/null +++ b/docs/IMPORTING-INTO-AVOGADRO.md @@ -0,0 +1,176 @@ +# Importing QuantUI results into Avogadro / IQmol / Jmol + +QuantUI saves every calculation as a *result folder* under `~/.quantui/results/`. +Each folder ships with portable, standards-compliant files that the wider +quantum-chemistry ecosystem already knows how to read. No screen-scraping, +no lock-in, no waiting on QuantUI to add a feature you can already get from +the tool you already use. + +This page is a quick cross-reference: **"I want to do X — which file do I open +in which tool?"** + +## The big table + +| What you want to do | QuantUI file (in result folder) | Recommended external tool(s) | +| --- | --- | --- | +| View molecular orbitals in 3D | `result.molden` | Avogadro · IQmol · Jmol | +| Animate vibrational normal modes | `result.molden` (from a Frequency calc) | Avogadro | +| Plot or replay a geometry-optimization or PES-scan trajectory | `trajectory.xyz` (any tool) or `trajectory.traj` (ASE) | VMD · Avogadro · ASE-GUI | +| Render an orbital isosurface from a saved cube | `HOMO.cube` / `LUMO.cube` / etc. | Avogadro · VMD · ChimeraX | +| Open spectrum data in Excel / a notebook | `*_data_*.csv` (per-panel: IR, UV-Vis, orbitals, PES) | LibreOffice Calc · Excel · `pandas.read_csv` | +| Share the whole result with a collaborator | `.zip` (use **Export bundle** in the Analysis tab) | Any unzip tool | +| Edit a structure and re-run elsewhere | `trajectory.traj` (last frame) | ASE-GUI | + +## Where the files live + +After a calculation finishes, open the **Files tab** in QuantUI and select +the result folder. You will see a tree like this: + +```text +2026-05-25_14-32-11-394021_H2O_B3LYP_6-31Gs/ +├── result.json ← machine-readable result metadata +├── result.molden ← MOs + (for freq) vibrations ← EXPORT.1 / EXPORT.2 +├── pyscf.log ← raw PySCF output +├── orbitals.npz ← MO coefficients (for QuantUI re-render) +├── thumbnail.png ← preview card image +├── trajectory.xyz ← geo-opt / PES frames (multi-frame XYZ) ← EXPORT.3 +├── trajectory.traj ← geo-opt / PES frames (ASE binary) ← EXPORT.7 +├── ir_data_.csv ← IR-spectrum (freq+intensity) data ← EXPORT.4 +├── uv_data_.csv ← UV-Vis-spectrum data ← EXPORT.4 +├── orb_data_.csv ← orbital-diagram data ← EXPORT.4 +├── pes_data_.csv ← PES-scan data ← EXPORT.4 +└── isosurfaces/ + ├── H2O_HOMO_.cube + └── H2O_LUMO_.cube ← EXPORT.5 +``` + +Files marked `← EXPORT.X` were added in the M-EXPORT milestone (session 54, +QuantUI 0.2.0). Older result folders may not have them. + +## Per-tool quick start + +### Avogadro 2 + +Avogadro is the easiest cross-platform viewer for QuantUI outputs. + +- **View MOs:** `File → Open → result.molden` → menu **Analysis → Orbitals**. + Pick an orbital from the list, then **Extensions → Surfaces → Generate**. +- **Animate vibrations:** open the *same* `result.molden` from a Frequency + calculation → menu **Extensions → Vibrational Modes** → pick a frequency + → **Start Animation**. QuantUI writes `[FREQ]`, `[FR-COORD]`, and + `[FR-NORM-COORD]` blocks per the Molden spec. +- **Replay a geometry optimization:** `File → Open → trajectory.xyz` and + use the frame slider at the bottom of the viewport. +- **Render an isosurface from a cube file:** `File → Open → .cube` + → **Extensions → Surfaces → Generate** (the cube is already on a grid). + +### IQmol + +Excellent for MO visualization with smooth navigation between orbitals. + +- **MOs:** `File → Open → result.molden`. The orbital tree appears in the + side panel; double-click an orbital to render its isosurface. +- IQmol does not animate vibrations from Molden files. For vibrations, + use Avogadro. + +### Jmol + +Useful when you want a script-driven viewer for batch screenshots or +publications. + +- **MOs:** `load result.molden` → `mo HOMO` (or any orbital index). +- **Trajectories:** `load trajectory.xyz` autoloads all frames; `frame next` + cycles them. +- **Cubes:** `isoSurface s1 cutoff 0.05 "HOMO.cube"`. + +### VMD + +The best tool for large trajectories (PES scans with hundreds of points, +long MD-style replays). + +- **Trajectories:** `vmd -m trajectory.xyz`. VMD auto-detects multi-frame + XYZ. +- **Cubes:** `mol new HOMO.cube` then **Graphics → Representations → + Isosurface**. + +### ASE-GUI (graphical) and `ase` (Python) + +ASE round-trips the binary `.traj` file with per-frame energies preserved. + +- **Graphical:** `ase gui trajectory.traj` opens an interactive viewer. + Slice with `ase gui trajectory.traj@0:10:2`. +- **Edit + save as a new starting point:** + `ase gui trajectory.traj` → manipulate atoms → **File → Save as…**. + Re-import the saved geometry into QuantUI for a follow-up calculation. +- **Python post-processing:** + + ```python + from ase.io import read + frames = read("trajectory.traj", index=":") + for f in frames: + print(f.get_potential_energy()) # eV (ASE convention) + ``` + + The `.xyz` trajectory uses the *extended-XYZ* convention with + `energy= Hartree` per frame, so `ase.io.read("trajectory.xyz", ":")` + also works. + +### Plain Python (Excel, pandas) + +Every spectrum / diagram panel exports its data as a per-trace CSV via +the **📋 Copy data** button. The file is also written to the result folder +as `_data_.csv`. The format is one section per trace: + +```text +# trace 1 +x,y +400,0.0 +401,0.012 +... +``` + +This parses cleanly with stdlib `csv.reader`, `pandas.read_csv`, Excel, +LibreOffice Calc, or anything else that knows how to read comma-separated +values with comment lines. + +## Bundle export + +The **Export bundle** button in the Analysis tab zips an entire result +folder. The archive lands as a sibling of the result directory: + +```text +~/.quantui/results/2026-05-25_14-32-11-394021_H2O_B3LYP_6-31Gs.zip +``` + +Share that one file and your collaborator gets every artifact above — +no need to walk them through which file does what. + +## Troubleshooting + +- **Avogadro 1.2 doesn't show vibrations.** Upgrade to Avogadro 2; the v1 + branch is no longer maintained. Avogadro 2 reads QuantUI's Molden + vibration blocks natively. +- **`result.molden` is missing for an older result.** Auto-export was + added in session 54 (QuantUI 0.2.0). Older results don't have a + `.molden`; re-running the calc regenerates one. +- **IQmol can't open the file.** IQmol's parser is stricter than + Avogadro's. If you see a parse error, open the file in Avogadro first + to confirm it's well-formed — usually a sign of a half-written file + from an interrupted run. +- **Cube files render in Avogadro but the colors are inverted.** Toggle + **Extensions → Surfaces → Color by Phase**. Cube sign conventions vary + between codes; QuantUI uses PySCF's default (gpu4pyscf matches). + +## Related reading + +- [Molden file format spec](https://www.theochem.ru.nl/molden/molden_format.html) +- [Extended-XYZ specification](https://wiki.fysik.dtu.dk/ase/ase/io/formatoptions.html#extended-xyz) +- [ASE trajectory file format](https://wiki.fysik.dtu.dk/ase/ase/io/trajectory.html) +- [Cube file format (Gaussian convention)](https://gaussian.com/cubegen/) + +## Roadmap link + +This page closes work-package **EXPORT.6** in [M-EXPORT](https://github.com/NCCU-Schultz-Lab/QuantUI/blob/main/CHANGELOG.md). +The companion exports (Molden, multi-frame XYZ, ASE `.traj`, per-panel CSV, +cube + bundle) are tracked as EXPORT.1, EXPORT.2, EXPORT.3, EXPORT.4, +EXPORT.5, and EXPORT.7 in the same milestone. diff --git a/docs/index.html b/docs/index.html index bf5f5dd..9dec9d0 100644 --- a/docs/index.html +++ b/docs/index.html @@ -3,10 +3,10 @@ - QuantUI — An open-source frontend for DFT and post-HF quantum chemistry - - - + QuantUI — Free, open, and interactive quantum chemistry + + + @@ -354,16 +354,18 @@
- Open-source DFT frontend + Open-source PySCF frontend No cluster required + GPU-ready
-

A powerful frontend for
open-source quantum chemistry

+

Free, open, and
interactive quantum chemistry

QuantUI puts PySCF - behind an interactive Jupyter/Voilà UI. Run DFT, MP2, TD-DFT, - NMR, geometry optimization, frequencies, and PES scans — - visualize structures, orbitals, IR and UV-Vis spectra, all on - your laptop. + behind an interactive Jupyter/Voilà UI. Run DFT, MP2, CCSD, + CCSD(T), TD-DFT, NMR, geometry optimization, frequencies, and + PES scans — visualize structures, orbitals, IR and UV-Vis + spectra, all on your laptop with optional NVIDIA GPU offload via + gpu4pyscf.

@@ -374,7 +376,7 @@

A powerful frontend for
open-source quantum chemistry
Python 3.9–3.11 · - ~1000 tests + 1280+ tests · MIT License · @@ -459,8 +461,8 @@

A complete PySCF workflow

Calculations

- RHF, UHF, nine DFT functionals, and MP2 — with six - calculation types: single point, geometry optimization, + RHF, UHF, nine DFT functionals, MP2, CCSD, and CCSD(T) — + with six calculation types: single point, geometry optimization, frequencies/thermochemistry, TD-DFT UV-Vis, NMR shielding, and 1D PES scans. PCM implicit solvation included.

@@ -481,10 +483,51 @@

A complete PySCF workflow

📂
Exports & History

- Every calculation auto-saves to a timestamped directory and - can be replayed after a kernel restart. Export structures as - XYZ, MOL/SDF, or PDB; spectra as standalone HTML; or any run - as a runnable .py script. + Every calc auto-saves to a timestamped directory and replays + after a kernel restart. Export structures (XYZ, MOL/SDF, PDB), + orbital data (Molden), trajectories (multi-frame XYZ, ASE + .traj), cube files, spectra + as HTML, full result bundles as .zip, + or any run as a standalone .py script. +

+
+ +
+
🚀
+
GPU Acceleration
+

+ Optional NVIDIA GPU offload via + gpu4pyscf + — RHF, UHF, RKS/UKS DFT, and TD-DFT auto-migrate to GPU + when available. Numerical IR-intensity SCFs also offload. Set + QUANTUI_DISABLE_GPU=1 to force + CPU; the result card always shows which device produced the numbers. +

+
+ +
+
📈
+
Time Estimator & Calibration
+

+ Four-tier calibration suite anchors a per-machine time-prediction + model with GPU-vs-CPU partitioning, IQR outlier rejection, and + variance-aware confidence labels. Pre-run estimates show in the + Calculate tab; predicted-vs-actual accuracy accrues automatically + in the analytics dashboard. +

+
+ +
+
⌨️
+
CLI & Analytics
+

+ The quantui CLI inspects the + event log (log tail), probes + GPU availability (gpu check), + and builds a self-contained HTML analytics dashboard + (analytics build --open) with + GPU-vs-CPU speedup tables, method usage, and estimator-accuracy + tracking.

@@ -637,7 +680,7 @@

Step-by-step tutorials

Supported calculations

- Six calculation types over twelve methods and nine basis sets, + Six calculation types over fourteen methods and nine basis sets, all dispatched through a single Calculate tab.

@@ -679,7 +722,7 @@

Supported calculations

- Twelve methods, grouped by family: + Fourteen methods, grouped by family:

@@ -707,8 +750,8 @@

Supported calculations

@@ -764,6 +807,7 @@

Supported calculations

Schultz LabRepositoryChangelog + Import into AvogadroLicense diff --git a/pyproject.toml b/pyproject.toml index 493f841..1042865 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -37,6 +37,11 @@ dependencies = [ "plotlymol>=0.2.1", ] +[project.scripts] +# ``quantui`` CLI — small toolkit for inspecting QuantUI state from +# the terminal (``quantui log tail -n 50``, etc.). See ``quantui/cli.py``. +quantui = "quantui.cli:main" + [tool.setuptools] packages = ["quantui"] @@ -64,6 +69,31 @@ app = [ "ipykernel>=6.0.0", ] +# GPU acceleration via gpu4pyscf + cupy (M-GPU). Linux + NVIDIA CUDA only. +# When installed, ``quantui.gpu_offload`` auto-detects + migrates SCF +# objects via ``mf.to_gpu()``. Set ``QUANTUI_DISABLE_GPU=1`` to force CPU +# at runtime even when these are available. Method coverage per the +# gpu4pyscf README: RHF/UHF/RKS/UKS fully supported; MP2/CCSD experimental; +# CCSD(T) explicitly unsupported (QuantUI's dispatcher skips it). +# +# IMPORTANT: gpu4pyscf and cupy publish CUDA-suffixed wheels — pick the +# extra matching your NVIDIA driver's CUDA version (see ``nvidia-smi``). +# The bare ``gpu4pyscf`` / ``cupy`` packages on PyPI are source sdists +# that require a local CUDA Toolkit (``nvcc``) to build; the suffixed +# wheels (``-cuda12x``, ``-cuda13x``) are prebuilt binaries and do NOT. +# See the "Optional: GPU acceleration" section in README.md for the +# full step-by-step including the ``nvidia-smi`` check. +gpu-cuda12x = [ + "gpu4pyscf-cuda12x", + "cupy-cuda12x", + "cutensor-cu12", +] +gpu-cuda13x = [ + "gpu4pyscf-cuda13x", + "cupy-cuda13x", + "cutensor-cu13", +] + # Notebook smoke-test dependencies notebook = [ "nbmake>=1.4.0", diff --git a/quantui/analytics.py b/quantui/analytics.py new file mode 100644 index 0000000..99318eb --- /dev/null +++ b/quantui/analytics.py @@ -0,0 +1,609 @@ +"""Self-contained analytics dashboard for QuantUI usage data. + +Reads ``~/.quantui/logs/perf_log.jsonl`` (override with +``QUANTUI_LOG_DIR``) and writes a standalone HTML report with charts that +work offline — Plotly's JS is inlined into the file so the user can open +it directly in a browser (no Voilà, no Jupyter). + +What the dashboard shows +------------------------ + +1. **Overview cards** — total runs, total compute time, GPU vs CPU run + counts, unique molecules / methods / basis sets. +2. **GPU vs CPU speedup table** — for every (method, basis, formula) that + has runs on BOTH devices, the median CPU time, median GPU time, and + the resulting speedup factor. Sortable / readable in one glance. +3. **Method usage** — bar chart of run counts per method. +4. **Calc-type distribution** — bar chart of run counts per calc_type. +5. **Recent timeline** — scatter of ``elapsed_s`` over time coloured by + compute device (CPU grey, GPU green), so a user can spot regressions + or speedups visually as they run more calcs. + +Older perf-log records that pre-date the M-GPU follow-up don't have +``gpu_used`` set — those are treated as "device unknown" and counted in +their own bucket rather than guessed CPU. + +Output is a single ``.html`` file (default ``~/.quantui/dashboard.html``) +the user can pin to their browser or email to a collaborator. +""" + +from __future__ import annotations + +import html as _html +import statistics +from collections import defaultdict +from datetime import datetime, timezone +from pathlib import Path +from typing import Optional + +from quantui.calc_log import _log_dir, get_perf_history, get_prediction_history + +# --------------------------------------------------------------------------- +# Internal helpers +# --------------------------------------------------------------------------- + + +def _classify_device(record: dict) -> str: + """Return ``"GPU"``, ``"CPU"``, or ``"Unknown"`` for one perf record. + + Records written before the M-GPU follow-up (2026-05-25) don't have + ``gpu_used`` at all — we don't backfill those as CPU because they + pre-date GPU support entirely, so calling them "CPU" would muddy any + speedup comparison. ``"Unknown"`` is the honest bucket. + """ + if "gpu_used" not in record: + return "Unknown" + return "GPU" if record["gpu_used"] else "CPU" + + +def _summary_metrics(records: list[dict]) -> dict: + """Compute headline counters for the overview cards.""" + total_runs = len(records) + total_seconds = sum(float(r.get("elapsed_s", 0.0)) for r in records) + gpu_runs = sum(1 for r in records if _classify_device(r) == "GPU") + cpu_runs = sum(1 for r in records if _classify_device(r) == "CPU") + unknown_runs = sum(1 for r in records if _classify_device(r) == "Unknown") + converged = sum(1 for r in records if r.get("converged")) + unique_formulas = len({r.get("formula", "") for r in records if r.get("formula")}) + unique_methods = len({r.get("method", "") for r in records if r.get("method")}) + unique_basis = len({r.get("basis", "") for r in records if r.get("basis")}) + return { + "total_runs": total_runs, + "total_seconds": total_seconds, + "gpu_runs": gpu_runs, + "cpu_runs": cpu_runs, + "unknown_runs": unknown_runs, + "converged_runs": converged, + "unique_formulas": unique_formulas, + "unique_methods": unique_methods, + "unique_basis": unique_basis, + } + + +def _speedup_rows(records: list[dict]) -> list[dict]: + """For each (method, basis, formula) with both CPU and GPU runs, return + a row with median times and the speedup factor. + + Only tuples that have at least one CPU run AND at least one GPU run + show up. ``Unknown`` device records are ignored for this comparison. + Sorted by speedup descending (best speedups at the top). + """ + bucket: dict[tuple, dict[str, list[float]]] = defaultdict( + lambda: {"CPU": [], "GPU": []} + ) + for r in records: + dev = _classify_device(r) + if dev not in ("CPU", "GPU"): + continue + key = ( + r.get("method", "?"), + r.get("basis", "?"), + r.get("formula", "?"), + ) + try: + bucket[key][dev].append(float(r["elapsed_s"])) + except (KeyError, TypeError, ValueError): + continue + + rows: list[dict] = [] + for (method, basis, formula), times in bucket.items(): + if not times["CPU"] or not times["GPU"]: + continue + cpu_med = statistics.median(times["CPU"]) + gpu_med = statistics.median(times["GPU"]) + if gpu_med <= 0: + continue + rows.append( + { + "method": method, + "basis": basis, + "formula": formula, + "cpu_runs": len(times["CPU"]), + "gpu_runs": len(times["GPU"]), + "cpu_median_s": cpu_med, + "gpu_median_s": gpu_med, + "speedup": cpu_med / gpu_med, + } + ) + rows.sort(key=lambda r: r["speedup"], reverse=True) + return rows + + +def _counts_by(records: list[dict], field: str) -> dict[str, int]: + """Tally ``records`` by ``field``, dropping empty/missing values.""" + counts: dict[str, int] = defaultdict(int) + for r in records: + v = r.get(field) + if v: + counts[str(v)] += 1 + return dict(counts) + + +# --------------------------------------------------------------------------- +# HTML rendering +# --------------------------------------------------------------------------- + + +_DASHBOARD_CSS = """ + +""" + + +def _card(label: str, value: str, css_class: str = "") -> str: + cls = f"card {css_class}".strip() + return ( + f'
' + f'
{_html.escape(label)}
' + f'
{_html.escape(value)}
' + ) + + +def _format_seconds(s: float) -> str: + if s < 60: + return f"{s:.1f} s" + if s < 3600: + return f"{s / 60:.1f} min" + return f"{s / 3600:.1f} h" + + +def _overview_section(summary: dict) -> str: + cards = [ + _card("Total runs", str(summary["total_runs"])), + _card("Total compute", _format_seconds(summary["total_seconds"])), + _card("GPU runs", str(summary["gpu_runs"]), css_class="gpu"), + _card("CPU runs", str(summary["cpu_runs"]), css_class="cpu"), + ] + if summary["unknown_runs"]: + cards.append(_card("Device unknown", str(summary["unknown_runs"]))) + cards.extend( + [ + _card("Unique molecules", str(summary["unique_formulas"])), + _card("Methods used", str(summary["unique_methods"])), + _card("Basis sets used", str(summary["unique_basis"])), + ] + ) + return ( + "

Overview

" + f'
{"".join(cards)}
' + ) + + +def _speedup_section(rows: list[dict]) -> str: + if not rows: + return ( + "

GPU vs CPU speedup

" + '

No (method, basis, formula) tuple has runs on ' + "both devices yet. Re-run any prior CPU calc on the GPU to populate " + "this table.

" + ) + body_rows = [] + for r in rows: + speedup_cls = "speedup-good" if r["speedup"] >= 1.5 else "speedup-flat" + body_rows.append( + "" + f"" + f"" + f"" + f'' + f'' + f'' + f'' + f'' + "" + ) + return ( + "

GPU vs CPU speedup

" + "
Post-HF - MP2
- Second-order Møller–Plesset for accurate small-molecule energies + MP2, CCSD, CCSD(T)
+ Møller–Plesset (O(N⁵)) for fast post-HF; coupled cluster (O(N⁶) singles+doubles, O(N⁷) with perturbative triples) for benchmark-quality small-molecule energies
{_html.escape(r['method'])}{_html.escape(r['basis'])}{_html.escape(r['formula'])}{r["cpu_runs"]}{r["gpu_runs"]}{r["cpu_median_s"]:.2f}{r["gpu_median_s"]:.2f}{r["speedup"]:.2f}×
" + "" + "" + "" + "" + "" + "".join(body_rows) + "
MethodBasisFormulaCPU nGPU nCPU median (s)GPU median (s)Speedup
" + ) + + +def _figure_section(title: str, fig_html: Optional[str], empty_msg: str) -> str: + if fig_html is None: + return f'

{_html.escape(title)}

{empty_msg}

' + return f"

{_html.escape(title)}

{fig_html}
" + + +def _bar_chart_html( + counts: dict[str, int], *, title: str, include_plotlyjs: bool +) -> Optional[str]: + if not counts: + return None + try: + import plotly.graph_objects as go + import plotly.io as pio + except ImportError: + return None + keys = sorted(counts, key=lambda k: counts[k], reverse=True) + fig = go.Figure( + data=[ + go.Bar( + x=keys, + y=[counts[k] for k in keys], + marker_color="#6366f1", + ) + ] + ) + fig.update_layout( + title=None, + xaxis_title=None, + yaxis_title="Runs", + height=320, + margin=dict(l=40, r=20, t=10, b=40), + plot_bgcolor="#ffffff", + ) + return pio.to_html( + fig, + include_plotlyjs="inline" if include_plotlyjs else False, + full_html=False, + config={"displayModeBar": False}, + ) + + +def _timeline_html(records: list[dict], *, include_plotlyjs: bool) -> Optional[str]: + """Scatter of elapsed_s vs timestamp, coloured by device.""" + if not records: + return None + try: + import plotly.graph_objects as go + import plotly.io as pio + except ImportError: + return None + + grouped: dict[str, list[tuple[datetime, float, str]]] = { + "GPU": [], + "CPU": [], + "Unknown": [], + } + for r in records: + try: + ts = datetime.fromisoformat(str(r["timestamp"])) + if ts.tzinfo is None: + ts = ts.replace(tzinfo=timezone.utc) + except (KeyError, ValueError): + continue + elapsed = float(r.get("elapsed_s", 0.0)) + label = ( + f"{r.get('method', '?')}/{r.get('basis', '?')} on " + f"{r.get('formula', '?')}" + ) + grouped[_classify_device(r)].append((ts, elapsed, label)) + + color_map = {"GPU": "#059669", "CPU": "#6b7280", "Unknown": "#d1d5db"} + traces = [] + for dev, points in grouped.items(): + if not points: + continue + points.sort(key=lambda p: p[0]) + traces.append( + go.Scatter( + x=[p[0] for p in points], + y=[p[1] for p in points], + mode="markers", + name=dev, + text=[p[2] for p in points], + marker=dict(size=8, color=color_map[dev], opacity=0.8), + hovertemplate="%{text}
%{x|%Y-%m-%d %H:%M}
%{y:.2f} s", + ) + ) + if not traces: + return None + fig = go.Figure(data=traces) + fig.update_layout( + height=380, + yaxis_title="Elapsed (s)", + margin=dict(l=50, r=20, t=10, b=50), + plot_bgcolor="#ffffff", + legend=dict(orientation="h", x=0, y=1.05), + ) + return pio.to_html( + fig, + include_plotlyjs="inline" if include_plotlyjs else False, + full_html=False, + config={"displayModeBar": False}, + ) + + +# --------------------------------------------------------------------------- +# Prediction-accuracy section (M-EST / EST.6, 2026-05-25) +# --------------------------------------------------------------------------- + + +def _prediction_accuracy_metrics(records: list[dict]) -> dict: + """Compute headline accuracy metrics from prediction-log records. + + Records with ``predicted_s=None`` are "no-estimate" runs and counted + separately. For the median-error calculation we use absolute + percentage error (``|actual - predicted| / predicted * 100``), so + over- and under-predictions weigh the same; the dashboard shows + both the signed median (bias) and the absolute median (magnitude). + """ + have_pred = [ + r + for r in records + if r.get("predicted_s") is not None and r.get("error_pct") is not None + ] + no_pred = [r for r in records if r.get("predicted_s") is None] + abs_errs = [abs(float(r["error_pct"])) for r in have_pred] + signed_errs = [float(r["error_pct"]) for r in have_pred] + return { + "n_total": len(records), + "n_with_estimate": len(have_pred), + "n_no_estimate": len(no_pred), + "median_abs_error_pct": (statistics.median(abs_errs) if abs_errs else None), + "median_signed_error_pct": ( + statistics.median(signed_errs) if signed_errs else None + ), + # "Within 25%" — a useful headline metric ("how often is the + # estimator usefully close?"). Roadmap target: ≥ 70% after a + # tier-4 calibration. + "pct_within_25": ( + round(100.0 * sum(1 for e in abs_errs if e <= 25.0) / len(abs_errs), 1) + if abs_errs + else None + ), + } + + +def _prediction_scatter_html( + records: list[dict], *, include_plotlyjs: bool +) -> Optional[str]: + """Scatter of predicted_s vs actual_s with a y=x reference line.""" + have_pred = [ + r + for r in records + if r.get("predicted_s") is not None and r.get("actual_s") is not None + ] + if len(have_pred) < 2: + return None + try: + import plotly.graph_objects as go + import plotly.io as pio + except ImportError: + return None + + # Hover labels show the calc spec so the user can identify outliers. + text_labels = [ + f"{r.get('method', '?')}/{r.get('basis', '?')} on {r.get('formula', '?')}" + for r in have_pred + ] + predicted = [float(r["predicted_s"]) for r in have_pred] + actual = [float(r["actual_s"]) for r in have_pred] + max_val = max(max(predicted), max(actual), 1.0) * 1.1 + + fig = go.Figure() + # y=x reference line (perfect prediction). + fig.add_trace( + go.Scatter( + x=[0, max_val], + y=[0, max_val], + mode="lines", + name="perfect (y=x)", + line=dict(color="#94a3b8", dash="dash", width=1), + hoverinfo="skip", + ) + ) + fig.add_trace( + go.Scatter( + x=predicted, + y=actual, + mode="markers", + name="run", + text=text_labels, + marker=dict(size=9, color="#6366f1", opacity=0.75), + hovertemplate=( + "%{text}
predicted: %{x:.2f} s
actual: %{y:.2f} s" + ), + ) + ) + fig.update_layout( + height=420, + xaxis=dict(title="Predicted (s)", range=[0, max_val]), + yaxis=dict(title="Actual (s)", range=[0, max_val]), + margin=dict(l=60, r=20, t=10, b=50), + plot_bgcolor="#ffffff", + legend=dict(orientation="h", x=0, y=1.05), + ) + return pio.to_html( + fig, + include_plotlyjs="inline" if include_plotlyjs else False, + full_html=False, + config={"displayModeBar": False}, + ) + + +def _prediction_accuracy_section( + records: list[dict], scatter_html: Optional[str] +) -> str: + """Render the "Prediction accuracy" section of the dashboard.""" + if not records: + return ( + "

Prediction accuracy

" + '

No predictions logged yet — run a few ' + "calculations and the estimator's track record will appear here.

" + "
" + ) + + m = _prediction_accuracy_metrics(records) + median_abs = m["median_abs_error_pct"] + median_signed = m["median_signed_error_pct"] + within_25 = m["pct_within_25"] + + # Banner when median absolute error exceeds 50%: estimator is in + # rough shape; re-running calibration usually helps. + banner = "" + if median_abs is not None and median_abs > 50.0: + banner = ( + '

' + f"⚠ Median absolute prediction error is {median_abs:.0f}%. " + "Re-running a deeper calibration tier (System Settings → Calibrate " + "time estimates) typically tightens this within ±25%." + "

" + ) + + cards = [ + _card("Predictions logged", str(m["n_total"])), + _card( + "With estimate", + f"{m['n_with_estimate']} / {m['n_total']}", + ), + ] + if median_abs is not None: + cards.append(_card("Median |error|", f"{median_abs:.1f}%")) + if median_signed is not None: + sign = "+" if median_signed >= 0 else "" + cards.append(_card("Median bias", f"{sign}{median_signed:.1f}%")) + if within_25 is not None: + cards.append(_card("Within ±25%", f"{within_25:.0f}%")) + if m["n_no_estimate"]: + cards.append(_card("No estimate", str(m["n_no_estimate"]))) + + chart_block = ( + scatter_html + if scatter_html + else ( + '

Need at least 2 predictions with an estimate ' + "before plotting accuracy.

" + ) + ) + return ( + "

Prediction accuracy

" + + banner + + f'
{"".join(cards)}
' + + chart_block + + "
" + ) + + +# --------------------------------------------------------------------------- +# Public API +# --------------------------------------------------------------------------- + + +def build_dashboard(out_path: Optional[Path] = None) -> Optional[Path]: + """Generate the QuantUI analytics dashboard as a self-contained HTML. + + Reads ``perf_log.jsonl`` from the active log directory (honouring + ``QUANTUI_LOG_DIR``) and writes the dashboard to ``out_path``. If + ``out_path`` is ``None``, defaults to ``/../dashboard.html`` + (one level up so it lives next to ``~/.quantui/`` rather than buried + in the logs folder). + + Returns the path to the written dashboard on success, or ``None`` if + there are zero records in the perf log (nothing to report — the + caller should surface that as an empty-state message). + """ + records = get_perf_history() + if not records: + return None + + if out_path is None: + out_path = _log_dir().parent / "dashboard.html" + out_path = Path(out_path) + + summary = _summary_metrics(records) + speedup_rows = _speedup_rows(records) + method_counts = _counts_by(records, "method") + calc_type_counts = _counts_by(records, "calc_type") + + # M-EST / EST.6: prediction-accuracy data lives in its own log file. + # Best-effort read — older installs without the file produce an + # empty list and the section degrades to an empty-state message. + try: + prediction_records = get_prediction_history() + except Exception: # noqa: BLE001 — best-effort + prediction_records = [] + + # Inline plotly.js exactly once (in the first figure that renders). + # Subsequent figures pass include_plotlyjs=False so we don't ship + # the ~3 MB bundle three times. + method_bar = _bar_chart_html( + method_counts, title="Method usage", include_plotlyjs=True + ) + calctype_bar = _bar_chart_html( + calc_type_counts, title="Calc-type distribution", include_plotlyjs=False + ) + timeline = _timeline_html(records, include_plotlyjs=False) + prediction_scatter = _prediction_scatter_html( + prediction_records, include_plotlyjs=False + ) + + generated = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC") + body = ( + '' + "QuantUI analytics" + _DASHBOARD_CSS + "" + "

QuantUI analytics

" + f'

Generated {generated} — {summary["total_runs"]} runs in perf log

' + + _overview_section(summary) + + _speedup_section(speedup_rows) + + _prediction_accuracy_section(prediction_records, prediction_scatter) + + _figure_section( + "Method usage", + method_bar, + "No method-tagged records found.", + ) + + _figure_section( + "Calc-type distribution", + calctype_bar, + "No calc-type-tagged records found.", + ) + + _figure_section( + "Recent timeline", + timeline, + "No timestamped records to plot.", + ) + + "
QuantUI analytics dashboard — open with any browser.
" + + "" + ) + out_path.parent.mkdir(parents=True, exist_ok=True) + out_path.write_text(body, encoding="utf-8") + return out_path diff --git a/quantui/app.py b/quantui/app.py index b6ff120..ed1db96 100644 --- a/quantui/app.py +++ b/quantui/app.py @@ -127,6 +127,9 @@ from quantui.app_exports import ( on_export as _exp_on_export, ) +from quantui.app_exports import ( + on_export_bundle as _exp_on_export_bundle, +) from quantui.app_exports import ( on_export_mol as _exp_on_export_mol, ) @@ -136,6 +139,9 @@ from quantui.app_exports import ( on_export_xyz as _exp_on_export_xyz, ) +from quantui.app_exports import ( + on_iso_export_cube as _exp_on_iso_export_cube, +) from quantui.app_formatters import ( format_freq_result as _fmt_freq_result, ) @@ -187,6 +193,9 @@ from quantui.app_runflow import ( on_cal_run as _run_on_cal_run, ) +from quantui.app_runflow import ( + on_cal_skip as _run_on_cal_skip, +) from quantui.app_runflow import ( on_cal_stop as _run_on_cal_stop, ) @@ -265,6 +274,9 @@ from quantui.app_runflow import ( on_solvent_cb_changed as _run_on_solvent_cb_changed, ) +from quantui.app_runflow import ( + on_tddft_seed_changed as _run_on_tddft_seed_changed, +) from quantui.app_runflow import ( populate_compare_list as _run_populate_compare_list, ) @@ -277,6 +289,9 @@ from quantui.app_runflow import ( refresh_results_browser as _run_refresh_results_browser, ) +from quantui.app_runflow import ( + refresh_tddft_seed_options as _run_refresh_tddft_seed_options, +) from quantui.app_runflow import ( update_estimate as _run_update_estimate, ) @@ -292,6 +307,9 @@ from quantui.app_visualization import ( build_vib_data_inner as _viz_build_vib_data_inner, ) +from quantui.app_visualization import ( + build_vib_export_html as _viz_build_vib_export_html, +) from quantui.app_visualization import ( on_ir_fwhm_changed as _viz_on_ir_fwhm_changed, ) @@ -419,11 +437,15 @@ from quantui.visualization_py3dmol import ( display_molecule as _display_molecule, ) + from quantui.visualization_py3dmol import ( + render_molecule_html as _render_molecule_html, + ) VISUALIZATION_AVAILABLE = True except ImportError: VISUALIZATION_AVAILABLE = False _display_molecule = None # type: ignore[assignment] + _render_molecule_html = None # type: ignore[assignment] _PLOTLYMOL_VIZ = False _PY3DMOL_VIZ = False _DEFAULT_VIZ_STYLE = "ball+stick" @@ -898,6 +920,22 @@ def __init__(self) -> None: self._last_ir_fig: Any = None self._last_uv_fig: Any = None self._last_orb_fig: Any = None + self._last_orb_info: Any = None + # Orbital state consumed by the Isosurface panel populator. Always + # initialized to None so ``pop_isosurface`` can read the attributes + # via direct access without raising AttributeError on a fresh app + # or on a history-replay where ``orbitals.npz`` is missing (BUG.8). + # ``_apply_analysis_context`` resets these between contexts so stale + # state from a prior calc cannot leak into the next molecule. + self._last_orb_mo_coeff: Any = None + self._last_orb_mol_atom: Any = None + self._last_orb_mol_basis: Any = None + # Last-generated cube file path + orbital label (M-EXPORT / EXPORT.5). + # Set by the isosurface render path; consumed by the Export cube + # button. Initialized here so the button handler reads ``None`` + # cleanly when no isosurface has been generated yet. + self._last_cube_path: Optional[Path] = None + self._last_cube_orbital: Optional[str] = None self._last_pes_fig: Any = None self._run_output_scroll_guard_installed: bool = False self._files_current_dir: Optional[Path] = None @@ -960,7 +998,7 @@ def display(self) -> None: display( widgets.VBox( [ - self._welcome_html, + self._welcome_header, widgets.HBox( [ self._activity_btn, @@ -1115,7 +1153,7 @@ def _build_status_panel(self) -> None: # ── Welcome header ──────────────────────────────────────────────────── def _build_welcome_header(self) -> None: - _bld_build_welcome_header(self) + _bld_build_welcome_header(self, layout_fn=_layout) # ── Shared widgets (Cell 3) ─────────────────────────────────────────── @@ -1354,6 +1392,12 @@ def _assemble_tabs(self) -> None: _rtp.insert(_rtp.index(self._to_analysis_btn), self.advanced_accordion) self.results_tab_panel.children = tuple(_rtp) + # POLISH.8 (M-POLISH, 2026-05-25): Log moved to be an + # Accordion inside the History tab — see build_output_tab for + # the wrap. Tab indices renumbered: Files 6→5, System Settings + # 7→6. Update any caller that depended on tab-index 5 being + # "Log" (notably _goto_output_tab — now navigates to History + # and expands the log accordion). self.root_tab = widgets.Tab( children=[ _calculate_content, @@ -1361,7 +1405,6 @@ def _assemble_tabs(self) -> None: self.analysis_tab_panel, self.history_panel, self.compare_panel, - self.log_tab_panel, self.files_tab_panel, self._status_tab_panel, ] @@ -1371,9 +1414,11 @@ def _assemble_tabs(self) -> None: self.root_tab.set_title(2, "Analysis") self.root_tab.set_title(3, "History") self.root_tab.set_title(4, "Compare") - self.root_tab.set_title(5, "Log") - self.root_tab.set_title(6, "Files") - self.root_tab.set_title(7, "Status") + self.root_tab.set_title(5, "Files") + # POLISH.4 (M-POLISH, 2026-05-25): "Status" was ambiguous — + # status of what? "System Settings" is what the tab actually + # holds (env info + calibration + GPU status + UI prefs). + self.root_tab.set_title(6, "System Settings") self.root_tab.observe( self._safe_cb(self._on_root_tab_changed), names="selected_index" ) @@ -1421,12 +1466,18 @@ def _wire_callbacks(self) -> None: self._freq_seed_dd.observe( self._safe_cb(self._on_freq_seed_changed), names="value" ) + self._tddft_seed_dd.observe( + self._safe_cb(self._on_tddft_seed_changed), names="value" + ) self._scan_type_dd.observe( self._safe_cb(self._update_scan_widgets), names="value" ) self._freq_seed_refresh_btn.on_click( lambda _btn: self._refresh_freq_seed_options() ) + self._tddft_seed_refresh_btn.on_click( + lambda _btn: self._refresh_tddft_seed_options() + ) # Notes + estimate self.method_dd.observe(self._safe_cb(self._update_notes), names="value") self.basis_dd.observe(self._safe_cb(self._update_notes), names="value") @@ -1454,6 +1505,12 @@ def _wire_callbacks(self) -> None: self._uv_export_btn.on_click(self._on_uv_export_plot) self._orb_export_btn.on_click(self._on_orb_export_plot) self._pes_export_btn.on_click(self._on_pes_export_plot) + self._vib_export_btn.on_click(self._on_vib_export_animation) + # M-EXPORT / EXPORT.4: per-panel CSV-to-clipboard / file buttons. + self._ir_copy_data_btn.on_click(self._on_ir_copy_data) + self._uv_copy_data_btn.on_click(self._on_uv_copy_data) + self._orb_copy_data_btn.on_click(self._on_orb_copy_data) + self._pes_copy_data_btn.on_click(self._on_pes_copy_data) # Accumulate / export self.accumulate_btn.on_click(self._on_accumulate) self.clear_btn.on_click(self._on_clear) @@ -1462,6 +1519,7 @@ def _wire_callbacks(self) -> None: ) self._cal_run_btn.on_click(self._on_cal_run) self._cal_stop_btn.on_click(self._on_cal_stop) + self._cal_skip_btn.on_click(self._on_cal_skip) self.export_btn.on_click(self._on_export) self.export_xyz_btn.on_click(self._on_export_xyz) self.export_mol_btn.on_click(self._on_export_mol) @@ -1530,6 +1588,9 @@ def _wire_callbacks(self) -> None: ) # Orbital isosurface generate button self._iso_generate_btn.on_click(self._on_iso_generate) + # M-EXPORT / EXPORT.5: cube + bundle exports + self._iso_export_cube_btn.on_click(self._on_iso_export_cube) + self._export_bundle_btn.on_click(self._on_export_bundle) # ── Files tab ──────────────────────────────────────────────────────── @@ -1767,6 +1828,7 @@ def _preview_file_path(self, path: Path) -> None: ".yml", ".xyz", ".cube", + ".molden", } if suffix in image_ext: @@ -1777,6 +1839,184 @@ def _preview_file_path(self, path: Path) -> None: self._set_files_status(f"Previewing image: {path.name}") return + if suffix == ".svg": + # IPython.display.Image doesn't handle SVG well — use SVG. + from IPython.display import SVG as _SVG + + with self._files_preview_output: + display(_SVG(filename=str(path))) + self._set_files_status(f"Previewing SVG: {path.name}") + return + + # POLISH.5 (M-POLISH, 2026-05-25): specialized previews for + # extensions where the generic text dump is unhelpful. Each + # handler caps file reads at 256 KB. On any exception inside a + # handler, fall through to the generic text dispatch below so + # the user always sees SOMETHING. Order matters: 3D-structure + # extensions (.xyz/.mol/.pdb) take precedence over their + # text-ext membership. + + if suffix in {".xyz", ".mol", ".pdb"}: + # 3D structure → py3Dmol viewer via raw model load. Falls + # through to text dispatch on failure (so the user still + # sees the raw coordinates). + try: + import py3Dmol as _p3d # type: ignore[import] + + model_format = {".xyz": "xyz", ".mol": "mol", ".pdb": "pdb"}[suffix] + raw_text = path.read_text(encoding="utf-8", errors="replace") + if len(raw_text) <= 256_000: + viewer = _p3d.view(width=500, height=380) + viewer.addModel(raw_text, model_format) + viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}}) + viewer.setBackgroundColor("white") + viewer.zoomTo() + html_str = viewer._make_html() + with self._files_preview_output: + display(HTML(html_str)) + self._set_files_status( + f"3D structure preview: {path.name}" + f" ({model_format.upper()})" + ) + return + except Exception: # noqa: BLE001 — fall through to text preview + pass + + if suffix == ".json": + try: + import json as _json_pretty + + raw = path.read_bytes()[:256_000] + parsed = _json_pretty.loads(raw.decode("utf-8", errors="replace")) + pretty = _json_pretty.dumps(parsed, indent=2, ensure_ascii=False) + # Cap line count so a 10k-key dict doesn't lock the viewport. + lines = pretty.splitlines() + truncated = False + if len(lines) > 500: + lines = lines[:500] + truncated = True + rendered = "\n".join(lines) + if truncated: + rendered += "\n\n[truncated to first 500 lines]" + with self._files_preview_output: + display( + HTML( + "
"
+                            f"{_html.escape(rendered)}
" + ) + ) + self._set_files_status(f"JSON preview: {path.name}") + return + except Exception: # noqa: BLE001 — fall through to text preview + pass + + if suffix == ".csv": + try: + import csv as _csv + + with open(path, encoding="utf-8", errors="replace", newline="") as fh: + reader = _csv.reader(fh) + rows: list[list[str]] = [] + for i, row in enumerate(reader): + if i >= 50: + break + rows.append(row) + if rows: + header = rows[0] + body = rows[1:] + head_html = "".join( + f'{_html.escape(str(c))}' + for c in header + ) + body_html = "".join( + "" + + "".join( + f'{_html.escape(str(c))}' + for c in r + ) + + "" + for r in body + ) + note = ( + f'

' + f"First {len(rows)} rows shown.

" + if len(rows) >= 50 + else "" + ) + table_html = ( + f"{note}" + '' + f"{head_html}" + f"{body_html}
" + ) + with self._files_preview_output: + display(HTML(table_html)) + self._set_files_status( + f"CSV preview: {path.name} ({len(rows)} rows)" + ) + return + except Exception: # noqa: BLE001 — fall through to text preview + pass + + if suffix in {".html", ".htm"}: + try: + raw = path.read_text(encoding="utf-8", errors="replace") + if len(raw) <= 1_000_000: + # Sandboxed iframe via srcdoc — embedded JS can't + # reach the parent app. + iframe_html = ( + '' + ) + with self._files_preview_output: + display(HTML(iframe_html)) + self._set_files_status(f"HTML preview (sandboxed): {path.name}") + return + except Exception: # noqa: BLE001 — fall through to text preview + pass + + if suffix == ".cube": + # Cube files can be hundreds of MB (volumetric data). Don't + # dump them — show the header + a size + a hint. + try: + stat = path.stat() + with open(path, encoding="utf-8", errors="replace") as fh: + head_lines = [] + for i, line in enumerate(fh): + if i >= 6: + break + head_lines.append(line.rstrip("\n")) + header_text = "\n".join(head_lines) + size_mb = stat.st_size / (1024 * 1024) + msg_html = ( + f'

' + f"Cube file: {_html.escape(path.name)} " + f"· {size_mb:.2f} MB

" + '

' + "Use the Analysis tab's Orbital Isosurface panel to " + "render volumetric data; the raw file is too large to " + "preview inline.

" + '

' + "Header (first 6 lines):

" + '
'
+                    f"{_html.escape(header_text)}
" + ) + with self._files_preview_output: + display(HTML(msg_html)) + self._set_files_status(f"Cube file metadata: {path.name}") + return + except Exception: # noqa: BLE001 — fall through to text preview + pass + is_text = suffix in text_ext if not is_text: try: @@ -1849,9 +2089,13 @@ def _on_files_entry_changed(self, change) -> None: self._set_files_status("Select a folder or file.") return if self._files_selected_path.is_dir(): - self._set_files_status(f"Folder selected: {self._files_selected_path.name}") + self._set_files_status( + f"Folder selected: {self._files_selected_path.name} — click Open to enter." + ) else: - self._set_files_status(f"File selected: {self._files_selected_path.name}") + # Auto-preview on selection so the user doesn't need to click Open + # for every file. Open remains useful for folders. + self._preview_file_path(self._files_selected_path) def _on_files_open(self, _btn) -> None: self._activity_begin("Opening selected path...") @@ -1939,19 +2183,70 @@ def _apply_plotly_theme(self, fig) -> None: ) def _set_html_output(self, out: widgets.Output, html: str) -> None: - """Render HTML into an Output widget. + """Render HTML into an Output widget via an atomic outputs swap. Plotly HTML contains + # that contains "Plotly". We expect exactly one such inline bundle. + assert "Plotly" in html + # Sanity: file is non-trivial size (plotly inline is ~3MB). + assert len(html) > 100_000 + + def test_dashboard_resilient_to_partial_records(self, isolated_log_dir): + # Records missing fields (early app version, partial writes) must + # not crash the dashboard build. + records = [ + {"timestamp": "2026-05-25T12:00:00+00:00"}, # bare minimum + _rec(), # full + ] + _write_perf_log(isolated_log_dir, records) + out = analytics.build_dashboard() + assert out is not None + assert out.exists() + + +class TestFormatHelpers: + def test_format_seconds_under_minute(self): + assert analytics._format_seconds(45.0) == "45.0 s" + + def test_format_seconds_minutes(self): + assert analytics._format_seconds(90.0) == "1.5 min" + + def test_format_seconds_hours(self): + assert analytics._format_seconds(7200.0) == "2.0 h" + + def test_counts_by_drops_missing(self): + records = [{"method": "B3LYP"}, {"method": ""}, {"method": "MP2"}, {}] + counts = analytics._counts_by(records, "method") + assert counts == {"B3LYP": 1, "MP2": 1} diff --git a/tests/test_app.py b/tests/test_app.py index 5cdd63c..1b89a03 100644 --- a/tests/test_app.py +++ b/tests/test_app.py @@ -8,7 +8,10 @@ from __future__ import annotations +import json import threading +from datetime import datetime +from pathlib import Path from unittest.mock import MagicMock, patch import ipywidgets as widgets @@ -192,9 +195,11 @@ def _cb() -> None: class TestTabStructure: """root_tab has the correct number and titles of tabs.""" - def test_eight_tabs(self): + def test_seven_tabs(self): + # POLISH.8 (M-POLISH, 2026-05-25): Log moved into the History + # tab as a sub-accordion → 8 root tabs → 7. app = QuantUIApp() - assert len(app.root_tab.children) == 8 + assert len(app.root_tab.children) == 7 def test_tab_titles(self): app = QuantUIApp() @@ -204,9 +209,12 @@ def test_tab_titles(self): "Analysis", "History", "Compare", - "Log", + # POLISH.8 (M-POLISH, 2026-05-25): Log tab moved into the + # History tab as a sub-accordion; Files + System Settings + # renumber to indices 5 and 6. "Files", - "Status", + # POLISH.4 (M-POLISH, 2026-05-25): "Status" → "System Settings". + "System Settings", ] for i, title in enumerate(expected): assert app.root_tab.get_title(i) == title @@ -1028,6 +1036,21 @@ def test_fwhm_slider_range(self): assert app._ir_fwhm_slider.min == 5.0 assert app._ir_fwhm_slider.max == 100.0 + def test_fwhm_slider_continuous_update_false(self): + # BUG.9 regression guard: continuous_update must be False so the + # slider only fires the observer on release, not 30-60 times per + # second during a drag (which produces visible flicker). + app = QuantUIApp() + assert app._ir_fwhm_slider.continuous_update is False + + def test_ir_fig_has_min_height(self): + # BUG.9 regression guard: min_height keeps the Output container + # from collapsing to 0px between renders. Pairs with the atomic + # outputs swap in _set_html_output to keep the IR panel + # flicker-free on mode toggle / slider changes. + app = QuantUIApp() + assert app._ir_fig.layout.min_height == "300px" + def test_ir_export_controls_exist(self): app = QuantUIApp() assert isinstance(app._ir_export_btn, widgets.Button) @@ -1087,6 +1110,1099 @@ def test_broadened_toggle_triggers_ir_figure_update(self): # --------------------------------------------------------------------------- +class TestSetHtmlOutputAtomic: + """_set_html_output must perform a single atomic outputs assignment. + + BUG.9 root cause: the previous implementation was clear_output() + + append_display_data(), which produced an intermediate empty state + between the two calls. On rapid invocations (IR FWHM slider drag, + Stick/Broadened toggle), the user saw the panel flash blank between + every re-render. Atomic outputs swap eliminates the intermediate + state in one widget-state update. + """ + + def test_outputs_is_single_entry_after_set(self): + app = QuantUIApp() + out = widgets.Output() + app._set_html_output(out, "

hello

") + assert len(out.outputs) == 1 + entry = out.outputs[0] + assert entry["output_type"] == "display_data" + assert entry["data"]["text/html"] == "

hello

" + + def test_outputs_replaces_prior_content_atomically(self): + # Repeated calls (e.g. FWHM slider scrub) must each produce a + # single-entry outputs tuple — never accumulating or clearing-then- + # appending (which would briefly empty the widget mid-update). + app = QuantUIApp() + out = widgets.Output() + app._set_html_output(out, "

first

") + app._set_html_output(out, "

second

") + app._set_html_output(out, "

third

") + assert len(out.outputs) == 1 + assert out.outputs[0]["data"]["text/html"] == "

third

" + + +class TestShowResult3DAtomic: + """``_show_result_3d`` must route through the atomic ``_set_html_output`` + swap rather than ``with output: display(viz)``. + + BUG.7 root cause: ``show_result_3d`` previously used the nested-Output + + main-thread ``display(viz)`` pattern, which intermittently produced a + blank 🙁 viewer on Analysis-tab history replay (same failure family as + resolved BUG.6 in trajectory render). After this fix, every invocation + leaves the target ``Output`` with a single-entry ``outputs`` tuple whose + ``text/html`` payload is non-empty. + """ + + def _make_water(self): + return Molecule( + atoms=["O", "H", "H"], + coordinates=[ + [0.0, 0.0, 0.0], + [0.96, 0.0, 0.0], + [-0.24, 0.93, 0.0], + ], + ) + + def test_analysis_mol_output_is_single_entry_after_show(self): + from quantui.app import _render_molecule_html + + if _render_molecule_html is None: + pytest.skip("No 3D visualization backend installed") + app = QuantUIApp() + app._show_result_3d(self._make_water(), extra_output=app._analysis_mol_output) + assert len(app._analysis_mol_output.outputs) == 1 + entry = app._analysis_mol_output.outputs[0] + assert entry["output_type"] == "display_data" + assert entry["data"]["text/html"].strip() != "" + + def test_result_viz_output_is_single_entry_after_show(self): + from quantui.app import _render_molecule_html + + if _render_molecule_html is None: + pytest.skip("No 3D visualization backend installed") + app = QuantUIApp() + app._show_result_3d(self._make_water(), extra_output=None) + assert len(app.result_viz_output.outputs) == 1 + entry = app.result_viz_output.outputs[0] + assert entry["output_type"] == "display_data" + assert entry["data"]["text/html"].strip() != "" + + def test_repeated_calls_do_not_accumulate_outputs(self): + # Backend-toggle scenario: re-render the same molecule multiple + # times and confirm the viewer is replaced atomically each time. + from quantui.app import _render_molecule_html + + if _render_molecule_html is None: + pytest.skip("No 3D visualization backend installed") + app = QuantUIApp() + mol = self._make_water() + for _ in range(3): + app._show_result_3d(mol, extra_output=app._analysis_mol_output) + assert len(app._analysis_mol_output.outputs) == 1 + assert len(app.result_viz_output.outputs) == 1 + + +class TestFreqSeedDropdownFilter: + """The Freq seed-geometry dropdown should only list prior geo-opts of + the currently-active molecule. + + Rationale: users selecting "Seed geometry" on the Frequency tab want a + geometry compatible with their current molecule. Listing a CH₄ geo-opt + while the user is working on H₂O is misleading and risks an accidental + geometry replacement. Filter is by formula (cheap and good enough for + the common case); strict atom-list match is queued under + M-HISTORY-HARDENING for later. + """ + + def _make_geo_opt_dir(self, root, formula, method="RHF", basis="STO-3G", offset=0): + # Offset the timestamp microseconds so directories sort + # deterministically when multiple fixtures share the same second. + ts = datetime.now().strftime("%Y-%m-%d_%H-%M-%S-") + f"{offset:06d}" + d = root / f"{ts}_{formula}_{method}_{basis}" + d.mkdir(parents=True) + (d / "result.json").write_text( + json.dumps( + { + "_schema_version": 2, + "timestamp": ts, + "calc_type": "geometry_opt", + "formula": formula, + "method": method, + "basis": basis, + } + ) + ) + (d / "trajectory.json").write_text("[]") + return d + + def _water(self): + return Molecule( + atoms=["O", "H", "H"], + coordinates=[[0.0, 0.0, 0.0], [0.96, 0.0, 0.0], [-0.24, 0.93, 0.0]], + ) + + def test_unfiltered_when_no_molecule_loaded(self, tmp_path, monkeypatch): + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + self._make_geo_opt_dir(tmp_path, "H2O", offset=1) + self._make_geo_opt_dir(tmp_path, "CH4", offset=2) + app = QuantUIApp() + assert app._molecule is None + app._refresh_freq_seed_options() + labels = [lbl for lbl, _ in app._freq_seed_dd.options] + assert any(lbl.startswith("H2O") for lbl in labels) + assert any(lbl.startswith("CH4") for lbl in labels) + + def test_filtered_to_current_molecule_formula(self, tmp_path, monkeypatch): + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + self._make_geo_opt_dir(tmp_path, "H2O", offset=1) + self._make_geo_opt_dir(tmp_path, "CH4", offset=2) + app = QuantUIApp() + app._molecule = self._water() + app._refresh_freq_seed_options() + labels = [lbl for lbl, _ in app._freq_seed_dd.options] + assert labels[0] == "(use current molecule)" + assert any(lbl.startswith("H2O") for lbl in labels) + assert not any(lbl.startswith("CH4") for lbl in labels) + + def test_set_molecule_triggers_filter(self, tmp_path, monkeypatch): + # Loading a new molecule should auto-refresh the dropdown so stale + # cross-molecule options drop out without the user clicking refresh. + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + self._make_geo_opt_dir(tmp_path, "H2O", offset=1) + self._make_geo_opt_dir(tmp_path, "CH4", offset=2) + app = QuantUIApp() + app._set_molecule(self._water(), label="test") + labels = [lbl for lbl, _ in app._freq_seed_dd.options] + assert any(lbl.startswith("H2O") for lbl in labels) + assert not any(lbl.startswith("CH4") for lbl in labels) + + +class TestPopIsosurfaceBug8: + """Regression tests for BUG.8: ``_pop_isosurface`` raised AttributeError + on single-point history replay when ``orbitals.npz`` was missing. + + Root cause: ``_last_orb_mo_coeff`` (and siblings) were only assigned by + ``show_orbital_diagram`` during a successful Energies-panel populate. + When that path returned early (no orbitals file or missing fields), the + attributes never existed, and the immediately-following Isosurface + populator's direct ``app._last_orb_mo_coeff is not None`` read blew up. + + Fix: initialize the attributes in ``__init__`` so they always exist, + reset them at the start of ``apply_analysis_context`` so stale state + can't leak between contexts, and use defensive ``getattr`` in the + populator as belt-and-suspenders. + """ + + def test_orb_state_initialized_on_fresh_app(self): + app = QuantUIApp() + # All three attributes must exist (initialized to None) so the + # populator can read them safely. + assert app._last_orb_mo_coeff is None + assert app._last_orb_mol_atom is None + assert app._last_orb_mol_basis is None + assert app._last_orb_info is None + + def test_pop_isosurface_on_fresh_app_returns_false_without_error(self): + # The exact crash scenario from the user's 2026-05-20 event log: + # a fresh QuantUIApp where no orbital data has been loaded yet + # should NOT raise; it should report the panel as unavailable. + from quantui.app_analysis import pop_isosurface + + app = QuantUIApp() + ctx = _AnalysisContext( + calc_type="single_point", + formula="H2O", + method="RHF", + basis="STO-3G", + ) + result = pop_isosurface(app, ctx) + assert result is False + + def test_apply_analysis_context_resets_orbital_state(self, tmp_path, monkeypatch): + # After running an SP that populated orbital state, replaying a + # different result (no orbitals.npz on disk) must NOT leak the + # previous calc's orbital arrays into the Isosurface panel. + from quantui.app_analysis import apply_analysis_context + + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + app = QuantUIApp() + # Simulate a prior live SP having populated orbital state. + app._last_orb_mo_coeff = [[1.0, 0.0], [0.0, 1.0]] + app._last_orb_mol_atom = [["H", [0.0, 0.0, 0.0]]] + app._last_orb_mol_basis = "sto-3g" + app._last_orb_info = MagicMock() + + # Now replay a context with no result_dir and no live_result — + # i.e. nothing to repopulate orbital state from. + ctx = _AnalysisContext( + calc_type="single_point", + formula="CH4", + method="RHF", + basis="STO-3G", + result_dir=None, + live_result=None, + ) + apply_analysis_context(app, ctx) + + # State must have been wiped — stale H2O orbitals must not survive + # into the CH4 context. + assert app._last_orb_mo_coeff is None + assert app._last_orb_mol_atom is None + assert app._last_orb_mol_basis is None + assert app._last_orb_info is None + + +class TestTDDFTSeedDropdown: + """The UV-Vis (TD-DFT) Calculate-tab tab exposes a seed-geometry dropdown + that mirrors the Frequency tab's behaviour (BUG.5). + + Acceptance: + - The dropdown widget exists with the placeholder option. + - On_calc_type_changed places the dropdown into ``calc_extra_opts`` + when UV-Vis (TD-DFT) is selected, but not for other calc types. + - Like the Frequency seed dropdown, options are filtered to saved + ``geometry_opt`` results whose formula matches the active molecule. + - Picking a seed disables the QM pre-opt checkbox (seed = already + optimised) and surfaces the green confirmation note. + - ``_set_molecule`` auto-refreshes both seed dropdowns. + """ + + def _make_geo_opt_dir(self, root, formula, method="RHF", basis="STO-3G", offset=0): + ts = datetime.now().strftime("%Y-%m-%d_%H-%M-%S-") + f"{offset:06d}" + d = root / f"{ts}_{formula}_{method}_{basis}" + d.mkdir(parents=True) + (d / "result.json").write_text( + json.dumps( + { + "_schema_version": 2, + "timestamp": ts, + "calc_type": "geometry_opt", + "formula": formula, + "method": method, + "basis": basis, + } + ) + ) + (d / "trajectory.json").write_text("[]") + return d + + def _water(self): + return Molecule( + atoms=["O", "H", "H"], + coordinates=[[0.0, 0.0, 0.0], [0.96, 0.0, 0.0], [-0.24, 0.93, 0.0]], + ) + + def test_seed_widgets_exist(self): + app = QuantUIApp() + assert isinstance(app._tddft_seed_dd, widgets.Dropdown) + assert isinstance(app._tddft_seed_refresh_btn, widgets.Button) + assert isinstance(app._tddft_seed_note, widgets.HTML) + # Initial placeholder option is present. + labels = [lbl for lbl, _ in app._tddft_seed_dd.options] + assert labels[0] == "(use current molecule)" + + def test_calc_type_uvvis_shows_seed_dropdown(self): + app = QuantUIApp() + app.calc_type_dd.value = "UV-Vis (TD-DFT)" + # The seed dropdown should now be one of the calc_extra_opts children. + descendants = list(app.calc_extra_opts.children) + # The seed dropdown is wrapped in an HBox with the refresh button. + found = False + for child in descendants: + if isinstance(child, widgets.HBox): + for sub in child.children: + if sub is app._tddft_seed_dd: + found = True + break + assert found, "UV-Vis tab should include the seed-geometry dropdown" + + def test_calc_type_single_point_does_not_show_seed_dropdown(self): + app = QuantUIApp() + app.calc_type_dd.value = "Single Point" + descendants = list(app.calc_extra_opts.children) + for child in descendants: + if isinstance(child, widgets.HBox): + for sub in child.children: + assert sub is not app._tddft_seed_dd + + def test_seed_options_filtered_by_formula(self, tmp_path, monkeypatch): + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + self._make_geo_opt_dir(tmp_path, "H2O", offset=1) + self._make_geo_opt_dir(tmp_path, "CH4", offset=2) + app = QuantUIApp() + app._molecule = self._water() + app._refresh_tddft_seed_options() + labels = [lbl for lbl, _ in app._tddft_seed_dd.options] + assert any(lbl.startswith("H2O") for lbl in labels) + assert not any(lbl.startswith("CH4") for lbl in labels) + + def test_set_molecule_triggers_tddft_seed_filter(self, tmp_path, monkeypatch): + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + self._make_geo_opt_dir(tmp_path, "H2O", offset=1) + self._make_geo_opt_dir(tmp_path, "CH4", offset=2) + app = QuantUIApp() + app._set_molecule(self._water(), label="test") + labels = [lbl for lbl, _ in app._tddft_seed_dd.options] + assert any(lbl.startswith("H2O") for lbl in labels) + assert not any(lbl.startswith("CH4") for lbl in labels) + + def test_picking_seed_disables_preopt_and_shows_note(self, tmp_path, monkeypatch): + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + seed_dir = self._make_geo_opt_dir(tmp_path, "H2O", offset=1) + app = QuantUIApp() + app._molecule = self._water() + app._refresh_tddft_seed_options() + # Pre-condition: pre-opt checkbox is enabled and toggled on. + app._freq_preopt_cb.disabled = False + app._freq_preopt_cb.value = True + # Pick the seed. + app._tddft_seed_dd.value = str(seed_dir) + # Post-condition: pre-opt is disabled and value cleared; note set. + assert app._freq_preopt_cb.disabled is True + assert app._freq_preopt_cb.value is False + assert "✓" in app._tddft_seed_note.value + + def test_clearing_seed_re_enables_preopt_and_clears_note( + self, tmp_path, monkeypatch + ): + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + seed_dir = self._make_geo_opt_dir(tmp_path, "H2O", offset=1) + app = QuantUIApp() + app._molecule = self._water() + app._refresh_tddft_seed_options() + app._tddft_seed_dd.value = str(seed_dir) + # Now clear the seed back to the placeholder. + app._tddft_seed_dd.value = "" + assert app._freq_preopt_cb.disabled is False + assert app._tddft_seed_note.value == "" + + +class TestVibExportAnimation: + """The Vibrational accordion exposes an export-to-HTML button that + writes the current mode as a self-contained animation file. + + Backend resolution is decoupled from the user's default backend + preference: plotlymol3d (preferred for export quality) with a py3Dmol + fallback. This separation is enforced inside ``build_vib_export_html`` + so a user whose default render backend is py3Dmol can still get the + higher-quality plotlymol animation when exporting. + """ + + def _water(self): + return Molecule( + atoms=["O", "H", "H"], + coordinates=[ + [0.0, 0.0, 0.0], + [0.96, 0.0, 0.0], + [-0.24, 0.93, 0.0], + ], + ) + + def _seed_vib_state(self, app): + """Populate the minimal state the export handler depends on. + + Mirrors what ``_render_vib_mode_py3dmol`` reads but does not exercise + the live-render path — keeps the test focused on the export surface. + """ + from types import SimpleNamespace + + mol = self._water() + freq_stub = SimpleNamespace( + frequencies_cm1=[100.0, 200.0, 300.0], + ir_intensities=[1.0, 1.0, 1.0], + displacements=[ + [[0.1, 0.0, 0.0], [-0.1, 0.0, 0.0], [0.0, 0.0, 0.0]], + [[0.0, 0.1, 0.0], [0.0, -0.1, 0.0], [0.0, 0.0, 0.0]], + [[0.0, 0.0, 0.1], [0.0, 0.0, -0.1], [0.0, 0.0, 0.0]], + ], + ) + app._last_vib_freq_result = freq_stub + app._last_vib_molecule = mol + app._last_vib_data = None # forces the py3dmol fallback in this test + app.vib_mode_dd.options = [ + ("Mode 1: 100.0 cm⁻¹", 1), + ("Mode 2: 200.0 cm⁻¹", 2), + ("Mode 3: 300.0 cm⁻¹", 3), + ] + app.vib_mode_dd.value = 1 + + def test_export_button_and_status_exist(self): + app = QuantUIApp() + assert hasattr(app, "_vib_export_btn") + assert isinstance(app._vib_export_btn, widgets.Button) + assert hasattr(app, "_vib_export_status") + assert isinstance(app._vib_export_status, widgets.HTML) + assert app._vib_export_status.value == "" + + def test_export_without_vib_state_shows_error_status(self, tmp_path, monkeypatch): + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + app = QuantUIApp() + # No _last_vib_freq_result / _last_vib_molecule yet. + app._on_vib_export_animation(None) + assert "color:#b91c1c" in app._vib_export_status.value + assert "No vibrational mode loaded" in app._vib_export_status.value + + def test_export_writes_html_and_reports_backend(self, tmp_path, monkeypatch): + from quantui.viz_backend_router import BackendAvailability + + if not BackendAvailability.from_environment().py3dmol: + pytest.skip("py3Dmol not available for export fallback test") + + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + app = QuantUIApp() + self._seed_vib_state(app) + # Force the py3Dmol fallback regardless of plotlymol installation — + # the goal here is to assert the fallback writes a real HTML file. + app._viz_availability = BackendAvailability(py3dmol=True, plotlymol=False) + app._last_result_dir = tmp_path + + app._on_vib_export_animation(None) + + assert "color:#16a34a" in app._vib_export_status.value + assert "Saved (py3dmol)" in app._vib_export_status.value + # Find the file the handler wrote. + files = list(tmp_path.glob("vib_*_mode1_*.html")) + assert len(files) == 1 + content = files[0].read_text(encoding="utf-8") + # py3Dmol HTML includes a 3dmoljs viewer block. + assert "viewer" in content.lower() or "3dmol" in content.lower() + + def test_export_no_backend_available_surfaces_error(self, tmp_path, monkeypatch): + from quantui.viz_backend_router import BackendAvailability + + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + app = QuantUIApp() + self._seed_vib_state(app) + app._viz_availability = BackendAvailability(py3dmol=False, plotlymol=False) + + app._on_vib_export_animation(None) + assert "color:#b91c1c" in app._vib_export_status.value + assert "No visualization backend available" in app._vib_export_status.value + + +class TestHistoryHardeningHist2: + """HIST.2: every history-load operation emits a single + ``history_load_timing`` event capturing total elapsed_ms + per-stage + breakdown. + + Acceptance: + - ``_LoadTimer.stage`` records elapsed_ms for each named sub-stage. + - ``_LoadTimer.emit`` calls ``calc_log.log_event`` with event_type + ``history_load_timing``, the total_ms, the op name, and per-stage + ``_ms`` keys. + - ``history_load_analysis`` emits exactly one timing event per call + with all expected stages. + - ``status="error"`` is reported when the loader raises mid-load. + - Telemetry failures (e.g. log_event itself raising) must NOT block + the load — they're swallowed inside ``emit``. + """ + + def _make_sp_result_dir(self, tmp_path): + ts = datetime.now().strftime("%Y-%m-%d_%H-%M-%S-") + "000001" + d = tmp_path / f"{ts}_H2O_RHF_STO-3G" + d.mkdir() + (d / "result.json").write_text( + json.dumps( + { + "_schema_version": 2, + "timestamp": ts, + "calc_type": "single_point", + "formula": "H2O", + "method": "RHF", + "basis": "STO-3G", + "energy_hartree": -75.0, + "energy_ev": -2041.0, + "homo_lumo_gap_ev": 8.0, + "converged": True, + "n_iterations": 10, + } + ) + ) + return d + + def test_load_timer_stage_records_elapsed_ms(self): + from quantui.app_history import _LoadTimer + + timer = _LoadTimer("test_op", Path("/tmp/dummy")) + with timer.stage("phase_a"): + pass # near-zero elapsed + with timer.stage("phase_b"): + pass + assert "phase_a" in timer._stages + assert "phase_b" in timer._stages + assert timer._stages["phase_a"] >= 0.0 + assert timer._stages["phase_b"] >= 0.0 + + def test_load_timer_emit_logs_event_with_stage_breakdown(self): + from quantui.app_history import _LoadTimer + + timer = _LoadTimer("test_op", Path("/tmp/dummy")) + with timer.stage("foo"): + pass + with patch("quantui.calc_log.log_event") as mock_log: + timer.emit(status="ok") + mock_log.assert_called_once() + event_type, _message = mock_log.call_args.args[:2] + kwargs = mock_log.call_args.kwargs + assert event_type == "history_load_timing" + assert kwargs["op"] == "test_op" + assert kwargs["status"] == "ok" + assert kwargs["total_ms"] >= 0.0 + assert "foo_ms" in kwargs + + def test_load_timer_emit_swallows_log_event_failures(self): + # If log_event raises (e.g. disk full), the timer's emit MUST NOT + # propagate the exception — telemetry must never block the load. + from quantui.app_history import _LoadTimer + + timer = _LoadTimer("test_op", Path("/tmp/dummy")) + with patch("quantui.calc_log.log_event", side_effect=RuntimeError("disk full")): + timer.emit(status="ok") # must not raise + + def test_history_load_analysis_emits_one_timing_event(self, tmp_path, monkeypatch): + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + result_dir = self._make_sp_result_dir(tmp_path) + app = QuantUIApp() + with ( + patch("quantui.calc_log.log_event") as mock_log, + patch.object(app, "_activity_pulse"), + ): + app._history_load_analysis(result_dir) + + # Find the history_load_timing event (mock_log captures many other + # events too — e.g. _refresh_file_browser may log nothing, but other + # observers do). + timing_calls = [ + call + for call in mock_log.call_args_list + if call.args and call.args[0] == "history_load_timing" + ] + assert len(timing_calls) == 1, ( + f"Expected exactly one history_load_timing event, got " + f"{len(timing_calls)}" + ) + kwargs = timing_calls[0].kwargs + assert kwargs["op"] == "history_load_analysis" + assert kwargs["status"] == "ok" + assert kwargs["total_ms"] >= 0.0 + # All five expected stages must appear. + expected_stages = { + "read_pyscf_log_ms", + "update_log_panel_ms", + "build_context_ms", + "mol_reconstruction_ms", + "show_result_3d_ms", + "apply_analysis_context_ms", + "nav_tab_ms", + } + actual_stages = set(kwargs.keys()) & expected_stages + assert actual_stages == expected_stages, ( + f"Missing stages: {expected_stages - actual_stages}; " + f"unexpected stages: {actual_stages - expected_stages}" + ) + + def test_history_load_analysis_reports_error_status_on_raise( + self, tmp_path, monkeypatch + ): + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + result_dir = self._make_sp_result_dir(tmp_path) + app = QuantUIApp() + with ( + patch("quantui.calc_log.log_event") as mock_log, + patch.object( + app, + "_apply_analysis_context", + side_effect=RuntimeError("simulated"), + ), + patch.object(app, "_activity_pulse"), + ): + try: + app._history_load_analysis(result_dir) + except RuntimeError: + pass + + timing_calls = [ + call + for call in mock_log.call_args_list + if call.args and call.args[0] == "history_load_timing" + ] + assert len(timing_calls) == 1 + assert timing_calls[0].kwargs["status"] == "error" + + +class TestHistoryHardeningHist6: + """HIST.6: strict atom-list + coordinate match for the seed-geometry + dropdown filter, replacing the formula-only filter shipped in session 54. + + Acceptance: + - Two same-formula candidates with DIFFERENT starting geometries + (different isomers / conformers) are correctly excluded from each + other's seed dropdown when the active molecule matches only one of + them by coordinates. + - Two same-formula candidates with starting geometries within the RMSD + tolerance of the active molecule's coordinates BOTH appear. + - Malformed or missing ``trajectory.json`` falls through to a formula- + only match (don't punish the user for a corrupt history entry). + - ``_load_starting_geometry`` caches per-result results so repeated + dropdown refreshes don't re-parse the same JSON files. + """ + + def _make_geo_opt_dir_with_trajectory( + self, + root, + formula, + atoms, + starting_coords, + offset=0, + method="RHF", + basis="STO-3G", + ): + from pathlib import Path + + ts = datetime.now().strftime("%Y-%m-%d_%H-%M-%S-") + f"{offset:06d}" + d = Path(root) / f"{ts}_{formula}_{method}_{basis}" + d.mkdir(parents=True) + (d / "result.json").write_text( + json.dumps( + { + "_schema_version": 2, + "timestamp": ts, + "calc_type": "geometry_opt", + "formula": formula, + "method": method, + "basis": basis, + } + ) + ) + (d / "trajectory.json").write_text( + json.dumps( + { + "atoms": atoms, + "charge": 0, + "multiplicity": 1, + "steps": [ + { + "coords": [ + list(map(float, row)) for row in starting_coords + ], + "energy": -75.0, + } + ], + } + ) + ) + return d + + def _water_coords(self, displacement=0.0): + # Returns water coords; ``displacement`` lets us produce a second + # water at a controllable RMSD distance from the canonical one. + return [ + [0.0 + displacement, 0.0, 0.0], + [0.96 + displacement, 0.0, 0.0], + [-0.24 + displacement, 0.93, 0.0], + ] + + def _water_molecule(self): + return Molecule(atoms=["O", "H", "H"], coordinates=self._water_coords(0.0)) + + def setup_method(self, _method): + # Tests share a module-level cache (_SEED_GEOMETRY_CACHE) for + # geometry parses; clear it before each test for determinism. + from quantui.app_runflow import _SEED_GEOMETRY_CACHE + + _SEED_GEOMETRY_CACHE.clear() + + def test_same_formula_different_geometry_excluded(self, tmp_path, monkeypatch): + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + # Active molecule = water at canonical coords. + # Saved A: same coords → matches. + # Saved B: coords shifted by 2 Å → RMSD ≈ 2 Å ≫ 0.1 Å → excluded. + self._make_geo_opt_dir_with_trajectory( + tmp_path, "H2O", ["O", "H", "H"], self._water_coords(0.0), offset=1 + ) + self._make_geo_opt_dir_with_trajectory( + tmp_path, "H2O", ["O", "H", "H"], self._water_coords(2.0), offset=2 + ) + app = QuantUIApp() + app._molecule = self._water_molecule() + app._refresh_freq_seed_options() + labels = [lbl for lbl, _ in app._freq_seed_dd.options] + assert len(labels) == 2, labels + assert labels[0] == "(use current molecule)" + assert labels[1].startswith("H2O") + + def test_same_formula_within_tolerance_included(self, tmp_path, monkeypatch): + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + # Two candidates, both within 0.1 Å RMSD of the active mol. + self._make_geo_opt_dir_with_trajectory( + tmp_path, "H2O", ["O", "H", "H"], self._water_coords(0.0), offset=1 + ) + self._make_geo_opt_dir_with_trajectory( + tmp_path, "H2O", ["O", "H", "H"], self._water_coords(0.02), offset=2 + ) + app = QuantUIApp() + app._molecule = self._water_molecule() + app._refresh_freq_seed_options() + labels = [lbl for lbl, _ in app._freq_seed_dd.options] + assert len(labels) == 3, labels + assert sum(1 for lbl in labels if lbl.startswith("H2O")) == 2 + + def test_atom_order_mismatch_excluded(self, tmp_path, monkeypatch): + # Strict atom-order policy: ["H","O","H"] is not the same as + # ["O","H","H"] even though the formula matches. + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + self._make_geo_opt_dir_with_trajectory( + tmp_path, "H2O", ["O", "H", "H"], self._water_coords(0.0), offset=1 + ) + self._make_geo_opt_dir_with_trajectory( + tmp_path, "H2O", ["H", "O", "H"], self._water_coords(0.0), offset=2 + ) + app = QuantUIApp() + app._molecule = self._water_molecule() + app._refresh_freq_seed_options() + labels = [lbl for lbl, _ in app._freq_seed_dd.options] + assert len(labels) == 2 + assert labels[1].startswith("H2O") + + def test_malformed_trajectory_falls_back_to_formula_match( + self, tmp_path, monkeypatch + ): + # Malformed trajectory.json must NOT crash — and must fall through + # to formula-only match so the candidate still appears. + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + ts = datetime.now().strftime("%Y-%m-%d_%H-%M-%S-") + "000001" + d = tmp_path / f"{ts}_H2O_RHF_STO-3G" + d.mkdir() + (d / "result.json").write_text( + json.dumps( + { + "_schema_version": 2, + "timestamp": ts, + "calc_type": "geometry_opt", + "formula": "H2O", + "method": "RHF", + "basis": "STO-3G", + } + ) + ) + (d / "trajectory.json").write_text("[]") # malformed (list, not dict) + app = QuantUIApp() + app._molecule = self._water_molecule() + app._refresh_freq_seed_options() + labels = [lbl for lbl, _ in app._freq_seed_dd.options] + assert any(lbl.startswith("H2O") for lbl in labels) + + def test_starting_geometry_cache_hit_avoids_reread(self, tmp_path): + # _load_starting_geometry must cache per-result so back-to-back + # refreshes (e.g. when both Freq and UV-Vis dropdowns refresh from + # the same _set_molecule call) don't re-parse the JSON. + from quantui.app_runflow import ( + _SEED_GEOMETRY_CACHE, + _load_starting_geometry, + ) + + _SEED_GEOMETRY_CACHE.clear() + d = self._make_geo_opt_dir_with_trajectory( + tmp_path, "H2O", ["O", "H", "H"], self._water_coords(0.0), offset=1 + ) + first = _load_starting_geometry(d) + assert first is not None + # Second call must return the cached object without touching disk. + with patch("pathlib.Path.read_text") as mock_read: + second = _load_starting_geometry(d) + assert second is first + mock_read.assert_not_called() + + +class TestMExportCopyPlotData: + """M-EXPORT / EXPORT.4: every spectrum / diagram panel offers a + "Copy data" button that exports the plot's (x, y) data to CSV and + attempts a clipboard copy via the browser's clipboard API. + + Acceptance: + - ``_fig_to_csv`` extracts per-trace (x, y) data from a Plotly figure + in the documented CSV layout; empty figure → empty string (caller + treats as "nothing to copy" rather than writing an empty file). + - Each plot panel (IR, UV-Vis, orbital, PES) exposes a + ``_*_copy_data_btn`` widget. + - The handler writes a CSV file to the active result directory and + updates the panel's status widget. + - The status reports an error when no figure has been rendered yet. + - Output CSV round-trips cleanly via stdlib ``csv.reader``. + """ + + def _make_simple_fig(self): + import plotly.graph_objects as go + + return go.Figure( + go.Scatter(x=[1.0, 2.0, 3.0], y=[10.0, 20.0, 30.0], name="trace0") + ) + + def _make_two_trace_fig(self): + import plotly.graph_objects as go + + fig = go.Figure() + fig.add_trace(go.Bar(x=[100, 200], y=[5, 8], name="Stick")) + fig.add_trace(go.Scatter(x=[100, 150, 200], y=[1, 4, 8], name="Broadened")) + return fig + + def test_fig_to_csv_returns_empty_string_for_none(self): + assert QuantUIApp._fig_to_csv(None) == "" + + def test_fig_to_csv_returns_empty_string_when_no_traces(self): + import plotly.graph_objects as go + + fig = go.Figure() # no data + assert QuantUIApp._fig_to_csv(fig) == "" + + def test_fig_to_csv_extracts_single_trace(self): + fig = self._make_simple_fig() + csv_text = QuantUIApp._fig_to_csv(fig, title="Test Plot") + assert "# Test Plot" in csv_text + assert "# trace0" in csv_text + assert "x,y" in csv_text + assert "1.0,10.0" in csv_text + assert "3.0,30.0" in csv_text + + def test_fig_to_csv_extracts_multi_trace_with_separator_sections(self): + fig = self._make_two_trace_fig() + csv_text = QuantUIApp._fig_to_csv(fig) + assert "# Stick" in csv_text + assert "# Broadened" in csv_text + # Each section gets its own "x,y" header — the layout is + # repeated, not merged into one wide table. + assert csv_text.count("x,y") == 2 + + def test_fig_to_csv_output_round_trips_via_stdlib_csv(self): + import csv as _csv + import io as _io + + fig = self._make_simple_fig() + text = QuantUIApp._fig_to_csv(fig, title="Roundtrip") + # Strip the "# ..." comment lines, leaving the actual rows. + lines = [ + line for line in text.splitlines() if line and not line.startswith("#") + ] + reader = _csv.reader(_io.StringIO("\n".join(lines))) + rows = list(reader) + assert rows[0] == ["x", "y"] + assert rows[1:] == [ + ["1.0", "10.0"], + ["2.0", "20.0"], + ["3.0", "30.0"], + ] + + def test_all_four_panels_expose_copy_data_button(self): + app = QuantUIApp() + for prefix in ("ir", "uv", "orb", "pes"): + btn = getattr(app, f"_{prefix}_copy_data_btn", None) + assert isinstance(btn, widgets.Button), f"missing _{prefix}_copy_data_btn" + assert btn.description == "Copy data" + + def test_copy_data_with_no_figure_shows_error_status(self, tmp_path, monkeypatch): + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + app = QuantUIApp() + app._last_ir_fig = None + app._on_ir_copy_data(None) + assert "color:#b91c1c" in app._ir_export_status.value + assert "No plot data" in app._ir_export_status.value + + def test_copy_data_writes_csv_to_result_dir(self, tmp_path, monkeypatch): + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + app = QuantUIApp() + app._last_result_dir = tmp_path + app._last_ir_fig = self._make_simple_fig() + app._on_ir_copy_data(None) + assert "color:#16a34a" in app._ir_export_status.value + assert "Saved CSV" in app._ir_export_status.value + csv_files = list(tmp_path.glob("ir_spectrum_data_*.csv")) + assert len(csv_files) == 1 + content = csv_files[0].read_text(encoding="utf-8") + assert "trace0" in content + assert "1.0,10.0" in content + + def test_copy_data_handles_figure_with_no_extractable_traces( + self, tmp_path, monkeypatch + ): + import plotly.graph_objects as go + + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + app = QuantUIApp() + app._last_result_dir = tmp_path + app._last_ir_fig = go.Figure() # empty + app._on_ir_copy_data(None) + assert "color:#b91c1c" in app._ir_export_status.value + assert "no extractable" in app._ir_export_status.value.lower() + + +class TestHistoryHardeningHist1: + """HIST.1: clicking View Results / View Analysis on a History selection + must give the user immediate visual feedback. + + Acceptance: + - ``_activity_count`` increments while the loader runs (toolbar + indicator turns to "UI Active") and decrements back to 0 on completion. + - Source buttons (View Results, View Analysis) are disabled at start of + load and re-enabled at end — prevents double-click + signals "loading". + - The feedback contract holds even if the load raises (try/finally). + """ + + def _make_sp_result_dir(self, tmp_path): + """Create a minimal saved single-point result the loader can read.""" + ts = datetime.now().strftime("%Y-%m-%d_%H-%M-%S-") + "000001" + d = tmp_path / f"{ts}_H2O_RHF_STO-3G" + d.mkdir() + (d / "result.json").write_text( + json.dumps( + { + "_schema_version": 2, + "timestamp": ts, + "calc_type": "single_point", + "formula": "H2O", + "method": "RHF", + "basis": "STO-3G", + "energy_hartree": -75.0, + "energy_ev": -2041.0, + "homo_lumo_gap_ev": 8.0, + "converged": True, + "n_iterations": 10, + } + ) + ) + return d + + def test_history_load_analysis_lights_activity_indicator( + self, tmp_path, monkeypatch + ): + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + result_dir = self._make_sp_result_dir(tmp_path) + app = QuantUIApp() + # The loader bumps _activity_count up by 1 inside its body and back + # down on exit. Patch _apply_analysis_context to capture the live + # count mid-load. Patch out the tab-switch pulse so its timer doesn't + # race the assertion (the load ends by setting root_tab.selected_index + # which fires _activity_pulse on a 160ms Timer thread — separate from + # the loader's own begin/end pair we're verifying here). + captured_count: list[int] = [] + original_apply = app._apply_analysis_context + + def _capture_count(ctx): + captured_count.append(app._activity_count) + return original_apply(ctx) + + with ( + patch.object(app, "_apply_analysis_context", side_effect=_capture_count), + patch.object(app, "_activity_pulse"), + ): + app._history_load_analysis(result_dir) + assert captured_count == [1] # exactly one active op while loading + assert app._activity_count == 0 # restored after completion + + def test_history_load_analysis_disables_source_buttons(self, tmp_path, monkeypatch): + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + result_dir = self._make_sp_result_dir(tmp_path) + app = QuantUIApp() + btn_a = widgets.Button(description="View Results") + btn_b = widgets.Button(description="View Analysis") + # Both buttons start enabled. + assert btn_a.disabled is False + assert btn_b.disabled is False + + # Capture disabled state mid-load. + captured: dict[str, bool] = {} + original_apply = app._apply_analysis_context + + def _capture(ctx): + captured["a"] = btn_a.disabled + captured["b"] = btn_b.disabled + return original_apply(ctx) + + with patch.object(app, "_apply_analysis_context", side_effect=_capture): + app._history_load_analysis(result_dir, source_btns=(btn_a, btn_b)) + assert captured == {"a": True, "b": True} + # Buttons restored after the load. + assert btn_a.disabled is False + assert btn_b.disabled is False + + def test_feedback_restored_even_on_exception(self, tmp_path, monkeypatch): + # If the loader raises mid-load, the activity counter and buttons + # must still be restored — try/finally contract. + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + result_dir = self._make_sp_result_dir(tmp_path) + app = QuantUIApp() + btn = widgets.Button(description="View") + + with patch.object( + app, + "_apply_analysis_context", + side_effect=RuntimeError("simulated failure"), + ): + try: + app._history_load_analysis(result_dir, source_btns=(btn,)) + except RuntimeError: + pass + assert app._activity_count == 0 + assert btn.disabled is False + + +class TestHistoryHardeningHist5: + """HIST.5: history dropdown labels must expose calc type before selection. + + The current ``refresh_results_browser`` formats each option as + ``" · [] /"``, + where the badge is the friendly name from ``_calc_type_badge``. This + test locks in that contract — particularly the bracketed badge — so + a future refactor can't accidentally drop the calc-type prefix that the + user originally reported missing in the M-PLOT user report. + """ + + def _make_result(self, tmp_path, formula, calc_type, offset): + ts = datetime.now().strftime("%Y-%m-%d_%H-%M-%S-") + f"{offset:06d}" + d = tmp_path / f"{ts}_{formula}_RHF_STO-3G" + d.mkdir() + (d / "result.json").write_text( + json.dumps( + { + "_schema_version": 2, + "timestamp": ts, + "calc_type": calc_type, + "formula": formula, + "method": "RHF", + "basis": "STO-3G", + } + ) + ) + # Geometry opt needs trajectory.json for the seed-dropdown side-path, + # but refresh_results_browser doesn't gate on it. + return d + + def test_dropdown_label_includes_calc_badge_for_each_type( + self, tmp_path, monkeypatch + ): + monkeypatch.setenv("QUANTUI_RESULTS_DIR", str(tmp_path)) + self._make_result(tmp_path, "H2O", "single_point", offset=1) + self._make_result(tmp_path, "H2O", "geometry_opt", offset=2) + self._make_result(tmp_path, "H2O", "frequency", offset=3) + self._make_result(tmp_path, "H2O", "tddft", offset=4) + self._make_result(tmp_path, "H2O", "nmr", offset=5) + self._make_result(tmp_path, "H2O", "pes_scan", offset=6) + app = QuantUIApp() + app._refresh_results_browser() + labels = [lbl for lbl, _ in app.past_dd.options] + # POLISH.6 (M-POLISH, 2026-05-25) prepends a + # "(select a calculation to view)" placeholder so the dropdown + # opens in an explicit no-selection state. Strip it before + # asserting per-row badge contents. + result_labels = [lbl for lbl in labels if "select a calculation" not in lbl] + # Every result row must include a bracketed badge. + assert all("[" in lbl and "]" in lbl for lbl in result_labels), result_labels + joined = " ".join(result_labels) + for expected in ("[SP]", "[GeoOpt]", "[Freq]", "[UV-Vis]", "[NMR]", "[PES]"): + assert expected in joined, f"missing badge {expected} in {result_labels}" + + class TestUVVisSpectrumWidgets: """UV-Vis accordion and controls exist in correct initial state.""" diff --git a/tests/test_benchmarks.py b/tests/test_benchmarks.py index d056d4d..32fa6cf 100644 --- a/tests/test_benchmarks.py +++ b/tests/test_benchmarks.py @@ -152,16 +152,26 @@ def test_progress_called_for_each_step(self): calls = [] stop = threading.Event() - # Only run first 2 steps for speed - with patch("quantui.benchmarks.BENCHMARK_SUITE", BENCHMARK_SUITE[:2]): + # Only run first 2 steps for speed. ``_MODE_TO_SUITE["tier1"]`` is the + # actual binding ``run_calibration`` reads at call time — patching + # ``BENCHMARK_SUITE`` alone no longer propagates, since + # ``BENCHMARK_SUITE_TIER1`` aliases the original list at import time. + with patch.dict( + "quantui.benchmarks._MODE_TO_SUITE", + {"tier1": BENCHMARK_SUITE[:2]}, + ): run_calibration( progress_cb=lambda *a: calls.append(a), stop_event=stop, timeout_per_step=60.0, ) - assert len(calls) == 2 - step_n, total, label, status, elapsed = calls[0] + # Filter to terminal per-step calls; intermediate "running" heartbeats + # (emitted every ~500ms while a step is in-flight) are an implementation + # detail of the live-status display and should not be counted here. + terminal = [c for c in calls if c[3] != "running"] + assert len(terminal) == 2 + step_n, total, label, status, elapsed = terminal[0] assert step_n == 1 assert total == 2 assert isinstance(label, str) diff --git a/tests/test_bug_regressions_2026_05_25.py b/tests/test_bug_regressions_2026_05_25.py new file mode 100644 index 0000000..b57dc47 --- /dev/null +++ b/tests/test_bug_regressions_2026_05_25.py @@ -0,0 +1,187 @@ +"""Regression tests for the four bugs reported in session 55 (2026-05-25). + +Bug A — GPU-run results saved with no MO data + ``_run_session_calc_body`` extracts ``mf.mo_energy`` / ``mo_coeff`` / + ``mo_occ`` via ``numpy.array(...)``. With a GPU-offloaded ``mf`` those + are CuPy arrays — numpy refuses implicit device transfers, so the + bare ``except`` swallowed a ``TypeError`` and the SessionResult + shipped with all MO fields ``None``. That made ``save_orbitals`` + no-op and history replay of any GPU-run SP/GeoOpt rendered "Not + available" in Energies + Isosurface panels. + +Bug B1/B2/B3 — Calculate-tab molecule viewer used the + ``with self.viz_output: display_molecule(...)`` pattern. Symptoms: + initial render wouldn't appear after a PubChem search (B1); + PlotlyMol RDKit valence errors spilled out as red logger lines + around the viewer (B2); generic ``logger.info`` lines from the + renderer were captured into the Output widget (B3). Fix migrates + to ``_refresh_calc_mol_viewer`` which renders HTML outside any + Output context and atomic-swaps into ``viz_output``. + +Bug C — Frequency pre-opt on benzene crashed the whole calc with + "singular matrix" in PySCF's ``cho_solve``. Three pre-opt sites + in ``_do_run`` now ``try/except`` around ``optimize_geometry`` and + fall back to the user-provided geometry on failure. +""" + +from __future__ import annotations + +import inspect + +import numpy as np +import pytest + +# ===================================================================== +# Bug A — cupy-aware MO array extraction in session_calc +# ===================================================================== + + +class _FakeCupyArray: + """A minimal stand-in for a CuPy array: numpy refuses to convert it + directly, but it exposes ``.get()`` (sync device→host copy) and + its ``type(...).__module__`` starts with ``"cupy"`` — the two + properties the fix probes.""" + + def __init__(self, host_data): + self._host = np.asarray(host_data) + + def get(self): + return self._host + + # numpy.asarray on a non-array-like falls back to object dtype unless + # we make the conversion explicitly fail like the real cupy. + def __array__(self, dtype=None): + raise TypeError( + "Implicit conversion to a NumPy array is not allowed. " + "Please use `.get()` to construct a NumPy array explicitly." + ) + + +# Pin __module__ so the type probe matches. +_FakeCupyArray.__module__ = "cupy._core.core" + + +def _extract_to_numpy(arr): + """Re-implementation of the closure to keep the test independent of + session_calc's import side effects. Mirrors the production helper: + detect CuPy by ``.get()`` callable + module prefix, otherwise pass + through ``np.asarray``.""" + if arr is None: + return None + get = getattr(arr, "get", None) + if callable(get) and type(arr).__module__.startswith("cupy"): + return np.asarray(get()) + return np.asarray(arr) + + +class TestBugA_CupyAwareConversion: + def test_none_passes_through(self): + assert _extract_to_numpy(None) is None + + def test_numpy_array_passes_through(self): + a = np.array([1.0, 2.0, 3.0]) + out = _extract_to_numpy(a) + np.testing.assert_array_equal(out, a) + + def test_cupy_like_is_converted_via_get(self): + fake = _FakeCupyArray([4.0, 5.0, 6.0]) + out = _extract_to_numpy(fake) + assert isinstance(out, np.ndarray) + np.testing.assert_array_equal(out, [4.0, 5.0, 6.0]) + + def test_bare_numpy_conversion_of_cupy_like_raises(self): + # Sanity: the production fix is needed precisely because the + # naive call (pre-fix code) raises. If this test ever stops + # raising, the regression guard is moot. + fake = _FakeCupyArray([1.0]) + with pytest.raises(TypeError): + np.array(fake) + + def test_production_helper_uses_to_numpy_array(self): + # Confirm the actual session_calc body contains the + # ``_to_numpy_array`` helper (so a future refactor that drops it + # breaks this test loudly). + from quantui import session_calc + + src = inspect.getsource(session_calc) + assert "_to_numpy_array" in src + assert "cupy" in src.lower() + + +# ===================================================================== +# Bug B — Calculate-tab molecule viewer uses atomic HTML swap +# ===================================================================== + + +class TestBugB_AtomicMolViewerSwap: + def test_app_has_refresh_calc_mol_viewer(self): + from quantui.app import QuantUIApp + + app = QuantUIApp() + assert hasattr(app, "_refresh_calc_mol_viewer") + + def test_refresh_calc_mol_viewer_handles_none_molecule(self): + from quantui.app import QuantUIApp + + app = QuantUIApp() + # No molecule loaded yet → must return cleanly, not raise. + assert app._molecule is None + app._refresh_calc_mol_viewer() # should not raise + + def test_calc_tab_does_not_use_with_viz_output_display_pattern(self): + # The BUG.7 pattern (Analysis tab) and this bug-batch's fix both + # forbid the ``with self.viz_output: display_molecule(...)`` + # idiom. Verify no occurrence remains in the migrated section. + from quantui import app as _app_mod + + src = inspect.getsource(_app_mod) + # ``_display_molecule`` is the imported alias; the fix removed + # all 5 of its call sites. The module may still import it for + # backwards compat, so we only check that the buggy + # idiom (``with self.viz_output:`` followed by a + # ``_display_molecule`` call) is gone. + idx = 0 + while True: + idx = src.find("with self.viz_output:", idx) + if idx < 0: + break + # Look at the next ~200 characters for a _display_molecule + # call. If we find one, the bad idiom is still present. + window = src[idx : idx + 400] + assert "_display_molecule(" not in window, ( + "Found ``with self.viz_output: _display_molecule(...)`` " + "idiom; should be migrated to _refresh_calc_mol_viewer " + "(BUG B1/B2/B3)." + ) + idx += 1 + + +# ===================================================================== +# Bug C — Pre-opt failures fall back to user geometry instead of crashing +# ===================================================================== + + +class TestBugC_PreoptFailureFallback: + def test_freq_preopt_block_has_try_except(self): + # Confirm the source contains the new fallback paths. Reading + # the source is the most direct way to assert this; running the + # actual freq calc would require PySCF. + # + # POLISH.9 (2026-05-25) renamed user-facing "Pre-optimisation" + # → "Geometry optimization"; update the guard string to match. + from quantui import app as _app_mod + + src = inspect.getsource(_app_mod) + assert "Geometry optimization failed" in src + # The exception variable name (_pre_exc) is unique to the new + # try/except wrapping all three pre-opt sites. + assert src.count("except Exception as _pre_exc") >= 3 + + def test_freq_preopt_fallback_uses_user_geometry(self): + # The fallback message should make it clear the calc continues + # with the user-provided geometry — that's the contract the bug + # report asked for. + from quantui import app as _app_mod + + src = inspect.getsource(_app_mod) + assert "user-provided geometry" in src or "seed geometry as-is" in src diff --git a/tests/test_c_stderr.py b/tests/test_c_stderr.py new file mode 100644 index 0000000..7a2a15e --- /dev/null +++ b/tests/test_c_stderr.py @@ -0,0 +1,126 @@ +"""Tests for the M-STDERR / STDERR.1 fd-level stderr capture helper.""" + +from __future__ import annotations + +import io +import os + +import pytest + +from quantui.c_stderr import capture_c_stderr + +_POSIX_ONLY = pytest.mark.skipif( + os.name != "posix", + reason="capture_c_stderr is POSIX-only (fd dup/dup2); no-op on Windows", +) + + +class TestWindowsNoOp: + """On Windows the context manager is a no-op and must not touch fds.""" + + def test_yields_without_raising_on_windows(self): + if os.name == "posix": + pytest.skip("Windows-specific behavior test") + relay = io.StringIO() + with capture_c_stderr(relay): + pass + # On Windows the relay must remain empty — capture_c_stderr did + # nothing. + assert relay.getvalue() == "" + + def test_relay_none_works_on_windows(self): + if os.name == "posix": + pytest.skip("Windows-specific behavior test") + with capture_c_stderr(None): + pass # must not raise + + +@_POSIX_ONLY +class TestPosixCaptureBehavior: + """The interesting fd-manipulation behavior — only runnable on POSIX.""" + + def test_captures_fd_writes_into_relay_stream(self): + relay = io.StringIO() + with capture_c_stderr(relay): + os.write(2, b"hello from c code\n") + # After exit the captured bytes must be in the relay stream. + assert "hello from c code" in relay.getvalue() + + def test_restores_original_stderr_fd_on_exit(self): + # Sanity: after the wrapped block, writes to fd 2 must NOT go to + # the temp file anymore. We check by writing one captured byte + # inside, then writing a byte outside — the relay must contain + # only the first. + relay = io.StringIO() + with capture_c_stderr(relay): + os.write(2, b"inside\n") + # If the fd weren't restored, this write would still hit the + # (now-closed) tempfile and fail with OSError. Just confirm it + # succeeds — we can't easily intercept it for content check. + os.write(2, b"") # zero-byte write must succeed on a valid fd + # And relay still has only what was captured during the block. + assert "inside" in relay.getvalue() + assert "outside" not in relay.getvalue() + + def test_restores_fd_even_when_block_raises(self): + # try/finally contract: descriptor must be restored on exception. + with pytest.raises(RuntimeError): + with capture_c_stderr(None): + os.write(2, b"before raise\n") + raise RuntimeError("simulated") + # If the fd weren't restored, this would fail. Confirm fd 2 is + # still valid by writing zero bytes. + os.write(2, b"") + + def test_no_relay_stream_drops_captured_output(self): + # capture_c_stderr(None) must accept writes silently. + with capture_c_stderr(None): + os.write(2, b"this disappears\n") + # Nothing to assert about content — just that it didn't raise. + + def test_captured_bytes_decoded_replace_on_bad_bytes(self): + # If PySCF C code writes non-UTF8 bytes (e.g. binary garbage on + # crash), the relay must not raise — replace_errors must absorb. + relay = io.StringIO() + with capture_c_stderr(relay): + os.write(2, b"\xff\xfe valid text after \n") + # The relay must have something (replaced bytes + the valid text). + relayed = relay.getvalue() + assert "valid text after" in relayed + + def test_empty_capture_does_not_write_to_relay(self): + # If nothing was written to fd 2 inside the block, relay must + # stay untouched (don't emit a blank line). + relay = io.StringIO() + relay.write("previous content\n") + with capture_c_stderr(relay): + pass + # No new content appended. + assert relay.getvalue() == "previous content\n" + + def test_nested_contexts_restore_correctly(self): + # Two levels deep: each must restore to the parent's state on + # exit. Inner write must go to inner relay; outer write to outer. + outer = io.StringIO() + inner = io.StringIO() + with capture_c_stderr(outer): + os.write(2, b"outer-before\n") + with capture_c_stderr(inner): + os.write(2, b"inner-only\n") + os.write(2, b"outer-after\n") + assert "inner-only" in inner.getvalue() + assert "inner-only" not in outer.getvalue() + assert "outer-before" in outer.getvalue() + assert "outer-after" in outer.getvalue() + + def test_relay_write_failure_is_swallowed(self): + # If the relay stream itself raises on write, capture_c_stderr + # must not propagate — telemetry must never block the caller. + class _BadStream: + def write(self, _s): + raise RuntimeError("relay broken") + + with capture_c_stderr(_BadStream()): + os.write(2, b"some content\n") + # If we got here without raising, contract holds. + os.write(2, b"") # fd still valid diff --git a/tests/test_calc_log.py b/tests/test_calc_log.py index 14a52c9..324b144 100644 --- a/tests/test_calc_log.py +++ b/tests/test_calc_log.py @@ -79,9 +79,24 @@ def test_estimate_time_scopes_by_calc_type(isolated_log_dir): def test_estimate_time_non_single_point_ignores_legacy_untyped_records( isolated_log_dir, ): + """Legacy untyped records must not enter the freq pool as *direct* matches. + + Before M-EST / EST.2 (session 55) this asserted ``est_freq is None`` — + a strict "no freq records → no freq estimate" rule. EST.2 added a + structured cost-model fallback that intentionally reuses the SP + history (where legacy untyped records DO count) to derive a freq + estimate when no direct freq records exist. So the contract today + is two-fold: + + 1. Legacy records still don't count as frequency-typed (strategies + 1-4 produce no direct prediction). + 2. The cost-model fallback DOES fire — producing a structured + SCF-anchor + Hessian + 6N IR estimate — and its value is much + larger than the underlying SP time (otherwise we know the + cost-model decomposition collapsed to just the SP anchor). + """ import quantui.calc_log as clog - # Legacy records with no calc_type should not be used for frequency estimates. for elapsed in (10.0, 12.0, 15.0): clog.log_calculation( formula="CH2O", @@ -105,4 +120,11 @@ def test_estimate_time_non_single_point_ignores_legacy_untyped_records( calc_type="frequency", ) - assert est_freq is None + # EST.2 fallback fires: not None, and noticeably larger than the + # bare SP median (~12 s) thanks to the +Hessian + 6×n_atoms × SP term. + assert est_freq is not None + assert est_freq["seconds"] > 100.0, ( + f"Expected freq estimate > 100 s (SP ~12 s × ~21 cost-model multiplier " + f"for 4 atoms), got {est_freq['seconds']:.1f} s — suggests the cost " + "model isn't firing on legacy SP records" + ) diff --git a/tests/test_calculator.py b/tests/test_calculator.py index 6616343..777f661 100644 --- a/tests/test_calculator.py +++ b/tests/test_calculator.py @@ -82,8 +82,11 @@ def test_lowercase_method(self, water_molecule): def test_unsupported_method(self, water_molecule): """Test error for unsupported method.""" + # Use a fictional method name to exercise the validation path — + # the previous stand-in "CCSD" became a real supported method in + # M8.1 (session 54), so the validator no longer rejects it. with pytest.raises(ValueError, match="not supported"): - PySCFCalculation(water_molecule, method="CCSD", basis="6-31G") + PySCFCalculation(water_molecule, method="NONEXISTENT", basis="6-31G") def test_nonstandard_basis_warning(self, water_molecule, caplog): """Test warning for non-standard basis set.""" diff --git a/tests/test_calibration_save_results.py b/tests/test_calibration_save_results.py new file mode 100644 index 0000000..753597a --- /dev/null +++ b/tests/test_calibration_save_results.py @@ -0,0 +1,295 @@ +"""Tests for the M-EST follow-up: calibration results saved as job files. + +Session 55 (2026-05-25) user request: + + > Are the calculations run as part of the calibration time estimates + > saved to job files so users can load the results as usual? + +Before this change, calibration steps only wrote to ``perf_log.jsonl`` +(for the estimator) and ``calibration.json`` (for the UI summary). The +full result objects were discarded. Tier-4 in particular runs MP2 + +CCSD on H₂O/cc-pVDZ plus benzene B3LYP/6-31G* frequency — those are +real research-quality calcs and the user wanted them saved. + +This file tests the new save path WITHOUT running PySCF, by: + +1. Unit-testing ``save_result(..., extras={...})`` — the new kwarg that + embeds ``calibration_run_id`` (and any other extras) in result.json. +2. Unit-testing the ``_TeeStream`` helper used to fan PySCF's + progress_stream to both the shared calibration log and an in-memory + buffer (so save_result has the per-calc PySCF log). +3. Unit-testing ``_save_calibration_step`` against a fake result + object — confirms it writes a result_dir with the calibration tag. +4. Structure-grep tests that the worker passes ``calibration_run_id`` + to the helper and returns ``result_dir`` on the queue, and that + ``BenchmarkStep`` has the new ``result_dir`` field. + +All tests platform-independent. No PySCF required. +""" + +from __future__ import annotations + +import inspect +import io +import json +from types import SimpleNamespace + +# ===================================================================== +# save_result(..., extras=...) — new kwarg +# ===================================================================== + + +class TestSaveResultExtras: + def test_extras_merged_into_result_json(self, tmp_path): + from quantui.results_storage import save_result + + fake_result = SimpleNamespace( + formula="H2O", + method="RHF", + basis="STO-3G", + energy_hartree=-75.0, + energy_ev=-75.0 * 27.211386245988, + homo_lumo_gap_ev=10.0, + converged=True, + n_iterations=5, + ) + + out = save_result( + fake_result, + pyscf_log="line 1\nline 2\n", + results_dir=tmp_path, + calc_type="single_point", + extras={"calibration_run_id": "2026-05-25T12:00:00+00:00"}, + ) + data = json.loads((out / "result.json").read_text()) + assert data["calibration_run_id"] == "2026-05-25T12:00:00+00:00" + # Existing fields still present. + assert data["formula"] == "H2O" + assert data["calc_type"] == "single_point" + + def test_extras_can_overwrite_builtin_field(self, tmp_path): + # Documented behaviour: extras takes precedence. This is by + # design — calibration uses it deliberately and a future caller + # may want the same affordance. + from quantui.results_storage import save_result + + fake_result = SimpleNamespace( + formula="H2O", + method="RHF", + basis="STO-3G", + energy_hartree=-75.0, + converged=True, + n_iterations=1, + ) + out = save_result( + fake_result, + results_dir=tmp_path, + extras={"formula": "OVERRIDDEN"}, + ) + data = json.loads((out / "result.json").read_text()) + assert data["formula"] == "OVERRIDDEN" + + def test_extras_none_is_no_op(self, tmp_path): + # Existing callers that don't pass extras must keep working. + from quantui.results_storage import save_result + + fake_result = SimpleNamespace( + formula="H2O", + method="RHF", + basis="STO-3G", + energy_hartree=-75.0, + converged=True, + n_iterations=1, + ) + out = save_result(fake_result, results_dir=tmp_path) + data = json.loads((out / "result.json").read_text()) + # No calibration_run_id when extras wasn't passed. + assert "calibration_run_id" not in data + + +# ===================================================================== +# _TeeStream — fan progress to two destinations +# ===================================================================== + + +class TestTeeStream: + def test_writes_to_all_streams(self): + from quantui.benchmarks import _TeeStream + + a = io.StringIO() + b = io.StringIO() + tee = _TeeStream(a, b) + tee.write("hello\n") + tee.write("world\n") + assert a.getvalue() == "hello\nworld\n" + assert b.getvalue() == "hello\nworld\n" + + def test_returns_len_of_written(self): + from quantui.benchmarks import _TeeStream + + tee = _TeeStream(io.StringIO()) + assert tee.write("abcde") == 5 + + def test_one_broken_stream_doesnt_kill_others(self): + from quantui.benchmarks import _TeeStream + + class _Broken: + def write(self, _s): + raise RuntimeError("simulated") + + def flush(self): + raise RuntimeError("simulated") + + good = io.StringIO() + tee = _TeeStream(_Broken(), good) + tee.write("payload") + tee.flush() + # The good stream still got the data. + assert good.getvalue() == "payload" + + +# ===================================================================== +# _save_calibration_step — the worker's save helper +# ===================================================================== + + +class TestSaveCalibrationStep: + def test_single_point_creates_result_dir_with_tag(self, tmp_path, monkeypatch): + # Redirect the default results dir to tmp_path. + from pathlib import Path as _Path + + monkeypatch.setattr(_Path, "home", lambda: tmp_path) + + from quantui.benchmarks import _save_calibration_step + + fake_result = SimpleNamespace( + formula="H2O", + method="B3LYP", + basis="STO-3G", + energy_hartree=-75.0, + energy_ev=-75.0 * 27.211386245988, + homo_lumo_gap_ev=10.0, + converged=True, + n_iterations=12, + ) + fake_mol = SimpleNamespace( + atoms=["O", "H", "H"], + coordinates=[[0, 0, 0], [0.7, 0.6, 0], [-0.7, 0.6, 0]], + charge=0, + multiplicity=1, + ) + + saved = _save_calibration_step( + fake_result, + calc_type="single_point", + pyscf_log="some log", + calibration_run_id="2026-05-25T12:00:00+00:00", + mol=fake_mol, + ) + assert saved is not None + assert saved.exists() + data = json.loads((saved / "result.json").read_text()) + assert data["calibration_run_id"] == "2026-05-25T12:00:00+00:00" + assert data["calc_type"] == "single_point" + assert data["formula"] == "H2O" + # pyscf.log should be present from the worker's per-calc tee buffer. + assert (saved / "pyscf.log").exists() + assert "some log" in (saved / "pyscf.log").read_text() + + def test_frequency_includes_spectra(self, tmp_path, monkeypatch): + from pathlib import Path as _Path + + monkeypatch.setattr(_Path, "home", lambda: tmp_path) + + from quantui.benchmarks import _save_calibration_step + + fake_freq = SimpleNamespace( + formula="H2O", + method="B3LYP", + basis="STO-3G", + energy_hartree=-75.0, + energy_ev=-75.0 * 27.211386245988, + homo_lumo_gap_ev=10.0, + converged=True, + n_iterations=12, + frequencies_cm1=[1600.0, 3700.0, 3800.0], + ir_intensities=[80.0, 5.0, 50.0], + zpve_hartree=0.02, + displacements=None, + ) + fake_mol = SimpleNamespace( + atoms=["O", "H", "H"], + coordinates=[[0, 0, 0], [0.7, 0.6, 0], [-0.7, 0.6, 0]], + charge=0, + multiplicity=1, + ) + + saved = _save_calibration_step( + fake_freq, + calc_type="frequency", + pyscf_log="", + calibration_run_id="tier4-run-1", + mol=fake_mol, + ) + assert saved is not None + data = json.loads((saved / "result.json").read_text()) + # The Analysis tab's IR + Vibrational panels read these keys. + assert "spectra" in data + assert "ir" in data["spectra"] + assert data["spectra"]["ir"]["frequencies_cm1"] == [1600.0, 3700.0, 3800.0] + assert "molecule" in data["spectra"] + assert data["spectra"]["molecule"]["atoms"] == ["O", "H", "H"] + + +# ===================================================================== +# Worker + BenchmarkStep structural checks +# ===================================================================== + + +class TestWorkerStructure: + def test_benchmark_step_has_result_dir_field(self): + from quantui.benchmarks import BenchmarkStep + + s = BenchmarkStep( + label="x", + method="RHF", + basis="STO-3G", + n_atoms=2, + n_electrons=2, + status="ok", + ) + # New field — default None. + assert s.result_dir is None + + def test_calibration_worker_signature_accepts_run_id(self): + from quantui.benchmarks import _calibration_worker + + sig = inspect.signature(_calibration_worker) + assert "calibration_run_id" in sig.parameters + + def test_worker_source_calls_save_calibration_step(self): + from quantui import benchmarks + + src = inspect.getsource(benchmarks._calibration_worker) + assert "_save_calibration_step" in src + # And the queue payload now carries result_dir. + assert "result_dir" in src + + def test_save_calibration_json_includes_result_dir(self): + # The persisted calibration.json should expose result_dir per + # step so future tooling can find the saved results. + from quantui import benchmarks + + src = inspect.getsource(benchmarks._save_calibration_json) + assert '"result_dir"' in src or "'result_dir'" in src + + +class TestHistoryLabelMarker: + def test_refresh_results_browser_emits_calibration_marker(self): + from quantui import app_runflow + + src = inspect.getsource(app_runflow.refresh_results_browser) + # The 🔧 marker is rendered when calibration_run_id is present + # on the saved result.json. + assert "calibration_run_id" in src + assert "🔧" in src or "calib_marker" in src diff --git a/tests/test_calibration_skip_and_gpu.py b/tests/test_calibration_skip_and_gpu.py new file mode 100644 index 0000000..e98f2f6 --- /dev/null +++ b/tests/test_calibration_skip_and_gpu.py @@ -0,0 +1,250 @@ +"""Tests for the session-55 calibration UX fixes: + +1. **Skip button**: replaces the per-step timeout. The user can abandon + ONE step without losing the whole calibration (the old hard 1800 s + tier-4 cap cut off a near-finishing benzene B3LYP/6-31G* freq). +2. **MP2 + CCSD blocked on GPU**: gpu4pyscf's post-HF support is + experimental and was crashing immediately after the RHF reference. + Both methods now stay CPU-side via ``_GPU_UNSUPPORTED_METHODS``. +3. **error_msg visible in calibration table**: failed steps now show + the captured error message inline (truncated) so the user knows + WHY a step failed. + +All tests platform-independent. No PySCF required. +""" + +from __future__ import annotations + +import inspect + +# ===================================================================== +# Fix 2 — MP2 + CCSD on the GPU skip list +# ===================================================================== + + +class TestGpuUnsupportedMethods: + def test_mp2_blocked_on_gpu(self): + from quantui.gpu_offload import _GPU_UNSUPPORTED_METHODS + + assert "MP2" in _GPU_UNSUPPORTED_METHODS + + def test_ccsd_blocked_on_gpu(self): + from quantui.gpu_offload import _GPU_UNSUPPORTED_METHODS + + assert "CCSD" in _GPU_UNSUPPORTED_METHODS + + def test_ccsd_t_still_blocked(self): + # Don't accidentally remove the original entry while adding new ones. + from quantui.gpu_offload import _GPU_UNSUPPORTED_METHODS + + assert "CCSD(T)" in _GPU_UNSUPPORTED_METHODS + + def test_try_to_gpu_returns_cpu_path_for_mp2(self): + # Direct functional check: try_to_gpu should short-circuit before + # calling .to_gpu() when the method is blocked. The "mf" we pass + # doesn't need to be real — try_to_gpu returns it unchanged. + from quantui.gpu_offload import try_to_gpu + + sentinel = object() + mf, used_gpu, name = try_to_gpu(sentinel, "MP2") + assert mf is sentinel + assert used_gpu is False + assert name is None + + +# ===================================================================== +# Fix 1 — Skip event + no-timeout default +# ===================================================================== + + +class TestRunCalibrationSignature: + def test_run_calibration_accepts_skip_event(self): + from quantui.benchmarks import run_calibration + + sig = inspect.signature(run_calibration) + assert "skip_event" in sig.parameters + + def test_timeout_per_step_default_is_none(self): + # session 55 user request: no automatic timeout — Skip button + # is the user-facing control. + from quantui.benchmarks import run_calibration + + sig = inspect.signature(run_calibration) + timeout_param = sig.parameters["timeout_per_step"] + assert timeout_param.default is None + + def test_loop_handles_none_timeout_without_crashing(self): + # Most direct path: run_calibration with PySCF unavailable just + # iterates through the suite emitting PySCF-not-available errors. + # With timeout_per_step=None we must NOT hit the + # ``elapsed > timeout_per_step`` comparison (which would + # TypeError on None). + from quantui.benchmarks import run_calibration + + # Smaller suite so the test stays fast. + result = run_calibration(mode="tier1", timeout_per_step=None) + # On Windows (no PySCF) every step is marked error. + # Function returns cleanly without exceptions. + assert result.mode == "tier1" + + def test_skipped_status_constant_exists(self): + from quantui import benchmarks + + assert hasattr(benchmarks, "_STATUS_SKIPPED") + assert benchmarks._STATUS_SKIPPED == "skipped" + + +class TestSkipEventInPollLoop: + """Structural / source check: the poll loop now honours skip_event. + + A full end-to-end skip test would require PySCF + spawning a real + worker; the source-grep test is the cheap regression guard. + """ + + def test_poll_loop_checks_skip_event(self): + from quantui import benchmarks + + src = inspect.getsource(benchmarks.run_calibration) + # The new branch checks skip_event.is_set() and calls + # skip_event.clear() so the next step starts fresh. + assert "skip_event" in src + assert "skip_event.is_set()" in src + assert "skip_event.clear()" in src + assert "_STATUS_SKIPPED" in src + + def test_no_unconditional_timeout_comparison(self): + # If someone reintroduces ``elapsed > timeout_per_step`` without + # a None guard, this test catches it. + from quantui import benchmarks + + src = inspect.getsource(benchmarks.run_calibration) + # Either the comparison is guarded by a None check OR it's gone. + # Match the guard pattern explicitly. + assert "timeout_per_step is not None" in src + + +# ===================================================================== +# Fix 3 — error_msg surfaced in the table +# ===================================================================== + + +class TestCalTableShowsErrorMsg: + def test_error_row_includes_error_msg_text(self): + # Direct render-helper test: an error step should include the + # error_msg in the rendered HTML so users see WHY the step failed. + from types import SimpleNamespace + + from quantui.app_runflow import _cal_table_html + + bad_step = SimpleNamespace( + label="H₂O MP2/cc-pVDZ", + method="MP2", + basis="cc-pVDZ", + n_atoms=3, + n_electrons=10, + n_basis=24, + status="error", + elapsed_s=5.54, + error_msg="MP2 correction failed for H2O: foo bar baz", + calc_type="single_point", + result_dir=None, + ) + html = _cal_table_html([bad_step], total=1) + assert "✗ error" in html + # The error message text appears in the rendered HTML. + assert "MP2 correction failed" in html + + def test_ok_row_does_not_show_inline_detail(self): + from types import SimpleNamespace + + from quantui.app_runflow import _cal_table_html + + good_step = SimpleNamespace( + label="H₂ RHF/STO-3G", + method="RHF", + basis="STO-3G", + n_atoms=2, + n_electrons=2, + n_basis=2, + status="ok", + elapsed_s=0.5, + error_msg="", + calc_type="single_point", + result_dir=None, + ) + html = _cal_table_html([good_step], total=1) + # No italic detail line for successful steps. + assert "font-style:italic" not in html or "color:#94a3b8" not in html + + def test_long_error_msg_truncated(self): + from types import SimpleNamespace + + from quantui.app_runflow import _cal_table_html + + long_msg = "x" * 500 + bad_step = SimpleNamespace( + label="bad", + method="MP2", + basis="cc-pVDZ", + n_atoms=3, + n_electrons=10, + n_basis=24, + status="error", + elapsed_s=1.0, + error_msg=long_msg, + calc_type="single_point", + result_dir=None, + ) + html = _cal_table_html([bad_step], total=1) + # The 500-char message gets truncated with "…". + assert "…" in html + # And isn't dumped wholesale (would be > 200 chars of x's). + assert "x" * 200 not in html + + def test_skipped_row_uses_skipped_label(self): + from types import SimpleNamespace + + from quantui.app_runflow import _cal_status_text, _cal_table_html + + # Direct check of the status renderer. + assert "skipped" in _cal_status_text("skipped").lower() + + skipped_step = SimpleNamespace( + label="C₆H₆ B3LYP [Freq]", + method="B3LYP", + basis="6-31G*", + n_atoms=12, + n_electrons=42, + n_basis=96, + status="skipped", + elapsed_s=1500.0, + error_msg="skipped by user at 1500s", + calc_type="frequency", + result_dir=None, + ) + html = _cal_table_html([skipped_step], total=1) + assert "⏭" in html or "skipped" in html + + +# ===================================================================== +# UI wiring — Skip button + handler exist +# ===================================================================== + + +class TestSkipButtonWiring: + def test_app_has_cal_skip_btn(self): + from quantui.app import QuantUIApp + + app = QuantUIApp() + assert hasattr(app, "_cal_skip_btn") + + def test_app_has_on_cal_skip_method(self): + from quantui.app import QuantUIApp + + app = QuantUIApp() + assert callable(getattr(app, "_on_cal_skip", None)) + + def test_on_cal_skip_handler_in_app_runflow(self): + from quantui import app_runflow + + assert callable(getattr(app_runflow, "on_cal_skip", None)) diff --git a/tests/test_cli.py b/tests/test_cli.py new file mode 100644 index 0000000..cad6083 --- /dev/null +++ b/tests/test_cli.py @@ -0,0 +1,409 @@ +"""Tests for the ``quantui`` CLI (``quantui/cli.py``). + +All tests are platform-independent. The CLI reads from +``~/.quantui/logs/event_log.jsonl`` by default, so each test overrides +``QUANTUI_LOG_DIR`` via ``monkeypatch`` to point at a ``tmp_path`` so we +never touch the real user log. +""" + +from __future__ import annotations + +import io +import json +import sys + +import pytest + +from quantui import cli + + +@pytest.fixture +def isolated_log_dir(tmp_path, monkeypatch): + """Point QuantUI's event log at a fresh tmp directory for one test.""" + monkeypatch.setenv("QUANTUI_LOG_DIR", str(tmp_path)) + return tmp_path + + +def _write_event_log(log_dir, events): + path = log_dir / "event_log.jsonl" + with path.open("w", encoding="utf-8") as fh: + for ev in events: + fh.write(json.dumps(ev) + "\n") + return path + + +def _capture(argv): + """Run cli.main with argv and return (exit_code, stdout, stderr).""" + out, err = io.StringIO(), io.StringIO() + real_out, real_err = sys.stdout, sys.stderr + sys.stdout, sys.stderr = out, err + try: + rc = cli.main(argv) + finally: + sys.stdout, sys.stderr = real_out, real_err + return rc, out.getvalue(), err.getvalue() + + +class TestLogTail: + def test_missing_log_returns_zero_with_msg(self, isolated_log_dir): + rc, out, err = _capture(["log", "tail"]) + assert rc == 0 + assert out == "" + assert "no event log" in err + + def test_empty_log_returns_zero_with_msg(self, isolated_log_dir): + _write_event_log(isolated_log_dir, []) + rc, out, err = _capture(["log", "tail"]) + assert rc == 0 + assert out == "" + assert "empty" in err + + def test_default_n_is_20(self, isolated_log_dir): + events = [ + { + "timestamp": f"2026-05-25T12:00:{i:02d}+00:00", + "event": "tick", + "message": f"msg-{i}", + } + for i in range(30) + ] + _write_event_log(isolated_log_dir, events) + rc, out, _ = _capture(["log", "tail"]) + assert rc == 0 + # 20 lines printed; verify the LAST 20 are kept (msg-10..msg-29). + lines = [ln for ln in out.splitlines() if ln.strip()] + assert len(lines) == 20 + assert "msg-10" in lines[0] + assert "msg-29" in lines[-1] + + def test_n_flag_overrides(self, isolated_log_dir): + events = [ + { + "timestamp": f"2026-05-25T12:00:{i:02d}+00:00", + "event": "tick", + "message": f"m{i}", + } + for i in range(10) + ] + _write_event_log(isolated_log_dir, events) + rc, out, _ = _capture(["log", "tail", "-n", "3"]) + assert rc == 0 + lines = [ln for ln in out.splitlines() if ln.strip()] + assert len(lines) == 3 + assert "m7" in lines[0] + assert "m9" in lines[-1] + + def test_extras_appended_as_kv(self, isolated_log_dir): + events = [ + { + "timestamp": "2026-05-25T12:00:00+00:00", + "event": "calc_done", + "message": "B3LYP/STO-3G on H2O", + "elapsed_ms": 4321, + "gpu_used": True, + }, + ] + _write_event_log(isolated_log_dir, events) + rc, out, _ = _capture(["log", "tail"]) + assert rc == 0 + # Both extras appear in k=v form. + assert "elapsed_ms=4321" in out + assert "gpu_used=True" in out + # Core fields appear once. + assert "calc_done" in out + assert "B3LYP/STO-3G on H2O" in out + + +class TestCliParser: + def test_no_args_exits_nonzero(self, isolated_log_dir): + # argparse exits 2 when a required subparser is missing. + with pytest.raises(SystemExit) as exc: + _capture([]) + assert exc.value.code == 2 + + def test_unknown_subcommand_exits_nonzero(self, isolated_log_dir): + with pytest.raises(SystemExit) as exc: + _capture(["bogus"]) + assert exc.value.code == 2 + + def test_log_without_subcommand_exits_nonzero(self, isolated_log_dir): + with pytest.raises(SystemExit) as exc: + _capture(["log"]) + assert exc.value.code == 2 + + +def test_fmt_event_renders_minimal_record(): + line = cli._fmt_event( + { + "timestamp": "2026-05-25T12:00:00+00:00", + "event": "startup", + "message": "QuantUI 0.2.0", + } + ) + assert "2026-05-25T12:00:00+00:00" in line + assert "startup" in line + assert "QuantUI 0.2.0" in line + + +def test_fmt_event_handles_missing_fields(): + # Should not raise even on a malformed record. + line = cli._fmt_event({}) + assert "?" in line # default event + + +class TestGpuCheck: + """`quantui gpu check` — exit 0 when GPU available, 1 otherwise.""" + + def test_disabled_via_env_var(self, monkeypatch, isolated_log_dir): + monkeypatch.setenv("QUANTUI_DISABLE_GPU", "1") + rc, out, err = _capture(["gpu", "check"]) + assert rc == 1 + assert "not available" in err + assert "QUANTUI_DISABLE_GPU" in err + + def test_reports_missing_gpu4pyscf(self, monkeypatch, isolated_log_dir): + # Pretend gpu4pyscf isn't installed. Because the GPU detector is + # @lru_cached, we patch the underlying functions rather than try + # to monkey with builtins __import__. + import quantui.gpu_offload as _gpuo + + _gpuo.is_gpu_available.cache_clear() + + # Make is_gpu_available return (False, None) and arrange gpu4pyscf + # import to fail inside the CLI's reason-probe path. + def _fake_import(name, *args, **kwargs): + if name == "gpu4pyscf": + raise ImportError("simulated") + return _real_import(name, *args, **kwargs) + + import builtins as _bi + + _real_import = _bi.__import__ + monkeypatch.setattr(_bi, "__import__", _fake_import) + rc, out, err = _capture(["gpu", "check"]) + assert rc == 1 + assert "gpu4pyscf not installed" in err + + def test_happy_path_when_gpu_detected(self, monkeypatch, isolated_log_dir): + import quantui.gpu_offload as _gpuo + + # Replace the lru_cache-decorated function with a plain callable + # that mimics the (.cache_clear()) attribute the CLI calls. + def _fake(): + return (True, "NVIDIA Test GPU") + + _fake.cache_clear = lambda: None # type: ignore[attr-defined] + monkeypatch.setattr(_gpuo, "is_gpu_available", _fake) + rc, out, err = _capture(["gpu", "check"]) + assert rc == 0 + assert "GPU offload available" in out + assert "NVIDIA Test GPU" in out + + +class TestAnalyticsBuild: + """`quantui analytics build` — wraps analytics.build_dashboard.""" + + def test_empty_perf_log_returns_zero_with_msg(self, isolated_log_dir): + rc, out, err = _capture(["analytics", "build"]) + assert rc == 0 + assert "perf log is empty" in err + + def test_writes_file_at_explicit_path(self, isolated_log_dir, tmp_path): + # Seed perf log so the dashboard has data. + perf_path = isolated_log_dir / "perf_log.jsonl" + perf_path.write_text( + json.dumps( + { + "timestamp": "2026-05-25T12:00:00+00:00", + "formula": "H2O", + "method": "B3LYP", + "basis": "STO-3G", + "elapsed_s": 1.0, + "converged": True, + "gpu_used": True, + } + ) + + "\n", + encoding="utf-8", + ) + target = tmp_path / "report.html" + rc, out, _ = _capture(["analytics", "build", "-o", str(target)]) + assert rc == 0 + assert target.exists() + assert "Wrote" in out + assert str(target) in out + + def _seed_perf_log(self, log_dir): + """Helper: write one perf record so build_dashboard has data.""" + perf_path = log_dir / "perf_log.jsonl" + perf_path.write_text( + json.dumps( + { + "timestamp": "2026-05-25T12:00:00+00:00", + "formula": "H2O", + "method": "B3LYP", + "basis": "STO-3G", + "elapsed_s": 1.0, + "converged": True, + } + ) + + "\n", + encoding="utf-8", + ) + + def test_open_flag_calls_webbrowser_off_wsl( + self, isolated_log_dir, tmp_path, monkeypatch + ): + # Force the non-WSL branch so the test runs the webbrowser path. + monkeypatch.setattr(cli, "_is_wsl", lambda: False) + self._seed_perf_log(isolated_log_dir) + target = tmp_path / "report.html" + + opened_urls: list[str] = [] + import webbrowser as _wb + + def _fake_open(url, *_args, **_kwargs): + opened_urls.append(url) + return True + + monkeypatch.setattr(_wb, "open", _fake_open) + + rc, _, _ = _capture(["analytics", "build", "-o", str(target), "--open"]) + assert rc == 0 + assert target.exists() + # The URL should be a file:// URI pointing at the written report. + assert len(opened_urls) == 1 + assert opened_urls[0].startswith("file:") + assert "report.html" in opened_urls[0] + + def test_open_flag_handles_browser_failure_gracefully( + self, isolated_log_dir, tmp_path, monkeypatch + ): + monkeypatch.setattr(cli, "_is_wsl", lambda: False) + self._seed_perf_log(isolated_log_dir) + target = tmp_path / "report.html" + + import webbrowser as _wb + + # Headless systems can return False from webbrowser.open. + monkeypatch.setattr(_wb, "open", lambda *a, **k: False) + + rc, _, err = _capture(["analytics", "build", "-o", str(target), "--open"]) + # Exit code must remain 0 — the dashboard was written successfully. + assert rc == 0 + assert "could not auto-open" in err + + +class TestWslAwareOpener: + """`_open_in_browser` chooses wslview / explorer.exe on WSL.""" + + def test_is_wsl_detects_env_var(self, monkeypatch): + monkeypatch.setenv("WSL_DISTRO_NAME", "Ubuntu") + assert cli._is_wsl() is True + + def test_is_wsl_false_when_env_and_proc_missing(self, monkeypatch): + # Both signals absent → must return False, not raise. + monkeypatch.delenv("WSL_DISTRO_NAME", raising=False) + import builtins + + original = builtins.open + + def _fail_open(*args, **kwargs): + if args and args[0] == "/proc/version": + raise OSError("simulated absence") + return original(*args, **kwargs) + + monkeypatch.setattr(builtins, "open", _fail_open) + assert cli._is_wsl() is False + + def test_wsl_prefers_wslview(self, monkeypatch, tmp_path): + """On WSL, wslview is tried first and wins when it returns 0.""" + monkeypatch.setattr(cli, "_is_wsl", lambda: True) + + calls: list[list[str]] = [] + + class _FakeRun: + def __init__(self, returncode): + self.returncode = returncode + + def _fake_subprocess_run(cmd, **_kwargs): + calls.append(list(cmd)) + return _FakeRun(0) + + import subprocess + + monkeypatch.setattr(subprocess, "run", _fake_subprocess_run) + target = tmp_path / "report.html" + target.write_text("x", encoding="utf-8") + + ok, tool = cli._open_in_browser(target) + assert ok is True + assert tool == "wslview" + assert len(calls) == 1 + assert calls[0][0] == "wslview" + assert str(target) in calls[0] + + def test_wsl_falls_back_to_explorer_when_wslview_missing( + self, monkeypatch, tmp_path + ): + """When wslview isn't installed (FileNotFoundError), explorer.exe runs.""" + monkeypatch.setattr(cli, "_is_wsl", lambda: True) + + calls: list[str] = [] + + class _FakeRun: + def __init__(self, returncode): + self.returncode = returncode + + def _fake_subprocess_run(cmd, **_kwargs): + tool = cmd[0] + calls.append(tool) + if tool == "wslview": + raise FileNotFoundError("not installed") + return _FakeRun(0) + + import subprocess + + monkeypatch.setattr(subprocess, "run", _fake_subprocess_run) + target = tmp_path / "report.html" + target.write_text("x", encoding="utf-8") + + ok, tool = cli._open_in_browser(target) + assert ok is True + assert tool == "explorer.exe" + assert calls == ["wslview", "explorer.exe"] + + def test_wsl_returns_false_when_all_openers_fail(self, monkeypatch, tmp_path): + monkeypatch.setattr(cli, "_is_wsl", lambda: True) + + import subprocess + + def _fake_run(cmd, **_kwargs): + raise FileNotFoundError(f"{cmd[0]} not installed") + + monkeypatch.setattr(subprocess, "run", _fake_run) + target = tmp_path / "report.html" + target.write_text("x", encoding="utf-8") + + ok, tool = cli._open_in_browser(target) + assert ok is False + assert tool is None + + def test_non_wsl_uses_webbrowser(self, monkeypatch, tmp_path): + monkeypatch.setattr(cli, "_is_wsl", lambda: False) + + opened: list[str] = [] + import webbrowser + + def _fake_open(url, *_args, **_kwargs): + opened.append(url) + return True + + monkeypatch.setattr(webbrowser, "open", _fake_open) + target = tmp_path / "report.html" + target.write_text("x", encoding="utf-8") + + ok, tool = cli._open_in_browser(target) + assert ok is True + assert tool == "webbrowser" + assert opened[0].startswith("file:") diff --git a/tests/test_code_quality.py b/tests/test_code_quality.py index d9999d0..a695205 100644 --- a/tests/test_code_quality.py +++ b/tests/test_code_quality.py @@ -5,6 +5,29 @@ SRC = Path(__file__).parent.parent / "quantui" +# Files where silent failure is most dangerous — numeric/data extraction +# paths where a swallowed exception ships subtly-wrong results downstream +# (bug-A class: cupy TypeError swallow in session_calc.py, session 55). +# +# Every broad-except + pass in these files must EITHER: +# - have a log call (logger.*, calc_log.log_event, _clog.log_event) +# within 10 lines after the ``except`` (window allows for multi-line +# log messages — see session_calc.py:455 MO-extract for an example), OR +# - carry a ``# noqa: BLE001 — `` comment on the ``except`` line +# justifying the silence (cleanup, telemetry self-guard, optional probe). +# +# See reflections/03-error-surfacing.md Rule 1 for the categorization rubric +# and BARE-EXCEPT-AUDIT-2026-05-25.md for the originating audit. +_HIGH_RISK_FILES = { + "session_calc.py", + "freq_calc.py", + "tddft_calc.py", + "nmr_calc.py", + "optimizer.py", + "gpu_offload.py", + "analytics.py", +} + def _grep(pattern: str) -> list[str]: hits = [] @@ -27,3 +50,106 @@ def test_no_bare_except_pass(): assert not hits, "Bare except/pass detected (swallows all errors):\n" + "\n".join( hits ) + + +def test_no_silent_broad_except_in_high_risk_files(): + """Fail CI when a new broad-except + pass lands in a high-risk file + without either a log call within 5 lines or a ``# noqa: BLE001 — `` + annotation on the ``except`` line. + + "Broad" means ``except Exception:`` (with or without ``as ``) or + truly-bare ``except:``. Narrower catches (``except ImportError:``, + ``except (KeyError, ValueError):``, etc.) are not flagged — the whole + point of narrowing is to be explicit about the failure mode. + + "Silent" means the body is ``pass`` (or assignment-only without a log + call) within the next 10 source lines. + + A line carrying ``# noqa: BLE001`` is treated as explicitly-justified + and skipped. The convention requires a ``— `` suffix; this + test does not enforce the format (too easy to game) — reviewers do. + """ + except_re = re.compile(r"^\s*except\s*(Exception(\s+as\s+\w+)?)?\s*:\s*(#.*)?$") + log_call_re = re.compile( + r"\b(logger\.|_clog\.|calc_log\.log_event|log_event\(|" + r"_log_event|warnings\.warn)" + ) + + violations: list[str] = [] + for path in SRC.rglob("*.py"): + if path.name not in _HIGH_RISK_FILES: + continue + lines = path.read_text(encoding="utf-8").splitlines() + for i, line in enumerate(lines): + m = except_re.match(line) + if not m: + continue + # Explicit noqa annotation = justified. Reviewers enforce + # that the trailing reason is present + sensible. + if "noqa: BLE001" in line: + continue + # Look at the body (next 10 non-blank lines) for a log call. + # If none, the block is silent — flag it. 10 is generous enough + # to allow multi-line log message arguments. + body = lines[i + 1 : i + 11] + if any(log_call_re.search(b) for b in body): + continue + # Also accept if the body re-raises (still surfaces the error). + if any("raise" in b for b in body[:2]): + continue + violations.append( + f"{path.relative_to(SRC.parent)}:{i + 1}: {line.strip()}\n" + f" (body: {body[0].strip() if body else ''})" + ) + + assert not violations, ( + "Silent broad-except detected in a high-risk file. Either add a " + "log call (logger.X / calc_log.log_event) within 10 lines of the " + "``except``, narrow the exception type, or annotate with\n" + " ``# noqa: BLE001 — ``\n" + "where is one of: cleanup, telemetry self-guard, optional probe.\n" + "See reflections/03-error-surfacing.md Rule 1.\n\n" + "\n".join(violations) + ) + + +def test_silent_broad_except_guard_actually_catches_violations(tmp_path): + """Meta-guard: confirm the lint check above isn't trivially passing. + + Builds a temporary high-risk-looking source file containing a known-bad + silent broad-except + pass and verifies the regex / logic flags it. + Without this test, an accidental regex break would silently accept + everything and we wouldn't notice. + """ + bad_source = ( + "def foo():\n" + " try:\n" + " risky()\n" + " except Exception:\n" + " pass\n" + ) + # Re-implement the matcher inline (mirrors the production logic) so + # changes to the production helper force a deliberate update here. + except_re = re.compile(r"^\s*except\s*(Exception(\s+as\s+\w+)?)?\s*:\s*(#.*)?$") + log_call_re = re.compile( + r"\b(logger\.|_clog\.|calc_log\.log_event|log_event\(|" + r"_log_event|warnings\.warn)" + ) + + lines = bad_source.splitlines() + flagged = False + for i, line in enumerate(lines): + if not except_re.match(line): + continue + if "noqa: BLE001" in line: + continue + body = lines[i + 1 : i + 11] + if any(log_call_re.search(b) for b in body): + continue + if any("raise" in b for b in body[:2]): + continue + flagged = True + assert flagged, ( + "The lint guard didn't flag a known-bad ``except Exception: pass`` " + "block. The regex or window logic has regressed — fix it before " + "trusting test_no_silent_broad_except_in_high_risk_files." + ) diff --git a/tests/test_est_calibration_resilience.py b/tests/test_est_calibration_resilience.py new file mode 100644 index 0000000..4ba8d7e --- /dev/null +++ b/tests/test_est_calibration_resilience.py @@ -0,0 +1,270 @@ +"""Tests for the calibration resilience fixes (session 55 user report). + +User-reported issues these tests guard against: + +1. Status indicator stayed "Idle" during calibration — covered by the + ``_activity_begin/_end`` wrapper in ``app_runflow.do_calibration``. + Not directly testable here (UI side); covered by the wrapper's + presence-in-source check below. +2. No per-step progress visibility — ``_tail_last_status_line`` + returns the most recent meaningful log line; tested directly. +3. ``calibration.json`` dropped state on interrupt — + ``_save_calibration_json`` is now called after every step (not just + end-of-loop). Verified by reading source markers + a unit test on + the helper itself. +4. Stop button didn't work mid-calc — ``run_calibration`` now uses + ``multiprocessing.Process`` so ``worker.terminate()`` cleanly + interrupts an in-flight step. The poll-loop logic is tested via + structure check; the actual termination is exercised by the + PySCF-gated integration test in ``test_benchmarks.py``. +5. Calibration log file — ``_calibration_log_path`` returns a path + under ``QUANTUI_LOG_DIR``; tested directly. + +All tests are platform-independent. +""" + +from __future__ import annotations + +import inspect +import json + +import pytest + +from quantui import benchmarks +from quantui.benchmarks import ( + BenchmarkStep, + CalibrationResult, + _calibration_log_path, + _save_calibration_json, + _tail_last_status_line, +) + + +@pytest.fixture +def isolated_log_dir(tmp_path, monkeypatch): + monkeypatch.setenv("QUANTUI_LOG_DIR", str(tmp_path)) + return tmp_path + + +# ===================================================================== +# _calibration_log_path +# ===================================================================== + + +class TestCalibrationLogPath: + def test_respects_quantui_log_dir(self, isolated_log_dir): + path = _calibration_log_path("2026-05-25T12:00:00+00:00") + # Lives under QUANTUI_LOG_DIR exactly. + assert path.parent == isolated_log_dir + + def test_filename_includes_timestamp(self, isolated_log_dir): + path = _calibration_log_path("2026-05-25T12:34:56+00:00") + assert path.name.startswith("calibration_") + assert path.name.endswith(".log") + # The timestamp is in the filename (sanitized — no colons since + # Windows file systems reject them). + assert ":" not in path.name + assert "2026-05-25" in path.name + + +# ===================================================================== +# _tail_last_status_line +# ===================================================================== + + +class TestTailLastStatusLine: + def test_missing_file_returns_empty(self, tmp_path): + assert _tail_last_status_line(tmp_path / "nope.log") == "" + + def test_empty_file_returns_empty(self, tmp_path): + p = tmp_path / "empty.log" + p.write_text("", encoding="utf-8") + assert _tail_last_status_line(p) == "" + + def test_prefers_quantui_status_marker(self, tmp_path): + p = tmp_path / "log.log" + p.write_text( + "some random PySCF output\n" + "[QuantUI_STATUS] Computing Hessian (3/12)\n" + "more PySCF noise after the marker\n", + encoding="utf-8", + ) + out = _tail_last_status_line(p) + # The QuantUI_STATUS line wins even though it's not the last. + assert "[QuantUI_STATUS]" in out + assert "Hessian" in out + + def test_falls_back_to_last_non_blank(self, tmp_path): + p = tmp_path / "log.log" + p.write_text( + "SCF iter 1 E=-1.0\n" "SCF iter 2 E=-1.5\n" "SCF converged\n" "\n", + encoding="utf-8", + ) + # No status marker → return the last non-blank line. + assert _tail_last_status_line(p) == "SCF converged" + + def test_truncates_long_lines(self, tmp_path): + p = tmp_path / "log.log" + long_line = "A" * 500 + p.write_text(long_line + "\n", encoding="utf-8") + out = _tail_last_status_line(p) + # Hard cap is 120 chars in the helper. + assert len(out) <= 120 + + +# ===================================================================== +# _save_calibration_json +# ===================================================================== + + +class TestSaveCalibrationJson: + def test_writes_to_user_home(self, monkeypatch, tmp_path): + # Redirect HOME so the helper writes into tmp_path, not + # ~/.quantui (which would clobber a real user setup). + monkeypatch.setenv("HOME", str(tmp_path)) + monkeypatch.setenv("USERPROFILE", str(tmp_path)) # Windows + # On some platforms Path.home() caches; patch directly too. + from pathlib import Path as _Path + + monkeypatch.setattr(_Path, "home", lambda: tmp_path) + + result = CalibrationResult(timestamp="2026-05-25T12:00:00+00:00", mode="tier1") + result.steps.append( + BenchmarkStep( + label="H2 RHF/STO-3G", + method="RHF", + basis="STO-3G", + n_atoms=2, + n_electrons=2, + status="ok", + elapsed_s=0.5, + n_basis=2, + calc_type="single_point", + ) + ) + log_path = tmp_path / "fake.log" + + _save_calibration_json(result, log_path) + cal_path = tmp_path / ".quantui" / "calibration.json" + assert cal_path.exists() + data = json.loads(cal_path.read_text(encoding="utf-8")) + assert data["mode"] == "tier1" + assert data["n_completed"] == 1 + assert data["steps"][0]["label"] == "H2 RHF/STO-3G" + assert data["log_path"] == str(log_path) + + def test_partial_state_persisted_on_interrupt(self, monkeypatch, tmp_path): + # Simulates the user's scenario: tier 4 stopped at step 25/30. + # After the partial save, the on-disk record should show + # n_completed=24 (or however many ran) + stopped_early=True. + from pathlib import Path as _Path + + monkeypatch.setattr(_Path, "home", lambda: tmp_path) + + result = CalibrationResult( + timestamp="2026-05-25T12:00:00+00:00", + mode="tier4", + stopped_early=True, + ) + # Add 24 ok steps + 1 stopped step. + for i in range(24): + result.steps.append( + BenchmarkStep( + label=f"step-{i}", + method="RHF", + basis="STO-3G", + n_atoms=2, + n_electrons=2, + status="ok", + elapsed_s=1.0, + n_basis=2, + calc_type="single_point", + ) + ) + result.steps.append( + BenchmarkStep( + label="step-stop", + method="B3LYP", + basis="6-31G*", + n_atoms=12, + n_electrons=42, + status="stopped", + elapsed_s=300.0, + n_basis=96, + calc_type="frequency", + ) + ) + + _save_calibration_json(result, tmp_path / "fake.log") + cal_path = tmp_path / ".quantui" / "calibration.json" + data = json.loads(cal_path.read_text(encoding="utf-8")) + + # User's actual complaint was that this dropped to None on + # interrupt. After the fix, the 24 completed runs must be on + # disk. + assert data["n_completed"] == 24 + assert data["stopped_early"] is True + assert len(data["steps"]) == 25 + # The stopped step is the last one. + assert data["steps"][-1]["status"] == "stopped" + + +# ===================================================================== +# Source-level structure checks (defend against regression) +# ===================================================================== + + +class TestRunCalibrationStructure: + """The fix touches ``run_calibration`` heavily. These tests assert + that key invariants of the new design are still present in the + source — so a future refactor that drops them fails loudly. + """ + + def test_uses_multiprocessing_process_not_thread_executor(self): + src = inspect.getsource(benchmarks.run_calibration) + # The Stop-button-mid-calc fix requires a process, not a + # ThreadPoolExecutor — threads can't be terminated externally. + assert "_mp.Process" not in src # we use _ctx.Process from a context + assert "Process" in src + assert "ThreadPoolExecutor" not in src + + def test_poll_loop_checks_stop_event(self): + src = inspect.getsource(benchmarks.run_calibration) + # The poll loop must check ``stop_event.is_set()`` so the stop + # button reaches the worker within poll_interval (500 ms). + assert "stop_event" in src + assert "is_set()" in src + assert ".terminate()" in src + + def test_saves_calibration_after_every_step(self): + src = inspect.getsource(benchmarks.run_calibration) + # Count _save_calibration_json invocations inside the loop. + # Should be at least 2: one inside the PySCF-unavailable + # branch, one after the main step completes. Plus the final + # idempotent write outside the loop. + n = src.count("_save_calibration_json") + assert n >= 3 + + def test_opens_log_file_at_start(self): + src = inspect.getsource(benchmarks.run_calibration) + # The per-run log file (the user requested this for tier 4) + # is opened with "w" mode at the top of the run. + assert "_calibration_log_path" in src + assert '"w"' in src or "'w'" in src + + +class TestDoCalibrationStructure: + """``app_runflow.do_calibration`` got the ``_activity_begin/_end`` + wrap so the toolbar badge stops reading 'Idle' during calibration. + """ + + def test_wraps_calibration_in_activity_markers(self): + from quantui import app_runflow + + src = inspect.getsource(app_runflow.do_calibration) + # The Status-indicator-says-Idle fix (user's first complaint). + assert "_activity_begin" in src + assert "_activity_end" in src + # Must be in a try/finally so a calibration crash still flips + # the badge back. + assert "finally" in src diff --git a/tests/test_est_calibration_tiers.py b/tests/test_est_calibration_tiers.py new file mode 100644 index 0000000..79859c0 --- /dev/null +++ b/tests/test_est_calibration_tiers.py @@ -0,0 +1,185 @@ +"""Tests for M-EST / EST.4 — four-tier calibration suite. + +Covers: + +- Each of the 4 tier constants is well-formed (non-empty, each entry + has a valid 7- or 8-tuple shape). +- The 8-tuple format (with explicit ``calc_type``) is correctly + normalized by ``_normalize_entry``. +- Tier 3 contains at least one entry of each non-SP calc-type. +- Tier 4 strict-contains tier 3 (and so on up the chain). +- ``_MODE_TO_SUITE`` resolves all the mode strings — both the new + tier names and the legacy aliases. +- ``run_calibration(mode="bogus")`` falls back to tier 1 without + crashing (graceful degradation). + +All tests are platform-independent. The PySCF-gated execution of +``run_calibration`` itself lives in ``tests/test_benchmarks.py`` — +this file checks the suite *shape* without running PySCF. +""" + +from __future__ import annotations + +import pytest + +from quantui import benchmarks +from quantui.benchmarks import ( + _MODE_TO_SUITE, + BENCHMARK_SUITE, + BENCHMARK_SUITE_LONG, + BENCHMARK_SUITE_TIER1, + BENCHMARK_SUITE_TIER2, + BENCHMARK_SUITE_TIER3, + BENCHMARK_SUITE_TIER4, + _normalize_entry, +) + +_SP = "single_point" +_OPT = "geometry_opt" +_FREQ = "frequency" + + +class TestTierSuites: + def test_tier1_alias_matches_legacy_short(self): + # Back-compat: BENCHMARK_SUITE_TIER1 is the same object as + # BENCHMARK_SUITE (existing tests + app.py imports rely on this). + assert BENCHMARK_SUITE_TIER1 is BENCHMARK_SUITE + + def test_tier2_alias_matches_legacy_long(self): + assert BENCHMARK_SUITE_TIER2 is BENCHMARK_SUITE_LONG + + def test_tier2_extends_tier1(self): + # Tier 2 contains every tier-1 entry plus more. + assert len(BENCHMARK_SUITE_TIER2) > len(BENCHMARK_SUITE_TIER1) + for entry in BENCHMARK_SUITE_TIER1: + assert entry in BENCHMARK_SUITE_TIER2 + + def test_tier3_extends_tier2(self): + assert len(BENCHMARK_SUITE_TIER3) > len(BENCHMARK_SUITE_TIER2) + for entry in BENCHMARK_SUITE_TIER2: + assert entry in BENCHMARK_SUITE_TIER3 + + def test_tier4_extends_tier3(self): + assert len(BENCHMARK_SUITE_TIER4) > len(BENCHMARK_SUITE_TIER3) + for entry in BENCHMARK_SUITE_TIER3: + assert entry in BENCHMARK_SUITE_TIER4 + + def test_tier1_and_tier2_are_sp_only(self): + # Lower tiers stay 7-tuple (pure single-point) by design — the + # user explicitly wanted tier 2 to remain SP-only. + for entry in BENCHMARK_SUITE_TIER1: + assert len(entry) == 7 + for entry in BENCHMARK_SUITE_TIER2: + assert len(entry) == 7 + + def test_tier3_introduces_geom_opt_and_freq(self): + # Tier 3 must add at least one geom-opt AND at least one freq. + calc_types = {_normalize_entry(e)["calc_type"] for e in BENCHMARK_SUITE_TIER3} + assert _OPT in calc_types + assert _FREQ in calc_types + # And keep the SP majority. + n_sp = sum( + 1 for e in BENCHMARK_SUITE_TIER3 if _normalize_entry(e)["calc_type"] == _SP + ) + assert n_sp > len(BENCHMARK_SUITE_TIER3) // 2 + + def test_tier4_has_post_hf_anchors(self): + # Tier 4 must include MP2 + CCSD entries so the β=5.0 / β=6.0 + # scaling exponents in calc_log have calibration data. + methods = {_normalize_entry(e)["method"] for e in BENCHMARK_SUITE_TIER4} + assert "MP2" in methods + assert "CCSD" in methods + + def test_tier4_includes_benzene_freq(self): + # Benzene B3LYP/6-31G* frequency is the workhorse parallel-IR + # anchor (12 atoms × 6 = 72 inner SCFs). + labels = [_normalize_entry(e)["label"] for e in BENCHMARK_SUITE_TIER4] + assert any("benzene" in lbl.lower() and "freq" in lbl.lower() for lbl in labels) + + +class TestNormalizeEntry: + def test_seven_tuple_defaults_to_single_point(self): + entry = ( + "H₂ RHF/STO-3G", + ["H", "H"], + [[0, 0, 0], [0, 0, 0.74]], + 0, + 1, + "RHF", + "STO-3G", + ) + out = _normalize_entry(entry) + assert out["calc_type"] == _SP + assert out["method"] == "RHF" + assert out["basis"] == "STO-3G" + + def test_eight_tuple_overrides_calc_type(self): + entry = ( + "H₂O B3LYP/STO-3G [GeoOpt]", + ["O", "H", "H"], + [[0, 0, 0], [0.7, 0.6, 0], [-0.7, 0.6, 0]], + 0, + 1, + "B3LYP", + "STO-3G", + "geometry_opt", + ) + out = _normalize_entry(entry) + assert out["calc_type"] == "geometry_opt" + + def test_invalid_length_raises_valueerror(self): + with pytest.raises(ValueError, match="7 or 8 fields"): + _normalize_entry(("label", ["H"])) # only 2 fields + + def test_all_tier_entries_normalize_cleanly(self): + # Every entry in every tier must normalize without raising. + for tier in ( + BENCHMARK_SUITE_TIER1, + BENCHMARK_SUITE_TIER2, + BENCHMARK_SUITE_TIER3, + BENCHMARK_SUITE_TIER4, + ): + for entry in tier: + out = _normalize_entry(entry) + assert out["calc_type"] in (_SP, _OPT, _FREQ) + assert len(out["atoms"]) == len(out["coords"]) + + +class TestModeToSuite: + def test_new_tier_names_resolve(self): + assert _MODE_TO_SUITE["tier1"] is BENCHMARK_SUITE_TIER1 + assert _MODE_TO_SUITE["tier2"] is BENCHMARK_SUITE_TIER2 + assert _MODE_TO_SUITE["tier3"] is BENCHMARK_SUITE_TIER3 + assert _MODE_TO_SUITE["tier4"] is BENCHMARK_SUITE_TIER4 + + def test_legacy_short_long_aliases(self): + # Back-compat: any pinned UI state or older callers using "short" + # or "long" should still resolve. + assert _MODE_TO_SUITE["short"] is BENCHMARK_SUITE_TIER1 + assert _MODE_TO_SUITE["long"] is BENCHMARK_SUITE_TIER2 + + +class TestUnknownModeFallback: + def test_unknown_mode_does_not_raise(self): + # PySCF-gated: when PySCF is absent the per-step error path + # already prevents any actual calculation, but we still want + # run_calibration to *not crash* on a typo'd mode string. + result = benchmarks.run_calibration(mode="bogus_mode") + # Falls back to tier1 — verify by checking the mode field. + assert result.mode == "tier1" + + +class TestCalibrationResult: + def test_n_total_uses_active_mode(self): + from quantui.benchmarks import CalibrationResult + + r1 = CalibrationResult(timestamp="t", mode="tier1") + r2 = CalibrationResult(timestamp="t", mode="tier2") + r3 = CalibrationResult(timestamp="t", mode="tier3") + r4 = CalibrationResult(timestamp="t", mode="tier4") + assert r1.n_total == len(BENCHMARK_SUITE_TIER1) + assert r2.n_total == len(BENCHMARK_SUITE_TIER2) + assert r3.n_total == len(BENCHMARK_SUITE_TIER3) + assert r4.n_total == len(BENCHMARK_SUITE_TIER4) + # Strict ordering by tier depth. + assert r1.n_total < r2.n_total < r3.n_total < r4.n_total diff --git a/tests/test_est_closeout_integration.py b/tests/test_est_closeout_integration.py new file mode 100644 index 0000000..8b55a58 --- /dev/null +++ b/tests/test_est_closeout_integration.py @@ -0,0 +1,320 @@ +"""EST.7 — integration tests that exercise the full M-EST stack end-to-end. + +Individual packages (EST.1 GPU filter, EST.2 freq cost model, EST.3 IQR / +CV confidence, EST.5 cross-device probe, EST.6 prediction log) all have +their own focused tests. This file checks the *boundaries between them*: + +- GPU filter + freq cost model: a freq prediction on a GPU host falls + through to the cost model, which itself respects ``gpu_used=True`` when + selecting the SP anchor. +- Cross-device probe + prediction log: a calibration run on a GPU host + produces both CPU-tagged and GPU-tagged perf records, and subsequent + predictions partition them correctly. +- IQR outlier rejection + freq cost model: a noisy SP pool produces a + freq prediction whose confidence reflects the SP anchor's variance. +- Mode normalization + plan expansion: every supported ``mode=`` string + produces an executable plan of the expected length. + +Each test seeds an isolated perf-log via ``QUANTUI_LOG_DIR`` so it can't +collide with the user's real history. +""" + +from __future__ import annotations + +import pytest + +from quantui.benchmarks import _MODE_TO_SUITE, _build_execution_plan +from quantui.calc_log import estimate_time, log_calculation + + +@pytest.fixture +def isolated_log_dir(tmp_path, monkeypatch): + monkeypatch.setenv("QUANTUI_LOG_DIR", str(tmp_path)) + return tmp_path + + +def _seed( + *, + calc_type: str, + method: str, + basis: str, + n_atoms: int, + n_electrons: int, + n_basis: int, + elapsed_s: float, + gpu_used: bool = False, + n_iter: int = 10, +): + log_calculation( + formula="X", + n_atoms=n_atoms, + n_electrons=n_electrons, + method=method, + basis=basis, + n_iterations=n_iter, + elapsed_s=elapsed_s, + converged=True, + n_basis=n_basis, + n_cores=1, + calc_type=calc_type, + gpu_used=gpu_used, + ) + + +class TestGpuFilterIntegrationWithCostModel: + """EST.1 + EST.2: when a freq estimate falls back to the cost model + on a GPU host, the SP anchor must respect ``gpu_used=True`` — + otherwise we'd predict GPU freq cost from CPU SP history.""" + + def test_gpu_freq_anchor_picks_gpu_sp(self, isolated_log_dir): + # Seed CPU SP records at 10 s each + GPU SP records at 1 s each + # for the same (method, basis). A correct freq prediction on + # ``gpu_used=True`` must use the 1 s anchor → ~21 s total, not + # ~210 s (which would imply the CPU anchor was used). + for _ in range(5): + _seed( + calc_type="single_point", + method="B3LYP", + basis="6-31G*", + n_atoms=3, + n_electrons=10, + n_basis=24, + elapsed_s=10.0, + gpu_used=False, + ) + for _ in range(5): + _seed( + calc_type="single_point", + method="B3LYP", + basis="6-31G*", + n_atoms=3, + n_electrons=10, + n_basis=24, + elapsed_s=1.0, + gpu_used=True, + ) + # Predict GPU freq. + est_gpu = estimate_time( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="6-31G*", + n_basis=24, + n_cores=1, + calc_type="frequency", + gpu_used=True, + ) + assert est_gpu is not None + # With 1 s anchor: 1 + 2*1 + 6*3*1 = 21 s. + assert est_gpu["seconds"] < 50.0, ( + f"GPU freq prediction {est_gpu['seconds']:.1f}s suggests " + "the CPU anchor leaked through the GPU filter" + ) + + # Predict CPU freq for cross-check: should be ~10× larger. + est_cpu = estimate_time( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="6-31G*", + n_basis=24, + n_cores=1, + calc_type="frequency", + gpu_used=False, + ) + assert est_cpu is not None + assert ( + est_cpu["seconds"] > est_gpu["seconds"] * 5 + ), "CPU prediction should be substantially slower than GPU" + + +class TestIqrConfidenceWithCostModel: + """EST.3 + EST.2: a noisy SP anchor should propagate ``confidence=low`` + through the cost model — users shouldn't see "high confidence" on a + freq prediction built from wildly variable SP history.""" + + def test_noisy_sp_pool_yields_lower_freq_confidence(self, isolated_log_dir): + # Tight SP pool → high confidence. + for v in (1.0, 1.05, 0.98, 1.02, 1.01, 0.99, 1.0, 1.03): + _seed( + calc_type="single_point", + method="B3LYP", + basis="STO-3G", + n_atoms=3, + n_electrons=10, + n_basis=7, + elapsed_s=v, + ) + tight_freq = estimate_time( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=7, + n_cores=1, + calc_type="frequency", + ) + assert tight_freq is not None + # Tight pool's CV is well below 0.15 → "high" confidence. + assert tight_freq["confidence"] == "high" + + +class TestModeNormalizationToPlanLength: + """EST.5 + EST.4 boundary: every supported mode string + (gpu, no-gpu) + combination must produce a non-empty plan whose length matches the + documented expansion rules.""" + + @pytest.mark.parametrize( + "mode,gpu_available,expansion", + [ + ("tier1", False, 0), + ("tier1", True, 0), # tier1 ignores GPU + ("tier2", False, 0), + ("tier2", True, 0), # tier2 ignores GPU + ("tier3", False, 0), + ("tier4", False, 0), + ("short", True, 0), # alias for tier1 + ("long", True, 0), # alias for tier2 + ], + ) + def test_no_expansion_paths(self, mode, gpu_available, expansion): + suite = _MODE_TO_SUITE[mode] + plan = _build_execution_plan(suite, mode, gpu_available) + assert len(plan) == len(suite) + expansion + + @pytest.mark.parametrize("mode", ["tier3", "tier4"]) + def test_gpu_tier3_or_4_expansion_count_matches_probe_set(self, mode): + from quantui.benchmarks import _CROSS_DEVICE_PROBE_LABELS + + suite = _MODE_TO_SUITE[mode] + plan = _build_execution_plan(suite, mode, gpu_available=True) + n_probes_in_suite = sum( + 1 for entry in suite if entry[0] in _CROSS_DEVICE_PROBE_LABELS + ) + # Each probe entry adds exactly 1 extra plan entry (the CPU twin). + assert len(plan) == len(suite) + n_probes_in_suite + + +class TestPostHfEstimatesUseCostModel: + """EST.2 must work for MP2/CCSD freq calcs too — these are the + expensive anchors in tier 4 and need an estimate.""" + + def test_mp2_freq_falls_back_to_cost_model(self, isolated_log_dir): + for _ in range(3): + _seed( + calc_type="single_point", + method="MP2", + basis="cc-pVDZ", + n_atoms=3, + n_electrons=10, + n_basis=24, + elapsed_s=8.0, + ) + est = estimate_time( + n_atoms=3, + n_electrons=10, + method="MP2", + basis="cc-pVDZ", + n_basis=24, + n_cores=1, + calc_type="frequency", + ) + assert est is not None + # Post-HF Hessian multiplier is larger, so total should be + # noticeably more than the equivalent HF/DFT case. + assert est["seconds"] > 8.0 # well above SP alone + + +class TestFreqCostModelDoesNotAffectNonFreqEstimates: + """Regression guard: my EST.2 fallback must NOT change predictions for + SP / geometry_opt / TDDFT calcs.""" + + def test_sp_prediction_unchanged_when_no_freq_records(self, isolated_log_dir): + for _ in range(5): + _seed( + calc_type="single_point", + method="B3LYP", + basis="STO-3G", + n_atoms=3, + n_electrons=10, + n_basis=7, + elapsed_s=1.5, + ) + sp = estimate_time( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=7, + n_cores=1, + calc_type="single_point", + ) + assert sp is not None + # Strategy 1: median(eff) × n_basis^β / n_cores → ~1.5 s. + assert sp["seconds"] == pytest.approx(1.5, rel=0.05) + + def test_geometry_opt_returns_none_without_geo_history(self, isolated_log_dir): + # SP pool exists but no geometry_opt records. The cost model is + # freq-only — geometry_opt must still return None. + for _ in range(5): + _seed( + calc_type="single_point", + method="B3LYP", + basis="STO-3G", + n_atoms=3, + n_electrons=10, + n_basis=7, + elapsed_s=1.0, + ) + est = estimate_time( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=7, + n_cores=1, + calc_type="geometry_opt", + ) + assert est is None + + +class TestPredictionLogIntegration: + """EST.6 already shipped its own focused tests. This is a thin + integration check: estimate_time + log_prediction can be composed + in a single workflow without conflict.""" + + def test_estimate_then_log_round_trip(self, isolated_log_dir): + from quantui.calc_log import get_prediction_history, log_prediction + + for _ in range(5): + _seed( + calc_type="single_point", + method="B3LYP", + basis="STO-3G", + n_atoms=3, + n_electrons=10, + n_basis=7, + elapsed_s=1.0, + ) + est = estimate_time( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=7, + n_cores=1, + calc_type="single_point", + ) + assert est is not None + log_prediction( + predicted_s=float(est["seconds"]), + actual_s=1.2, + calc_type="single_point", + method="B3LYP", + basis="STO-3G", + confidence=str(est["confidence"]), + ) + history = get_prediction_history() + assert len(history) == 1 + assert history[0]["predicted_s"] == pytest.approx(est["seconds"]) + assert history[0]["actual_s"] == pytest.approx(1.2) diff --git a/tests/test_est_cross_device_probe.py b/tests/test_est_cross_device_probe.py new file mode 100644 index 0000000..9aa1912 --- /dev/null +++ b/tests/test_est_cross_device_probe.py @@ -0,0 +1,316 @@ +"""Tests for M-EST / EST.5 — cross-device CPU/GPU probe in tier 3+4. + +The goal of EST.5 is that a single tier-4 calibration run on a GPU host +populates the analytics dashboard's GPU-vs-CPU speedup table without +asking users to manually re-run the suite under ``QUANTUI_DISABLE_GPU=1``. +The mechanism is to expand the execution plan so a SMALL representative +subset of entries appears twice — once forced-CPU, once GPU — and the +worker process sets ``QUANTUI_DISABLE_GPU=1`` before any PySCF / +gpu4pyscf import on the CPU variant. + +These tests are platform-independent: they exercise ``_build_execution_plan`` +directly (a pure function) plus a smoke test on ``_calibration_worker`` +to confirm the env-var toggle happens before quantui imports. The actual +GPU-vs-CPU wall-clock validation lives in manual WSL testing (EST.7). +""" + +from __future__ import annotations + +import os + +import pytest + +from quantui.benchmarks import ( + _CROSS_DEVICE_PROBE_LABELS, + _MODE_TO_SUITE, + _build_execution_plan, +) + + +class TestProbeLabelsExist: + """The probe labels must actually match entries in the tier 3/4 suites + — a typo here would silently disable the cross-device probe with no + test failure if we only checked the expansion machinery.""" + + def test_all_probe_labels_present_in_tier3(self): + labels_in_suite = {entry[0] for entry in _MODE_TO_SUITE["tier3"]} + missing = _CROSS_DEVICE_PROBE_LABELS - labels_in_suite + assert not missing, ( + f"Probe labels not found in tier3 suite: {missing}. " + f"Either add them to the suite or fix the labels." + ) + + def test_all_probe_labels_present_in_tier4(self): + labels_in_suite = {entry[0] for entry in _MODE_TO_SUITE["tier4"]} + missing = _CROSS_DEVICE_PROBE_LABELS - labels_in_suite + assert not missing, f"Probe labels not found in tier4 suite: {missing}" + + def test_probe_set_is_short(self): + # Doubling the whole suite would blow the time budget — keep this + # set small (≤5) so cross-device pairs cost ~5-10 min, not 30+. + assert 1 <= len(_CROSS_DEVICE_PROBE_LABELS) <= 5 + + +class TestNoGpuHostBehavior: + """On a CPU-only machine the plan must NEVER expand — cross-device + pairs are meaningless without a GPU to compare against.""" + + @pytest.mark.parametrize("mode", ["tier1", "tier2", "tier3", "tier4"]) + def test_no_expansion_on_cpu_only(self, mode): + suite = _MODE_TO_SUITE[mode] + plan = _build_execution_plan(suite, mode, gpu_available=False) + assert len(plan) == len(suite) + + def test_no_force_cpu_flags_on_cpu_only(self): + plan = _build_execution_plan( + _MODE_TO_SUITE["tier4"], "tier4", gpu_available=False + ) + assert all(p["force_cpu"] is False for p in plan) + + def test_no_label_suffixes_on_cpu_only(self): + plan = _build_execution_plan( + _MODE_TO_SUITE["tier4"], "tier4", gpu_available=False + ) + for p in plan: + assert "[GPU]" not in p["label"] + assert "[CPU]" not in p["label"] + + +class TestGpuHostTier1And2: + """Tier 1/2 are pure-SP smoke tests. Even on a GPU host they should + NOT expand — the cross-device data lives in tier 3+4 only because + those are the tiers users actually run when they want speedup data.""" + + @pytest.mark.parametrize("mode", ["tier1", "tier2"]) + def test_no_expansion_for_tier1_or_2(self, mode): + suite = _MODE_TO_SUITE[mode] + plan = _build_execution_plan(suite, mode, gpu_available=True) + assert len(plan) == len(suite) + + def test_legacy_aliases_no_expansion(self): + # ``"short"`` / ``"long"`` are tier1/tier2 aliases — same rule. + for legacy in ("short", "long"): + suite = _MODE_TO_SUITE[legacy] + plan = _build_execution_plan(suite, legacy, gpu_available=True) + assert len(plan) == len(suite) + + +class TestGpuHostTier3And4Expansion: + """The whole point of EST.5: GPU host + tier3/4 must produce CPU+GPU + pairs for each probe label.""" + + @pytest.mark.parametrize("mode", ["tier3", "tier4"]) + def test_expansion_increases_plan_size(self, mode): + suite = _MODE_TO_SUITE[mode] + plan = _build_execution_plan(suite, mode, gpu_available=True) + n_probe_in_suite = sum( + 1 for entry in suite if entry[0] in _CROSS_DEVICE_PROBE_LABELS + ) + # Each probe entry produces 2 plan entries (original count + n_probe extras). + assert len(plan) == len(suite) + n_probe_in_suite + + @pytest.mark.parametrize("mode", ["tier3", "tier4"]) + def test_each_probe_label_appears_twice(self, mode): + suite = _MODE_TO_SUITE[mode] + plan = _build_execution_plan(suite, mode, gpu_available=True) + for probe_label in _CROSS_DEVICE_PROBE_LABELS: + # Probe entries are renamed to include [GPU] / [CPU] suffix. + gpu_count = sum(1 for p in plan if p["label"] == f"{probe_label} [GPU]") + cpu_count = sum(1 for p in plan if p["label"] == f"{probe_label} [CPU]") + assert gpu_count == 1, f"Expected exactly 1 GPU variant of {probe_label}" + assert cpu_count == 1, f"Expected exactly 1 CPU variant of {probe_label}" + + def test_cpu_variants_carry_force_cpu_flag(self): + plan = _build_execution_plan( + _MODE_TO_SUITE["tier4"], "tier4", gpu_available=True + ) + cpu_entries = [p for p in plan if "[CPU]" in p["label"]] + gpu_entries = [p for p in plan if "[GPU]" in p["label"]] + assert cpu_entries, "Expected at least one CPU-tagged plan entry" + assert gpu_entries, "Expected at least one GPU-tagged plan entry" + assert all(p["force_cpu"] is True for p in cpu_entries) + assert all(p["force_cpu"] is False for p in gpu_entries) + + def test_non_probe_entries_keep_original_label_and_no_force_cpu(self): + suite = _MODE_TO_SUITE["tier4"] + plan = _build_execution_plan(suite, "tier4", gpu_available=True) + non_probe_originals = [ + entry[0] for entry in suite if entry[0] not in _CROSS_DEVICE_PROBE_LABELS + ] + for label in non_probe_originals: + matching = [p for p in plan if p["label"] == label] + assert len(matching) == 1, ( + f"Non-probe entry {label!r} should appear exactly once " + f"(unchanged), got {len(matching)}" + ) + assert matching[0]["force_cpu"] is False + + def test_plan_entries_preserve_calc_type(self): + # The freq probe must keep calc_type="frequency"; the SP probes + # must keep "single_point". A bug that defaults everything to + # SP would silently break the freq-on-CPU vs freq-on-GPU pair. + plan = _build_execution_plan( + _MODE_TO_SUITE["tier4"], "tier4", gpu_available=True + ) + freq_probe = [ + p for p in plan if p["label"].startswith("H₂O B3LYP/STO-3G [Freq]") + ] + assert len(freq_probe) == 2 # GPU + CPU variants + assert all(p["calc_type"] == "frequency" for p in freq_probe) + + sp_probe = [p for p in plan if p["label"].startswith("H₂O B3LYP/6-31G* [")] + assert len(sp_probe) == 2 + assert all(p["calc_type"] == "single_point" for p in sp_probe) + + +class TestPlanEntryShape: + """Plan entries must have all the fields the worker's positional args + expect — adding a field to one path but forgetting the other has + bitten us before.""" + + def test_all_required_fields_present(self): + required = { + "label", + "atoms", + "coords", + "charge", + "multiplicity", + "method", + "basis", + "calc_type", + "force_cpu", + } + plan = _build_execution_plan( + _MODE_TO_SUITE["tier4"], "tier4", gpu_available=True + ) + for p in plan: + missing = required - p.keys() + assert not missing, f"Plan entry missing fields {missing}: {p}" + + +class TestWorkerEnvVarToggle: + """The worker must set QUANTUI_DISABLE_GPU=1 BEFORE any quantui / + gpu4pyscf import, otherwise the cached ``is_gpu_available()`` probe + sees the parent's environment and the CPU variant ends up using GPU. + + We can't easily test the import-order property without an actual + subprocess spawn, but we can confirm the env var IS set by the time + the worker's body executes. The worker accepts a ``result_queue``; + we monkeypatch ``Molecule`` to capture the env state at call time + and skip the rest of the calc.""" + + def test_force_cpu_true_sets_disable_gpu_env(self, monkeypatch, tmp_path): + # Strip any pre-existing value so we can see the worker set it. + monkeypatch.delenv("QUANTUI_DISABLE_GPU", raising=False) + + # Sentinel raise to short-circuit the worker after env-setup. + class _StopEarly(Exception): + pass + + captured_env: dict = {} + + def _spy_molecule(*args, **kwargs): + captured_env["QUANTUI_DISABLE_GPU"] = os.environ.get( + "QUANTUI_DISABLE_GPU", "" + ) + raise _StopEarly("captured") + + monkeypatch.setattr("quantui.molecule.Molecule", _spy_molecule) + + from quantui.benchmarks import _calibration_worker + + class _StubQueue: + def __init__(self): + self.items = [] + + def put(self, item): + self.items.append(item) + + q = _StubQueue() + log_path = tmp_path / "cal.log" + log_path.write_text("") + + _calibration_worker( + ["H", "H"], + [[0.0, 0.0, 0.0], [0.0, 0.0, 0.74]], + 0, + 1, + "RHF", + "STO-3G", + "single_point", + str(log_path), + q, + "test-cal-id", + True, # force_cpu + ) + assert captured_env.get("QUANTUI_DISABLE_GPU") == "1" + + def test_force_cpu_false_does_not_touch_env(self, monkeypatch, tmp_path): + monkeypatch.delenv("QUANTUI_DISABLE_GPU", raising=False) + + class _StopEarly(Exception): + pass + + captured_env: dict = {} + + def _spy_molecule(*args, **kwargs): + captured_env["QUANTUI_DISABLE_GPU"] = os.environ.get( + "QUANTUI_DISABLE_GPU", "" + ) + raise _StopEarly("captured") + + monkeypatch.setattr("quantui.molecule.Molecule", _spy_molecule) + + from quantui.benchmarks import _calibration_worker + + class _StubQueue: + def __init__(self): + self.items = [] + + def put(self, item): + self.items.append(item) + + q = _StubQueue() + log_path = tmp_path / "cal.log" + log_path.write_text("") + + _calibration_worker( + ["H", "H"], + [[0.0, 0.0, 0.0], [0.0, 0.0, 0.74]], + 0, + 1, + "RHF", + "STO-3G", + "single_point", + str(log_path), + q, + "test-cal-id", + False, # force_cpu + ) + # No env var set by the worker → still unset (== "" sentinel). + assert captured_env.get("QUANTUI_DISABLE_GPU") == "" + + +class TestCalibrationResultTotal: + """The dataclass's ``n_total`` property must reflect the expanded + plan length, not just the raw suite size, so the UI's progress + denominator stays correct on a GPU-host tier-4 run.""" + + def test_default_falls_back_to_suite_size(self): + from quantui.benchmarks import CalibrationResult + + r = CalibrationResult(timestamp="t", mode="tier4") + assert r.n_total == len(_MODE_TO_SUITE["tier4"]) + + def test_expected_steps_overrides_suite_size(self): + from quantui.benchmarks import CalibrationResult + + r = CalibrationResult(timestamp="t", mode="tier4", expected_steps=42) + assert r.n_total == 42 + + def test_expected_steps_zero_falls_back(self): + from quantui.benchmarks import CalibrationResult + + # 0 is the "no override" sentinel — must NOT shadow the suite size. + r = CalibrationResult(timestamp="t", mode="tier3", expected_steps=0) + assert r.n_total == len(_MODE_TO_SUITE["tier3"]) diff --git a/tests/test_est_estimator.py b/tests/test_est_estimator.py new file mode 100644 index 0000000..b56ddf9 --- /dev/null +++ b/tests/test_est_estimator.py @@ -0,0 +1,316 @@ +"""Tests for M-EST estimator hardening. + +Covers: + +- **EST.1**: GPU-aware filtering — passing ``gpu_used`` partitions the + candidate pool so GPU-history predicts GPU runs and CPU-history + predicts CPU runs. Includes the partition-fallback path (insufficient + records → fall back to mixed pool, downgrade confidence). +- **EST.3**: IQR outlier rejection — a single anomalously-slow record + no longer dominates the median. +- **EST.3**: variance-aware confidence — high-variance pools report + "low" confidence even with many samples. + +All tests are platform-independent. ``perf_log.jsonl`` is redirected to +``tmp_path`` via the ``QUANTUI_LOG_DIR`` env var so the user's real log +is never touched. +""" + +from __future__ import annotations + +import json + +import pytest + +from quantui.calc_log import ( + _coefficient_of_variation, + _confidence_label, + _iqr_filter, + estimate_time, +) + + +@pytest.fixture +def isolated_log_dir(tmp_path, monkeypatch): + monkeypatch.setenv("QUANTUI_LOG_DIR", str(tmp_path)) + return tmp_path + + +def _seed_perf_log(log_dir, records): + path = log_dir / "perf_log.jsonl" + with path.open("w", encoding="utf-8") as fh: + for r in records: + fh.write(json.dumps(r) + "\n") + return path + + +def _rec( + *, + elapsed_s: float, + gpu_used=None, + method="B3LYP", + basis="STO-3G", + n_basis=15, + n_electrons=10, + calc_type="single_point", + converged=True, + n_cores=1, +): + r = { + "timestamp": "2026-05-25T12:00:00+00:00", + "formula": "H2O", + "n_atoms": 3, + "n_electrons": n_electrons, + "method": method, + "basis": basis, + "n_iterations": 10, + "elapsed_s": elapsed_s, + "converged": converged, + "n_basis": n_basis, + "n_cores": n_cores, + "calc_type": calc_type, + } + if gpu_used is not None: + r["gpu_used"] = gpu_used + return r + + +# ===================================================================== +# EST.1 — GPU-aware filtering +# ===================================================================== + + +class TestGpuAwareFiltering: + def test_gpu_pool_used_when_requested(self, isolated_log_dir): + # 5 GPU records (fast) + 5 CPU records (slow) for the same calc. + records = [_rec(elapsed_s=1.0, gpu_used=True) for _ in range(5)] + records += [_rec(elapsed_s=10.0, gpu_used=False) for _ in range(5)] + _seed_perf_log(isolated_log_dir, records) + + gpu_est = estimate_time( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=15, + calc_type="single_point", + gpu_used=True, + ) + cpu_est = estimate_time( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=15, + calc_type="single_point", + gpu_used=False, + ) + + assert gpu_est is not None + assert cpu_est is not None + # GPU prediction should land near 1.0 s; CPU near 10.0 s. + assert gpu_est["seconds"] < 3.0 + assert cpu_est["seconds"] > 5.0 + # And they should differ by roughly the recorded factor. + assert cpu_est["seconds"] / gpu_est["seconds"] > 3.0 + + def test_none_gpu_used_uses_full_pool(self, isolated_log_dir): + # Default callers (gpu_used=None) get the mixed-pool estimate. + records = [_rec(elapsed_s=1.0, gpu_used=True) for _ in range(3)] + records += [_rec(elapsed_s=11.0, gpu_used=False) for _ in range(3)] + _seed_perf_log(isolated_log_dir, records) + + est = estimate_time( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=15, + calc_type="single_point", + # gpu_used omitted → None → no partition + ) + assert est is not None + # The mixed-pool median falls between the GPU and CPU clusters. + assert 1.0 < est["seconds"] < 11.0 + + def test_pre_session55_records_count_as_cpu(self, isolated_log_dir): + # Old records have no `gpu_used` key. Requesting gpu_used=False + # must still admit them (they predate GPU support; conservative + # assumption is they ran CPU-side). + records = [_rec(elapsed_s=10.0) for _ in range(5)] + # Remove the gpu_used key from each (already absent — _rec + # only adds it when explicit). Sanity check: + assert all("gpu_used" not in r for r in records) + _seed_perf_log(isolated_log_dir, records) + + cpu_est = estimate_time( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=15, + calc_type="single_point", + gpu_used=False, + ) + assert cpu_est is not None + # Should predict roughly 10 s. + assert 5.0 < cpu_est["seconds"] < 20.0 + + def test_gpu_partition_fallback_downgrades_confidence(self, isolated_log_dir): + # Only 1 GPU record (not enough to partition) + 5 CPU records. + records = [_rec(elapsed_s=1.0, gpu_used=True)] + records += [_rec(elapsed_s=10.0, gpu_used=False) for _ in range(5)] + _seed_perf_log(isolated_log_dir, records) + + gpu_est = estimate_time( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=15, + calc_type="single_point", + gpu_used=True, + ) + assert gpu_est is not None + # The cpu pool has 6 entries → would normally be "high" or + # "medium"; with GPU fallback the confidence is downgraded one + # notch. + assert gpu_est["confidence"] in ("medium", "low") + + +# ===================================================================== +# EST.3 — IQR outlier rejection +# ===================================================================== + + +class TestIqrFilter: + def test_passes_through_small_pools(self): + # IQR isn't meaningful on N < 4 — preserve all values. + assert _iqr_filter([1.0, 2.0, 3.0]) == [1.0, 2.0, 3.0] + + def test_drops_high_outlier(self): + # 4 values clustered near 10, one anomalous 100. + result = _iqr_filter([10.0, 10.5, 9.5, 10.2, 100.0]) + assert 100.0 not in result + # The clustered values are preserved. + for v in (10.0, 10.5, 9.5, 10.2): + assert v in result + + def test_drops_low_outlier(self): + result = _iqr_filter([100.0, 105.0, 95.0, 102.0, 1.0]) + assert 1.0 not in result + + def test_all_equal_pool_unchanged(self): + # IQR = 0 → no fence — return everything. + assert _iqr_filter([5.0, 5.0, 5.0, 5.0, 5.0]) == [5.0, 5.0, 5.0, 5.0, 5.0] + + +class TestEstimatorOutlierRobustness: + def test_single_outlier_does_not_dominate_prediction(self, isolated_log_dir): + # 5 records ~1 s + 1 anomalous 100 s record. The naive median is + # ~1 s already (the outlier sits at position 6/6); but if the + # outlier is included the IQR-filtered median should still be 1 s. + records = [_rec(elapsed_s=1.0) for _ in range(5)] + records.append(_rec(elapsed_s=100.0)) + _seed_perf_log(isolated_log_dir, records) + + est = estimate_time( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=15, + calc_type="single_point", + ) + assert est is not None + # Without IQR, including the 100s outlier shifts the median to 1s + # too (same result here since 5 of 6 cluster at 1.0). The strong + # case: a 5/5 split would pull naive mean badly; check that we're + # close to 1 s and that n_samples reflects the filter dropped at + # least one record. + assert est["seconds"] < 3.0 + + +# ===================================================================== +# EST.3 — Variance-aware confidence +# ===================================================================== + + +class TestCoefficientOfVariation: + def test_low_variance(self): + # All values within 1% of mean — CV ~ 0.005. + cv = _coefficient_of_variation([10.0, 10.05, 9.95, 10.02]) + assert cv < 0.05 + + def test_high_variance(self): + # Values spanning 1-10s on a single (method, basis) — CV > 0.4. + cv = _coefficient_of_variation([1.0, 5.0, 10.0, 3.0, 8.0]) + assert cv > 0.4 + + def test_zero_mean_returns_zero(self): + assert _coefficient_of_variation([0.0, 0.0, 0.0]) == 0.0 + + def test_single_value_returns_zero(self): + assert _coefficient_of_variation([5.0]) == 0.0 + + +class TestConfidenceLabel: + def test_low_variance_high_samples_yields_high(self): + # 6 samples, all ~10 s → CV < 0.15 → "high" + assert _confidence_label([10.0, 10.1, 9.9, 10.05, 9.95, 10.02], 6) == "high" + + def test_high_variance_yields_low_even_with_many_samples(self): + # 10 samples spanning 1-30 → CV > 0.35 → "low" + wild = [1.0, 5.0, 30.0, 2.0, 25.0, 4.0, 28.0, 3.0, 20.0, 10.0] + assert _confidence_label(wild, len(wild)) == "low" + + def test_few_samples_cap_at_medium(self): + # 3 samples is enough for CV but caps below "high" + assert _confidence_label([10.0, 10.05, 9.95], 3) == "medium" + + def test_under_three_samples_always_low(self): + assert _confidence_label([10.0, 10.05], 2) == "low" + + def test_medium_variance_yields_medium(self): + # CV around 0.25 — between the 0.15 and 0.35 thresholds → "medium" + med = [10.0, 14.0, 7.0, 12.0, 8.0, 11.0] + label = _confidence_label(med, len(med)) + assert label == "medium" + + +class TestEstimatorVarianceAwareConfidence: + def test_high_variance_pool_reports_low_confidence(self, isolated_log_dir): + # 6 records but with huge spread — confidence MUST be "low", + # not "high" just because n_samples >= 5. + records = [_rec(elapsed_s=t) for t in (1.0, 5.0, 30.0, 2.0, 25.0, 4.0)] + _seed_perf_log(isolated_log_dir, records) + + est = estimate_time( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=15, + calc_type="single_point", + ) + assert est is not None + assert est["confidence"] == "low" + + def test_tight_pool_with_many_samples_reports_high(self, isolated_log_dir): + # 10 tightly-clustered samples — confidence should be "high". + records = [ + _rec(elapsed_s=t) + for t in (1.0, 1.02, 0.98, 1.01, 0.99, 1.03, 0.97, 1.0, 1.0, 1.0) + ] + _seed_perf_log(isolated_log_dir, records) + + est = estimate_time( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=15, + calc_type="single_point", + ) + assert est is not None + assert est["confidence"] == "high" diff --git a/tests/test_est_frequency_cost_model.py b/tests/test_est_frequency_cost_model.py new file mode 100644 index 0000000..5872977 --- /dev/null +++ b/tests/test_est_frequency_cost_model.py @@ -0,0 +1,478 @@ +"""Tests for M-EST / EST.2 — frequency cost model. + +The cost model decomposes a freq estimate into:: + + freq_total ≈ scf_anchor + hessian_term + ir_intensity_term + +This file exercises the helper :func:`quantui.calc_log._estimate_frequency_cost` +directly (no PySCF needed) plus the integration with :func:`estimate_time` +(falls back to the cost model when direct freq history is empty). + +Each test seeds a temporary perf-log via the ``QUANTUI_LOG_DIR`` env +var override so we don't touch the user's real log. +""" + +from __future__ import annotations + +import pytest + +from quantui.calc_log import ( + _HESSIAN_MULTIPLIER_HF_DFT, + _HESSIAN_MULTIPLIER_POST_HF, + _estimate_frequency_cost, + estimate_time, + log_calculation, +) + + +@pytest.fixture +def isolated_perf_log(tmp_path, monkeypatch): + """Redirect calc_log to a temp dir so tests don't pollute the user's log.""" + monkeypatch.setenv("QUANTUI_LOG_DIR", str(tmp_path)) + return tmp_path + + +def _seed_sp_record( + *, + formula: str, + n_atoms: int, + n_electrons: int, + method: str, + basis: str, + elapsed_s: float, + n_basis: int, + gpu_used: bool = False, +): + """Write one converged single-point record into the temp perf log.""" + log_calculation( + formula=formula, + n_atoms=n_atoms, + n_electrons=n_electrons, + method=method, + basis=basis, + n_iterations=10, + elapsed_s=elapsed_s, + converged=True, + n_basis=n_basis, + n_cores=1, + calc_type="single_point", + gpu_used=gpu_used, + ) + + +class TestCostModelStructure: + """The decomposition must show its work: every component scales the + way the docstring claims.""" + + def test_returns_none_when_no_sp_anchor(self, isolated_perf_log): + # No SP history → no anchor → cost model can't fire. + est = _estimate_frequency_cost( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=7, + ) + assert est is None + + def test_returns_dict_when_sp_anchor_available(self, isolated_perf_log): + # Two SP records → strategy 1 fires → cost model has an anchor. + for elapsed in (1.0, 1.2): + _seed_sp_record( + formula="H2O", + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + elapsed_s=elapsed, + n_basis=7, + ) + est = _estimate_frequency_cost( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=7, + ) + assert est is not None + assert "seconds" in est + assert "confidence" in est + assert "n_samples" in est + assert est["seconds"] > 0 + + def test_returns_none_for_zero_atoms(self): + est = _estimate_frequency_cost( + n_atoms=0, n_electrons=0, method="RHF", basis="STO-3G" + ) + assert est is None + + +class TestCostModelArithmetic: + """The model is ``scf + hessian + 6N×scf / workers``. With workers=1 + and a known SP anchor, we can predict the exact total.""" + + def test_water_b3lyp_total_matches_decomposition(self, isolated_perf_log): + # Seed water B3LYP/STO-3G SP at exactly 1.0 s with all-equal samples + # so IQR can't drop anything and median == 1.0. + for _ in range(5): + _seed_sp_record( + formula="H2O", + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + elapsed_s=1.0, + n_basis=7, + ) + # SP anchor for n_basis=7, β=3.5, n_cores=1: predicted == 1.0 s. + sp = estimate_time( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=7, + n_cores=1, + calc_type="single_point", + ) + assert sp is not None + scf_anchor = sp["seconds"] + # Now the freq cost model: 1 + 2*1 + 6*3*1/1 = 21 s. + cost = _estimate_frequency_cost( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=7, + n_cores=1, + ) + assert cost is not None + expected = ( + scf_anchor + _HESSIAN_MULTIPLIER_HF_DFT * scf_anchor + 6 * 3 * scf_anchor + ) + assert cost["seconds"] == pytest.approx(expected, rel=1e-6) + + def test_post_hf_uses_larger_hessian_multiplier(self, isolated_perf_log): + # Two MP2 SP records → MP2 anchor available. + for _ in range(5): + _seed_sp_record( + formula="H2O", + n_atoms=3, + n_electrons=10, + method="MP2", + basis="cc-pVDZ", + elapsed_s=10.0, + n_basis=24, + ) + cost = _estimate_frequency_cost( + n_atoms=3, + n_electrons=10, + method="MP2", + basis="cc-pVDZ", + n_basis=24, + n_cores=1, + ) + assert cost is not None + # Post-HF: hessian multiplier is _HESSIAN_MULTIPLIER_POST_HF (=6.0). + # Verify the multiplier is meaningfully larger than HF/DFT's (=2.0). + assert _HESSIAN_MULTIPLIER_POST_HF > _HESSIAN_MULTIPLIER_HF_DFT + + def test_scales_linearly_in_n_atoms(self, isolated_perf_log): + # Same anchor cost, but the IR term should grow ~6N. + # We can't seed different n_atoms cleanly with strategy 1, so we + # use strategy 2 (electron count) which is more permissive. + for _ in range(5): + _seed_sp_record( + formula="H2", + n_atoms=2, + n_electrons=2, + method="RHF", + basis="STO-3G", + elapsed_s=1.0, + n_basis=2, + ) + # Predict freq for various n_atoms. The SP anchor should grow + # via the electron-count scale, but the freq prediction should + # ALSO grow with the 6N IR term. + c2 = _estimate_frequency_cost( + n_atoms=2, + n_electrons=2, + method="RHF", + basis="STO-3G", + n_basis=2, + n_cores=1, + ) + c4 = _estimate_frequency_cost( + n_atoms=4, + n_electrons=2, # held fixed to isolate the n_atoms effect + method="RHF", + basis="STO-3G", + n_basis=2, + n_cores=1, + ) + assert c2 is not None and c4 is not None + # ir_term doubles when n_atoms doubles (24 vs 12 displacement SCFs). + # SP anchor doesn't change (electron count fixed, n_basis fixed). + # So total should grow by roughly the additional 12 × scf_anchor. + assert c4["seconds"] > c2["seconds"] + + +class TestParallelIrAwareness: + """The model must reflect whether ``QUANTUI_FREQ_PARALLEL`` would + actually engage on the predicted run.""" + + def test_serial_when_env_var_off(self, isolated_perf_log, monkeypatch): + monkeypatch.delenv("QUANTUI_FREQ_PARALLEL", raising=False) + for _ in range(5): + _seed_sp_record( + formula="C6H6", + n_atoms=12, + n_electrons=42, + method="B3LYP", + basis="6-31G*", + elapsed_s=2.0, + n_basis=120, + ) + cost = _estimate_frequency_cost( + n_atoms=12, + n_electrons=42, + method="B3LYP", + basis="6-31G*", + n_basis=120, + n_cores=8, + ) + assert cost is not None + # Compute SP anchor for the same profile to cross-check. + sp = estimate_time( + n_atoms=12, + n_electrons=42, + method="B3LYP", + basis="6-31G*", + n_basis=120, + n_cores=8, + calc_type="single_point", + ) + assert sp is not None + # Serial: ir_term = 6*12 * anchor = 72 * anchor (no division). + expected = ( + sp["seconds"] + + _HESSIAN_MULTIPLIER_HF_DFT * sp["seconds"] + + 6 * 12 * sp["seconds"] + ) + assert cost["seconds"] == pytest.approx(expected, rel=1e-6) + + def test_parallel_reduces_estimate_when_env_var_on_and_gates_pass( + self, isolated_perf_log, monkeypatch + ): + monkeypatch.setenv("QUANTUI_FREQ_PARALLEL", "1") + for _ in range(5): + _seed_sp_record( + formula="C6H6", + n_atoms=12, + n_electrons=42, + method="B3LYP", + basis="6-31G*", + elapsed_s=2.0, + n_basis=120, + ) + cost_parallel = _estimate_frequency_cost( + n_atoms=12, + n_electrons=42, + method="B3LYP", + basis="6-31G*", + n_basis=120, + n_cores=8, + gpu_used=False, # parallel gated off on GPU + ) + # Compare to serial (same params, different env var). + monkeypatch.delenv("QUANTUI_FREQ_PARALLEL") + cost_serial = _estimate_frequency_cost( + n_atoms=12, + n_electrons=42, + method="B3LYP", + basis="6-31G*", + n_basis=120, + n_cores=8, + gpu_used=False, + ) + assert cost_parallel is not None + assert cost_serial is not None + # Parallel divides the 72-SCF IR term by effective_workers (= 4 + # on an 8-core host per pick_worker_count). Total should be + # noticeably smaller. + assert cost_parallel["seconds"] < cost_serial["seconds"] + # Sanity: parallel can't reduce to less than (1 + Hessian) × scf + # since only the 6N IR term gets divided. With Hessian=2× scf, + # the floor is 3× scf — which is well above zero/negative. + assert cost_parallel["seconds"] > cost_serial["seconds"] * 0.1 + + def test_gpu_run_stays_serial_even_with_env_var( + self, isolated_perf_log, monkeypatch + ): + # parallel_enabled_for_run gates off when gpu_available=True. + monkeypatch.setenv("QUANTUI_FREQ_PARALLEL", "1") + for _ in range(5): + _seed_sp_record( + formula="C6H6", + n_atoms=12, + n_electrons=42, + method="B3LYP", + basis="6-31G*", + elapsed_s=2.0, + n_basis=120, + gpu_used=True, + ) + cost = _estimate_frequency_cost( + n_atoms=12, + n_electrons=42, + method="B3LYP", + basis="6-31G*", + n_basis=120, + n_cores=8, + gpu_used=True, # ← GPU run — parallel must NOT engage + ) + assert cost is not None + sp = estimate_time( + n_atoms=12, + n_electrons=42, + method="B3LYP", + basis="6-31G*", + n_basis=120, + n_cores=8, + calc_type="single_point", + gpu_used=True, + ) + assert sp is not None + # Serial expectation despite env var. + expected = ( + sp["seconds"] + + _HESSIAN_MULTIPLIER_HF_DFT * sp["seconds"] + + 6 * 12 * sp["seconds"] + ) + assert cost["seconds"] == pytest.approx(expected, rel=1e-6) + + +class TestEstimateTimeIntegration: + """``estimate_time(calc_type="frequency")`` must fall back to the + cost model when direct freq history is empty AND return the + direct-history result when one exists.""" + + def test_falls_back_when_no_freq_history(self, isolated_perf_log): + # SP history only — direct strategies 1-4 should fail for freq. + for _ in range(5): + _seed_sp_record( + formula="H2O", + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + elapsed_s=1.0, + n_basis=7, + ) + est = estimate_time( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=7, + n_cores=1, + calc_type="frequency", + ) + assert est is not None + # Should be the cost-model prediction: ~21 s. + assert est["seconds"] > 10.0 # well above SP alone + assert est["seconds"] < 100.0 # within sanity range + + def test_direct_freq_history_wins_over_cost_model(self, isolated_perf_log): + # Seed BOTH SP records AND direct freq records. The freq pool + # is what we want the estimator to use; the cost model should + # never fire when direct data exists. + for _ in range(5): + _seed_sp_record( + formula="H2O", + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + elapsed_s=1.0, + n_basis=7, + ) + # Direct freq runs: ALL exactly 30 s, very different from the + # cost model's predicted ~21 s. + for _ in range(5): + log_calculation( + formula="H2O", + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_iterations=10, + elapsed_s=30.0, + converged=True, + n_basis=7, + n_cores=1, + calc_type="frequency", + ) + est = estimate_time( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=7, + n_cores=1, + calc_type="frequency", + ) + assert est is not None + # Direct freq history dominates → close to 30 s, not 21 s. + assert est["seconds"] == pytest.approx(30.0, rel=1e-6) + + def test_returns_none_when_no_history_at_all(self, isolated_perf_log): + est = estimate_time( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=7, + n_cores=1, + calc_type="frequency", + ) + assert est is None + + +class TestConfidenceInheritance: + """Cost model adds structural assumptions but no new data — it + should never claim higher confidence than the SP anchor.""" + + def test_low_confidence_when_anchor_is_low(self, isolated_perf_log): + # Highly variable SP records → low confidence on the anchor. + # Mix tiny + huge values; IQR will still trim but CV will be high. + for v in (1.0, 1.2, 1.1, 5.0, 6.0): + _seed_sp_record( + formula="H2O", + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + elapsed_s=v, + n_basis=7, + ) + sp = estimate_time( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=7, + n_cores=1, + calc_type="single_point", + ) + cost = _estimate_frequency_cost( + n_atoms=3, + n_electrons=10, + method="B3LYP", + basis="STO-3G", + n_basis=7, + n_cores=1, + ) + assert sp is not None and cost is not None + # Cost model inherits the SP anchor's confidence. + assert cost["confidence"] == sp["confidence"] + assert cost["n_samples"] == sp["n_samples"] diff --git a/tests/test_est_prediction_log.py b/tests/test_est_prediction_log.py new file mode 100644 index 0000000..6866858 --- /dev/null +++ b/tests/test_est_prediction_log.py @@ -0,0 +1,312 @@ +"""Tests for M-EST / EST.6 — predicted-vs-actual feedback log. + +After each ``_do_run``, QuantUI now writes a record to +``prediction_log.jsonl`` with the estimator's pre-run prediction + +the actual wall-clock outcome. The analytics dashboard surfaces: + +- headline cards (median absolute error %, % within 25%, bias, etc.) +- a scatter of predicted vs actual with a y=x reference line +- a "consider re-running calibration" banner when median |error| > 50% + +All tests are platform-independent. ``prediction_log.jsonl`` is +redirected to ``tmp_path`` via ``QUANTUI_LOG_DIR``. +""" + +from __future__ import annotations + +import inspect +import json + +import pytest + +from quantui import analytics +from quantui.calc_log import ( + _prediction_log_path, + get_prediction_history, + log_prediction, +) + + +@pytest.fixture +def isolated_log_dir(tmp_path, monkeypatch): + monkeypatch.setenv("QUANTUI_LOG_DIR", str(tmp_path)) + return tmp_path + + +# ===================================================================== +# log_prediction / get_prediction_history +# ===================================================================== + + +class TestLogPrediction: + def test_writes_record_with_all_fields(self, isolated_log_dir): + log_prediction( + predicted_s=10.0, + actual_s=12.5, + method="B3LYP", + basis="6-31G*", + calc_type="single_point", + formula="H2O", + confidence="high", + gpu_used=False, + ) + records = get_prediction_history() + assert len(records) == 1 + r = records[0] + assert r["predicted_s"] == 10.0 + assert r["actual_s"] == 12.5 + assert r["method"] == "B3LYP" + assert r["calc_type"] == "single_point" + assert r["formula"] == "H2O" + assert r["confidence"] == "high" + assert r["gpu_used"] is False + # Derived field: signed error percentage. + assert r["error_pct"] == 25.0 + + def test_underprediction_yields_positive_error(self, isolated_log_dir): + # Predicted 1 min, took 5 min — error_pct should be +400% (actual + # is 4x the prediction, i.e. 400% larger). + log_prediction( + predicted_s=60.0, + actual_s=300.0, + method="B3LYP", + basis="6-31G*", + calc_type="frequency", + ) + r = get_prediction_history()[0] + assert r["error_pct"] == 400.0 + + def test_overprediction_yields_negative_error(self, isolated_log_dir): + # Predicted 100 s, took 50 s — error_pct should be -50%. + log_prediction( + predicted_s=100.0, + actual_s=50.0, + method="RHF", + basis="STO-3G", + calc_type="single_point", + ) + r = get_prediction_history()[0] + assert r["error_pct"] == -50.0 + + def test_no_estimate_records_none_error(self, isolated_log_dir): + # When the estimator returned no estimate (insufficient history), + # we still log the actual outcome so the dashboard counts the + # "no-estimate" runs separately. + log_prediction( + predicted_s=None, + actual_s=1.5, + method="B3LYP", + basis="STO-3G", + calc_type="single_point", + ) + r = get_prediction_history()[0] + assert r["predicted_s"] is None + assert r["error_pct"] is None + assert r["actual_s"] == 1.5 + + def test_zero_predicted_does_not_div_by_zero(self, isolated_log_dir): + # Defensive: predicted_s=0 is nonsensical but mustn't crash. + log_prediction( + predicted_s=0.0, + actual_s=1.0, + method="RHF", + basis="STO-3G", + calc_type="single_point", + ) + r = get_prediction_history()[0] + assert r["error_pct"] is None # zero-protected path + + def test_path_honors_quantui_log_dir(self, isolated_log_dir): + # The fixture sets QUANTUI_LOG_DIR. The prediction log must + # land there, not in ~/.quantui/logs. + log_prediction( + predicted_s=1.0, + actual_s=1.0, + method="RHF", + basis="STO-3G", + calc_type="single_point", + ) + assert _prediction_log_path().parent == isolated_log_dir + + +# ===================================================================== +# Analytics metrics +# ===================================================================== + + +class TestPredictionAccuracyMetrics: + def test_empty_records(self): + m = analytics._prediction_accuracy_metrics([]) + assert m["n_total"] == 0 + assert m["median_abs_error_pct"] is None + assert m["median_signed_error_pct"] is None + assert m["pct_within_25"] is None + + def test_all_within_25_pct(self): + # Spread of 10% / 15% / 20% / 5% — all within 25%. + records = [ + {"predicted_s": 1.0, "actual_s": 1.1, "error_pct": 10.0}, + {"predicted_s": 1.0, "actual_s": 1.15, "error_pct": 15.0}, + {"predicted_s": 1.0, "actual_s": 1.2, "error_pct": 20.0}, + {"predicted_s": 1.0, "actual_s": 1.05, "error_pct": 5.0}, + ] + m = analytics._prediction_accuracy_metrics(records) + assert m["pct_within_25"] == 100.0 + + def test_mixed_within_25(self): + # 2 of 4 within 25%, 2 outside (one +60%, one -40%). + records = [ + {"predicted_s": 1.0, "actual_s": 1.1, "error_pct": 10.0}, + {"predicted_s": 1.0, "actual_s": 1.2, "error_pct": 20.0}, + {"predicted_s": 1.0, "actual_s": 1.6, "error_pct": 60.0}, + {"predicted_s": 1.0, "actual_s": 0.6, "error_pct": -40.0}, + ] + m = analytics._prediction_accuracy_metrics(records) + assert m["pct_within_25"] == 50.0 + + def test_signed_median_picks_up_bias(self): + # All four runs over-ran the prediction → positive bias. + records = [ + {"predicted_s": 1.0, "actual_s": 1.5, "error_pct": 50.0}, + {"predicted_s": 1.0, "actual_s": 1.6, "error_pct": 60.0}, + {"predicted_s": 1.0, "actual_s": 1.4, "error_pct": 40.0}, + {"predicted_s": 1.0, "actual_s": 1.7, "error_pct": 70.0}, + ] + m = analytics._prediction_accuracy_metrics(records) + assert m["median_signed_error_pct"] is not None + assert m["median_signed_error_pct"] > 0 # positive bias + + def test_no_estimate_records_excluded_from_error_stats(self): + # 2 records with no estimate + 2 with — the metrics use only + # the 2 that have data, and report the no-estimate count. + records = [ + {"predicted_s": None, "actual_s": 1.0, "error_pct": None}, + {"predicted_s": None, "actual_s": 2.0, "error_pct": None}, + {"predicted_s": 1.0, "actual_s": 1.1, "error_pct": 10.0}, + {"predicted_s": 1.0, "actual_s": 1.2, "error_pct": 20.0}, + ] + m = analytics._prediction_accuracy_metrics(records) + assert m["n_total"] == 4 + assert m["n_with_estimate"] == 2 + assert m["n_no_estimate"] == 2 + assert m["median_abs_error_pct"] == 15.0 + + +# ===================================================================== +# Dashboard rendering +# ===================================================================== + + +def _seed_perf_log(log_dir): + """Seed perf_log so build_dashboard doesn't early-return None.""" + p = log_dir / "perf_log.jsonl" + p.write_text( + json.dumps( + { + "timestamp": "2026-05-25T12:00:00+00:00", + "formula": "H2O", + "method": "B3LYP", + "basis": "STO-3G", + "elapsed_s": 1.0, + "converged": True, + } + ) + + "\n", + encoding="utf-8", + ) + + +def _seed_prediction_log(log_dir, records): + p = log_dir / "prediction_log.jsonl" + with p.open("w", encoding="utf-8") as fh: + for r in records: + fh.write(json.dumps(r) + "\n") + + +class TestDashboardPredictionSection: + def test_section_present_when_predictions_exist(self, isolated_log_dir): + _seed_perf_log(isolated_log_dir) + _seed_prediction_log( + isolated_log_dir, + [ + { + "timestamp": "2026-05-25T12:00:00+00:00", + "predicted_s": 1.0, + "actual_s": 1.1, + "error_pct": 10.0, + "method": "B3LYP", + "basis": "STO-3G", + "calc_type": "single_point", + }, + { + "timestamp": "2026-05-25T12:01:00+00:00", + "predicted_s": 5.0, + "actual_s": 6.0, + "error_pct": 20.0, + "method": "B3LYP", + "basis": "STO-3G", + "calc_type": "single_point", + }, + ], + ) + out = analytics.build_dashboard() + assert out is not None + html = out.read_text(encoding="utf-8") + assert "Prediction accuracy" in html + # Headline metric should appear (median |error| = 15%). + assert "15.0%" in html + + def test_empty_state_when_no_predictions(self, isolated_log_dir): + _seed_perf_log(isolated_log_dir) + # No prediction_log.jsonl written. + out = analytics.build_dashboard() + html = out.read_text(encoding="utf-8") + assert "Prediction accuracy" in html + assert "No predictions logged yet" in html + + def test_banner_when_median_error_exceeds_threshold(self, isolated_log_dir): + _seed_perf_log(isolated_log_dir) + # All four predictions off by 60%+ → median absolute > 50%. + _seed_prediction_log( + isolated_log_dir, + [ + { + "timestamp": f"2026-05-25T12:00:{i:02d}+00:00", + "predicted_s": 1.0, + "actual_s": 2.0, + "error_pct": 100.0, + "method": "B3LYP", + "basis": "STO-3G", + "calc_type": "single_point", + } + for i in range(4) + ], + ) + out = analytics.build_dashboard() + html = out.read_text(encoding="utf-8") + # The re-calibrate banner kicks in at median |error| > 50%. + assert "Re-running a deeper calibration tier" in html + + +# ===================================================================== +# _do_run wiring — source-level structure check +# ===================================================================== + + +class TestDoRunWiring: + def test_do_run_captures_predicted_run_s(self): + from quantui import app as _app_mod + + src = inspect.getsource(_app_mod) + # The capture variable name is unique to EST.6. + assert "_predicted_run_s" in src + # And the call to log_prediction happens after log_calculation. + assert "log_prediction(" in src + + def test_do_run_passes_gpu_used_to_estimator(self): + # The pre-run estimate must honour the device prediction so the + # logged predicted_s matches what the user saw in the UI. + from quantui import app as _app_mod + + src = inspect.getsource(_app_mod) + assert "_predicted_gpu_used" in src diff --git a/tests/test_export_cube_and_bundle.py b/tests/test_export_cube_and_bundle.py new file mode 100644 index 0000000..7a73d2b --- /dev/null +++ b/tests/test_export_cube_and_bundle.py @@ -0,0 +1,200 @@ +"""Tests for the M-EXPORT / EXPORT.5 cube + bundle helpers. + +Both helpers are pure-Python (shutil + Path) so all tests run on every +platform, no PySCF / RDKit / ASE required. + +What we want to lock down: + +* ``export_cube`` copies the source cube to ``/