Live Demo: https://124-creator.github.io/ScholarLoop/ | GitHub: https://github.com/124-creator/ScholarLoop
ScholarLoop is a competition-grade AI Agent prototype for complex academic paper search. It decomposes research questions, combines BM25, dense retrieval and cross-encoder reranking, then presents recommendations as verifiable evidence cards and evidence matrices.
Status: public-safe competition snapshot, updated through M180. The GitHub Pages demo is a static Studio walkthrough. Realtime endpoints are included in the source for local execution, but the public page does not call private keys or fabricate live results.
A normal paper search page usually returns a title list. ScholarLoop focuses on three stricter questions:
- Is the user question decomposed into searchable objects, methods, datasets, metrics and controversy points?
- Are the recommendations better than BM25 / single-pass retrieval under a reproducible benchmark protocol?
- Can each recommendation reason be traced back to a title, abstract, source span or external metadata record instead of being invented by a model?
The following metrics come from saved evaluation artifacts under reports/.
| Module | Evidence |
|---|---|
| Retrieval benchmark | LitSearch 597 queries |
| A-v2 ranking | F1 = 0.1312, Recall@20 = 0.7564, NDCG@20 = 0.5657 |
| BM25 baseline | F1 = 0.0964, Recall@20 = 0.5683, NDCG@20 = 0.3931 |
| Significance (LitSearch) | A-v2 vs BM25 delta-F1 = 0.0348, 95% CI = [0.0287, 0.0409] |
| Cross-benchmark zero-shot | RealScholarQuery/PaSa: A-v2 F1 = 0.1972 vs BM25 F1 = 0.1058, delta-F1 = 0.0914, 95% CI = [0.0657, 0.1176], permutation p < 1e-4 under the frozen M040 config |
| Evidence matrix | 30 rendered query docs, 0 fabricated citation fields in the public report |
| External metadata | OpenAlex / Crossref resolver layer; 82 / 90 sample cards resolved in M050 |
| Click-to-verify demo | 1170 span checks; 989 highlightable fields; mismatch = 0; 120 trace steps; fabrication = 0 |
| Public smoke tests | 6 lightweight public-snapshot tests pass; the original full local suite recorded 77 passed in reports/m180/pytest.txt |
| Latest presentation polish | M180 keeps live-query author/year/DOI as "to be verified" instead of guessing |
Open: https://124-creator.github.io/ScholarLoop/
The public page is designed for recruiter and judge review:
- Search Loop: query decomposition -> hybrid retrieval -> reranking -> evidence cards.
- Trust Loop: source-span verification -> artifact trace -> human-review boundary.
- Click-to-verify: fields are highlighted only when
source_text[char_span] == field value; otherwise they remain marked for manual review. - Realtime honesty: live search is optional in local runtime; unavailable states stay explicit and never fabricate recommendation rows.
Verified offline demo endpoints in local runtime: /, /pro, /studio, /api/search, /api/verify_span, /api/trail.
Complex research question
-> query decomposition
-> candidate retrieval: keyword / BM25 / dense embedding
-> reranking and feature fusion
-> evidence card generation
-> source / DOI / author-year verification
-> Web evidence matrix and review artifacts
Main source tree:
src/scholarloop/
connectors/ OpenAlex, Crossref, Semantic Scholar, arXiv metadata connectors
corpus/ benchmark and corpus loading helpers
evidence/ evidence cards, matrix rendering, field status verification
query/ query decomposition
rank/ fusion and reranking logic
retrieval/ BM25 and dense retrieval helpers
demo/ offline Studio, realtime wrapper, graph and click-to-verify views
web/ lightweight stdlib Web demo renderer
reports/m040/results.json A-v2 ranking and significance artifacts
reports/m060/results.json second-benchmark zero-shot generalization artifacts
reports/m100/ expert-score demo and graph/realtime additions
reports/m120/validation_summary.json interactive demo verification summary
reports/m130..m180/validation_summary.json flagship Studio, bilingual, a11y and realtime-polish checks
docs/submission/ competition submission materials
docs/dev/plans/ original module plans through M180
Large files are excluded on purpose: raw corpora, model caches, .npy, .parquet, .zip, .omx, and secrets are not part of this public snapshot. See PUBLIC_SNAPSHOT.json for the snapshot manifest.
Install lightweight test dependencies:
python -m pip install -r requirements.txtRun public smoke tests:
python -m pytest -qThese tests cover the static Studio page, verified metric artifacts, M180 validation summaries, OpenAlex fixture parsing, public-snapshot exclusions, and high-risk secret scanning.
The original full local suite recorded 77 passed in reports/m180/pytest.txt, but reproducing it requires excluded raw/parquet corpora and optional model dependencies. Full LitSearch / RealScholarQuery evaluation is not included because it depends on excluded benchmark/corpus artifacts and runtime caches.
ScholarLoop is one applied case of ResearchLoop: the project was planned and reviewed with a dual-loop workflow covering problem freezing, route selection, Test Oracle, implementation review and retrospective artifacts.
Tian Zhongfei - AI Agent / LLM application engineering portfolio