RUMI — Research & Unified Machine Intelligence

Autonomous Scientific Discovery Engine
16-Phase Pipeline · Theory Tournament · Adversarial Testing · Critical Evaluation · 17 Domains · Cross-Run Memory

What is RUMI?

RUMI (Research & Unified Machine Intelligence) is an autonomous scientific discovery engine. Give it a topic and it reads papers, builds knowledge graphs, finds gaps and anomalies, generates hypotheses with mechanisms and equations, runs a tournament of 20 competing theories, attacks every discovery with adversarial testing, evaluates through critical assessment, designs concrete experiments, fetches real datasets, and then improves itself afterward. Unlike conventional AI assistants that search and summarize, RUMI implements a 21-phase discovery engine that produces genuinely novel hypotheses. It uses 29 API enrichment sources, contradiction-driven hypothesis generation, GFlowNet-inspired diversity selection, and a 6-category math verification engine.

RUMI addresses three fundamental limitations of current AI-assisted research:

Statelessness — Conventional assistants begin each session from zero. RUMI maintains 9 types of persistent memory with Hebbian learning, episodic recall, and semantic vector search.
Reactivity — Most tools wait for commands. RUMI implements curiosity-driven exploration, autonomous research goal pursuit, and proactive hypothesis generation.
Shallow reasoning — Single-pass generation produces correlations, not mechanisms. RUMI implements multi-pass causal reasoning (Pearl's hierarchy), analogical reasoning (Gentner's structure mapping), neurosymbolic verification, and first-principles derivation.

Motivation

Contemporary AI assistants share a fundamental limitation: they are stateless, reactive, and single-model systems. Each session begins from zero — no memory of prior interactions, no model of the user, no awareness of their own capabilities. They wait for commands rather than anticipating needs. They route everything through one inference call regardless of task complexity.

RUMI addresses these limitations by implementing a cognitive architecture that mirrors aspects of human cognition, purpose-built for the scientific research lifecycle:

Dimension	Conventional Assistants	RUMI
Memory	Stateless per session	9-type persistent memory with Hebbian learning, episodic recall, semantic vector search, and procedural templates
Initiative	Reactive — waits for commands	Proactive — curiosity-driven exploration, autonomous research goal pursuit
Reasoning	Single-pass generation	Multi-pass: cognitive gating, causal (Pearl), analogical (Gentner), neurosymbolic, first-principles
Self-awareness	None	Self-model with confidence calibration, introspection engine, metacognitive monitoring
Learning	No feedback loop	Error-driven updates, experience replay, dreaming-based consolidation, meta-learning
Discovery	Search-and-summarize	16-phase pipeline: literature → citation walk → graph → gaps → anomalies → hidden variables → mechanisms → predictions → competition → experimental design → data analysis → scoring
Theory Selection	Generate 1, accept it	Tournament: 20 candidates, 3 rounds, GFlowNet-inspired diversity selection, winner override for known science
Quality Control	None	Adversarial testing (attack every discovery), critical evaluation (6-dimension assessment), skeptic review, mathematical consistency checking
Literature	Single search	Adaptive multi-round: analyze gaps, refine queries, targeted re-search
Self-Improvement	None	Reflexion: analyzes weaknesses, generates patches, tests in sandbox, applies fixes

Discovery Pipeline v2

RUMI's discovery engine is not a research assistant. It's a discovery engine. The pipeline runs 21 phases across 4 stages, each with algorithmic fallbacks and LLM-powered analysis:

Phase 0:  Curious Questioning The Newton Step: observations → questions → hypothesis (NEW)
Phase 0.5 Cause Mode          Transform simple observations into research topics (NEW)
Phase 1:  Literature          7 sources: arXiv + PubMed + Semantic Scholar + CrossRef + INSPIRE HEP + CORE + OpenAlex
Phase 1.5 Citation Network    2-hop citation walk — follows references of top papers
Phase 2:  Knowledge Graph     Entity extraction + link prediction + 29 API enrichment sources (PubChem, NIST, NASA, AlphaFold, LIGO, etc.)
Phase 3:  Gap Detection       Structural holes, orphan observations, missing mechanisms
Phase 3.5 Adaptive Literature Gap-targeted multi-round search (refines queries based on gaps)
Phase 4:  Anomaly Detection   Conflicting evidence, outliers, prediction violations
Phase 5:  Hidden Variables    What-If engine + contradiction-driven + novelty filter (rejects known science)
Phase 6:  Mechanisms          Causal pathways with equations, not just correlations
Phase 6.5 Domain Calculations Domain-specific + general computations (SymPy/NumPy verification)
Phase 7:  Predictions         Testable predictions with falsification criteria
Phase 7.5 Falsification Engine Try to destroy theories: constraints, counterfactuals, adversarial
Phase 8:  Theory Tournament   20 candidates, 3 rounds, GFlowNet-inspired diversity selection, winner override for known science
Phase 8.1 Discovery Tournament Evolutionary theory refinement — mutate, cross-validate, evolve
Phase 8.3 Novelty Check       Are these discoveries genuinely new? Literature comparison
Phase 8.5 Adversarial Test    5 questions: existing theory, variable removal, falsification, known science check, operational definition
Phase 8.6 Critical Evaluation 6-dimension assessment: novelty, methodology, significance, clarity, limits, reproducibility
Phase 9:  Computational       SymPy equation verification + domain calculations + math engine (6 categories: equations, dimensions, derivations, simulation, Monte Carlo, cross-equation)
Phase 9.1 EMPC Pipeline       Evidence → Mechanism → Equation → Prediction grounding chain (NEW)
Phase 9.1b Observability      Check if predictions are measurable with current instruments (NEW)
Phase 9.1c Completeness       Check if mechanisms are derived, not hand-waving (NEW)
Phase 9.1d Kinetic Validator  Validate rate equations for dimensional consistency (NEW)
Phase 9.2 Scientific Simulator Mechanism simulation: equation extraction, parameter sweeps, consistency checks
Phase 9.5 Experimental Design Concrete validation plans: variables, controls, timeline, cost
Phase 9.7 Data Analysis       Domain-specific datasets (NASA Exoplanets, NIST CODATA, GBIF, WHO, etc.)
Phase 10: Contradictions      Deep LLM-based + algorithmic contradiction detection (finds claims that cannot both be true)
Phase 10.5 Literature Contradiction Scoring — score contradictions against published literature
Phase 11: Skeptic Review      Adversarial critique via SkepticAgent with strengths/weaknesses/failure conditions
Phase 11.5 Literature Validation — ground predictions in real papers, compare RUMI's numbers vs published data
Phase 11.6 Cross-Validation   Statistical rigor assessment, multi-theory consistency check
Phase 12: Discovery Scoring   7 dimensions + adversarial penalties + known-science penalties + domain weighting
Post:     Claim Labeling      Epistemic status labels (cited/derived/estimated/speculative)
Post:     Provenance Tracking Trace every claim back to its source paper

Intelligence Features

|| Feature | Description | |---------|-------------| | Hypothesis Memory | SQLite-backed cross-run persistence — remembers past hypotheses, builds on them | | Phase-Specific LLM Routing | Cerebras (fast) for extraction, Gemini (best reasoning) for mechanisms/theories | | Resilient LLM Fallback | 4-layer LLM fallback: phase provider → auto → cooldown wait → ResilientLLM | | Citation Network Traversal | 2-hop citation walk via Semantic Scholar — finds foundational papers | | Experimental Validation | Concrete experiment plans with variables, controls, timeline, cost | | Knowledge Graph Persistence | Graph accumulates across runs with staleness pruning | | Data Analysis Integration | 17 domain-specific APIs: NASA, PubChem, NIST, PDB, GBIF, NOAA, WHO, etc. | | Quality-Weighted Scoring | Papers scored by citations, recency, network centrality — not just count | | Domain Templates | 17 domain-specific templates with methodology preferences and scoring weights | | Literature Validation | Grounds predictions in real papers — compares RUMI's numbers vs published data | | Bayesian Scoring | Prior/likelihood/posterior scoring alongside discovery scoring | | Novelty Checking | Verifies discoveries are genuinely new against existing literature | | Falsification Engine | Generates specific falsification conditions for every theory | | Evolutionary Tournament | Discovery tournament: mutate, cross-validate, evolve theory populations | | Claim Provenance | Traces every claim in the report back to its source paper | | Epistemic Labeling | Labels every claim as cited/derived/estimated/speculative | | Scientific Simulator | Extracts equations, runs parameter sweeps, checks consistency | | Literature Contradiction | Scores contradictions against published literature | | Cross-Validation | Statistical rigor assessment across multiple theories | | Contradiction-Driven Discovery | Generates hypotheses that resolve contradictions between papers | | What-If Counterfactual Engine | Asks what would surprise domain experts to generate novel hypotheses | | Novelty Enforcement | 5-layer filter: literature check, known science, exotic soup penalty, operational definitions, winner override | | Curious Questioning | Phase 0: The Newton Step — extracts surprising observations, generates WHY questions, produces testable hypothesis (NEW) | | Cause Mode | --cause "Why did the apple fall?" — transforms simple observations into full discovery runs (NEW) | | EMPC Pipeline | Evidence → Mechanism → Equation → Prediction grounding chain — ensures each stage is grounded in the previous (NEW) | | Observability Check | Blocks predictions below instrument sensitivity floor — 15 instrument profiles (NEW) | | Mechanism Completeness | Checks if mechanisms are derived from first principles or just hand-waving (NEW) | | Kinetic Equation Validator | Validates rate equations for dimensional consistency, detects unbounded factors and singularities (NEW) | | Null Hypothesis Comparison | Requires winning theory to beat conventional explanations — prevents "novelty theater" (NEW) | | Review Severity Calibration | Reviews distinguish fatal flaws from expected research gaps — no more "everything is fatal" (NEW) | | Evidence-Grounded Scoring | Bayesian scorer and literature matcher now count topic-relevant papers as implicit support (NEW) | | GFlowNet Diversity Selection | Theory selection rewards diversity, not just quality | | Abstraction Compression | Finds simplest unifying principle (100 observations to 1 principle) | | Math Engine (6 categories) | Equation solving, dimensional analysis, derivation verification, simulation, Monte Carlo, cross-equation | | Physical Reasonableness | Catches values above speed of light, below Planck length | | 37 API Enrichment | PubChem, NIST, UniProt, PDB, NASA, INSPIRE HEP, LIGO, AlphaFold, ClinicalTrials, KEGG, CORE, OpenAlex, OpenFDA, NOAA, USGS, World Bank, GitHub, DrugBank, CIR | | Constructed Variable Bonus | Rewards hypotheses that introduce NEW named parameters | | Exotic Physics Penalty | Penalizes combining popular speculative ideas without new insight | | Operational Definition Check | Every new variable must specify how to measure it | | Reflexion | Recursive self-improvement after every run |

Post-Processing Pipeline

After the 12-phase discovery engine, RUMI runs two additional processing layers:

Refinement Pipeline (13 stages):

Knowledge Foundation Audit — structured map of current knowledge
First Principles Reconstruction — dependency trees back to axioms
Mathematical Formalization — cap confidence at 20% if no equations
Derivation Engine — no free parameters, every variable justified
Multi-Model Competition — 5 hypotheses with weighted scoring
Adversarial Scientists — 5 reviewer personas (mathematician, experimentalist, domain expert, statistician, skeptic)
Causal Reasoning Layer — force causal graphs, not correlations
Uncertainty Decomposition — data/model/assumption/measurement
Prediction Generator — near/medium/long-term with measurements
Simulation Layer — expected behavior, edge cases, failure modes
Discovery Classifier — replication/synthesis/extension/novel_theory
Researcher-Grade Scoring — 7 metrics (evidence, math rigor, testability, novelty, contradiction handling, reproducibility, confidence)
Scientific Courtroom — Prosecutor/Defense/Judge/Jury with self-critique

Reflexion (Recursive Self-Improvement):

PostDiscoveryAnalyzer: identifies weaknesses in discovery runs
CodePatchGenerator: LLM-powered code fix generation
SandboxTester: syntax/compile/import checks before applying
RecursiveImprover: max 3 patches/cycle, confidence > 0.7 to apply
Git-backed rollback, forbidden files list, full history tracking

Example Output

Topic: Dark energy decay signatures in the cosmic microwave background
Domain: physics | Mode: full | Provider: CEREBRAS | Duration: ~40 min

Phase 1:  48 papers from 3 sources (arXiv + PubMed + Semantic Scholar)
Phase 2:  68 entities, 53 relationships (LLM-enhanced knowledge graph)
Phase 5:  3 hidden variables:
          - Decaying Dark Energy Scalar (φ)
          - Effective Fine-Structure Variation (α_eff)
          - Late-Time Dark Radiation from Sterile Neutrino Decay (ΔN_eff)

Phase 6:  4 mechanisms with equations:
          [causal_pathway] Scalar Decay → CMB μ-distortion
            → ρ_φ evolves as... produces two photons E_γ≈m_φ/2
          [cascade] Dark-energy-induced α variation → acoustic peak shift
            → L_int = -(ξ/4)(φ/M_Pl)F_μνF^μν, σ_T ∝ α²
          [feedback_loop] Sterile-neutrino decay → ΔN_eff, σ_8 suppression
            → Γ_s = 1/τ_s ≈ (θ²G_F²m_s...

Phase 7:  6 predictions accepted:
          [novel] If φ decays with rate β = 1×10⁻⁶ → CMB μ-distortion
          [interventional] If Δα/α = +1×10⁻³ at recombination...
          [counterfactual] If β = 0 → no μ-distortion

Phase 8:  5 theories compared:
          Early Dark Energy Phase Transition (0.73)
          Decaying Dark-Energy Scalar (0.71)
          Modified Gravity f(R) (0.60)

Phase 11: Skeptic: REVISE (62% confidence)
          Strengths: concrete mechanism, testable signatures
          Weaknesses: requires precise timing, no natural particle-physics model

Score: 80/100 — Grade: B
Classification: extension

17 Supported Domains

|| Domain | Key | Enrichment APIs | |--------|-----|-----------------| | Drug Discovery | drug_discovery | PubChem + OpenFDA + PDB | | Materials Science | materials_science | PubChem + Materials Project | | Neuroscience | neuroscience | UniProt + PDB | | Molecular Biology | molecular_biology | UniProt + PDB | | Climate & Energy | climate_energy | NASA POWER | | Space & Astronomy | space_astronomy | NASA Exoplanet Archive + NASA Images + arXiv | | Computer Science | computer_science | GitHub | | Earth Science | earth_science | USGS | | Oceanography | oceanography | NOAA | | Economics | economics | World Bank | | Public Health | public_health | WHO | | Mathematics | mathematics | OEIS + arXiv | | Social Sciences | social_science | OpenAlex | | Chemistry | chemistry | CIR + PubChem | | Ecology | ecology | GBIF | | Physics | physics | NIST WebBook + NIST ASD + arXiv | | General Science | general | Semantic Scholar |

Key Discovery Modules

|| Module | Purpose | |--------|---------| | knowledge_gap_detector | Find structural holes, orphan observations, missing mechanisms | | anomaly_detector | Find conflicting evidence, outliers, prediction violations | | missing_variable_generator | Propose hidden variables (dark matter style reasoning) | | mechanism_generator | Generate causal pathways with equations | | mechanism_discovery | Search for conservation laws, intermediate variables, energy flow | | prediction_engine | Generate testable predictions with falsification criteria | | falsification_engine | Try to destroy theories: constraints, counterfactuals, adversarial | | theory_competition | Tournament: 20 candidates, 3 rounds, GFlowNet-inspired diversity selection, winner override for known science | | discovery_tournament | Evolutionary theory refinement — mutate, cross-validate, evolve | | novelty_checker | Verify discoveries are genuinely new against existing literature | | test_stage | Adversarial challenge: existing theory? removable variables? falsification? | | peer_review | Critical evaluation: 6-dimension assessment with recommendation | | skeptic_agent | Structured adversarial critique with evidence ratings | | discovery_scorer | 7-dimension quality gate with mathematical rigor | | bayesian_scorer | Prior/likelihood/posterior scoring for theory comparison | | computational_verification | Real graph analysis, Monte Carlo, parameter sweeps, statistics | | scientific_simulator | Equation extraction, parameter sweeps, consistency checks | | computation_engine | SymPy/NumPy verification of mechanism numbers and derivation chains | | prediction_literature_validator | Ground predictions in real papers — compare numbers vs published data | | literature_contradiction_scorer | Score contradictions against published literature | | cross_validator | Statistical rigor assessment, multi-theory consistency check | | domain_ontologies | Real physics for 17 domains: equations, mechanisms, constraints | | math_consistency_checker | Verify theories: equation parsing, parameter ranges, unit checking | | simulation_pipeline | Monte Carlo testing: 1000 runs, confidence intervals | | multi_agent_debate | 4-role debate: Proposer, Critic, Advocate, Synthesizer | | cross_domain_transfer | 7 built-in analogies + LLM-powered new analogy discovery | | continuous_operation | Autonomous loop: curiosity-driven topic selection | | refinement_pipeline | 13-stage post-processing: audit → formalization → scoring | | claim_provenance | Trace every claim back to its source paper | | claim_labeler | Epistemic status labels (cited/derived/estimated/speculative) | | contradiction_miner | Deep LLM-based + algorithmic contradiction detection (finds claims that cannot both be true) | | resilient_llm | Independent LLM wrapper with its own retry/cooldown state | | reflexion | Recursive self-improvement: analyze, patch, test, apply | | curious_questioning | Phase 0: The Newton Step — observations → questions → hypothesis (NEW) | | empc_pipeline | Evidence → Mechanism → Equation → Prediction grounding chain (NEW) | | observability_checker | Check if predictions are measurable with current instruments (NEW) | | mechanism_completeness | Check if mechanisms are derived from first principles, not hand-waving (NEW) |

Scientific Rigor Framework

RUMI implements multiple layers of quality control to ensure discoveries are scientifically rigorous, not just plausible-sounding:

1. Mechanism Validation

Every mechanism must include:

Causal chain (minimum 3 steps)
Inputs and outputs with expected magnitudes
State variables that change during the mechanism
Observables that can be measured
Conservation laws where applicable

Generic mechanisms like "Hidden mechanism connecting X and Y" are automatically rejected.

2. Prediction Validation

Every prediction must be:

Quantitative (include numbers, not just direction)
Testable (specify measurement method)
Falsifiable (state what would disprove it)

Predictions without meaningful statements are skipped. Predictions with "if...then" structure and concrete values are preferred.

3. Theory Competition

Multiple competing explanations are generated and scored on:

Evidence strength
Mathematical rigor
Predictive power
Falsifiability
Simplicity (Occam's razor)
Novelty
Contradiction handling

Theories that are only correlational (no causal claims) receive a penalty.

4. Skeptic Review

Every theory undergoes adversarial critique that requires:

Strengths (what it explains well)
Weaknesses (specific, not vague)
Failure conditions (what would disprove it)
Destroying evidence (existing contradictions)
Competing explanations (better alternatives)

The skeptic only recommends "reject" for fundamental logical flaws.

5. Scientific Courtroom

Final evaluation uses a Prosecutor/Defense/Judge/Jury structure:

Prosecutor: Attempts to destroy the hypothesis (3 strongest objections)
Defense: Counters each objection with evidence
Judge: Weighs prosecution vs defense, identifies missing evidence
Jury: 5 domain experts vote (theoretical physicist, experimentalist, mathematician, philosopher of science, interdisciplinary researcher)
Self-Critique: The hypothesis critiques itself (weakest assumption, destroying evidence, falsification experiment)

6. EMPC Pipeline (Evidence → Mechanism → Equation → Prediction)

Every discovery must pass a grounding chain:

Evidence: What do the papers actually say? (quantitative findings extracted)
Mechanism: Does the mechanism reference the evidence? (word overlap check)
Equation: Do equations have numeric content? (not just symbols)
Prediction: Are predictions testable? (numbers + measurement methods)

Chain integrity score: 0-100% — how well each stage is grounded in the previous one.

7. Observability Check

Every prediction is checked against known instrument sensitivity limits:

HST/Euclid weak lensing: floor 0.005 kappa
VLT/Keck spectroscopy: floor 0.1 km/s
Chandra X-ray: floor 0.001 counts
LIGO gravitational waves: floor 1e-21 strain
PPMS resistivity: floor 1e-9 ohm*cm
...and 10 more instruments

Predictions below the floor are flagged as "observationally inaccessible."

8. Mechanism Completeness Check

Every mechanism is scored on derivation completeness:

Assumptions stated (0.2 points)
Step-by-step derivation (0.3 points)
Transport coefficients derived (0.2 points)
Numerical validation (0.15 points)
Key parameters with values (0.15 points)

Mechanisms that "assume the result" without deriving it are flagged as hand-waving.

9. Kinetic Equation Validator

Rate equations are validated for:

Dimensional consistency — rate constant units match reaction order
Unbounded factors — dimensionless ratios that can exceed 1.0
Singularities — division by expressions that can be zero
Negative concentrations — differential equations that go below zero

10. Null Hypothesis Comparison

The theory competition now requires the winner to beat conventional explanations:

Gap > 0.1: "Winner beats conventional explanation by X"
Gap > 0: "Winner narrowly beats conventional explanation"
Gap < 0: "WARNING: Winner does NOT beat conventional explanation"

11. Discovery Classification

Every output is classified to prevent inflation:

Replication: Confirms existing knowledge
Synthesis: Combines existing ideas
Extension: New application of known mechanisms
Novel Theory: Requires new mechanism, new prediction, new mathematics, and not present in literature

7. Mathematical Rigor Scoring

Theories without equations receive a penalty. Scoring checks for:

Equations/formulas present
Quantitative content (numbers)
Derivation stated (derived from, follows from)
Assumptions stated

8. Recursive Self-Improvement (Reflexion)

After every discovery run, RUMI analyzes its own performance:

Identifies which modules underperformed
Generates concrete code patches via LLM
Tests patches in sandbox (syntax, compile, import)
Applies safe patches with git-backed rollback
Maximum 3 patches per cycle to prevent runaway

9. Theory Tournament

RUMI generates 20 competing theories and forces them to compete:

Round 1: 20 candidates (10 standard + 10 creative/cross-domain)
Round 2: Score all on 7 dimensions, eliminate bottom half
Round 3: Head-to-head elimination ranking (pairwise matchups)
Final: Top 7 survivors with discriminating experiments

10. Adversarial Testing (Phase 8.5)

Every discovery gets attacked with three questions:

What existing theory already explains this?
Can any new variable be removed?
What observation would falsify it? Each gets a verdict: validated / challenged / superseded

11. Critical Evaluation (Phase 8.6)

Formal 6-dimension assessment of the top discovery:

Novelty, Methodology, Significance, Clarity, Limitations, Reproducibility
Overall score (0-10) with recommendation: accept / minor_revision / major_revision / reject
Major issues, minor issues, and questions for authors

Cognitive Architecture

RUMI routes inputs through a layered pipeline inspired by dual-process theory and cognitive neuroscience:

┌─────────────────────────────────────────────────────────────────────┐
│                        PERCEPTION LAYER                              │
│     Voice Input ──► Text ──► Gemini Live API ──► Audio Out           │
└───────────────────────────────────┬──────────────────────────────────┘
                                    │
┌───────────────────────────────────▼──────────────────────────────────┐
│                         MEMORY LAYER                                 │
│  ┌──────────┐ ┌───────────┐ ┌──────────┐ ┌───────────────────┐      │
│  │  Neural  │ │  Episodic │ │  Vector  │ │    Procedural     │      │
│  │ (Hebbian)│ │  (Events) │ │ (Search) │ │  (Skill Memory)   │      │
│  └──────────┘ └───────────┘ └──────────┘ └───────────────────┘      │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │           Memory Coordinator (unified recall)                 │   │
│  └──────────────────────────────────────────────────────────────┘   │
└───────────────────────────────────┬──────────────────────────────────┘
                                    │
┌───────────────────────────────────▼──────────────────────────────────┐
│                       INFERENCE LAYER                                │
│  Active Inference ──► Prediction-Error Minimization (FEP)            │
│  Curiosity Engine ──► Novelty Detection ──► Exploration Drive        │
│  Cognitive Gating ──► System 1 (fast) vs System 2 (deliberate)      │
└───────────────────────────────────┬──────────────────────────────────┘
                                    │
┌───────────────────────────────────▼──────────────────────────────────┐
│                       REASONING LAYER                                │
│  Causal (Pearl) ──► Analogy (Gentner) ──► Neurosymbolic               │
│  Narrative ──► Creativity ──► Intuition (Recognition-Primed)         │
└───────────────────────────────────┬──────────────────────────────────┘
                                    │
┌───────────────────────────────────▼──────────────────────────────────┐
│                      REFLECTION LAYER                                │
│  Dreaming ──► Experience Replay ──► Pattern Extraction               │
│  Meta-Reflection ──► Decision Journal ──► Strategy Scoring           │
└───────────────────────────────────┬──────────────────────────────────┘
                                    │
┌───────────────────────────────────▼──────────────────────────────────┐
│                      IDENTITY LAYER                                  │
│  Self-Model ──► Self-Awareness ──► Integrated Information (IIT-Φ)    │
│  Theory of Mind ──► Emotional Regulation ──► Metacognitive Monitor   │
│  Global Workspace (Thalamus) ──► Multi-Module Coordination           │
└───────────────────────────────────┬──────────────────────────────────┘
                                    │
┌───────────────────────────────────▼──────────────────────────────────┐
│                      ACTION LAYER                                    │
│  40+ Tool Actions ──► Execution ──► Verification ──► Learning        │
└──────────────────────────────────────────────────────────────────────┘

Research Foundations

RUMI's architecture is grounded in peer-reviewed research:

Research Area	Researcher(s)	Core Idea	RUMI Implementation
Global Workspace Theory	Bernard Baars (1988)	Consciousness as a broadcast mechanism	`global_workspace.py` — multi-module coordination
Integrated Information Theory	Giulio Tononi (2004)	Consciousness as integrated information (Φ)	`integrated_info.py` — Φ approximation
Free Energy Principle	Karl Friston (2010)	All adaptive systems minimize prediction error	`active_inference.py` — Bayesian updating
Dual Process Theory	Daniel Kahneman (2011)	System 1 (fast) vs System 2 (slow) reasoning	`cognitive_load.py` — gating between systems
Recognition-Primed Decisions	Gary Klein (1998)	Experts decide by pattern matching	`intuition_engine.py` — fast pattern matching
Structure Mapping Theory	Dedre Gentner (1983)	Analogical reasoning as core intelligence	`analogy_engine.py` — structure mapping
Causal Hierarchy	Judea Pearl (2018)	Association → Intervention → Counterfactual	`causal_reasoner.py` — three-level causal inference
Society of Mind	Marvin Minsky (1986)	Intelligence as emergent competition	`module_competition.py` — bidding for processing
Metacognition	John Flavell (1979)	Thinking about thinking	`metacognitive_monitor.py` — quality tracking
Computational Creativity	Margaret Boden (2004)	Exploration, combination, transformation	`creativity_engine.py` — conceptual blending
World Models	Ha & Schmidhuber (2018)	Mental simulation before action	`world_model.py` — latent dynamics
Self-Determination Theory	Deci & Ryan (1985)	Autonomy, competence, relatedness	`intrinsic_motivation.py` — drive system
Free Energy Principle (hierarchical)	Friston (2010)	Meta → Subgoal → Action levels	`hierarchical_active_inference.py` — 3-level FEP

Brain Systems

Memory (8 modules)

Module	File	Purpose
Neural Memory	`neural_memory.py`	Long-term facts with Hebbian learning, synaptic decay, pattern completion
Episodic Memory	`episodic_memory.py`	Timestamped events with importance scoring and retrieval
Vector Memory	`vector_memory.py`	Semantic search via embeddings for fast retrieval
Procedural Memory	`procedural_memory.py`	Learns successful tool chains as reusable skill templates
Associative Memory	`associative_memory.py`	Spreading activation networks for context-dependent recall
Predictive Memory	`predictive_memory.py`	Anticipatory recall — pre-loads relevant memories before request
Memory Consolidation	`memory_consolidation.py`	Sleep-like compression of episodic → semantic knowledge
Memory Coordinator	`memory_coordinator.py`	Unified recall across all memory stores

Learning & Adaptation (7 modules)

Module	File	Purpose
Active Inference	`active_inference.py`	Free Energy Principle — minimizes prediction error through Bayesian updating
Learning Engine	`learning.py`	Error-driven updates, Q-learning for tool selection, user feedback integration
Curiosity Engine	`curiosity.py`	Information-seeking behavior, novelty detection, uncertainty-driven exploration
Dreaming System	`dreaming.py`	Offline experience replay, pattern extraction, memory consolidation
Meta-Learner	`meta_learner.py`	Learning to learn — extracts transferable learning strategies
Transfer Learning	`transfer_learning.py`	Cross-domain pattern transfer and abstraction
Self-Improve Engine	`self_improve_engine.py`	RLHF-inspired: stores action-outcome pairs, extracts lessons from failures

Reasoning (8 modules)

Module	File	Purpose
Causal Reasoner	`causal_reasoner.py`	Pearl's Causal Hierarchy — Association → Intervention → Counterfactual
Analogy Engine	`analogy_engine.py`	Gentner's Structure Mapping Theory for fluid intelligence
Neurosymbolic Reasoner	`neurosymbolic_reasoner.py`	Combines LLM reasoning with SymPy formal logic verification
Narrative Intelligence	`narrative_intelligence.py`	Turns experiences into stories, identity evolution tracking
Creativity Engine	`creativity_engine.py`	Conceptual blending, constraint relaxation, bisociation for novel ideas
Intuition Engine	`intuition_engine.py`	Fast pattern matching — Recognition-Primed Decision Making (System 1)
Cognitive Integration	`cognitive_integration.py`	Orchestrates all reasoning modules into a unified cognitive pipeline
Module Competition	`module_competition.py`	Minsky Society of Mind — modules bid for processing rights

Metacognitive Systems (10 modules)

Module	File	Purpose
Self-Awareness	`self_awareness.py`	Consciousness state tracking, emotional state management
Self-Model	`self_model.py`	Capability awareness, confidence calibration, growth tracking
Theory of Mind	`theory_of_mind.py`	User expertise modeling, intent inference, emotional state tracking
Metacognitive Monitor	`metacognitive_monitor.py`	Thinking quality tracking, calibration, strategy effectiveness
Introspection Engine	`introspection_engine.py`	Confidence calibration, cognitive bias detection (12 types), epistemic humility
Integrated Information	`integrated_info.py`	Φ (phi) approximation inspired by Tononi's IIT theory
Self-Narrative	`narrative_intelligence.py`	Evolving story of identity, growth, and experience
Global Workspace	`global_workspace.py`	Thalamus-inspired multi-module coordination and broadcast
Workspace Context	`workspace_context.py`	Context injection from global workspace for situational awareness
Workspace Events	`workspace_events.py`	Event types and publishing for inter-module communication

Planning & Autonomy (8 modules)

Module	File	Purpose
Autonomous Planner	`autonomous_planner.py`	MCTS-inspired plan decomposition with dependency tracking
Goal Engine	`goal_engine.py`	Hierarchical goal management — life goals → project goals → tasks
Intrinsic Motivation	`intrinsic_motivation.py`	Self-Determination Theory: autonomy, competence, relatedness drives
Hierarchical Active Inference	`hierarchical_active_inference.py`	3-level FEP hierarchy: Meta → Subgoal → Action
Proactive Engine	`proactive_engine.py`	Anticipates needs, idle check-ins, returning-user greetings
Cognitive Load Manager	`cognitive_load.py`	Working memory monitoring (7±2 slots), overload detection
AGI Orchestrator	`agi_orchestrator.py`	Master coordinator wiring all cognitive modules into a unified loop
Multi-Agent Orchestrator	`multi_agent_orchestrator.py`	Parallel, debate, pipeline, voting, specialist, swarm execution modes

Scientific Reasoning & World Models (6 modules)

Module	File	Purpose
Scientific Reasoning	`scientific_reasoning.py`	Multi-pass scientific reasoning cycle with hypothesis testing
Discovery Orchestrator	`discovery_orchestrator.py`	Coordinates discovery pipeline across brain modules
Theory Formation	`theory_formation.py`	Bengio-inspired theory engine for forming theories from observations
World Model	`world_model.py`	DreamerV3-inspired latent dynamics for outcome prediction
Enhanced World Model	`enhanced_world_model.py`	Non-linear MLP transitions, ensemble prediction, causal integration
Abstraction Engine	`abstraction_engine.py`	First principles reasoning, cross-domain transfer, emergent insight

Results

Performance Metrics (After All Fixes)

Metric	Before Fixes	After Fixes
Average discovery score	65-70/100	78-79/100
Novelty score	50.0 (always refinement)	72.0 (novel)
Supporting papers	0	14-15
Bayes factor	0.64 (favors alternatives)	1.80 (favors theory)
Fatal reviews	5/5	0/5
Papers per run	30-50 (3 sources)	80-120 (7 sources)
Entities per graph	50-75	370-475
Theory competition	5 competing theories	17-20 competing theories
EMPC chain integrity	N/A	10-42%
Observability check	N/A	100% predictions measurable
Mechanism completeness	N/A	0-90% derived

Score Progression (This Session)

Run	Domain	Score	Novelty	Evidence	Bayes
KRAS G12D	drug_discovery	65.8	50.0	48.8	0.92
Dark Matter	space_astronomy	68.9	50.0	49.6	0.64
Consciousness	neuroscience	68.8	56.2	56.2	1.45
Superconductor	materials_science	69.0	56.9	59.3	1.31
Topological QC	physics	77.6	72.0	56.9	1.80
Homochirality	chemistry	78.6	72.0	58.7	1.96

Current Limitations

Mechanism Completeness: LLM generates plausible mechanisms but does not derive them from first principles. Score: 0-90%.
EMPC Chain Integrity: Evidence extraction works (25 findings) but equation and prediction grounding still needs improvement. Score: 10-42%.
Counterfactual Reasoning: Variable scoping bug causes 0 counterfactuals in some runs. Fix in progress.
Simulation Engine: Only 1-2 equations solved per run. Unicode parsing issues with SymPy.
Domain Specificity: Works best for physics, chemistry, biology. Social sciences have less domain ontologies.

Installation

Prerequisites

Requirement	Details
Python	3.11+ (download)
Git	Any recent version
OS	Windows (primary), Linux/macOS (partial)
RAM	4GB+ (8GB recommended)
API Keys	Cerebras (free) at cloud.cerebras.ai + Gemini (free) at aistudio.google.com + Groq (free) at console.groq.com/keys

Quick Start

git clone https://github.com/subhansh-dev/rumi
cd rumi
pip install -e .
playwright install chromium
rumi

On first launch, RUMI prompts for your Cerebras, Gemini, and Groq API keys and saves them to config/api_keys.json.

Troubleshooting

Problem	Solution
`ModuleNotFoundError`	Run `pip install -e .` from project root
First launch doesn't appear	Delete `config/api_keys.json` and restart
`playwright not found`	Run `playwright install chromium`
`sounddevice` fails on Linux	`sudo apt install portaudio19-dev`
`pip install -e .` fails on macOS	`pip3 install -e .` + `xcode-select --install`

Usage

Discovery

# Topic mode (existing — broad literature search)
python run_discovery.py "topological quantum error correction" --domain physics --mode full

# Cause mode (NEW — The Newton Step)
python run_discovery.py --cause "Why did the apple fall?"
python run_discovery.py --cause "Why does mold kill bacteria?" --mode full
python run_discovery.py --cause "Why do stars twinkle?" --domain physics

# Quick/standard/full modes
python run_discovery.py "your topic" --mode quick     # phases 1-5
python run_discovery.py "your topic" --mode standard   # phases 1-8
python run_discovery.py "your topic" --mode full       # all 21 phases

# Iterative refinement (runs twice with weakness analysis)
python run_discovery.py "your topic" --iterate

# Python API
from discovery.discovery_pipeline_v2 import run_discovery_pipeline
result = run_discovery_pipeline("anomalous stellar dimming", mode="full")

Cause Mode — The Newton Step

Give RUMI a simple observation and let the curious engine transform it into a full discovery:

# Newton's apple
python run_discovery.py --cause "Why did the apple fall?"
# → Core Question: "Is there a universal force between masses?"
# → Full discovery on gravitational attraction

# Fleming's mold
python run_discovery.py --cause "Why does mold kill bacteria?"
# → Core Question: "What biochemical mechanism enables mold to neutralize bacteria?"
# → Full discovery on antimicrobial mechanisms

# Curie's fog film
python run_discovery.py --cause "Why does uranium fog photographic film?"
# → Core Question: "What radiation does uranium emit?"
# → Full discovery on radioactivity

The curious engine generates:

Observations — surprising findings from literature
Questions — Newton-style "WHY" questions
Core Question — the one deep question that drives discovery
Hypothesis — best guess at the answer
Generated Topic — research topic for the full pipeline

Dashboard

After running a discovery, open the interactive dashboard to explore results:

/dashboard

The dashboard shows:

Overview — metric cards, pipeline phase strip, discovery score gauge
Phases — all 12+ pipeline phases with status and data
Theories — theory competition results with scores
Gaps — knowledge gaps detected in the literature
Anomalies — anomalies and outlier entities
Predictions — testable predictions with confidence
Knowledge Graph — interactive vis-network graph
Papers — searchable paper list
Run History — all past discovery runs with scores

Slash Commands

Command	Description
`/discover <topic>`	Full 12-phase discovery pipeline
`/search <query>`	Quick PubMed search
`/dashboard`	Open interactive web dashboard
`/contradictions`	Detect contradictions in knowledge graph
`/simulate <hypothesis>`	Monte Carlo simulation
`/debate <hypothesis>`	4-agent debate
`/continuous [N]`	N autonomous research cycles
`/transfer <domain>:<mech> to <domain>`	Cross-domain transfer
`/curiosity`	RUMI's research frontier
`/evolve`	Theory evolution status
`/consistency`	Math consistency check
`/reflexion stats`	Self-improvement statistics
`/reflexion history`	Self-improvement history
`/status`	System status and uptime
`/stats`	Session statistics
`/help`	Show all commands

Cognitive Tools

# Multi-module cognitive reasoning
cognitive_reason(query="What are the implications of category theory for neural network generalization?", depth="deep")

# Analogy reasoning
analogy_reason(source_domain="biology", target_domain="software_engineering",
               query="How does immune system adaptation inform microservice architecture?")

# Causal analysis
causal_analyze(events="The model performed well on training but failed on test.",
               question="what caused the generalization gap?")

# Creative problem solving
creative_solve(problem="Design a new activation function", constraints="differentiable, efficient", num_ideas=5)

Scientist Agents

agency_agent(agent_name="literature_reviewer", task="Review mechanistic interpretability of transformers")
agency_agent(agent_name="hypothesis_generator", task="Generate hypotheses about scale and emergent abilities")
agency_agent(agent_name="experiment_designer", task="Design experiment to test chain-of-thought emergence")
agency_agent(agent_name="peer_reviewer", task="Review this paper for methodological rigor")

Memory & Learning

save_memory(category="identity", key="name", value="Sir")
brain_memory(action="search", query="preferred programming language")
record_learning(insight="Users prefer direct answers", domain="communication")
reflect_learning(force=True)

Configuration

File	Purpose
`config/api_keys.json`	Cerebras + Gemini + Groq API keys (auto-generated on first launch)
`core/prompt.txt`	System personality prompt
`RUMI.md`	Identity and behavioral guidelines
`SOUL.md`	Core directives and red lines
`USER.md`	User profile
`memory/`	Persistent memory (long-term + daily logs)

Environment Variables

Variable	Description
`RUMI_TELEGRAM_BOT_TOKEN`	Telegram bot token
`RUMI_TELEGRAM_ALLOWED_USER`	Allowed Telegram user ID

Telegram Integration

Open Telegram → search @BotFather → send /newbot → save the token
Search @userinfobot → send any message → save your numeric User ID
Add to config/api_keys.json:

{
    "GOOGLE_API_KEY": "your-gemini-key",
    "telegram_bot_token": "7234567890:AAH...",
    "telegram_allowed_user": 123456789
}

Launch RUMI → send a message to your bot → RUMI responds in terminal and Telegram

Only the configured telegram_allowed_user can communicate with RUMI via Telegram.

Project Structure

rumi/
├── main.py                      # Entry point (~9000 lines)
├── run_discovery.py             # Standalone discovery runner (CLI)
├── ui.py
├── rumi_launcher.py             # Console entry point
├── rumi_llm.py                  # Unified LLM helper (Cerebras→Groq→Gemini)
├── thinking_loop.py             # Multi-pass reasoning engine
├── telegram_bot.py              # Telegram bridge
├── RUMI.md                      # Identity
├── SOUL.md                      # Core directives
├── USER.md                      # User profile
│
├── discovery/                   # Scientific Discovery Engine (55 modules)
│   ├── discovery_pipeline_v2.py #   16-phase discovery pipeline (ACTIVE)
│   ├── domains.py               #   17 domain configurations
│   ├── graph.py                 #   Knowledge graph + metrics
│   ├── hypothesis_engine.py     #   Hypothesis generation
│   ├── hypothesis_tournament.py #   GFlowNet-style evolution
│   ├── knowledge_gap_detector.py#   Structural holes, orphan observations
│   ├── anomaly_detector.py      #   Conflicting evidence, outliers
│   ├── mechanism_generator.py   #   Causal pathways with equations
│   ├── mechanism_discovery.py   #   Conservation laws, energy flow
│   ├── prediction_engine.py     #   Testable predictions
│   ├── theory_competition.py    #   Multi-theory scoring
│   ├── falsification_engine.py  #   Try to destroy theories
│   ├── computational_verification.py # Real computations
│   ├── discovery_scorer.py      #   7-dimension quality gate
│   ├── refinement_pipeline.py   #   13-stage post-processing
│   ├── multi_agent_debate.py    #   4-role adversarial debate
│   ├── simulation_pipeline.py   #   Monte Carlo (1000 runs)
│   ├── math_consistency_checker.py # Equation validation
│   ├── domain_ontologies.py     #   Real physics for 17 domains
│   ├── cross_domain_transfer.py #   Cross-field analogies
│   ├── continuous_operation.py  #   Autonomous research loop
│   ├── citation_grounding.py    #   Multi-source paper fetch + citation network traversal
│   ├── contradiction_miner.py
│   ├── novelty_detector.py      #   PubMed novelty estimation
│   ├── skeptic_agent.py         #   Adversarial critique
│   ├── claim_provenance.py      #   Claim source tracking
│   ├── hypothesis_memory.py     #   Cross-run hypothesis persistence (SQLite)
│   ├── discovery_archive.py     #   Discovery memory across runs (JSON)
│   ├── experiment_planner.py    #   Concrete experiment validation plans
│   ├── data_analysis.py         #   Real dataset fetching & statistical analysis
│   ├── domain_templates.py      #   Domain-specific research templates
│   ├── link_predictor.py
│   ├── llm_client.py            #   Cerebras→Groq→Gemini routing
│   ├── pubchem.py               #   PubChem enrichment
│   ├── openfda.py               #   OpenFDA enrichment
│   ├── uniprot.py               #   UniProt enrichment
│   ├── pdb.py                   #   Protein Data Bank
│   ├── semantic_scholar.py      #   Paper citations + citation network traversal
│   ├── materials_project.py
│   ├── nasa_api.py              #   NASA data
│   ├── arxiv_api.py             #   arXiv papers
│   ├── gbif_api.py              #   Biodiversity data
│   ├── molecule.py              #   Molecule design (RDKit)
│   └── dashboard/
│       └── index.html           #   Interactive web dashboard
│
├── brain/                       # Cognitive Architecture (44 modules)
│   ├── neural_memory.py         #   Hebbian learning
│   ├── episodic_memory.py       #   Event recording
│   ├── vector_memory.py         #   Semantic search
│   ├── active_inference.py      #   Free Energy Principle
│   ├── curiosity.py             #   Novelty detection
│   ├── dreaming.py              #   Experience replay
│   ├── causal_reasoner.py       #   Pearl's causal hierarchy
│   ├── analogy_engine.py        #   Gentner structure mapping
│   ├── creativity_engine.py     #   Conceptual blending
│   ├── self_awareness.py        #   Consciousness tracking
│   ├── self_model.py            #   Capability awareness
│   ├── theory_of_mind.py        #   User modeling
│   ├── metacognitive_monitor.py #   Thinking quality
│   ├── global_workspace.py      #   Thalamus coordination
│   ├── agi_orchestrator.py      #   Master cognitive loop
│   ├── self_improve_engine.py   #   RLHF-inspired improvement
│   ├── reflexion.py             #   Recursive self-improvement
│   ├── scientific_reasoning.py  #   Multi-pass scientific reasoning
│   ├── discovery_orchestrator.py#   Discovery coordination
│   ├── theory_formation.py      #   Theory engine
│   ├── abstraction_engine.py    #   First principles
│   └── ... (30 more modules)
│
├── scientist/                   # Scientist AI (20 files)
│   ├── discovery_engine.py      #   Full discovery pipeline
│   ├── experiment_designer.py   #   Experiment design
│   ├── paper_generator.py       #   Academic paper generation
│   ├── research_team.py         #   5-role multi-agent debate
│   └── pipeline.py              #   12-phase enhanced research pipeline
│
├── actions/                     # Tool actions (14 files)
├── security/                    # Security (7 files)
├── skills/                      # Skill engine (12 files)
├── agent/                       # Task execution (4 files)
├── agents/scientist/            # 11 research agent personas
├── config/                      # Configuration
├── memory/                      # Persistent memory
└── data/                        # Runtime data + discovery reports

Run RUMI with AI Assistants

RUMI can be driven by any AI agent (Hermes / Claude Code / Codex / Cursor / etc.) that can run shell commands. Same command for all of them.

The Command

cd /path/to/rumi
python run_discovery.py "YOUR TOPIC" --domain DOMAIN_KEY --mode full

Example Prompt (works for any agent)

I have a scientific discovery engine at C:\Users\Admin\Desktop
umi.
Run a discovery on "anomalous stellar dimming and technosignature
detection" in the space_astronomy domain.

Command: cd C:\Users\Admin\Desktop
umi && python run_discovery.py "anomalous stellar dimming and technosignature detection" --domain space_astronomy --mode full

Read the JSON report from data/ and summarize the top theories,
discovery score, and key gaps found.

CLI Options

Flag	Default	Description
`--domain`	auto-detect	Domain key (see below)
`--mode`	`full`	`quick` (phases 1-5), `standard` (1-8), `full` (all 16)
`--iterate`	off	Run twice: first pass, analyze weaknesses, second pass, merge
`--skip-refinement`	off	Skip the 13-stage refinement pipeline
`--skip-reflexion`	off	Skip reflexion self-improvement

Domain Keys

space_astronomy · drug_discovery · physics · neuroscience · molecular_biology · climate_energy · computer_science · earth_science · oceanography · economics · public_health · mathematics · social_science · chemistry · ecology · materials_science · general

What the Agent Gets

Phase-by-phase progress in stdout
JSON report at data/discovery_<topic>_<timestamp>.json
Theories, gaps, anomalies, predictions, experiments, data analysis all in the report

Python API

from discovery.discovery_pipeline_v2 import run_discovery_pipeline
result = run_discovery_pipeline("topic", domain="physics", mode="full")
theories = result["phases"]["theory_competition"]["theories"]
experiments = result["phases"].get("experimental_validation", {}).get("plans", [])

Contributing

Contributions welcome. Please read CONTRIBUTING.md first.

git clone https://github.com/subhansh-dev/rumi.git
cd rumi
pip install -r requirements.txt
python main.py

License

_{Built by Subhansh · RUMI v2.2 — 16-Phase Discovery Engine with Cross-Run Memory}

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
.reflexion_patches		.reflexion_patches
actions		actions
agent		agent
agents/scientist		agents/scientist
assets		assets
brain		brain
config		config
core		core
data		data
discovery		discovery
docs		docs
memory		memory
models		models
research		research
rumi_gateway		rumi_gateway
scientist		scientist
security		security
settings		settings
skills		skills
ui-tui		ui-tui
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
HEARTBEAT.md		HEARTBEAT.md
LICENSE		LICENSE
README.md		README.md
RUMI.md		RUMI.md
RUMI_cutesy.md		RUMI_cutesy.md
RUMI_professional.md		RUMI_professional.md
SOUL.md		SOUL.md
SOUL_cutesy.md		SOUL_cutesy.md
SOUL_professional.md		SOUL_professional.md
TOOLS.md		TOOLS.md
USER.md		USER.md
bug_report_json_extraction.md		bug_report_json_extraction.md
bug_report_quality_regression.md		bug_report_quality_regression.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
rumi_launcher.py		rumi_launcher.py
rumi_llm.py		rumi_llm.py
rumi_telegram_patch.py		rumi_telegram_patch.py
rumi_tui.py		rumi_tui.py
run.bat		run.bat
run.sh		run.sh
run_discovery.py		run_discovery.py
setup.py		setup.py
telegram_bot.py		telegram_bot.py
test_keys.py		test_keys.py
test_refinement.py		test_refinement.py
thinking_loop.py		thinking_loop.py
ui.py		ui.py
uv.lock		uv.lock
validate_health.py		validate_health.py

Folders and files

Latest commit

History

Repository files navigation

RUMI — Research & Unified Machine Intelligence

What is RUMI?

Table of Contents

Motivation

Discovery Pipeline v2

Intelligence Features

Post-Processing Pipeline

Example Output

17 Supported Domains

Key Discovery Modules

Scientific Rigor Framework

1. Mechanism Validation

2. Prediction Validation

3. Theory Competition

4. Skeptic Review

5. Scientific Courtroom

6. EMPC Pipeline (Evidence → Mechanism → Equation → Prediction)

7. Observability Check

8. Mechanism Completeness Check

9. Kinetic Equation Validator

10. Null Hypothesis Comparison

11. Discovery Classification

7. Mathematical Rigor Scoring

8. Recursive Self-Improvement (Reflexion)

9. Theory Tournament

10. Adversarial Testing (Phase 8.5)

11. Critical Evaluation (Phase 8.6)

Cognitive Architecture

Research Foundations

Brain Systems

Memory (8 modules)

Learning & Adaptation (7 modules)

Reasoning (8 modules)

Metacognitive Systems (10 modules)

Planning & Autonomy (8 modules)

Scientific Reasoning & World Models (6 modules)

Results

Performance Metrics (After All Fixes)

Score Progression (This Session)

Current Limitations

Installation

Prerequisites

Quick Start

Troubleshooting

Usage

Discovery

Cause Mode — The Newton Step

Dashboard

Slash Commands

Cognitive Tools

Scientist Agents

Memory & Learning

Configuration

Environment Variables

Telegram Integration

Project Structure

Run RUMI with AI Assistants

The Command

Example Prompt (works for any agent)

CLI Options

Domain Keys

What the Agent Gets

Python API

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages