Recursive Formal Alignment Factory — Autonomous Vericoding Trajectory Generation & Code Model Training
RFA Factory is an end-to-end pipeline that generates ultra-high-density vericoding trajectories using a Multi-Agent Logic Loop (MALL) and then trains a small but powerful code model (1B-3B parameters) that is competitive with much larger models on real-world software engineering benchmarks.
The entire pipeline — data generation, training, and evaluation — runs on a single T4 GPU (16 GB VRAM) via Google Colab Pro.
Seed Problem → Architect (formal spec) → Developer (implementation)
→ Verification Loop (Python contracts → Dafny → Lean4)
→ Tester (with LDB runtime tracing) → Security Audit
→ Cross-Language (Rust + C++) → VeRPO Rewards → Trajectory
Each trajectory captures the complete correction process — from specification through multiple failure/critique/repair cycles to verified solution — producing data with dramatically higher information density than standard code datasets.
git clone https://github.com/Edmon02/rfa-factory.git
cd rfa-factory
python -m venv .venv && source .venv/bin/activate
pip install -e ".[all]"Or install components separately:
pip install -e ".[llm]" # LLM providers (Anthropic, OpenAI)
pip install -e ".[verification]" # Hypothesis, sympy
pip install -e ".[training]" # PyTorch, Unsloth, QLoRA
pip install -e ".[security]" # Bandit, Semgrepcp .env.example .env
# Edit .env with your API keysexport RFA_LLM_PROVIDER=anthropic
export RFA_LLM_API_KEY=your-key-here
export RFA_LLM_MODEL=claude-sonnet-4-20250514import asyncio
from rfa_factory.config import RFAConfig
from rfa_factory.llm import create_llm
from rfa_factory.seeds.miner import KnowledgeMiner
from rfa_factory.topology.mall import MALLOrchestrator
from rfa_factory.trajectory.serializer import TrajectorySerializer
async def main():
config = RFAConfig()
llm = create_llm(config.llm)
miner = KnowledgeMiner(llm)
seeds = await miner.mine_all()
orchestrator = MALLOrchestrator(config, llm)
trajectories = await orchestrator.run_batch(seeds)
serializer = TrajectorySerializer(config.trajectory_output_dir)
serializer.save_trajectories(trajectories)
asyncio.run(main())The MALL orchestrates four specialized agents that simulate a complete software development lifecycle with formal verification:
| Agent | Role | Output |
|---|---|---|
| Architect | Decomposes problems into formal specifications | Pydantic types, pre/postconditions, Dafny/Lean4 specs |
| Developer | Implements code, performs structured repair | Working implementation + repair traces |
| Tester | Generates tests, captures LDB runtime traces | Test suites + variable-state traces at basic blocks |
| Security Auditor | SAST scanning + OWASP review | Vulnerability findings with CWE IDs |
- Python Contracts — Pydantic type validation → Hypothesis property testing → sympy symbolic invariants
- Dafny — Formal proofs with postcondition checking
- Lean4 — Theorem-level formal verification
- Knowledge Miner — 170+ built-in seeds from RFCs, GoF/enterprise patterns, popular APIs, ADR templates
- OSS-Instruct — 74 curated code snippets across Python, Rust, C++, Go, TypeScript → teacher LLM → novel problems
- Evol-Instruct — 5 evolution heuristics + Tree-of-Evolution branching with quality gates
- VeRPO Dense Rewards — 5-dimensional scoring: test pass rate, property satisfaction, proof strength, security compliance, efficiency
- CodeScaler — Execution-free reward model for pre-filtering (predicts VeRPO score without running verification)
Every verified Python solution is translated to Rust and C++ from the same formal spec, with verification applied to each translation.
Every trajectory chunk is tagged with source, confidence, verification_level, and a provenance_chain tracking how the data was generated and verified.
Optimized for Google Colab T4 GPU (14 GB VRAM budget):
| Stage | Method | Details |
|---|---|---|
| SFT | Unsloth + QLoRA | 4-bit NF4, rank=64, DoRA, gradient checkpointing, FlashAttention-2 |
| RLVR | GRPO | VeRPO dense rewards, KL-constrained policy optimization |
Qwen2.5-Coder-3B-Instruct — selected for highest logic density per parameter after RFA data training.
| Component | GB |
|---|---|
| Base model (4-bit NF4) | 1.8 |
| LoRA adapters (r=64) | 0.4 |
| Optimizer (AdamW 8-bit) | 0.8 |
| Gradients (checkpointed) | 0.8 |
| Activations (batch=1, seq=16384) | 8.0 |
| KV cache | 1.5 |
| Overhead | 0.7 |
| Total | 14.0 |
After initial training on 10k trajectories, the trained model replaces the teacher LLM to generate the next 100k+:
- Load trained model as local vLLM backend
- Generate new trajectories using the model as teacher
- Pre-filter with CodeScaler reward model (threshold > 0.7)
- Full verification only on top candidates
- Merge verified trajectories with original dataset
- Retrain on combined data
- Repeat until convergence
| Benchmark | Category |
|---|---|
| SWE-bench Verified | Real-world issue resolution |
| SWE-bench Pro | Advanced software engineering |
| CyberGym | Security-focused coding |
| Terminal-Bench 2.0 | Terminal/CLI tasks |
| SWE-bench Multimodal | Visual + code reasoning |
| SWE-bench Multilingual | Multi-language SE |
rfa_factory/
├── rfa_factory/
│ ├── config.py # Central configuration (Pydantic)
│ ├── llm/ # Multi-provider LLM abstraction
│ │ ├── base.py # BaseLLM, CostTracker, RateLimiter
│ │ ├── anthropic_provider.py
│ │ ├── openai_provider.py
│ │ └── local.py # vLLM/Ollama
│ ├── agents/ # Multi-agent system
│ │ ├── base.py # BaseAgent, AgentMessage, EpistemicTag
│ │ ├── architect.py # FormalSpec generation
│ │ ├── developer.py # Implementation + repair
│ │ ├── tester.py # Test generation + LDB traces
│ │ └── security_auditor.py # bandit + semgrep + OWASP
│ ├── topology/
│ │ └── mall.py # Multi-Agent Logic Loop orchestrator
│ ├── seeds/ # Knowledge mining + evolution
│ │ ├── miner.py # 170+ built-in seed templates
│ │ ├── oss_instruct.py # 74 code snippets → novel problems
│ │ └── evol_instruct.py # 5 heuristics + Tree-of-Evolution
│ ├── verification/ # Formal verification silos
│ │ ├── python_contracts.py # Pydantic + Hypothesis + sympy
│ │ ├── dafny_verifier.py # Dafny formal proofs
│ │ ├── lean4_verifier.py # Lean4 formal proofs
│ │ └── sandbox.py # Execution sandbox + LDB tracing
│ ├── rewards/
│ │ ├── verpo.py # VeRPO 5-dimensional dense rewards
│ │ └── codescaler.py # Execution-free reward model
│ ├── trajectory/
│ │ ├── serializer.py # JSONL + HuggingFace Dataset export
│ │ └── epistemic.py # Epistemic provenance tagging
│ ├── crosslang/
│ │ └── translator.py # Python → Rust/C++ translation
│ └── training/
│ ├── data_loader.py # SFT/DPO/RLVR dataset preparation
│ ├── trainer.py # Unsloth QLoRA + GRPO
│ └── evaluator.py # Benchmark runners
├── notebooks/
│ ├── 01_Data_Generation.ipynb
│ ├── 02_Training.ipynb
│ └── 03_Evaluation.ipynb
├── tests/
├── pyproject.toml
├── requirements.txt
├── CHANGELOG.md
├── CONTRIBUTING.md
├── CODE_OF_CONDUCT.md
└── LICENSE
| Provider | Models | Notes |
|---|---|---|
| Anthropic | Claude Sonnet 4, Claude Opus 4 | Recommended for data generation |
| OpenAI | GPT-4o, GPT-4o-mini | Alternative provider |
| Local | Any vLLM/Ollama model | Free — used in recursive self-improvement |
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
If you use RFA Factory in your research, please cite:
@software{rfa_factory_2026,
title = {RFA Factory: Recursive Formal Alignment Factory for Code Model Training},
author = {Edmon02},
year = {2026},
url = {https://github.com/Edmon02/rfa-factory}
}This project is licensed under the MIT License — see LICENSE for details.