Skip to content

Edmon02/rfa-factory

RFA Factory

Recursive Formal Alignment Factory — Autonomous Vericoding Trajectory Generation & Code Model Training

CI Release License Python 3.10+ Colab T4

Open Data Generation in Colab Open Training in Colab Open Evaluation in Colab


Overview

RFA Factory is an end-to-end pipeline that generates ultra-high-density vericoding trajectories using a Multi-Agent Logic Loop (MALL) and then trains a small but powerful code model (1B-3B parameters) that is competitive with much larger models on real-world software engineering benchmarks.

The entire pipeline — data generation, training, and evaluation — runs on a single T4 GPU (16 GB VRAM) via Google Colab Pro.

Seed Problem → Architect (formal spec) → Developer (implementation)
    → Verification Loop (Python contracts → Dafny → Lean4)
    → Tester (with LDB runtime tracing) → Security Audit
    → Cross-Language (Rust + C++) → VeRPO Rewards → Trajectory

Each trajectory captures the complete correction process — from specification through multiple failure/critique/repair cycles to verified solution — producing data with dramatically higher information density than standard code datasets.

Quick Start

1-Click Colab (Recommended)

Notebook Description Link
01 — Data Generation Run the full RFA pipeline to generate 50k-150k trajectories Open In Colab
02 — Training QLoRA fine-tune + RLVR stage on generated data Open In Colab
03 — Evaluation Run all benchmarks automatically Open In Colab

Local Installation

git clone https://github.com/Edmon02/rfa-factory.git
cd rfa-factory
python -m venv .venv && source .venv/bin/activate
pip install -e ".[all]"

Or install components separately:

pip install -e ".[llm]"          # LLM providers (Anthropic, OpenAI)
pip install -e ".[verification]" # Hypothesis, sympy
pip install -e ".[training]"     # PyTorch, Unsloth, QLoRA
pip install -e ".[security]"     # Bandit, Semgrep

Configuration

cp .env.example .env
# Edit .env with your API keys
export RFA_LLM_PROVIDER=anthropic
export RFA_LLM_API_KEY=your-key-here
export RFA_LLM_MODEL=claude-sonnet-4-20250514

Python API

import asyncio
from rfa_factory.config import RFAConfig
from rfa_factory.llm import create_llm
from rfa_factory.seeds.miner import KnowledgeMiner
from rfa_factory.topology.mall import MALLOrchestrator
from rfa_factory.trajectory.serializer import TrajectorySerializer

async def main():
    config = RFAConfig()
    llm = create_llm(config.llm)
    miner = KnowledgeMiner(llm)
    seeds = await miner.mine_all()
    orchestrator = MALLOrchestrator(config, llm)
    trajectories = await orchestrator.run_batch(seeds)
    serializer = TrajectorySerializer(config.trajectory_output_dir)
    serializer.save_trajectories(trajectories)

asyncio.run(main())

Architecture

Multi-Agent Logic Loop (MALL)

The MALL orchestrates four specialized agents that simulate a complete software development lifecycle with formal verification:

Agent Role Output
Architect Decomposes problems into formal specifications Pydantic types, pre/postconditions, Dafny/Lean4 specs
Developer Implements code, performs structured repair Working implementation + repair traces
Tester Generates tests, captures LDB runtime traces Test suites + variable-state traces at basic blocks
Security Auditor SAST scanning + OWASP review Vulnerability findings with CWE IDs

Verification Stack (3 Silos)

  1. Python Contracts — Pydantic type validation → Hypothesis property testing → sympy symbolic invariants
  2. Dafny — Formal proofs with postcondition checking
  3. Lean4 — Theorem-level formal verification

Seed Generation Pipeline

  • Knowledge Miner — 170+ built-in seeds from RFCs, GoF/enterprise patterns, popular APIs, ADR templates
  • OSS-Instruct — 74 curated code snippets across Python, Rust, C++, Go, TypeScript → teacher LLM → novel problems
  • Evol-Instruct — 5 evolution heuristics + Tree-of-Evolution branching with quality gates

Reward System

  • VeRPO Dense Rewards — 5-dimensional scoring: test pass rate, property satisfaction, proof strength, security compliance, efficiency
  • CodeScaler — Execution-free reward model for pre-filtering (predicts VeRPO score without running verification)

Cross-Language Alignment

Every verified Python solution is translated to Rust and C++ from the same formal spec, with verification applied to each translation.

Epistemic Provenance

Every trajectory chunk is tagged with source, confidence, verification_level, and a provenance_chain tracking how the data was generated and verified.

Training Pipeline

Optimized for Google Colab T4 GPU (14 GB VRAM budget):

Stage Method Details
SFT Unsloth + QLoRA 4-bit NF4, rank=64, DoRA, gradient checkpointing, FlashAttention-2
RLVR GRPO VeRPO dense rewards, KL-constrained policy optimization

Base Model

Qwen2.5-Coder-3B-Instruct — selected for highest logic density per parameter after RFA data training.

VRAM Budget

Component GB
Base model (4-bit NF4) 1.8
LoRA adapters (r=64) 0.4
Optimizer (AdamW 8-bit) 0.8
Gradients (checkpointed) 0.8
Activations (batch=1, seq=16384) 8.0
KV cache 1.5
Overhead 0.7
Total 14.0

Recursive Self-Improvement

After initial training on 10k trajectories, the trained model replaces the teacher LLM to generate the next 100k+:

  1. Load trained model as local vLLM backend
  2. Generate new trajectories using the model as teacher
  3. Pre-filter with CodeScaler reward model (threshold > 0.7)
  4. Full verification only on top candidates
  5. Merge verified trajectories with original dataset
  6. Retrain on combined data
  7. Repeat until convergence

Target Benchmarks

Benchmark Category
SWE-bench Verified Real-world issue resolution
SWE-bench Pro Advanced software engineering
CyberGym Security-focused coding
Terminal-Bench 2.0 Terminal/CLI tasks
SWE-bench Multimodal Visual + code reasoning
SWE-bench Multilingual Multi-language SE

Project Structure

rfa_factory/
├── rfa_factory/
│   ├── config.py                 # Central configuration (Pydantic)
│   ├── llm/                      # Multi-provider LLM abstraction
│   │   ├── base.py               # BaseLLM, CostTracker, RateLimiter
│   │   ├── anthropic_provider.py
│   │   ├── openai_provider.py
│   │   └── local.py              # vLLM/Ollama
│   ├── agents/                   # Multi-agent system
│   │   ├── base.py               # BaseAgent, AgentMessage, EpistemicTag
│   │   ├── architect.py          # FormalSpec generation
│   │   ├── developer.py          # Implementation + repair
│   │   ├── tester.py             # Test generation + LDB traces
│   │   └── security_auditor.py   # bandit + semgrep + OWASP
│   ├── topology/
│   │   └── mall.py               # Multi-Agent Logic Loop orchestrator
│   ├── seeds/                    # Knowledge mining + evolution
│   │   ├── miner.py              # 170+ built-in seed templates
│   │   ├── oss_instruct.py       # 74 code snippets → novel problems
│   │   └── evol_instruct.py      # 5 heuristics + Tree-of-Evolution
│   ├── verification/             # Formal verification silos
│   │   ├── python_contracts.py   # Pydantic + Hypothesis + sympy
│   │   ├── dafny_verifier.py     # Dafny formal proofs
│   │   ├── lean4_verifier.py     # Lean4 formal proofs
│   │   └── sandbox.py            # Execution sandbox + LDB tracing
│   ├── rewards/
│   │   ├── verpo.py              # VeRPO 5-dimensional dense rewards
│   │   └── codescaler.py         # Execution-free reward model
│   ├── trajectory/
│   │   ├── serializer.py         # JSONL + HuggingFace Dataset export
│   │   └── epistemic.py          # Epistemic provenance tagging
│   ├── crosslang/
│   │   └── translator.py         # Python → Rust/C++ translation
│   └── training/
│       ├── data_loader.py        # SFT/DPO/RLVR dataset preparation
│       ├── trainer.py            # Unsloth QLoRA + GRPO
│       └── evaluator.py          # Benchmark runners
├── notebooks/
│   ├── 01_Data_Generation.ipynb
│   ├── 02_Training.ipynb
│   └── 03_Evaluation.ipynb
├── tests/
├── pyproject.toml
├── requirements.txt
├── CHANGELOG.md
├── CONTRIBUTING.md
├── CODE_OF_CONDUCT.md
└── LICENSE

LLM Provider Support

Provider Models Notes
Anthropic Claude Sonnet 4, Claude Opus 4 Recommended for data generation
OpenAI GPT-4o, GPT-4o-mini Alternative provider
Local Any vLLM/Ollama model Free — used in recursive self-improvement

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Citation

If you use RFA Factory in your research, please cite:

@software{rfa_factory_2026,
  title  = {RFA Factory: Recursive Formal Alignment Factory for Code Model Training},
  author = {Edmon02},
  year   = {2026},
  url    = {https://github.com/Edmon02/rfa-factory}
}

License

This project is licensed under the MIT License — see LICENSE for details.

About

Recursive Formal Alignment Factory — Autonomous pipeline for generating ultra-high-density vericoding trajectories and training SOTA code models on a single T4 GPU. Multi-agent verification, formal proofs, dense rewards, recursive self-improvement.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors