RFA Factory

Recursive Formal Alignment Factory — Autonomous Vericoding Trajectory Generation & Code Model Training

Overview

RFA Factory is an end-to-end pipeline that generates ultra-high-density vericoding trajectories using a Multi-Agent Logic Loop (MALL) and then trains a small but powerful code model (1B-3B parameters) that is competitive with much larger models on real-world software engineering benchmarks.

The entire pipeline — data generation, training, and evaluation — runs on a single T4 GPU (16 GB VRAM) via Google Colab Pro.

Seed Problem → Architect (formal spec) → Developer (implementation)
    → Verification Loop (Python contracts → Dafny → Lean4)
    → Tester (with LDB runtime tracing) → Security Audit
    → Cross-Language (Rust + C++) → VeRPO Rewards → Trajectory

Each trajectory captures the complete correction process — from specification through multiple failure/critique/repair cycles to verified solution — producing data with dramatically higher information density than standard code datasets.

Quick Start

1-Click Colab (Recommended)

Notebook	Description	Link
01 — Data Generation	Run the full RFA pipeline to generate 50k-150k trajectories
02 — Training	QLoRA fine-tune + RLVR stage on generated data
03 — Evaluation	Run all benchmarks automatically

Local Installation

git clone https://github.com/Edmon02/rfa-factory.git
cd rfa-factory
python -m venv .venv && source .venv/bin/activate
pip install -e ".[all]"

Or install components separately:

pip install -e ".[llm]"          # LLM providers (Anthropic, OpenAI)
pip install -e ".[verification]" # Hypothesis, sympy
pip install -e ".[training]"     # PyTorch, Unsloth, QLoRA
pip install -e ".[security]"     # Bandit, Semgrep

Configuration

cp .env.example .env
# Edit .env with your API keys

export RFA_LLM_PROVIDER=anthropic
export RFA_LLM_API_KEY=your-key-here
export RFA_LLM_MODEL=claude-sonnet-4-20250514

Python API

import asyncio
from rfa_factory.config import RFAConfig
from rfa_factory.llm import create_llm
from rfa_factory.seeds.miner import KnowledgeMiner
from rfa_factory.topology.mall import MALLOrchestrator
from rfa_factory.trajectory.serializer import TrajectorySerializer

async def main():
    config = RFAConfig()
    llm = create_llm(config.llm)
    miner = KnowledgeMiner(llm)
    seeds = await miner.mine_all()
    orchestrator = MALLOrchestrator(config, llm)
    trajectories = await orchestrator.run_batch(seeds)
    serializer = TrajectorySerializer(config.trajectory_output_dir)
    serializer.save_trajectories(trajectories)

asyncio.run(main())

Architecture

Multi-Agent Logic Loop (MALL)

The MALL orchestrates four specialized agents that simulate a complete software development lifecycle with formal verification:

Agent	Role	Output
Architect	Decomposes problems into formal specifications	Pydantic types, pre/postconditions, Dafny/Lean4 specs
Developer	Implements code, performs structured repair	Working implementation + repair traces
Tester	Generates tests, captures LDB runtime traces	Test suites + variable-state traces at basic blocks
Security Auditor	SAST scanning + OWASP review	Vulnerability findings with CWE IDs

Verification Stack (3 Silos)

Python Contracts — Pydantic type validation → Hypothesis property testing → sympy symbolic invariants
Dafny — Formal proofs with postcondition checking
Lean4 — Theorem-level formal verification

Seed Generation Pipeline

Knowledge Miner — 170+ built-in seeds from RFCs, GoF/enterprise patterns, popular APIs, ADR templates
OSS-Instruct — 74 curated code snippets across Python, Rust, C++, Go, TypeScript → teacher LLM → novel problems
Evol-Instruct — 5 evolution heuristics + Tree-of-Evolution branching with quality gates

Reward System

VeRPO Dense Rewards — 5-dimensional scoring: test pass rate, property satisfaction, proof strength, security compliance, efficiency
CodeScaler — Execution-free reward model for pre-filtering (predicts VeRPO score without running verification)

Cross-Language Alignment

Every verified Python solution is translated to Rust and C++ from the same formal spec, with verification applied to each translation.

Epistemic Provenance

Every trajectory chunk is tagged with source, confidence, verification_level, and a provenance_chain tracking how the data was generated and verified.

Training Pipeline

Optimized for Google Colab T4 GPU (14 GB VRAM budget):

Stage	Method	Details
SFT	Unsloth + QLoRA	4-bit NF4, rank=64, DoRA, gradient checkpointing, FlashAttention-2
RLVR	GRPO	VeRPO dense rewards, KL-constrained policy optimization

Base Model

Qwen2.5-Coder-3B-Instruct — selected for highest logic density per parameter after RFA data training.

VRAM Budget

Component	GB
Base model (4-bit NF4)	1.8
LoRA adapters (r=64)	0.4
Optimizer (AdamW 8-bit)	0.8
Gradients (checkpointed)	0.8
Activations (batch=1, seq=16384)	8.0
KV cache	1.5
Overhead	0.7
Total	14.0

Recursive Self-Improvement

After initial training on 10k trajectories, the trained model replaces the teacher LLM to generate the next 100k+:

Load trained model as local vLLM backend
Generate new trajectories using the model as teacher
Pre-filter with CodeScaler reward model (threshold > 0.7)
Full verification only on top candidates
Merge verified trajectories with original dataset
Retrain on combined data
Repeat until convergence

Target Benchmarks

Benchmark	Category
SWE-bench Verified	Real-world issue resolution
SWE-bench Pro	Advanced software engineering
CyberGym	Security-focused coding
Terminal-Bench 2.0	Terminal/CLI tasks
SWE-bench Multimodal	Visual + code reasoning
SWE-bench Multilingual	Multi-language SE

Project Structure

rfa_factory/
├── rfa_factory/
│   ├── config.py                 # Central configuration (Pydantic)
│   ├── llm/                      # Multi-provider LLM abstraction
│   │   ├── base.py               # BaseLLM, CostTracker, RateLimiter
│   │   ├── anthropic_provider.py
│   │   ├── openai_provider.py
│   │   └── local.py              # vLLM/Ollama
│   ├── agents/                   # Multi-agent system
│   │   ├── base.py               # BaseAgent, AgentMessage, EpistemicTag
│   │   ├── architect.py          # FormalSpec generation
│   │   ├── developer.py          # Implementation + repair
│   │   ├── tester.py             # Test generation + LDB traces
│   │   └── security_auditor.py   # bandit + semgrep + OWASP
│   ├── topology/
│   │   └── mall.py               # Multi-Agent Logic Loop orchestrator
│   ├── seeds/                    # Knowledge mining + evolution
│   │   ├── miner.py              # 170+ built-in seed templates
│   │   ├── oss_instruct.py       # 74 code snippets → novel problems
│   │   └── evol_instruct.py      # 5 heuristics + Tree-of-Evolution
│   ├── verification/             # Formal verification silos
│   │   ├── python_contracts.py   # Pydantic + Hypothesis + sympy
│   │   ├── dafny_verifier.py     # Dafny formal proofs
│   │   ├── lean4_verifier.py     # Lean4 formal proofs
│   │   └── sandbox.py            # Execution sandbox + LDB tracing
│   ├── rewards/
│   │   ├── verpo.py              # VeRPO 5-dimensional dense rewards
│   │   └── codescaler.py         # Execution-free reward model
│   ├── trajectory/
│   │   ├── serializer.py         # JSONL + HuggingFace Dataset export
│   │   └── epistemic.py          # Epistemic provenance tagging
│   ├── crosslang/
│   │   └── translator.py         # Python → Rust/C++ translation
│   └── training/
│       ├── data_loader.py        # SFT/DPO/RLVR dataset preparation
│       ├── trainer.py            # Unsloth QLoRA + GRPO
│       └── evaluator.py          # Benchmark runners
├── notebooks/
│   ├── 01_Data_Generation.ipynb
│   ├── 02_Training.ipynb
│   └── 03_Evaluation.ipynb
├── tests/
├── pyproject.toml
├── requirements.txt
├── CHANGELOG.md
├── CONTRIBUTING.md
├── CODE_OF_CONDUCT.md
└── LICENSE

LLM Provider Support

Provider	Models	Notes
Anthropic	Claude Sonnet 4, Claude Opus 4	Recommended for data generation
OpenAI	GPT-4o, GPT-4o-mini	Alternative provider
Local	Any vLLM/Ollama model	Free — used in recursive self-improvement

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Citation

If you use RFA Factory in your research, please cite:

@software{rfa_factory_2026,
  title  = {RFA Factory: Recursive Formal Alignment Factory for Code Model Training},
  author = {Edmon02},
  year   = {2026},
  url    = {https://github.com/Edmon02/rfa-factory}
}

License

This project is licensed under the MIT License — see LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RFA Factory

Overview

Quick Start

1-Click Colab (Recommended)

Local Installation

Configuration

Python API

Architecture

Multi-Agent Logic Loop (MALL)

Verification Stack (3 Silos)

Seed Generation Pipeline

Reward System

Cross-Language Alignment

Epistemic Provenance

Training Pipeline

Base Model

VRAM Budget

Recursive Self-Improvement

Target Benchmarks

Project Structure

LLM Provider Support

Contributing

Citation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
notebooks		notebooks
rfa_factory		rfa_factory
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

RFA Factory

Overview

Quick Start

1-Click Colab (Recommended)

Local Installation

Configuration

Python API

Architecture

Multi-Agent Logic Loop (MALL)

Verification Stack (3 Silos)

Seed Generation Pipeline

Reward System

Cross-Language Alignment

Epistemic Provenance

Training Pipeline

Base Model

VRAM Budget

Recursive Self-Improvement

Target Benchmarks

Project Structure

LLM Provider Support

Contributing

Citation

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages