This repository is for validating a Simula-style synthetic data framework before production integration. The goal is to reproduce the core mechanism-design ideas from the Simula research line: treat dataset construction as a controllable system across independent axes of coverage, complexity, and quality.
This phase is research-first and validation-focused.
- Build and evaluate the generation mechanism, not a full production platform.
- Verify that decomposition into independent control axes produces measurable gains.
- Establish reproducible experimentation and decision gates for promotion.
The validation pipeline is designed around four generation stages and one evaluation stage:
- Global diversification: build hierarchical taxonomies to map domain coverage.
- Local diversification: produce diverse instantiations within each taxonomy concept.
- Complexification: raise difficulty for a controlled fraction of samples.
- Dual-critic quality checks: independently verify correctness and reject low-quality samples.
- Evaluation: compute coverage, complexity calibration, and quality metrics for run decisions.
flowchart TD
domainObjective[DomainObjective] --> globalDiversification[GlobalDiversificationTaxonomy]
globalDiversification --> localDiversification[LocalDiversificationMetaPrompts]
localDiversification --> complexification[ComplexificationStage]
complexification --> dualCritic[DualCriticQualityChecks]
dualCritic --> curatedDataset[CuratedSyntheticDataset]
curatedDataset --> metricsEval[CoverageComplexityQualityEvaluation]
metricsEval --> validationDecision[ValidationGateDecision]
validationDecision --> iterationLoop[IterationOrPromotion]
Use the structured docs index first:
Domain language anchors:
The exact scripts and commands will be added as code lands. Use this staged flow for the first end-to-end validation cycle:
- Define target domain and taxonomy depth/branching policy.
- Generate taxonomy and inspect node coverage map.
- Generate local instantiations from taxonomy nodes.
- Apply complexification policy to the configured sample fraction.
- Run dual-critic checks and regenerate rejected samples.
- Compute evaluation metrics and compare against baseline/ablations.
- Fill run report template and decide: iterate or promote.
Initial validation is complete when all of the following are true:
- Coverage, complexity, and quality are each measurable with explicit metrics.
- At least one full baseline and one ablation matrix are executed and reported.
- Run artifacts are reproducible from stored config, seed, and model metadata.
- A validation gate decision is made with documented evidence and trade-offs.
The primary research reference for this repository is: