Skip to content

S4b: fair MO cross-library harness — shared ported SBX + aligned NSGA-II (milestone-12a)#16

Merged
CooperBigFoot merged 2 commits into
mainfrom
milestone-12a/s4b-mo-harness
Jun 26, 2026
Merged

S4b: fair MO cross-library harness — shared ported SBX + aligned NSGA-II (milestone-12a)#16
CooperBigFoot merged 2 commits into
mainfrom
milestone-12a/s4b-mo-harness

Conversation

@CooperBigFoot

Copy link
Copy Markdown
Contributor

s4b-mo-harness (milestone-12a) — fair MO cross-library harness

ctrl-freak nsga2 vs pymoo NSGA2 vs DEAP (selNSGA2), configured to run the IDENTICAL algorithm. Returns normalized raw output (extracted NON-DOMINATED front objectives + per-generation history); computes NO metrics (s5 does).

Files

  • benchmarks/harness/multi_objective.py (NEW): the three NSGA-II adapters + run_mo. Imports the shared ported SBX from benchmarks.harness.operators (created by s4a — read-only here).
  • tests/benchmarks/test_mo_harness.py (NEW).

Alignment (Option C — see PLAN.md decision)

  • SBX: all three run ctrl-freak's exact ported SBX via make_pymoo_ctrl_freak_sbx (custom pymoo Crossover) and deap_ctrl_freak_sbx (DEAP mate); ctrl-freak uses native sbx_crossover. Identical crossover → comparison isolates the NSGA-II loop.
  • PM: pymoo PM(prob=1.0, prob_var=1/n_var), DEAP mutPolynomialBounded(indpb=1/n_var) every offspring = ctrl-freak. (The legacy per-individual PM(prob=1/n_var) ≈3% was the real cause of the old "pymoo ZDT3 scatter" — fixed.)
  • Eval budget identical: exactly 25,100 evaluate() calls all three (100 + 250×100), verified by independent counter; pymoo driven via OO setup/NoTermination/next() to fix the gen-1 off-by-one.
  • Front extraction: shared non_dominated_sort==0 mask; every reported front is non-dominated; ZDT3 clean.
  • DEAP creator uses distinct FitnessMinMO/IndividualMO (per-n_obj weights), coexists with s4a's SO classes.

Parity payoff (IGD+, seeds 0–2 median, full 25,100 budget)

problem ctrl-freak pymoo DEAP
zdt1 0.0060 0.0060 0.0059
zdt2 0.0056 0.0061 0.0056
zdt3 0.0032 0.0033 0.0030
dtlz2 0.0562 0.0582 0.0540

With the identical ported SBX, the three coincide within seed noise on the convex/standard fronts (the ~70% gap under stock-0.5 SBX is gone). zdt4/zdt6 stay far/high-variance for all three (multimodal n=30); s5's 30-seed overlapping-variance test adjudicates.

Acceptance (verified twice — critic + orchestrator)

  • pytest test_mo_harness.py --no-cov: 22 passed. Doctests: 5 passed.
  • ruff check, ty check src/: clean. full uv run pytest: 506 passed @ 98.89%.

Plan converged via planner↔critic over 2 rounds (critic reproduced the IGD+ collapse, the live ported operator (100 core calls/run), eval counts, extraction, and creator coexistence by execution).

@CooperBigFoot

Copy link
Copy Markdown
Contributor Author

Adversarial review — APPROVE ✅ (after F1 fix)

Fresh reviewer verified by execution: ported SBX live in both adapters (100 core calls each; _CtrlFreakSBX / deap_ctrl_freak_sbx, stock operators absent), 25,100-eval equality (independent counter), the IGD+ parity collapse reproduced to the digit (zdt1 0.0060/0.0060/0.0059), ZDT3 front 100% non-dominated (N6 confirmed — not a plotting artifact), DEAP FitnessMinMO/IndividualMO per-n_obj weights coexist with s4a. CI-safe (ty gate src/ only; importorskip).

F1 (resolved in 144378d): the unplanned full-budget test_default_eval_counts_equal_25100 (~4.8s/CI run) was removed — test_eval_counts_equal_and_match_formula already pins 25,100 via config-fixed defaults. Subset now 21 passed in 0.29s; full suite 505 @ 98.89%.

Recommend merge.

@CooperBigFoot CooperBigFoot merged commit b93780b into main Jun 26, 2026
4 checks passed
@CooperBigFoot CooperBigFoot deleted the milestone-12a/s4b-mo-harness branch June 26, 2026 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant