Skip to content

Claude/code review discussion 9w5pnt#136

Closed
50thycal wants to merge 24 commits into
cobusgreyling:mainfrom
50thycal:claude/code-review-discussion-9w5pnt
Closed

Claude/code review discussion 9w5pnt#136
50thycal wants to merge 24 commits into
cobusgreyling:mainfrom
50thycal:claude/code-review-discussion-9w5pnt

Conversation

@50thycal

@50thycal 50thycal commented Jul 3, 2026

Copy link
Copy Markdown

Summary

Changes

  • New pattern or starter (followed templates/pattern-template.md + updated registry.yaml)
  • Doc / example improvement
  • Tool change (loop-audit)
  • Story (includes real failure or surprise + lesson)

Checklist (from CONTRIBUTING)

  • All required sections present for patterns
  • Links work from README, patterns/README, starters/README, docs/index
  • No secrets, tokens, internal company URLs
  • STATE.md* examples use .example suffix
  • Safety-related content references docs/safety.md
  • Ran node tools/loop-audit/dist/cli.js . (or on the starter) and addressed findings

Testing / Dogfood

  • loop-audit passes on affected starters or this repo
  • Manual review of generated state / skill output

Screenshots / Examples (if UI or command output)


This template enforces the high bar this reference is known for.

claude added 22 commits June 26, 2026 02:58
…ne safely

Rebuilds the viral 'loop engineering for quant trading' architecture the way
this repo insists: paper-only, report-first, and with a verifier that is real
math instead of an LLM opining on a backtest.

The article's fatal flaw was framing the maker/checker verifier as a second
agent asked whether a backtest looks good. A backtest's failure mode is
overfitting, which a second opinion cannot catch. This starter's checker is
numerical and non-overridable: out-of-sample split, deflated Sharpe vs n_trials,
Probabilistic Sharpe >= 0.95, drawdown cap, and an IS->OOS degradation guard.

Includes:
- engine/ — runnable pure-stdlib five-stage loop (zero deps, offline-capable)
- skills/ — maker + checker procedure manuals
- test_engine.py — 9 passing correctness tests
- LOOP.md, README, state example wired into starters/ index
- stories/quant-loop-the-verifier-problem.md teaching artifact

On default synthetic (random-walk) data the loop correctly REJECTS and refuses
to trade: IS Sharpe +2.54, OOS Sharpe -4.33. The refusal is the product.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
Adds campaign mode (--search) with the two structural guards that make an
auto-search loop safe to turn on — the fixes for self-deception when you are
less in the loop:

#1 Enforced trial counting (engine/search.py + ledger.py)
   Every candidate the grid evaluates ticks a counter; it is persisted in
   research-ledger.json and accumulates ACROSS cycles, then feeds the deflated
   Sharpe gate. You cannot search 1,000 configs and claim n_trials=1 — the loop
   counts for you, permanently.

#2 Three-way split + write-once lockbox (engine/split.py + ledger.py + verifier.py)
   Data splits train/validation/lockbox. Search optimizes on train, ranks on
   validation; the lockbox is opened exactly once, on the winner only. The
   ledger fingerprints the lockbox and BLOCKS any re-open — re-peeking is
   self-deception, so the loop refuses.

Demonstration on no-edge synthetic data: search finds a winner with validation
Sharpe 6.27 (overfit), lockbox opens once and shows Sharpe -9.22 -> REJECT;
a second cycle on the same data is BLOCKED; cumulative trials rise across cycles
so the deflated bar keeps climbing. The search will always find a beautiful
in-sample winner; the lockbox is what stops you believing it.

Tests: 14/14 pass (added split, ledger accumulation, write-once, enforced
counter, and overfit-winner-rejection). Repo validate gates still pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
Trading target: BTC/USDT spot, daily (1d) bars.
- BTC/USDT: deepest liquidity keeps the 5+5bps cost assumption realistic
- spot: no funding/leverage/liquidation; strategy is long/flat only
- daily: the Donchian 20/55 breakout is the classic Turtle system, built for 1d

Fixes a real bug: annualization was hardcoded to hourly (24*365) everywhere, so
daily bars would inflate every Sharpe ~5x. Adds PERIODS_PER_YEAR map and a
--timeframe flag (1h/4h/1d) that drives both the data fetch interval and Sharpe
annualization together. Daily campaign run now yields sane numbers (validation
Sharpe 1.28, lockbox -1.88 -> REJECT) instead of the inflated hourly ones.

Docs note the US Binance geo-block (point data.py at Binance.US/Coinbase).
Tests 14/14, repo validate gates pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
Exchange APIs (Binance.US, Coinbase) are blocked by this environment's egress
policy, but raw.githubusercontent.com is allowed — so the real-data source is the
Coin Metrics community dataset (daily reference price). Adds:

- engine/data.py: from_coinmetrics() with robust chunked read (tolerates the
  egress size cap via IncompleteRead salvage); to_csv() writer; get_ohlcv source
  'coinmetrics' with quote-suffix normalization (BTCUSDT -> btc)
- sample-data/btc_1d_coinmetrics.csv: committed real BTC daily snapshot (~2010-2020)
  so the loop runs on real data offline / in CI
- --source coinmetrics wired into the CLI
- test + docs

Coin Metrics gives daily close, not OHLC, so the breakout runs as Donchian-on-close
(standard daily variant). On a user's own machine, --source live uses Binance
OHLCV (US users point data.py at Binance.US/Coinbase).

Result on REAL BTC daily: Turtle 20/10 shows validation Sharpe 2.4 and +335%
lockbox return, but the lockbox REJECTS it — after the 35-config deflated-Sharpe
penalty (bar 1.73 vs 1.47) and a 47% drawdown it fails the honest gates. A naive
backtest ships it; the lockbox does not.

Tests 15/15, repo validate gates pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
Adds --walkforward mode: the gold-standard time-series test. Re-optimizes the
grid on a rolling (or anchored) in-sample window and scores each next
out-of-sample fold across the whole history. Two gates, both required:

- Consistency (K-of-N): at least K folds clear a per-fold Sharpe/drawdown gate,
  so a strategy that only worked in one regime fails.
- Aggregate honesty: pool all folds' OOS returns and require the combined curve
  to beat the deflated benchmark for ALL trials (N folds x grid), clear PSR >=
  0.95, and stay under the drawdown cap.

Every fold's search ticks the enforced trial counter and accumulates in the
ledger, so re-optimizing N times raises the deflated bar N-fold.

On real BTC daily the Turtle breakout shows pooled OOS Sharpe 1.91 (beats the
deflated bar, PSR 1.0) yet is REJECTED: only 2/5 folds pass because 3 folds had
36-65% drawdowns. An aggregate-only test green-lights it; the K-of-N gate vetoes
it. That disagreement is exactly the added value over a single lockbox.

Tests 18/18, repo validate gates pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
…ifier)

Adds --vol-target: size the position by target_vol/realized_vol so risk is
roughly constant (hold less in violent regimes, more in calm ones), capped at
--max-leverage (1.0 = spot, no borrow). Threaded through every mode via a merged
base_params, and generate_signals now accepts periods_per_year for correct
annualization of the vol target. No look-ahead: realized vol at bar t uses
returns ending at t.

On real BTC daily walk-forward this is a structural win: consistency 2/5 -> 5/5,
pooled OOS Sharpe 1.91 -> 2.34, pooled drawdown 65% -> 28%, per-fold drawdowns
65/52/24/36/20% -> 28/24/14/13/12%. Lower risk targeting generalizes to any
future data, so it is not curve-fit.

But it is kept HONEST: at the a-priori 0.40 default it is still REJECTED, missing
the aggregate drawdown cap by 3 points (28% vs 25%). A lower target passes, but
sweeping target_vol by hand and reporting the value that clears the gate is
uncounted multiple testing — the enforced counter tracks the grid, not the
researcher's own experimentation. Documented as the trap it is; the real verdict
can only come from forward data (cobusgreyling#5).

Tests 20/20, repo validate gates pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
… + forward quarantine (cobusgreyling#5)

cobusgreyling#4 --trial-budget N: the loop halts searching once cumulative trials reach N.
An autonomous loop that searches forever turns the whole dataset into in-sample
data; the budget is the alpha-spending cap that forces a stop. Checked before
each run (a run may overshoot); once spent, further searches halt and point to
forward-testing or new data. engine/ledger.py budget_exhausted().

cobusgreyling#5 --forward-test: carve the newest slice into a quarantine window the search,
walk-forward, and lockbox never touch. Research on the earlier window, then
forward-test the survivor on the held-out tail. Forward performance gates
capital, not the backtest. engine/quarantine.py. Approval requires research AND
forward to pass. Each forward window is spent after --max-forward-evals tests
(the lockbox lesson, applied to forward data: testing 100 strategies on one tail
just relocates the multiple-testing problem).

Real BTC demonstration (vol-targeted breakout, 0.40 default): research REJECTs
(aggregate drawdown), but the forward out-of-time window actually PASSes cleanly
(Sharpe 1.38, +94%, 18% DD on unseen data) — yet the strategy is NOT approved,
because approval needs both gates. No single lucky result is sufficient.

All five hardening steps (#1-cobusgreyling#5) now implemented. Tests 23/23, repo gates pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
Adds --strategy selector and a pluggable registry (engine/strategy.py). Every
strategy is daily-close, long/flat, no look-ahead; the vol-target overlay,
walk-forward, lockbox, forward quarantine, and budget all work on top unchanged.

New hypotheses alongside the donchian breakout:
- tsmom:   time-series momentum (long when price > trailing R-bar mean) — trend
- meanrev: short-term mean reversion (long when oversold, exit at mean) — counter-trend
- regime:  trend gated by a calm-volatility regime (price > long SMA AND short vol
           < long vol) — conditional trend, aimed at the drawdown constraint

Bake-off on real BTC (vol-targeted, full gauntlet):
- meanrev fails everywhere (Sharpe -0.58, 51% forward DD) — falling knives, as
  predicted; short-term reversion is not a standalone edge in BTC.
- donchian/tsmom pass forward, fail research on drawdown (~37-38%); correlated.
- regime is the standout: pooled DD 37%->26%, forward Sharpe 1.66 / 9% DD, but
  still misses honest research approval by one point (26% vs 25% cap).

Honesty preserved: testing 4 strategies is 4x selection on top of each grid; no
tuning to force a pass. Grids kept small (DOF discipline, enforced by a test).

Tests 26/26, repo validate gates pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
…test

Fetches the complete Coin Metrics BTC history via chunked HTTP Range requests,
defeating the egress size cap that previously truncated the download at ~2020.
engine/data.py from_coinmetrics now stitches ranged chunks (falls back to a
single GET if the server ignores Range). Bundled snapshot refreshed to daily
close 2010-07-18 -> 2026-05-23 (5789 bars).

This enables the capstone: research each strategy on 2010-2020, then forward-test
on 2020-2026 — data NO research, tuning, or bake-off ever touched.

Result (the whole project in one table):
- regime PASSED honest research for the first time (5/5 folds, 14% DD) but FAILED
  the true out-of-time test (37% DD on 2020-2026). Research success did not
  survive out-of-time.
- Every strategy made big returns 2020-2026 (tsmom +531%) but with 37-41%
  drawdowns and sub-bar Sharpes: beta to a bull market, not alpha.
- meanrev dead everywhere; trend family correlated.
- Verdict: none approvable. The harness refused to dress beta as alpha.

Docs updated to the committed 2010-2026 snapshot (walk-forward, vol-target A/B,
forward quarantine, capstone all reproduce). Tests 26/26, repo gates pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
Teaching artifact capturing the full arc: build the anti-self-deception guards,
try to beat them on real BTC, and get honestly rejected by out-of-time data. Key
lesson: a strategy (regime) passed honest walk-forward research on 2010-2020 yet
failed on unseen 2020-2026; every strategy 'made money' in the bull market but
none had risk-adjusted edge. The harness's value is the 'no' you wouldn't have
said yourself.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
…t negative)

Investigation into signals orthogonal to price. Plumbs Coin Metrics on-chain
features (mvrv, adract, txcnt) through the data layer: Bar gains a features dict,
from_csv/to_csv carry extra numeric columns, from_coinmetrics attaches them, and
the bundled snapshot is re-generated with them. stats.median added.

Two MVRV hypotheses (market value / realized value = price vs network cost basis):
- mvrv: contrarian valuation timing. Result: bad — 0/5 walk-forward, 65-81%
  drawdowns. 'Cheap' MVRV in a crash gets cheaper, so it buys falling knives and
  holds down. Real information used badly.
- trendval: regime trend + an MVRV euphoria brake (step aside when overvalued).
  The principled use of orthogonal info against the drawdown constraint. Brake is
  wired (fires ~18% of days) but out-of-time produced the SAME 37% drawdown as
  plain regime — no measurable edge added.

Honest negative result: on-chain valuation sounds like it must help, but neither
formulation cleared the bar or improved the best price strategy on unseen
2020-2026 data. That is the harness working: a compelling narrative is not
evidence. Stopping at 6 strategies rather than tuning MVRV until something passes.

Tests 28/28, repo validate gates pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
Adds multi-asset infrastructure and the first hypothesis with genuine statistical
signal:
- engine/multi_data.py: fetch/align a multi-asset price panel from Coin Metrics
  (12-coin universe), cache to sample-data/crypto_panel.csv
- engine/xsectional.py: long-only top-K cross-sectional momentum portfolio with
  costs, optional portfolio-level vol targeting, and its own K-of-N walk-forward
  (enforced trial counting, deflated Sharpe, PSR, drawdown gates)

Result on 2010-2026: cross-sectional momentum BEATS the deflated-Sharpe bar with
PSR 1.0 — real relative-strength edge, unlike every single-asset trend/on-chain
strategy. But it is REJECTED on drawdown (93% raw, 77% vol-targeted): crypto
momentum crashes are too violent for trailing-vol scaling to cap at 25%.

Two caveats dominate and are documented prominently:
1. Survivorship bias — the universe is coins that survived to 2026; dead coins and
   rugs are absent, inflating results, and the forward test cannot fix it.
2. The 25% drawdown cap is a risk preference; relaxing it post-hoc to force a pass
   is goalpost-moving.

Real edge found; approvable strategy not yet. Tests 30/30, repo gates pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
…rget)

The prior synthetic vol was low enough that target_vol/realized_vol capped at 1.0,
making vol targeting a no-op and the assertion fail. Use target_vol=0.1 (well
below the basket's realized vol) so the overlay must scale exposure. 30/30 pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
…est landing)

Under a pre-registered 40% drawdown mandate, refines the cross-sectional momentum
signal two ways:
- long_short: market-neutral (short weakest, long strongest). Result: BACKFIRED
  (Sharpe 0.96 -> 0.07, forward -62%). Crypto's short leg is toxic — weakest coins
  short-squeeze; momentum edge is asymmetric and long-only.
- market_filter: go to cash when the market proxy (BTC) is below its trend SMA.
  Result: HELPED — drawdown 68% -> 44-47%, walk-forward Sharpe up to ~1.2, PSR 1.0.
  Out-of-time ~36-43% DD depending on window.

Landing: cross-sectional momentum + market risk-off + vol targeting is real,
repeatable edge (beats deflated bar, PSR 1.0) that lands CLOSE to a 40% drawdown
mandate but does not cleanly clear it out-of-sample. Stopped tweaking here: more
knob-twisting until one clears 40% is uncounted multiple testing, the exact trap
the engine exists to expose. Honest next steps documented (fix survivorship;
forward paper-trade one pre-registered strategy).

Walk-forward drawdown mandate defaults set to the pre-registered 40%/50%.
Tests 31/31, repo gates pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
The 12-coin cross-sectional universe was hand-picked survivors. Corrects it by
expanding to a 32-coin universe that includes coins which pumped and collapsed
(FTT -100% in the FTX blowup, BTG -100%, BSV -97%, XVG -99%), with point-in-time
eligibility (a coin is only rankable on days it has a price).

- engine/data.py + multi_data.py: defensive loaders skip assets lacking a USD
  price series instead of crashing.
- sample-data/crypto_panel_expanded.csv: 32-coin panel including the collapses.

Same strategy (long-only x-sec momentum + market risk-off + vol target), both
universes:
  survivor-12  : walk-fwd Sharpe 1.28, 4/5, out-of-time Sharpe 1.09 / 36% DD / +691%
  expanded-32  : walk-fwd Sharpe 0.94, 2/5, out-of-time Sharpe 0.78 / 51% DD / +304%

Survivorship inflated Sharpe ~30-40% and hid ~15 points of drawdown. Momentum did
buy the collapses and eat them. PSR stays 1.0 (the relative-strength signal is
real) but on a realistic universe it is a 51%-drawdown reject — and still an upper
bound, since truly delisted coins are excluded. Real edge, honestly deflated, not
approvable.

Tests 32/32, repo gates pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
The research is over. Freezes the one signal with real edge (cross-sectional
momentum + market risk-off + vol targeting, survivorship-corrected 32-coin
universe) and paper-trades it forward — the honest end of the road.

- forward-registration.json: the frozen contract, written ONCE (committed).
- engine/forward_paper.py: --register (write-once; refuses to overwrite =
  pre-registration integrity), --run (marks the frozen strategy to market, no
  re-optimization; forward paper equity is the verdict), --since (illustrative
  replay, clearly labelled, not the live record). Proper loop-engineering loop:
  scheduled, stateful (quant-forward-state.md), paper-only.
- PREREGISTRATION.md: the human-readable commitment device and rules.

Committed data ends 2026-05-23, so a today-dated registration is correctly
'awaiting forward data' until a live feed adds bars. Sobering illustrative check:
replaying the frozen config on 2024-2025 lost money (~0.85x equity, negative
Sharpe, ~48% drawdown) — recent regimes have been hard for momentum, which is
exactly why forward (not backtest) decides.

Mandate max drawdown 40%; corrected historical was ~51%, so forward is the
unbiased tiebreaker, not a foregone pass. Tests 33/33, repo gates pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
…strategy

The consolidated scorecard showed the humble single-asset regime trend is the
better HONEST bet than the exotic cross-sectional strategy once survivorship is
accounted for. Generalizes the forward paper harness from one strategy to a
registry supporting two kinds:
- xsectional  (multi-asset panel via xsectional.portfolio_returns)
- single_asset (BTC via strategy.generate_signals + backtest)

Both are frozen write-once in forward-registration.json (per-name pre-registration
integrity; --run marks all to market; no re-optimization). Old flat registration
format migrates automatically.

Illustrative 2024-2025 head-to-head (NOT the live record) vindicates the review:
  xsectional : 0.85x equity, -0.02 Sharpe, 48% DD -> mandate BREACHED
  regime     : 1.48x equity,  0.79 Sharpe, 27% DD -> WITHIN mandate
The simple strategy crushed the complex one on recent unseen data. Forward time
still decides, but registering regime alongside was clearly right.

Tests 33/33, repo gates pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
Adds engine/blotter.py — a magnifying glass on the equity curve. Slices the exact
daily returns of a single-asset strategy into round-trips (entry when exposure
goes from flat to on, exit when it returns to flat), attributing PnL by compounding
each span's daily contribution net of costs. No re-simulation: the product of all
trade PnLs reconciles to the backtest's final equity to 1e-9 (tested).

Per trade: entry/exit date+price, holding days, avg exposure, raw price move, net
PnL (%), and $ on a fixed 10k stake. Roll-up stats: win rate, avg win/loss, profit
factor, expectancy, avg hold, best/worst. Reads the frozen forward config so the
blotter matches what is paper-traded.

On regime-trend it exposes the trend-follower profile the curve hides: 33% win
rate but profit factor 1.82 (avg win +17.5% vs avg loss -4.75%, best +71%) — a few
big BTC trends carry it, most trades are small scratches. Essential context before
deploying: you must be wrong 2/3 of the time and still hold.

Cross-sectional per-coin blotter is the next addition. Tests 36/36, gates pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
Adds xsectional.portfolio_trades — instruments the basket portfolio to emit each
coin's holding spans and its contribution to the portfolio's arithmetic return,
net of the market-filter and vol-target overlays. Reconciles exactly:
sum(contributions) - total_costs == sum(portfolio_returns(...)) to 1e-9 (tested).

blotter.py now dispatches by strategy kind (single_asset -> round-trips,
xsectional -> per-coin roll-up) and shows which coins made or lost the money.

The per-coin view makes the survivorship drag concrete: on the 32-coin basket,
XEM +103%, DASH +62%, ETH +53% carried it while XVG (the -99% collapse) cost the
portfolio -27% and ALGO -13% -- momentum bought the pumps and rode them down.

Tests 37/37, repo gates pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
Adds the recurring 'is it actually making money since we started' loop:
- forward_paper --refresh: pulls latest prices from the live source into a
  gitignored data/ cache before evaluating, so the loop checks prices on a cadence
  without churning the committed snapshots (falls back to snapshot on network error).
- .github/workflows/quant-forward-track.yml: daily cron that runs --run --refresh,
  marks both frozen strategies to market, and commits the forward P&L record. Self-
  suppresses (no commit) until new price data exists. Activates on merge to default.
- quant-forward-state.md / quant-forward-log.md are now committed (durable in-git
  track record); data/ live caches stay gitignored.

Honest current state: the free source (Coin Metrics) and the exchange-blocked
sandbox both end 2026-05-23 = the registration date, so the record reads 'awaiting
data' until newer bars publish. The machinery is correct and starts recording real
forward P&L the moment data flows (a merge, or a live exchange feed on the user's
own infra).

Tests 37/37, repo gates pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
…hesis tracking

Turns the forward loop into a hosted, always-running platform that checks prices
on a schedule, scores every thesis, and persists the record — so you can iterate,
add strategies, and keep tracking until something is truly profitable.

- engine/service.py: stdlib web service + background scheduler. Runs the forward
  check every CHECK_INTERVAL_SECONDS, persists state/log/scoreboard to
  QUANT_DATA_DIR, serves a live status page (/), /scoreboard.json, and /health.
  Statuses per thesis: within_mandate / breached / awaiting_data.
- forward_paper.py: QUANT_DATA_DIR env so the record persists to a Volume (not the
  repo); committed registration acts as a seed; auto_register_pending() write-once
  registers any new thesis on boot.
- Dockerfile + railway.json + DEPLOY-RAILWAY.md: one-click-ish Railway deploy
  (mount a Volume at /data). Pure stdlib, no DB required.

Extensibility: add a thesis to FROZEN_STRATEGIES, redeploy -> the service
auto-registers it (write-once, stamped at the latest data date) and tracks it
alongside the others. Registration stays write-once per name; revising a thesis
means a NEW name with a NEW start date.

Tests 38/38, repo gates pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
…ilder

The Railway deploy failed with 'Railpack could not determine how to build the app'
because the merged commit predated the Dockerfile and Railpack (Railway's default
builder) found no Python marker. Rather than depend on the Dockerfile being picked
up (fragile: needs the right commit + builder setting + root dir), add the two
signals the default builder needs:
- requirements.txt: pure-stdlib marker (no deps) so Railpack/Nixpacks detects Python
- Procfile: 'web: python -m engine.service' start command

Now it builds natively under Railpack OR the Dockerfile. DEPLOY-RAILWAY.md updated:
set QUANT_DATA_DIR=/data (required for persistence without the Dockerfile ENV) and
mount a Volume there; redeploy latest main if a build predates these files.

Tests 38/38, service boots via the Procfile command, repo gates pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
@50thycal 50thycal requested a review from cobusgreyling as a code owner July 3, 2026 16:25
claude added 2 commits July 3, 2026 16:38
…a adapter

Adds engine/coinbase.py — the public Coinbase Exchange candles adapter (no API
key): hourly (or any 1m..1d granularity) with backward pagination and a to_daily
resampler. This is the current, granular feed that unblocks the forward tracker
(the daily/lagging Coin Metrics source cannot) and is the foundation for intraday
strategies.

- data.get_ohlcv gains source='coinbase' (+ research --source coinbase); graceful
  synthetic fallback where exchange egress is blocked.
- forward_paper: regime-trend now carries source=coinbase (BTC-USD, daily candles),
  so on Railway its forward record fills with live prices; falls back to the
  committed snapshot in sandboxes that block exchanges (reads 'awaiting data').
- xsectional stays on the Coin Metrics panel for now (its survivorship-corrected
  universe includes delisted coins an exchange won't serve).

Verified with fixtures/mocks (parse order, daily resample, pagination + dedup);
exchange APIs are blocked in this sandbox so live calls fall back cleanly. Hourly
data is ready for the next (intraday) strategy build.

Tests 41/41, repo gates pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
PR #3's audit check failed because the new starter scored 31 (L0), under the
starter L1 gate of 38 — loop-audit runs only on PRs to main, so this was its first
audit. Adds the standard loop-engineering scaffolding it scores for (all genuine
for this loop, not padding):
- STATE.md: live loop state (+18, and required for L1)
- AGENTS.md: build/test/layout/review norms (+9)
- loop-budget.md + loop-run-log.md: cost caps, kill switches, run history
- docs/safety.md: paper-only, kill switches, no-relitigation, egress scope
- LOOP.md: worktree isolation + MCP-not-required notes

Starter now scores 74 (L1); full CI audit gate passes (all starters >= 38,
reference 100). Tests 41/41, validate gates pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX
@cobusgreyling

Copy link
Copy Markdown
Owner

Triage — thanks, not mergeable as-is

@50thycal This is substantial, well-written work. The two stories are exactly the kind of honest failure content this repo wants, and the quant starter’s numerical verifier discipline is a strong dogfooding example. Closing for scope and checklist gaps — not a rejection of the idea.

What’s strong

  • Stories (quant-loop-the-verifier-problem.md, quant-loop-out-of-time.md) — real surprises, metrics, actionable lessons. These fit CONTRIBUTING.md step 2 and could merge on their own.
  • Starter README — clear paper-only posture, explains why LLM-as-verifier fails for backtests.
  • Engine design — OOS split, trial counting, walk-forward; aligns with loop-engineering safety principles.

Blockers (why we’re closing)

  1. PR template unfilled — no summary, checklist unchecked, no loop-audit evidence.
  2. No pattern registration — new domain starters that illustrate a loop pattern need a patterns/*.md entry + patterns/registry.yaml update (see pattern-template.md).
  3. STATE.md committed live — repo rule: state examples use .example suffix; live STATE.md should be gitignored in starters.
  4. Repo bloat — three ~5.8k-line CSV snapshots (~17k lines). Prefer a fetch script + one small fixture, or document download steps (see how other starters stay lean).
  5. .github/workflows/quant-forward-track.yml — daily contents: write auto-commits to main from a fork PR. That’s a maintainer/security decision we can’t take from an unsolicited workflow. If we add this later, it needs a separate maintainer-owned PR with explicit gates.
  6. Not wired into loop-init — other starters are synced via tools/loop-init/ and scripts/check-loop-init-sync.mjs.
  7. CI never ran — fork PR workflows stayed action_required (needs maintainer approval); no audit/validate signal.

Suggested path (smallest merges first)

PR Contents Effort
A Stories only + stories/README.md row ~15 min review
B Starter without workflow, quant-state.md.example, trim CSVs/fetch script ~half day
C Pattern doc + registry.yaml + optional loop-init wiring maintainer pairs with you

If you want to continue, open PR A first (stories-only) and comment here — happy to fast-track per maintainer response.

Thanks again — the verifier-problem framing is a great addition to the corpus.

@cobusgreyling

Copy link
Copy Markdown
Owner

Closing per triage above — please resubmit stories-only (PR A) when ready.

@cobusgreyling

Copy link
Copy Markdown
Owner

Stories landed in #137 (merged) — thanks @50thycal! Starter remains on your fork; open a trimmed PR B whenever you're ready.

cobusgreyling added a commit that referenced this pull request Jul 3, 2026
Cherry-pick stories-only from PR #136 (closed for scope). Links point
to the contributor's reference starter on their fork until a trimmed
starter lands in-repo.

Co-authored-by: Cobus Greyling <cobusgreyling@Cobuss-MacBook-Pro-2.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants