Claude/code review discussion 9w5pnt by 50thycal · Pull Request #136 · cobusgreyling/loop-engineering

50thycal · 2026-07-03T16:25:32Z

Summary

Changes

New pattern or starter (followed templates/pattern-template.md + updated registry.yaml)
Doc / example improvement
Tool change (loop-audit)
Story (includes real failure or surprise + lesson)

Checklist (from CONTRIBUTING)

All required sections present for patterns
Links work from README, patterns/README, starters/README, docs/index
No secrets, tokens, internal company URLs
STATE.md* examples use .example suffix
Safety-related content references docs/safety.md
Ran node tools/loop-audit/dist/cli.js . (or on the starter) and addressed findings

Testing / Dogfood

loop-audit passes on affected starters or this repo
Manual review of generated state / skill output

Screenshots / Examples (if UI or command output)

This template enforces the high bar this reference is known for.

…ne safely Rebuilds the viral 'loop engineering for quant trading' architecture the way this repo insists: paper-only, report-first, and with a verifier that is real math instead of an LLM opining on a backtest. The article's fatal flaw was framing the maker/checker verifier as a second agent asked whether a backtest looks good. A backtest's failure mode is overfitting, which a second opinion cannot catch. This starter's checker is numerical and non-overridable: out-of-sample split, deflated Sharpe vs n_trials, Probabilistic Sharpe >= 0.95, drawdown cap, and an IS->OOS degradation guard. Includes: - engine/ — runnable pure-stdlib five-stage loop (zero deps, offline-capable) - skills/ — maker + checker procedure manuals - test_engine.py — 9 passing correctness tests - LOOP.md, README, state example wired into starters/ index - stories/quant-loop-the-verifier-problem.md teaching artifact On default synthetic (random-walk) data the loop correctly REJECTS and refuses to trade: IS Sharpe +2.54, OOS Sharpe -4.33. The refusal is the product. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

Adds campaign mode (--search) with the two structural guards that make an auto-search loop safe to turn on — the fixes for self-deception when you are less in the loop: #1 Enforced trial counting (engine/search.py + ledger.py) Every candidate the grid evaluates ticks a counter; it is persisted in research-ledger.json and accumulates ACROSS cycles, then feeds the deflated Sharpe gate. You cannot search 1,000 configs and claim n_trials=1 — the loop counts for you, permanently. #2 Three-way split + write-once lockbox (engine/split.py + ledger.py + verifier.py) Data splits train/validation/lockbox. Search optimizes on train, ranks on validation; the lockbox is opened exactly once, on the winner only. The ledger fingerprints the lockbox and BLOCKS any re-open — re-peeking is self-deception, so the loop refuses. Demonstration on no-edge synthetic data: search finds a winner with validation Sharpe 6.27 (overfit), lockbox opens once and shows Sharpe -9.22 -> REJECT; a second cycle on the same data is BLOCKED; cumulative trials rise across cycles so the deflated bar keeps climbing. The search will always find a beautiful in-sample winner; the lockbox is what stops you believing it. Tests: 14/14 pass (added split, ledger accumulation, write-once, enforced counter, and overfit-winner-rejection). Repo validate gates still pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

Trading target: BTC/USDT spot, daily (1d) bars. - BTC/USDT: deepest liquidity keeps the 5+5bps cost assumption realistic - spot: no funding/leverage/liquidation; strategy is long/flat only - daily: the Donchian 20/55 breakout is the classic Turtle system, built for 1d Fixes a real bug: annualization was hardcoded to hourly (24*365) everywhere, so daily bars would inflate every Sharpe ~5x. Adds PERIODS_PER_YEAR map and a --timeframe flag (1h/4h/1d) that drives both the data fetch interval and Sharpe annualization together. Daily campaign run now yields sane numbers (validation Sharpe 1.28, lockbox -1.88 -> REJECT) instead of the inflated hourly ones. Docs note the US Binance geo-block (point data.py at Binance.US/Coinbase). Tests 14/14, repo validate gates pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

Exchange APIs (Binance.US, Coinbase) are blocked by this environment's egress policy, but raw.githubusercontent.com is allowed — so the real-data source is the Coin Metrics community dataset (daily reference price). Adds: - engine/data.py: from_coinmetrics() with robust chunked read (tolerates the egress size cap via IncompleteRead salvage); to_csv() writer; get_ohlcv source 'coinmetrics' with quote-suffix normalization (BTCUSDT -> btc) - sample-data/btc_1d_coinmetrics.csv: committed real BTC daily snapshot (~2010-2020) so the loop runs on real data offline / in CI - --source coinmetrics wired into the CLI - test + docs Coin Metrics gives daily close, not OHLC, so the breakout runs as Donchian-on-close (standard daily variant). On a user's own machine, --source live uses Binance OHLCV (US users point data.py at Binance.US/Coinbase). Result on REAL BTC daily: Turtle 20/10 shows validation Sharpe 2.4 and +335% lockbox return, but the lockbox REJECTS it — after the 35-config deflated-Sharpe penalty (bar 1.73 vs 1.47) and a 47% drawdown it fails the honest gates. A naive backtest ships it; the lockbox does not. Tests 15/15, repo validate gates pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

Adds --walkforward mode: the gold-standard time-series test. Re-optimizes the grid on a rolling (or anchored) in-sample window and scores each next out-of-sample fold across the whole history. Two gates, both required: - Consistency (K-of-N): at least K folds clear a per-fold Sharpe/drawdown gate, so a strategy that only worked in one regime fails. - Aggregate honesty: pool all folds' OOS returns and require the combined curve to beat the deflated benchmark for ALL trials (N folds x grid), clear PSR >= 0.95, and stay under the drawdown cap. Every fold's search ticks the enforced trial counter and accumulates in the ledger, so re-optimizing N times raises the deflated bar N-fold. On real BTC daily the Turtle breakout shows pooled OOS Sharpe 1.91 (beats the deflated bar, PSR 1.0) yet is REJECTED: only 2/5 folds pass because 3 folds had 36-65% drawdowns. An aggregate-only test green-lights it; the K-of-N gate vetoes it. That disagreement is exactly the added value over a single lockbox. Tests 18/18, repo validate gates pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

…ifier) Adds --vol-target: size the position by target_vol/realized_vol so risk is roughly constant (hold less in violent regimes, more in calm ones), capped at --max-leverage (1.0 = spot, no borrow). Threaded through every mode via a merged base_params, and generate_signals now accepts periods_per_year for correct annualization of the vol target. No look-ahead: realized vol at bar t uses returns ending at t. On real BTC daily walk-forward this is a structural win: consistency 2/5 -> 5/5, pooled OOS Sharpe 1.91 -> 2.34, pooled drawdown 65% -> 28%, per-fold drawdowns 65/52/24/36/20% -> 28/24/14/13/12%. Lower risk targeting generalizes to any future data, so it is not curve-fit. But it is kept HONEST: at the a-priori 0.40 default it is still REJECTED, missing the aggregate drawdown cap by 3 points (28% vs 25%). A lower target passes, but sweeping target_vol by hand and reporting the value that clears the gate is uncounted multiple testing — the enforced counter tracks the grid, not the researcher's own experimentation. Documented as the trap it is; the real verdict can only come from forward data (cobusgreyling#5). Tests 20/20, repo validate gates pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

… + forward quarantine (cobusgreyling#5) cobusgreyling#4 --trial-budget N: the loop halts searching once cumulative trials reach N. An autonomous loop that searches forever turns the whole dataset into in-sample data; the budget is the alpha-spending cap that forces a stop. Checked before each run (a run may overshoot); once spent, further searches halt and point to forward-testing or new data. engine/ledger.py budget_exhausted(). cobusgreyling#5 --forward-test: carve the newest slice into a quarantine window the search, walk-forward, and lockbox never touch. Research on the earlier window, then forward-test the survivor on the held-out tail. Forward performance gates capital, not the backtest. engine/quarantine.py. Approval requires research AND forward to pass. Each forward window is spent after --max-forward-evals tests (the lockbox lesson, applied to forward data: testing 100 strategies on one tail just relocates the multiple-testing problem). Real BTC demonstration (vol-targeted breakout, 0.40 default): research REJECTs (aggregate drawdown), but the forward out-of-time window actually PASSes cleanly (Sharpe 1.38, +94%, 18% DD on unseen data) — yet the strategy is NOT approved, because approval needs both gates. No single lucky result is sufficient. All five hardening steps (#1-cobusgreyling#5) now implemented. Tests 23/23, repo gates pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

Adds --strategy selector and a pluggable registry (engine/strategy.py). Every strategy is daily-close, long/flat, no look-ahead; the vol-target overlay, walk-forward, lockbox, forward quarantine, and budget all work on top unchanged. New hypotheses alongside the donchian breakout: - tsmom: time-series momentum (long when price > trailing R-bar mean) — trend - meanrev: short-term mean reversion (long when oversold, exit at mean) — counter-trend - regime: trend gated by a calm-volatility regime (price > long SMA AND short vol < long vol) — conditional trend, aimed at the drawdown constraint Bake-off on real BTC (vol-targeted, full gauntlet): - meanrev fails everywhere (Sharpe -0.58, 51% forward DD) — falling knives, as predicted; short-term reversion is not a standalone edge in BTC. - donchian/tsmom pass forward, fail research on drawdown (~37-38%); correlated. - regime is the standout: pooled DD 37%->26%, forward Sharpe 1.66 / 9% DD, but still misses honest research approval by one point (26% vs 25% cap). Honesty preserved: testing 4 strategies is 4x selection on top of each grid; no tuning to force a pass. Grids kept small (DOF discipline, enforced by a test). Tests 26/26, repo validate gates pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

…test Fetches the complete Coin Metrics BTC history via chunked HTTP Range requests, defeating the egress size cap that previously truncated the download at ~2020. engine/data.py from_coinmetrics now stitches ranged chunks (falls back to a single GET if the server ignores Range). Bundled snapshot refreshed to daily close 2010-07-18 -> 2026-05-23 (5789 bars). This enables the capstone: research each strategy on 2010-2020, then forward-test on 2020-2026 — data NO research, tuning, or bake-off ever touched. Result (the whole project in one table): - regime PASSED honest research for the first time (5/5 folds, 14% DD) but FAILED the true out-of-time test (37% DD on 2020-2026). Research success did not survive out-of-time. - Every strategy made big returns 2020-2026 (tsmom +531%) but with 37-41% drawdowns and sub-bar Sharpes: beta to a bull market, not alpha. - meanrev dead everywhere; trend family correlated. - Verdict: none approvable. The harness refused to dress beta as alpha. Docs updated to the committed 2010-2026 snapshot (walk-forward, vol-target A/B, forward quarantine, capstone all reproduce). Tests 26/26, repo gates pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

Teaching artifact capturing the full arc: build the anti-self-deception guards, try to beat them on real BTC, and get honestly rejected by out-of-time data. Key lesson: a strategy (regime) passed honest walk-forward research on 2010-2020 yet failed on unseen 2020-2026; every strategy 'made money' in the bull market but none had risk-adjusted edge. The harness's value is the 'no' you wouldn't have said yourself. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

…t negative) Investigation into signals orthogonal to price. Plumbs Coin Metrics on-chain features (mvrv, adract, txcnt) through the data layer: Bar gains a features dict, from_csv/to_csv carry extra numeric columns, from_coinmetrics attaches them, and the bundled snapshot is re-generated with them. stats.median added. Two MVRV hypotheses (market value / realized value = price vs network cost basis): - mvrv: contrarian valuation timing. Result: bad — 0/5 walk-forward, 65-81% drawdowns. 'Cheap' MVRV in a crash gets cheaper, so it buys falling knives and holds down. Real information used badly. - trendval: regime trend + an MVRV euphoria brake (step aside when overvalued). The principled use of orthogonal info against the drawdown constraint. Brake is wired (fires ~18% of days) but out-of-time produced the SAME 37% drawdown as plain regime — no measurable edge added. Honest negative result: on-chain valuation sounds like it must help, but neither formulation cleared the bar or improved the best price strategy on unseen 2020-2026 data. That is the harness working: a compelling narrative is not evidence. Stopping at 6 strategies rather than tuning MVRV until something passes. Tests 28/28, repo validate gates pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

Adds multi-asset infrastructure and the first hypothesis with genuine statistical signal: - engine/multi_data.py: fetch/align a multi-asset price panel from Coin Metrics (12-coin universe), cache to sample-data/crypto_panel.csv - engine/xsectional.py: long-only top-K cross-sectional momentum portfolio with costs, optional portfolio-level vol targeting, and its own K-of-N walk-forward (enforced trial counting, deflated Sharpe, PSR, drawdown gates) Result on 2010-2026: cross-sectional momentum BEATS the deflated-Sharpe bar with PSR 1.0 — real relative-strength edge, unlike every single-asset trend/on-chain strategy. But it is REJECTED on drawdown (93% raw, 77% vol-targeted): crypto momentum crashes are too violent for trailing-vol scaling to cap at 25%. Two caveats dominate and are documented prominently: 1. Survivorship bias — the universe is coins that survived to 2026; dead coins and rugs are absent, inflating results, and the forward test cannot fix it. 2. The 25% drawdown cap is a risk preference; relaxing it post-hoc to force a pass is goalpost-moving. Real edge found; approvable strategy not yet. Tests 30/30, repo gates pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

…rget) The prior synthetic vol was low enough that target_vol/realized_vol capped at 1.0, making vol targeting a no-op and the assertion fail. Use target_vol=0.1 (well below the basket's realized vol) so the overlay must scale exposure. 30/30 pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

…est landing) Under a pre-registered 40% drawdown mandate, refines the cross-sectional momentum signal two ways: - long_short: market-neutral (short weakest, long strongest). Result: BACKFIRED (Sharpe 0.96 -> 0.07, forward -62%). Crypto's short leg is toxic — weakest coins short-squeeze; momentum edge is asymmetric and long-only. - market_filter: go to cash when the market proxy (BTC) is below its trend SMA. Result: HELPED — drawdown 68% -> 44-47%, walk-forward Sharpe up to ~1.2, PSR 1.0. Out-of-time ~36-43% DD depending on window. Landing: cross-sectional momentum + market risk-off + vol targeting is real, repeatable edge (beats deflated bar, PSR 1.0) that lands CLOSE to a 40% drawdown mandate but does not cleanly clear it out-of-sample. Stopped tweaking here: more knob-twisting until one clears 40% is uncounted multiple testing, the exact trap the engine exists to expose. Honest next steps documented (fix survivorship; forward paper-trade one pre-registered strategy). Walk-forward drawdown mandate defaults set to the pre-registered 40%/50%. Tests 31/31, repo gates pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

The 12-coin cross-sectional universe was hand-picked survivors. Corrects it by expanding to a 32-coin universe that includes coins which pumped and collapsed (FTT -100% in the FTX blowup, BTG -100%, BSV -97%, XVG -99%), with point-in-time eligibility (a coin is only rankable on days it has a price). - engine/data.py + multi_data.py: defensive loaders skip assets lacking a USD price series instead of crashing. - sample-data/crypto_panel_expanded.csv: 32-coin panel including the collapses. Same strategy (long-only x-sec momentum + market risk-off + vol target), both universes: survivor-12 : walk-fwd Sharpe 1.28, 4/5, out-of-time Sharpe 1.09 / 36% DD / +691% expanded-32 : walk-fwd Sharpe 0.94, 2/5, out-of-time Sharpe 0.78 / 51% DD / +304% Survivorship inflated Sharpe ~30-40% and hid ~15 points of drawdown. Momentum did buy the collapses and eat them. PSR stays 1.0 (the relative-strength signal is real) but on a realistic universe it is a 51%-drawdown reject — and still an upper bound, since truly delisted coins are excluded. Real edge, honestly deflated, not approvable. Tests 32/32, repo gates pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

The research is over. Freezes the one signal with real edge (cross-sectional momentum + market risk-off + vol targeting, survivorship-corrected 32-coin universe) and paper-trades it forward — the honest end of the road. - forward-registration.json: the frozen contract, written ONCE (committed). - engine/forward_paper.py: --register (write-once; refuses to overwrite = pre-registration integrity), --run (marks the frozen strategy to market, no re-optimization; forward paper equity is the verdict), --since (illustrative replay, clearly labelled, not the live record). Proper loop-engineering loop: scheduled, stateful (quant-forward-state.md), paper-only. - PREREGISTRATION.md: the human-readable commitment device and rules. Committed data ends 2026-05-23, so a today-dated registration is correctly 'awaiting forward data' until a live feed adds bars. Sobering illustrative check: replaying the frozen config on 2024-2025 lost money (~0.85x equity, negative Sharpe, ~48% drawdown) — recent regimes have been hard for momentum, which is exactly why forward (not backtest) decides. Mandate max drawdown 40%; corrected historical was ~51%, so forward is the unbiased tiebreaker, not a foregone pass. Tests 33/33, repo gates pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

…strategy The consolidated scorecard showed the humble single-asset regime trend is the better HONEST bet than the exotic cross-sectional strategy once survivorship is accounted for. Generalizes the forward paper harness from one strategy to a registry supporting two kinds: - xsectional (multi-asset panel via xsectional.portfolio_returns) - single_asset (BTC via strategy.generate_signals + backtest) Both are frozen write-once in forward-registration.json (per-name pre-registration integrity; --run marks all to market; no re-optimization). Old flat registration format migrates automatically. Illustrative 2024-2025 head-to-head (NOT the live record) vindicates the review: xsectional : 0.85x equity, -0.02 Sharpe, 48% DD -> mandate BREACHED regime : 1.48x equity, 0.79 Sharpe, 27% DD -> WITHIN mandate The simple strategy crushed the complex one on recent unseen data. Forward time still decides, but registering regime alongside was clearly right. Tests 33/33, repo gates pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

Adds engine/blotter.py — a magnifying glass on the equity curve. Slices the exact daily returns of a single-asset strategy into round-trips (entry when exposure goes from flat to on, exit when it returns to flat), attributing PnL by compounding each span's daily contribution net of costs. No re-simulation: the product of all trade PnLs reconciles to the backtest's final equity to 1e-9 (tested). Per trade: entry/exit date+price, holding days, avg exposure, raw price move, net PnL (%), and $ on a fixed 10k stake. Roll-up stats: win rate, avg win/loss, profit factor, expectancy, avg hold, best/worst. Reads the frozen forward config so the blotter matches what is paper-traded. On regime-trend it exposes the trend-follower profile the curve hides: 33% win rate but profit factor 1.82 (avg win +17.5% vs avg loss -4.75%, best +71%) — a few big BTC trends carry it, most trades are small scratches. Essential context before deploying: you must be wrong 2/3 of the time and still hold. Cross-sectional per-coin blotter is the next addition. Tests 36/36, gates pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

Adds xsectional.portfolio_trades — instruments the basket portfolio to emit each coin's holding spans and its contribution to the portfolio's arithmetic return, net of the market-filter and vol-target overlays. Reconciles exactly: sum(contributions) - total_costs == sum(portfolio_returns(...)) to 1e-9 (tested). blotter.py now dispatches by strategy kind (single_asset -> round-trips, xsectional -> per-coin roll-up) and shows which coins made or lost the money. The per-coin view makes the survivorship drag concrete: on the 32-coin basket, XEM +103%, DASH +62%, ETH +53% carried it while XVG (the -99% collapse) cost the portfolio -27% and ALGO -13% -- momentum bought the pumps and rode them down. Tests 37/37, repo gates pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

Adds the recurring 'is it actually making money since we started' loop: - forward_paper --refresh: pulls latest prices from the live source into a gitignored data/ cache before evaluating, so the loop checks prices on a cadence without churning the committed snapshots (falls back to snapshot on network error). - .github/workflows/quant-forward-track.yml: daily cron that runs --run --refresh, marks both frozen strategies to market, and commits the forward P&L record. Self- suppresses (no commit) until new price data exists. Activates on merge to default. - quant-forward-state.md / quant-forward-log.md are now committed (durable in-git track record); data/ live caches stay gitignored. Honest current state: the free source (Coin Metrics) and the exchange-blocked sandbox both end 2026-05-23 = the registration date, so the record reads 'awaiting data' until newer bars publish. The machinery is correct and starts recording real forward P&L the moment data flows (a merge, or a live exchange feed on the user's own infra). Tests 37/37, repo gates pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

…hesis tracking Turns the forward loop into a hosted, always-running platform that checks prices on a schedule, scores every thesis, and persists the record — so you can iterate, add strategies, and keep tracking until something is truly profitable. - engine/service.py: stdlib web service + background scheduler. Runs the forward check every CHECK_INTERVAL_SECONDS, persists state/log/scoreboard to QUANT_DATA_DIR, serves a live status page (/), /scoreboard.json, and /health. Statuses per thesis: within_mandate / breached / awaiting_data. - forward_paper.py: QUANT_DATA_DIR env so the record persists to a Volume (not the repo); committed registration acts as a seed; auto_register_pending() write-once registers any new thesis on boot. - Dockerfile + railway.json + DEPLOY-RAILWAY.md: one-click-ish Railway deploy (mount a Volume at /data). Pure stdlib, no DB required. Extensibility: add a thesis to FROZEN_STRATEGIES, redeploy -> the service auto-registers it (write-once, stamped at the latest data date) and tracks it alongside the others. Registration stays write-once per name; revising a thesis means a NEW name with a NEW start date. Tests 38/38, repo gates pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

…ilder The Railway deploy failed with 'Railpack could not determine how to build the app' because the merged commit predated the Dockerfile and Railpack (Railway's default builder) found no Python marker. Rather than depend on the Dockerfile being picked up (fragile: needs the right commit + builder setting + root dir), add the two signals the default builder needs: - requirements.txt: pure-stdlib marker (no deps) so Railpack/Nixpacks detects Python - Procfile: 'web: python -m engine.service' start command Now it builds natively under Railpack OR the Dockerfile. DEPLOY-RAILWAY.md updated: set QUANT_DATA_DIR=/data (required for persistence without the Dockerfile ENV) and mount a Volume there; redeploy latest main if a build predates these files. Tests 38/38, service boots via the Procfile command, repo gates pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

…a adapter Adds engine/coinbase.py — the public Coinbase Exchange candles adapter (no API key): hourly (or any 1m..1d granularity) with backward pagination and a to_daily resampler. This is the current, granular feed that unblocks the forward tracker (the daily/lagging Coin Metrics source cannot) and is the foundation for intraday strategies. - data.get_ohlcv gains source='coinbase' (+ research --source coinbase); graceful synthetic fallback where exchange egress is blocked. - forward_paper: regime-trend now carries source=coinbase (BTC-USD, daily candles), so on Railway its forward record fills with live prices; falls back to the committed snapshot in sandboxes that block exchanges (reads 'awaiting data'). - xsectional stays on the Coin Metrics panel for now (its survivorship-corrected universe includes delisted coins an exchange won't serve). Verified with fixtures/mocks (parse order, daily resample, pagination + dedup); exchange APIs are blocked in this sandbox so live calls fall back cleanly. Hourly data is ready for the next (intraday) strategy build. Tests 41/41, repo gates pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

PR #3's audit check failed because the new starter scored 31 (L0), under the starter L1 gate of 38 — loop-audit runs only on PRs to main, so this was its first audit. Adds the standard loop-engineering scaffolding it scores for (all genuine for this loop, not padding): - STATE.md: live loop state (+18, and required for L1) - AGENTS.md: build/test/layout/review norms (+9) - loop-budget.md + loop-run-log.md: cost caps, kill switches, run history - docs/safety.md: paper-only, kill switches, no-relitigation, egress scope - LOOP.md: worktree isolation + MCP-not-required notes Starter now scores 74 (L1); full CI audit gate passes (all starters >= 38, reference 100). Tests 41/41, validate gates pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UcE4n3gQdVXJtD2z3mBrZX

cobusgreyling · 2026-07-03T19:00:55Z

Triage — thanks, not mergeable as-is

@50thycal This is substantial, well-written work. The two stories are exactly the kind of honest failure content this repo wants, and the quant starter’s numerical verifier discipline is a strong dogfooding example. Closing for scope and checklist gaps — not a rejection of the idea.

What’s strong

Stories (quant-loop-the-verifier-problem.md, quant-loop-out-of-time.md) — real surprises, metrics, actionable lessons. These fit CONTRIBUTING.md step 2 and could merge on their own.
Starter README — clear paper-only posture, explains why LLM-as-verifier fails for backtests.
Engine design — OOS split, trial counting, walk-forward; aligns with loop-engineering safety principles.

Blockers (why we’re closing)

PR template unfilled — no summary, checklist unchecked, no loop-audit evidence.
No pattern registration — new domain starters that illustrate a loop pattern need a patterns/*.md entry + patterns/registry.yaml update (see pattern-template.md).
STATE.md committed live — repo rule: state examples use .example suffix; live STATE.md should be gitignored in starters.
Repo bloat — three ~5.8k-line CSV snapshots (~17k lines). Prefer a fetch script + one small fixture, or document download steps (see how other starters stay lean).
.github/workflows/quant-forward-track.yml — daily contents: write auto-commits to main from a fork PR. That’s a maintainer/security decision we can’t take from an unsolicited workflow. If we add this later, it needs a separate maintainer-owned PR with explicit gates.
Not wired into loop-init — other starters are synced via tools/loop-init/ and scripts/check-loop-init-sync.mjs.
CI never ran — fork PR workflows stayed action_required (needs maintainer approval); no audit/validate signal.

Suggested path (smallest merges first)

PR	Contents	Effort
A	Stories only + `stories/README.md` row	~15 min review
B	Starter without workflow, `quant-state.md.example`, trim CSVs/fetch script	~half day
C	Pattern doc + `registry.yaml` + optional `loop-init` wiring	maintainer pairs with you

If you want to continue, open PR A first (stories-only) and comment here — happy to fast-track per maintainer response.

Thanks again — the verifier-problem framing is a great addition to the corpus.

cobusgreyling · 2026-07-03T19:01:03Z

Closing per triage above — please resubmit stories-only (PR A) when ready.

cobusgreyling · 2026-07-03T19:04:54Z

Stories landed in #137 (merged) — thanks @50thycal! Starter remains on your fork; open a trimmed PR B whenever you're ready.

Cherry-pick stories-only from PR #136 (closed for scope). Links point to the contributor's reference starter on their fork until a trimmed starter lands in-repo. Co-authored-by: Cobus Greyling <cobusgreyling@Cobuss-MacBook-Pro-2.local>

claude added 22 commits June 26, 2026 02:58

50thycal requested a review from cobusgreyling as a code owner July 3, 2026 16:25

claude added 2 commits July 3, 2026 16:38

cobusgreyling closed this Jul 3, 2026

cobusgreyling mentioned this pull request Jul 3, 2026

stories: quant loop failure stories from @50thycal #137

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude/code review discussion 9w5pnt#136

Claude/code review discussion 9w5pnt#136
50thycal wants to merge 24 commits into
cobusgreyling:mainfrom
50thycal:claude/code-review-discussion-9w5pnt

50thycal commented Jul 3, 2026

Uh oh!

cobusgreyling commented Jul 3, 2026

Uh oh!

cobusgreyling commented Jul 3, 2026

Uh oh!

cobusgreyling commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

50thycal commented Jul 3, 2026

Summary

Changes

Checklist (from CONTRIBUTING)

Testing / Dogfood

Screenshots / Examples (if UI or command output)

Uh oh!

cobusgreyling commented Jul 3, 2026

Triage — thanks, not mergeable as-is

What’s strong

Blockers (why we’re closing)

Suggested path (smallest merges first)

Uh oh!

cobusgreyling commented Jul 3, 2026

Uh oh!

cobusgreyling commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants