The research ladder

The Research tab is the honest scoreboard. It takes every backtest you’ve run and rolls the gauntlet stages into a single view: for each strategy × symbol × timeframe, a 0–100 readiness score and a clear picture of which hard gates it has — or hasn’t — cleared. It’s served by GET /api/research/confidence and cached in Redis.

What a ladder row shows

Strategy / symbol / timeframe — the unit being judged (Daily or Intraday).
Readiness score (0–100) — a combined score across the gauntlet rungs.
A ⭐ — awarded only when the strategy has cleared all the hard gates (below).
Buy & hold % and excess — always shown, so a high score that doesn’t beat buying the instrument is obvious.
The stages — the status of each rung (rule test, held-out Sharpe, walk-forward OOS, deflated Sharpe), plus visible-but-not-gating signals (Monte Carlo, cross-sectional breadth).

The hard gates (the ⭐)

A strategy is “ready” only when it clears, in order:

Rule test significant — the real Sharpe beats the 95th percentile of random-entry twins.
Held-out Sharpe > 0.5 — on the optimizer’s untouched test window.
Walk-forward out-of-sample Sharpe > 0.5 — stitched across rolling windows.
Deflated Sharpe > ~0.95 — corrected for how many variants were tried.

These bars are deliberately strict — a positive rule test and a positive held-out Sharpe together are not enough (the “ORB lesson”). Each rung documents, on the tab itself, exactly what it protects against; the full explanation is in the gauntlet.

Breadth is a bonus, not a gate

Cross-sectional breadth — does the edge show up across many independent names? — strengthens the score but never gates it. So a genuinely asset-specific edge (a structural pair, a calendar effect) can still qualify, while a one-name “winner” is treated with suspicion. See multi-instrument engines.

From the ladder to action

Each row is actionable: it deep-links to the next sensible step — open the relevant run, launch a missing gauntlet stage pre-filled, or, once a strategy is ready, promote it to paper trading. The ladder is where you decide what’s worth deploying.

Generating ladder evidence quickly

One strategy, full ladder: python -m ats.research --strategy <key> --symbols SPY,TLT,GLD.
The whole basket, unattended: python -m ats.sweep runs a cheap rule-test screen, then the deep gauntlet on the survivors, and prints a ready-first leaderboard.

Both write every stage as a normal run, so the ladder rebuilds from them. See command-line research. The strategies and their honest outcomes are catalogued in the strategy library.