The research ladder
The Research tab is the honest scoreboard. It takes every backtest you’ve run and rolls the
gauntlet stages into a single view: for each strategy × symbol × timeframe, a
0–100 readiness score and a clear picture of which hard gates it has — or hasn’t — cleared. It’s
served by GET /api/research/confidence and cached in Redis.
What a ladder row shows
Section titled “What a ladder row shows”- Strategy / symbol / timeframe — the unit being judged (Daily or Intraday).
- Readiness score (0–100) — a combined score across the gauntlet rungs.
- A ⭐ — awarded only when the strategy has cleared all the hard gates (below).
- Buy & hold % and excess — always shown, so a high score that doesn’t beat buying the instrument is obvious.
- The stages — the status of each rung (rule test, held-out Sharpe, walk-forward OOS, deflated Sharpe), plus visible-but-not-gating signals (Monte Carlo, cross-sectional breadth).
The hard gates (the ⭐)
Section titled “The hard gates (the ⭐)”A strategy is “ready” only when it clears, in order:
- Rule test significant — the real Sharpe beats the 95th percentile of random-entry twins.
- Held-out Sharpe > 0.5 — on the optimizer’s untouched test window.
- Walk-forward out-of-sample Sharpe > 0.5 — stitched across rolling windows.
- Deflated Sharpe > ~0.95 — corrected for how many variants were tried.
These bars are deliberately strict — a positive rule test and a positive held-out Sharpe together are not enough (the “ORB lesson”). Each rung documents, on the tab itself, exactly what it protects against; the full explanation is in the gauntlet.
Breadth is a bonus, not a gate
Section titled “Breadth is a bonus, not a gate”Cross-sectional breadth — does the edge show up across many independent names? — strengthens the score but never gates it. So a genuinely asset-specific edge (a structural pair, a calendar effect) can still qualify, while a one-name “winner” is treated with suspicion. See multi-instrument engines.
From the ladder to action
Section titled “From the ladder to action”Each row is actionable: it deep-links to the next sensible step — open the relevant run, launch a missing gauntlet stage pre-filled, or, once a strategy is ready, promote it to paper trading. The ladder is where you decide what’s worth deploying.
Generating ladder evidence quickly
Section titled “Generating ladder evidence quickly”- One strategy, full ladder:
python -m ats.research --strategy <key> --symbols SPY,TLT,GLD. - The whole basket, unattended:
python -m ats.sweepruns a cheap rule-test screen, then the deep gauntlet on the survivors, and prints a ready-first leaderboard.
Both write every stage as a normal run, so the ladder rebuilds from them. See command-line research. The strategies and their honest outcomes are catalogued in the strategy library.