Writing a strategy
Prefer to describe an idea in plain English? The Studio tab generates a strategy that follows this same contract (your own Anthropic key, validated + sandboxed) and drops it straight into the Backtests picker. This guide is the contract it — and you — write to.
Strategies are code in this repo (ADR-001) — one file per hypothesis under
backend/ats/strategies/, registered at import time. There are 19 built-in
strategies to read for patterns; the cleanest skeletons to copy are
donchian.py (breakout, with the
previous-bar discipline), rsi_pullback.py
(oscillator mean-reversion), and
ibs_reversion.py (an OHLC signal
computed straight from the bar, no indicator).
The contract
Section titled “The contract”Every strategy file provides three things:
-
A frozen config (
StrategyConfig, frozen=True) starting withinstrument_id: InstrumentIdandbar_type: BarType, then your parameters with sensible defaults, then — verbatim — the shared blocks:trade_size: int = 100sizing_mode: str = "fixed" # "fixed" | "vol_target"risk_pct: float = 0.5sizing_atr_period: int = 14max_position_quantity: int = 10_000max_gross_notional: float = 1_000_000.0# Rule-significance machinery (hidden from the UI form)entry_mode: str = "rule"entry_probability: float = 0.05random_seed: int = 0These names are load-bearing: the sizing helper, the rule test, and the UI form generator all read them (engine/hidden fields are filtered via
_NON_PARAM_FIELDSinregistry.py). -
The
@register(...)decorator with a unique key, display name, one-line description, and a search space — without one the optimization stage of the research ladder has nothing to do. Keep it to 2–3 dimensions; every extra dimension is more room to overfit. -
The Strategy class with
on_start(register indicators + subscribe),on_bar, and the standardon_stop(cancel_all_orders → close_all_positions → unsubscribe_bars).
Indicators are not a fixed set. Any class in nautilus_trader.indicators is
available — import it from its group module: momentum (RSI, Stochastics, CCI,
RateOfChange…), trend (MACD, Aroon, DirectionalMovement, Ichimoku…),
volatility (BollingerBands, ATR, Keltner, Donchian…), averages
(SMA/EMA/Hull/Wilder…), volume (VWAP, OnBalanceVolume…) — or compute your own
signal inline from math + the bar OHLCV. (The Studio’s AI generator gets this
same palette; it’s bounded only by the sandbox import allow-list.)
The entry decision MUST go through the rule-test switch, and sizing through the shared helper:
if is_flat and should_enter(self, <your entry signal as a bool>): self._buy(close) # uses entry_quantity(self, price)Exits stay outside should_enter — the rule test holds exits fixed and
randomizes only entry timing.
Traps (each of these cost real debugging time)
Section titled “Traps (each of these cost real debugging time)”- Indicators update BEFORE
on_barfires. Comparing this bar’s close to a channel/extreme that already includes this bar can never trigger — store previous-bar values (see_prev_upperin donchian.py). Trend filters and oscillators (SMA, RSI) are fine to read directly: the value is known at the close and the market order fills on the next bar, so there’s no lookahead either way. - Never name an attribute
_stop— it shadowsComponent._stopand breaks the Nautilus FSM with'float' object is not callable. - Nautilus scales:
RelativeStrengthIndex.valueis 0..1 (not 0..100),RateOfChange.valueis fractional. Keep config thresholds in conventional units and convert once inon_bar. - Vol-target sizing needs
self._sizing_atr— anAverageTrueRangeindicator attribute with exactly that name, registered for bars. - Determinism: no wall clocks, no unseeded randomness. Any randomness
must flow from
random_seed(the rule test depends on this). - Register the module in
ats/strategies/__init__.pyor nothing sees it.
Testing it
Section titled “Testing it”tests/test_strategies.py parametrizes over all registry keys, so a new
strategy is automatically checked for: schema hygiene, “it actually trades on
the fixture” (add fixture-appropriate params to FIXTURE_PARAMS if your
defaults need more than ~300 warmup bars — the fixture is 520 bars), and
“entry_mode='rule' is byte-identical to default”. Run the suite from
backend/: uv run pytest -q.
Researching it
Section titled “Researching it”The full evidence ladder, one command (imports fresh — no worker restart):
cd backenduv run python -m ats.research --strategy your_key --symbols SPY,TLT,GLDStages: rule test → split optimize (train window only) → WFO → a final run
that feeds the deflated Sharpe. Results land in the Backtests tab and the
Research-tab ladder, where each strategy/symbol/timeframe gets a 0–100
readiness score and a ⭐ once it clears the hard gauntlet: rule-test
significant → held-out Sharpe > 0.5 → WFO out-of-sample Sharpe > 0.5 →
deflated Sharpe > 0.95. Those bars are deliberately strict — a positive rule
test plus a positive held-out Sharpe is not enough (the ORB lesson). Monte
Carlo and cross-sectional breadth are visible signals; breadth is a score
bonus, never a gate, so an asset-specific edge (a structural pair, a sector
anomaly) still qualifies. Always judge against Buy & hold % too. To test
lots of strategies at once — across the whole basket and both timeframes,
unattended — run python -m ats.sweep (a cheap rule-test screen, then the deep
gauntlet on survivors); it writes every stage as a normal run and prints a
ready-first leaderboard. What each rung protects against is documented on the
Research tab itself and in the README.
For an intraday strategy (1-minute bars), point the same command at the intraday catalog and charge the measured spread per fill — the whole ladder then runs net of honest costs:
uv run python -m ats.research --strategy intraday_vwap_exhaustion --symbols QQQ,SPY,DIA \ --bar-spec 1-MINUTE-LAST --slippage-bps auto --start 2023-04-01 --end 2026-06-12--slippage-bps auto resolves to each name’s measured half-spread from the
ingested sidecar (daily runs default to 0, byte-identical). Metrics annualize with
the right periods_per_year for the cadence automatically.
Intraday strategies (a session-aware shape)
Section titled “Intraday strategies (a session-aware shape)”The intraday_* keys are still single-instrument registry strategies (same contract,
same should_enter switch), but they subclass a small session harness in
intraday.py instead of Strategy directly:
_Session rebuilds per-RTH-session state from the 1-minute bars (VWAP, opening range,
minutes-since-open n, deviation dispersion, rolling volume), and _IntradayStrategy
handles the lifecycle every intraday idea shares — flat by the close (no overnight
risk on a cash account), long-only market entries, and an _observe() per-bar hook for
strategies that need to latch state (e.g. a measure-time Z-score). You override
_entry_signal / _exit_signal and, if needed, _observe. Design them for
1-MINUTE-LAST bars; they need intraday data ingested (ats.intraday) and are judged
net of the measured spread — the dominant intraday cost — via the backtest’s
SpreadCostFeeModel (--slippage-bps auto), not by anything inside the strategy.
A relative-value bet must see the whole basket at once, which a single-instrument
strategy can’t — so the long-only intraday relative-value engine is vectorized
(ats/workers/intraday_relval.py, ats.intraday_relval), the session-aware sibling of
the cross-sectional engine below. Honest result so far: every intraday archetype (the
four intraday_* strategies + the relative-value engine) is a negative net of
spread — the capability is the deliverable, not yet an edge. See the README’s Intraday
section for the numbers.
Cross-sectional strategies (a different shape)
Section titled “Cross-sectional strategies (a different shape)”Everything above is single-instrument: a strategy sees one symbol’s bars and
decides long/flat. A cross-sectional strategy is a different shape — it ranks
a whole universe each rebalance and holds the top-K (long the strongest =
momentum, or the most oversold = reversion). That’s portfolio construction, not
an entry rule, so it doesn’t use the Strategy contract or the registry: it’s a
vectorized engine (ats/workers/cross_sectional.py) over the catalog’s aligned
closes, and you reach for it from the Backtests → Cross-sectional builder or
the CLI:
cd backenduv run python -m ats.cross_sectional --universe SPY,QQQ,IWM,TLT,GLD,EEM,EFA \ --signal momentum --lookback 208 --top-k 4 --rebalance 50 --mode rank_testIt has its own anti-overfit ladder. The one that matters is the rank test:
real ranking vs N random-selection twins at the same turnover — does the
ranking beat picking names at random? (same 95th-percentile bar as the
single-instrument rule test). Then optimize (held-out), walk_forward, and a
2× cost-stress button on the single-mode detail. The benchmark is holding the
whole universe equal-weight, so judge against excess vs equal-weight, not raw
return. Signals: momentum, reversion, low_vol (rank by lowest trailing
realized vol). Lessons from the research so far: cross-asset / country momentum
is the one cross-sectional edge that clears the bar (significant on a broad
14-country ETF universe — breadth strengthened it from marginal); the low-vol
factor is refuted (fails the rank test on every equity universe — selecting the
calmest names is a beta tilt, not a timing edge); sector reversion shows no edge.
Same “simple edges win, filters remove edge” pattern as the single-instrument
strategies. (Cross-sectional has no live order path — even a significant result
is a research finding until bridged to a rebalance strategy.)
The same engine shape has a session-aware intraday variant,
ats/workers/intraday_relval.py (ats.intraday_relval): it loads the aligned
1-minute grid (the shared _load_aligned is now bar_spec-parameterized — daily
default unchanged), each session buys the intraday laggard (most-negative
residual vs the basket) and goes flat by the close — the long-only, no-shorting
reframe of an intraday pair. Its rank test is the gate, judged net of the measured
spread. Lesson, consistent with everything above: it’s a no-edge — laggard
ranking doesn’t beat random selection (full 15-ETF basket only marginal, the
correlated “broken hedge” pairs are a coin-flip) and barely beats, or underperforms,
just holding the basket. The single-instrument intraday_vwap_exhaustion (a
time-windowed VWAP fade conditioned on volume exhaustion) is also a negative — a
couple of names pass the rule test but collapse under WFO + deflated Sharpe, with
breadth positive on only 3 of 15. See the README’s intraday section for the numbers;
the takeaway is that cross-sectional breadth is the load-bearing anti-overfit rung
— a real edge shows on many independent names, not two or three lucky ones.
Pairs and portfolio construction (combining edges)
Section titled “Pairs and portfolio construction (combining edges)”Single edges are weak on their own; the durable gains come from combining low- correlation edges. Two tools help here, both vectorized (return-level, no order-engine shorting) like the cross-sectional engine:
Pairs (ats/workers/pairs.py, python -m ats.pairs --legs LQD,HYG) trade the
spread between two cointegrated instruments dollar-neutral. They’re near-zero
correlation to the long book, so a validated pair (e.g. LQD/HYG credit) is the
ideal diversifier. Same ladder as everything else — the load-bearing rung is the
rank test vs random entry timing (“does waiting for a stretched spread beat
entering at random?”), and the 2× cost stress matters more because both legs trade.
The spread is log(A) − β·log(B), and --hedge sets β (workers/kalman.py):
fixed (β=1, equal dollars — the default and validated config), rolling_ols
(trailing Cov/Var), or kalman (a state-space random-walk filter that lets β drift).
A time-varying β is meant to strip residual directional beta out of the spread, and
should in theory raise rank-test significance. It didn’t, on this daily US-ETF
universe — same-parameter kalman halved LQD/HYG’s Sharpe (0.85 → 0.44, β stayed
in [1.07, 1.16]) and rescued no previously-rejected pair into significance. These
legs are already near dollar-for-dollar (β≈1, stable), so a wandering hedge mostly adds
noise and tightens the spread, collapsing exposure (~38% → ~8%). fixed stays the
default (byte-identical to the validated result); kalman/rolling_ols are opt-in
ingredients for future pairs whose legs have genuinely different betas (a stock vs its
sector). Yet another instance of the recurring lesson: simple wins; added machinery
rarely adds rule-test edge — and the ladder, not intuition, is what told us.
Discovering pairs (ats/workers/pair_screen.py, python -m ats.pair_screen --universe …) finds candidates instead of hand-picking: it scans every pair, runs
the Engle-Granger cointegration test (OLS hedge + Dickey-Fuller t-stat on the residual)
- OU half-life + hedge-β stability, and ranks by ADF t-stat. The top-N go into the pairs ladder above — the rank test, not the screen, is the gate. Two lessons: it works (found new rank-test-significant pairs like IWF/QQQ and EWA/VWO beyond LQD/HYG), but ADF rank ≠ tradeable-edge rank — the most “cointegrated” hits are near-redundant fund twins (AGG/BND: same index, different sponsor), tight but low-capacity. Discovered pairs are still pairs (no live path) until a live order-placing pairs strategy is built.
Breadth beats cleverness (campaign finding). Widening the catalog to ~63 ETFs and re-running the existing ladder is higher-EV than inventing strategies: the IBS edge proved structural (8/8 new equity-ETF candidates cleared the rule test), and adding the two genuinely-uncorrelated survivors — IBS/EWY (Korea) and IBS/GDXJ (gold miners) — lifted the faithful deployable book from Sharpe ~1.24 to ~1.46 (OOS walk-forward ~1.4, 100% windows profitable) at higher capital efficiency. The diversification came from new instruments/asset-classes, not new signals — exactly where the research record said it would.
Portfolio search (ats/workers/portfolio_search.py,
python -m ats.portfolio_search) takes a pool of validated sleeves (strategies
and pairs) and searches subsets × allocation methods (equal, inverse-vol,
risk-parity, min-variance, max-diversification) for the best held-out Sharpe.
The discipline is the whole point: weights are fit on a train window and scored
out-of-sample, and the winner’s deflated Sharpe discounts for how many
combinations were tried — a search for high Sharpe is mass multiple-testing, and
without that correction you crown noise. Add a --deployment-floor to require
capital is actually deployed, and --target-vol / --core to explore deploying
idle cash.
Two lessons paid for here, both worth internalizing:
- Diversification is the only real Sharpe lever. Combining uncorrelated edges
lifts the portfolio Sharpe above the best single sleeve (diversification ratio
1). The biggest gains come from sleeves that are uncorrelated by construction — a bond reversion, a market-neutral pair, a calendar effect — not from another copy of the same reversion edge on a correlated instrument.
- Capital deployment trades against Sharpe; there is no free lunch. Forcing exposure pulls in lower-Sharpe ingredients, and a directional “core” gets rejected by the search (beta isn’t alpha). Vol-targeting can deploy more capital at ~constant Sharpe, but a low-vol book needs heavy leverage to hit a vol target, which amplifies tail risk a daily backtest can’t see. Treat leverage as a deliberate risk choice, never a Sharpe upgrade.
- Walk-forward the process, not just the weights. A single train/held-out
split validates the chosen weights but not the selection — the act of picking
which sleeves + allocator.
--mode walk_forwardre-runs the whole search per rolling window and trades its pick out-of-sample, reporting a selection stability metric. On our pool it was the reality check: the single-split “winner” showed Sharpe 1.60, but walk-forward delivered ~1.1–1.3 OOS and picked a different portfolio almost every window (25–50% stability). The lesson: validate the construction process, deploy a stable diversified core sized to its walk-forward Sharpe, and distrust any single “best” portfolio the search hands you — the edge family is more robust than any one combination of it. - Validate the deployment, not just the blend.
run_portfolioand the search blend independently-run single-strategy backtests (each sleeve on the full cash, growth-normalized, weighted) — great for selecting a mix, but it is NOT how a live book deploys. A book backtest (python -m ats.book, the Book builder, or “Validate faithfully” on a search winner) runs every sleeve as a real strategy on ONEBacktestEngine/cash account — exactly as the live node — and reports the real per-bar capital deployment (Σ position notionals ÷ equity), the true Sharpe, a capital-efficiency figure, a book walk-forward, and a modeled-vs-faithful gap. The honest finding on our validated core: the blend reported ~33% exposure but the faithful book deploys only ~9.5% (the blend’s time-weighted exposure overstated real deployment ~3.4×) at a comparable Sharpe. Judge a multi-strategy deployment on the faithful book; raise per-sleevetrade_sizeto actually deploy more capital. Order-placing registry strategies only — pairs/cross-sectional have no live order path. - Stress the correlation, not just the average. The full-period correlation matrix hides time: a book can average 0.25 and still spike to 0.8 in a crisis, which is exactly when its diversification was supposed to pay. Every portfolio run reports a stress block — rolling 63-day pairwise correlation, behaviour in named crisis windows (COVID, the 2022 rate shock, and the book’s own worst 21-day window), and a ρ=0.8 correlation-spike vol estimate. The honest finding on our holistic book: diversification held in the textbook crises (COVID +1.7% / corr 0.30, 2022 flat / corr 0.16 — the LQD/HYG credit pair did not blow up), the worst correlation breakdown was a recent regime (the 2025 tariff shock, mean corr ~0.56), and even a forced ρ=0.8 only raises vol ~1.4× because the sleeves are individually low-vol. Read the spike number as a risk bound, not a forecast — and remember the next regime that breaks diversification rarely resembles the last one.