Writing a strategy

Prefer to describe an idea in plain English? The Studio tab generates a strategy that follows this same contract (your own Anthropic key, validated + sandboxed) and drops it straight into the Backtests picker. This guide is the contract it — and you — write to.

Strategies are code in this repo (ADR-001) — one file per hypothesis under backend/ats/strategies/, registered at import time. There are 19 built-in strategies to read for patterns; the cleanest skeletons to copy are donchian.py (breakout, with the previous-bar discipline), rsi_pullback.py (oscillator mean-reversion), and ibs_reversion.py (an OHLC signal computed straight from the bar, no indicator).

The contract

Every strategy file provides three things:

A frozen config (StrategyConfig, frozen=True) starting with instrument_id: InstrumentId and bar_type: BarType, then your parameters with sensible defaults, then — verbatim — the shared blocks:
```
trade_size: int = 100
sizing_mode: str = "fixed"  # "fixed" | "vol_target"
risk_pct: float = 0.5
sizing_atr_period: int = 14
max_position_quantity: int = 10_000
max_gross_notional: float = 1_000_000.0
# Rule-significance machinery (hidden from the UI form)
entry_mode: str = "rule"
entry_probability: float = 0.05
random_seed: int = 0
```
These names are load-bearing: the sizing helper, the rule test, and the UI form generator all read them (engine/hidden fields are filtered via _NON_PARAM_FIELDS in registry.py).
The @register(...) decorator with a unique key, display name, one-line description, and a search space — without one the optimization stage of the research ladder has nothing to do. Keep it to 2–3 dimensions; every extra dimension is more room to overfit.
The Strategy class with on_start (register indicators + subscribe), on_bar, and the standard on_stop (cancel_all_orders → close_all_positions → unsubscribe_bars).

Indicators are not a fixed set. Any class in nautilus_trader.indicators is available — import it from its group module: momentum (RSI, Stochastics, CCI, RateOfChange…), trend (MACD, Aroon, DirectionalMovement, Ichimoku…), volatility (BollingerBands, ATR, Keltner, Donchian…), averages (SMA/EMA/Hull/Wilder…), volume (VWAP, OnBalanceVolume…) — or compute your own signal inline from math + the bar OHLCV. (The Studio’s AI generator gets this same palette; it’s bounded only by the sandbox import allow-list.)

The entry decision MUST go through the rule-test switch, and sizing through the shared helper:

if is_flat and should_enter(self, <your entry signal as a bool>):
    self._buy(close)          # uses entry_quantity(self, price)

Exits stay outside should_enter — the rule test holds exits fixed and randomizes only entry timing.

Traps (each of these cost real debugging time)

Indicators update BEFORE on_bar fires. Comparing this bar’s close to a channel/extreme that already includes this bar can never trigger — store previous-bar values (see _prev_upper in donchian.py). Trend filters and oscillators (SMA, RSI) are fine to read directly: the value is known at the close and the market order fills on the next bar, so there’s no lookahead either way.
Never name an attribute _stop — it shadows Component._stop and breaks the Nautilus FSM with 'float' object is not callable.
Nautilus scales: RelativeStrengthIndex.value is 0..1 (not 0..100), RateOfChange.value is fractional. Keep config thresholds in conventional units and convert once in on_bar.
Vol-target sizing needs self._sizing_atr — an AverageTrueRange indicator attribute with exactly that name, registered for bars.
Determinism: no wall clocks, no unseeded randomness. Any randomness must flow from random_seed (the rule test depends on this).
Register the module in ats/strategies/__init__.py or nothing sees it.

Testing it

tests/test_strategies.py parametrizes over all registry keys, so a new strategy is automatically checked for: schema hygiene, “it actually trades on the fixture” (add fixture-appropriate params to FIXTURE_PARAMS if your defaults need more than ~300 warmup bars — the fixture is 520 bars), and “entry_mode='rule' is byte-identical to default”. Run the suite from backend/: uv run pytest -q.

Researching it

The full evidence ladder, one command (imports fresh — no worker restart):

cd backend
uv run python -m ats.research --strategy your_key --symbols SPY,TLT,GLD

Stages: rule test → split optimize (train window only) → WFO → a final run that feeds the deflated Sharpe. Results land in the Backtests tab and the Research-tab ladder, where each strategy/symbol/timeframe gets a 0–100 readiness score and a ⭐ once it clears the hard gauntlet: rule-test significant → held-out Sharpe > 0.5 → WFO out-of-sample Sharpe > 0.5 → deflated Sharpe > 0.95. Those bars are deliberately strict — a positive rule test plus a positive held-out Sharpe is not enough (the ORB lesson). Monte Carlo and cross-sectional breadth are visible signals; breadth is a score bonus, never a gate, so an asset-specific edge (a structural pair, a sector anomaly) still qualifies. Always judge against Buy & hold % too. To test lots of strategies at once — across the whole basket and both timeframes, unattended — run python -m ats.sweep (a cheap rule-test screen, then the deep gauntlet on survivors); it writes every stage as a normal run and prints a ready-first leaderboard. What each rung protects against is documented on the Research tab itself and in the README.

For an intraday strategy (1-minute bars), point the same command at the intraday catalog and charge the measured spread per fill — the whole ladder then runs net of honest costs:

uv run python -m ats.research --strategy intraday_vwap_exhaustion --symbols QQQ,SPY,DIA \
    --bar-spec 1-MINUTE-LAST --slippage-bps auto --start 2023-04-01 --end 2026-06-12

--slippage-bps auto resolves to each name’s measured half-spread from the ingested sidecar (daily runs default to 0, byte-identical). Metrics annualize with the right periods_per_year for the cadence automatically.

Intraday strategies (a session-aware shape)

The intraday_* keys are still single-instrument registry strategies (same contract, same should_enter switch), but they subclass a small session harness in intraday.py instead of Strategy directly: _Session rebuilds per-RTH-session state from the 1-minute bars (VWAP, opening range, minutes-since-open n, deviation dispersion, rolling volume), and _IntradayStrategy handles the lifecycle every intraday idea shares — flat by the close (no overnight risk on a cash account), long-only market entries, and an _observe() per-bar hook for strategies that need to latch state (e.g. a measure-time Z-score). You override _entry_signal / _exit_signal and, if needed, _observe. Design them for 1-MINUTE-LAST bars; they need intraday data ingested (ats.intraday) and are judged net of the measured spread — the dominant intraday cost — via the backtest’s SpreadCostFeeModel (--slippage-bps auto), not by anything inside the strategy.

A relative-value bet must see the whole basket at once, which a single-instrument strategy can’t — so the long-only intraday relative-value engine is vectorized (ats/workers/intraday_relval.py, ats.intraday_relval), the session-aware sibling of the cross-sectional engine below. Honest result so far: every intraday archetype (the four intraday_* strategies + the relative-value engine) is a negative net of spread — the capability is the deliverable, not yet an edge. See the README’s Intraday section for the numbers.

Cross-sectional strategies (a different shape)

Everything above is single-instrument: a strategy sees one symbol’s bars and decides long/flat. A cross-sectional strategy is a different shape — it ranks a whole universe each rebalance and holds the top-K (long the strongest = momentum, or the most oversold = reversion). That’s portfolio construction, not an entry rule, so it doesn’t use the Strategy contract or the registry: it’s a vectorized engine (ats/workers/cross_sectional.py) over the catalog’s aligned closes, and you reach for it from the Backtests → Cross-sectional builder or the CLI:

cd backend
uv run python -m ats.cross_sectional --universe SPY,QQQ,IWM,TLT,GLD,EEM,EFA \
    --signal momentum --lookback 208 --top-k 4 --rebalance 50 --mode rank_test

It has its own anti-overfit ladder. The one that matters is the rank test: real ranking vs N random-selection twins at the same turnover — does the ranking beat picking names at random? (same 95th-percentile bar as the single-instrument rule test). Then optimize (held-out), walk_forward, and a 2× cost-stress button on the single-mode detail. The benchmark is holding the whole universe equal-weight, so judge against excess vs equal-weight, not raw return. Signals: momentum, reversion, low_vol (rank by lowest trailing realized vol). Lessons from the research so far: cross-asset / country momentum is the one cross-sectional edge that clears the bar (significant on a broad 14-country ETF universe — breadth strengthened it from marginal); the low-vol factor is refuted (fails the rank test on every equity universe — selecting the calmest names is a beta tilt, not a timing edge); sector reversion shows no edge. Same “simple edges win, filters remove edge” pattern as the single-instrument strategies. (Cross-sectional has no live order path — even a significant result is a research finding until bridged to a rebalance strategy.)

The same engine shape has a session-aware intraday variant, ats/workers/intraday_relval.py (ats.intraday_relval): it loads the aligned 1-minute grid (the shared _load_aligned is now bar_spec-parameterized — daily default unchanged), each session buys the intraday laggard (most-negative residual vs the basket) and goes flat by the close — the long-only, no-shorting reframe of an intraday pair. Its rank test is the gate, judged net of the measured spread. Lesson, consistent with everything above: it’s a no-edge — laggard ranking doesn’t beat random selection (full 15-ETF basket only marginal, the correlated “broken hedge” pairs are a coin-flip) and barely beats, or underperforms, just holding the basket. The single-instrument intraday_vwap_exhaustion (a time-windowed VWAP fade conditioned on volume exhaustion) is also a negative — a couple of names pass the rule test but collapse under WFO + deflated Sharpe, with breadth positive on only 3 of 15. See the README’s intraday section for the numbers; the takeaway is that cross-sectional breadth is the load-bearing anti-overfit rung — a real edge shows on many independent names, not two or three lucky ones.

Pairs and portfolio construction (combining edges)

Single edges are weak on their own; the durable gains come from combining low- correlation edges. Two tools help here, both vectorized (return-level, no order-engine shorting) like the cross-sectional engine:

Pairs (ats/workers/pairs.py, python -m ats.pairs --legs LQD,HYG) trade the spread between two cointegrated instruments dollar-neutral. They’re near-zero correlation to the long book, so a validated pair (e.g. LQD/HYG credit) is the ideal diversifier. Same ladder as everything else — the load-bearing rung is the rank test vs random entry timing (“does waiting for a stretched spread beat entering at random?”), and the 2× cost stress matters more because both legs trade.

The spread is log(A) − β·log(B), and --hedge sets β (workers/kalman.py): fixed (β=1, equal dollars — the default and validated config), rolling_ols (trailing Cov/Var), or kalman (a state-space random-walk filter that lets β drift). A time-varying β is meant to strip residual directional beta out of the spread, and should in theory raise rank-test significance. It didn’t, on this daily US-ETF universe — same-parameter kalman halved LQD/HYG’s Sharpe (0.85 → 0.44, β stayed in [1.07, 1.16]) and rescued no previously-rejected pair into significance. These legs are already near dollar-for-dollar (β≈1, stable), so a wandering hedge mostly adds noise and tightens the spread, collapsing exposure (~38% → ~8%). fixed stays the default (byte-identical to the validated result); kalman/rolling_ols are opt-in ingredients for future pairs whose legs have genuinely different betas (a stock vs its sector). Yet another instance of the recurring lesson: simple wins; added machinery rarely adds rule-test edge — and the ladder, not intuition, is what told us.

Discovering pairs (ats/workers/pair_screen.py, python -m ats.pair_screen --universe …) finds candidates instead of hand-picking: it scans every pair, runs the Engle-Granger cointegration test (OLS hedge + Dickey-Fuller t-stat on the residual)

OU half-life + hedge-β stability, and ranks by ADF t-stat. The top-N go into the pairs ladder above — the rank test, not the screen, is the gate. Two lessons: it works (found new rank-test-significant pairs like IWF/QQQ and EWA/VWO beyond LQD/HYG), but ADF rank ≠ tradeable-edge rank — the most “cointegrated” hits are near-redundant fund twins (AGG/BND: same index, different sponsor), tight but low-capacity. Discovered pairs are still pairs (no live path) until a live order-placing pairs strategy is built.

Breadth beats cleverness (campaign finding). Widening the catalog to ~63 ETFs and re-running the existing ladder is higher-EV than inventing strategies: the IBS edge proved structural (8/8 new equity-ETF candidates cleared the rule test), and adding the two genuinely-uncorrelated survivors — IBS/EWY (Korea) and IBS/GDXJ (gold miners) — lifted the faithful deployable book from Sharpe ~1.24 to ~1.46 (OOS walk-forward ~1.4, 100% windows profitable) at higher capital efficiency. The diversification came from new instruments/asset-classes, not new signals — exactly where the research record said it would.

Portfolio search (ats/workers/portfolio_search.py, python -m ats.portfolio_search) takes a pool of validated sleeves (strategies and pairs) and searches subsets × allocation methods (equal, inverse-vol, risk-parity, min-variance, max-diversification) for the best held-out Sharpe. The discipline is the whole point: weights are fit on a train window and scored out-of-sample, and the winner’s deflated Sharpe discounts for how many combinations were tried — a search for high Sharpe is mass multiple-testing, and without that correction you crown noise. Add a --deployment-floor to require capital is actually deployed, and --target-vol / --core to explore deploying idle cash.

Two lessons paid for here, both worth internalizing:

Diversification is the only real Sharpe lever. Combining uncorrelated edges lifts the portfolio Sharpe above the best single sleeve (diversification ratio

1). The biggest gains come from sleeves that are uncorrelated by construction — a bond reversion, a market-neutral pair, a calendar effect — not from another copy of the same reversion edge on a correlated instrument.
Capital deployment trades against Sharpe; there is no free lunch. Forcing exposure pulls in lower-Sharpe ingredients, and a directional “core” gets rejected by the search (beta isn’t alpha). Vol-targeting can deploy more capital at ~constant Sharpe, but a low-vol book needs heavy leverage to hit a vol target, which amplifies tail risk a daily backtest can’t see. Treat leverage as a deliberate risk choice, never a Sharpe upgrade.
Walk-forward the process, not just the weights. A single train/held-out split validates the chosen weights but not the selection — the act of picking which sleeves + allocator. --mode walk_forward re-runs the whole search per rolling window and trades its pick out-of-sample, reporting a selection stability metric. On our pool it was the reality check: the single-split “winner” showed Sharpe 1.60, but walk-forward delivered ~1.1–1.3 OOS and picked a different portfolio almost every window (25–50% stability). The lesson: validate the construction process, deploy a stable diversified core sized to its walk-forward Sharpe, and distrust any single “best” portfolio the search hands you — the edge family is more robust than any one combination of it.
Validate the deployment, not just the blend. run_portfolio and the search blend independently-run single-strategy backtests (each sleeve on the full cash, growth-normalized, weighted) — great for selecting a mix, but it is NOT how a live book deploys. A book backtest (python -m ats.book, the Book builder, or “Validate faithfully” on a search winner) runs every sleeve as a real strategy on ONE BacktestEngine/cash account — exactly as the live node — and reports the real per-bar capital deployment (Σ position notionals ÷ equity), the true Sharpe, a capital-efficiency figure, a book walk-forward, and a modeled-vs-faithful gap. The honest finding on our validated core: the blend reported ~33% exposure but the faithful book deploys only ~9.5% (the blend’s time-weighted exposure overstated real deployment ~3.4×) at a comparable Sharpe. Judge a multi-strategy deployment on the faithful book; raise per-sleeve trade_size to actually deploy more capital. Order-placing registry strategies only — pairs/cross-sectional have no live order path.
Stress the correlation, not just the average. The full-period correlation matrix hides time: a book can average 0.25 and still spike to 0.8 in a crisis, which is exactly when its diversification was supposed to pay. Every portfolio run reports a stress block — rolling 63-day pairwise correlation, behaviour in named crisis windows (COVID, the 2022 rate shock, and the book’s own worst 21-day window), and a ρ=0.8 correlation-spike vol estimate. The honest finding on our holistic book: diversification held in the textbook crises (COVID +1.7% / corr 0.30, 2022 flat / corr 0.16 — the LQD/HYG credit pair did not blow up), the worst correlation breakdown was a recent regime (the 2025 tariff shock, mean corr ~0.56), and even a forced ρ=0.8 only raises vol ~1.4× because the sleeves are individually low-vol. Read the spike number as a risk bound, not a forecast — and remember the next regime that breaks diversification rarely resembles the last one.