Why Backtest Profits Disappear Live

Last updated: 2026-06-10

In short

When live trading undershoots the backtest, audit in this order: (1) uncounted costs (spread, commission, swap — the cause in most cases), (2) data granularity (candle-based fills guessed in your favor), (3) execution differences (slippage, missed entries), (4) you (rule deviations under pressure), (5) regime change (the edge’s conditions left). Only conclude #5 after ruling out 1–4 — most “dead edges” are arithmetic, not markets.

First, Compare Like with Like

Before diagnosing anything, put both records in the same units: expectancy in R (or pips) per trade, net of costs, over a defined trade count. Comparing a 200-trade backtest expectancy against three weeks of live P&L in euros tells you nothing — 20 live trades is inside normal variance for almost any strategy. If the live sample is small, the honest answer may be “too early to tell.”

Suspect 1: Uncounted Costs (Check This First)

The most common gap, and the most fixable. If the backtest counted gross pips, live trading runs the full cost stack against you: spread on every round trip — session-correct, including the news-time spikes — commission if you’re on a raw account, swap on every overnight hold. Re-run the backtest’s numbers through the cost audit; in our worked example that single step explained a 59% performance gap. If net backtest expectancy ≈ live expectancy, the mystery is solved — your backtest was optimistic, not your execution.

Suspect 2: Data Granularity Guessed for You

Candle-based backtests resolve same-bar stop/target conflicts by assumption, and the assumptions systematically favor the trader. Tight-stop intraday strategies are most exposed: the backtest credited wins that live trading books as stop-outs. Telltale signature: live win rate is several points below backtest while average win/loss sizes match. Fix: re-validate the strategy on tick-level replay where the intrabar path is real, not guessed.

Suspect 3: Execution Differences

Slippage and fill asymmetries: stops filling beyond the level in fast markets, limits touched-but-not-filled, requotes, entry delay. Signature: live losses run slightly larger than backtested stops, and some backtested winners are missing entirely from the live record. Fix: pessimistic fill rules in the backtest (gap-side stop fills, pierced-not-touched limits), avoid holding tight stops through red news.

Suspect 4: You

The backtest assumed every rule followed. Live trading adds hesitation on valid entries, early exits on winners, “one more try” on losers, oversizing after wins. Signature: live trade list differs from what the rules would have produced — skipped setups, off-plan exits. This is checkable: replay the live period afterward and trade it by the rules; the gap between that result and your live result is the discipline gap, precisely measured. (A journal makes this audit possible — and prop-firm style rules practice is the training that closes it.)

Suspect 5: Regime Change (Last, Not First)

Edges have habitats — a breakout edge needs volatility, a mean-reversion edge needs range. If suspects 1–4 are clean and live expectancy has genuinely degraded over a meaningful sample, check whether the regime that hosted the edge is still present. Honest responses: a regime filter (trade it only in its weather), reduced size while monitoring, or retirement. What’s not honest: declaring regime death after eight losing trades — that’s normal variance wearing a costume.

The Ten-Minute Audit, In Order

  1. Net-of-costs expectancy on both records, same units.
  2. Cost stack present in the backtest? If not, add and recompare.
  3. Tight stops + candle data? Re-validate on ticks.
  4. Live fills worse than levels? Apply pessimistic fill rules and recompare.
  5. Live trades match the rules? Replay the period to measure the discipline gap.
  6. All clean and the gap persists past ~50 live trades? Now investigate regime.

Frequently Asked Questions

How many live trades before comparing against the backtest?

Treat 30 as the bare minimum and 50+ as meaningful — below that, normal variance around the backtest's expectancy can fully explain large P&L gaps. The exception: structural problems (every fill slightly worse than the level, swaps you didn't expect) show up immediately and are worth fixing at trade five.

My live win rate is lower but my winners are the same size. What does that point to?

That signature points to granularity or execution: candle-based backtests crediting same-bar wins that live trading resolves as stop-outs, or live stops slipping to fills the backtest never charged. Re-validate the strategy on tick-level data with pessimistic fill rules and compare again.

Could the difference just be a bad month?

Yes — and the backtest itself tells you how bad a normal bad month is. Check its worst 20-30 trade stretch and longest losing streak; if your live experience sits inside those bounds, you're observing variance the test predicted, not failure. That's exactly why drawdown and streak stats matter as much as expectancy.

Is forward testing supposed to catch all this earlier?

Mostly, yes — a 2-4 week demo forward test surfaces execution friction and discipline gaps before money is at risk, which is why it belongs between backtest and live. It cannot fully reproduce real-money psychology, but it isolates suspects 1-3 cheaply.

More in Costs

All Costs →

Practice This in a Free Replay Tool

StrategyTune replays real bid/ask tick data for 70+ instruments in the browser — free, no registration, no downloads. Place simulated trades and see your stats build.

Open StrategyTune