Backtest Metrics That Matter
Last updated: 2026-06-11
In short
Judge a backtest by expectancy (the headline: average R per trade, net of costs), max drawdown and longest losing streak (whether you could survive it), and profit factor. Win rate alone is the metric that lies — meaningless without the reward-to-risk ratio beside it.
Expectancy — the Headline Number
Expectancy = (win rate × average win) − (loss rate × average loss)
It folds frequency and magnitude into one figure: the average R (or pips) you earn per trade. It must be positive after the full cost stack or there’s no edge. Example: 45% winners at +15 pips, 55% losers at −8 pips → 0.45×15 − 0.55×8 = +2.35 pips/trade gross; subtract ~1 pip costs → +1.35 net. Everything else is context for this number.
Win Rate — the Metric That Lies Alone
A 40% win rate is excellent or terrible depending entirely on reward-to-risk. The breakeven table:
| Reward : Risk | Breakeven win rate |
|---|---|
| 0.5 : 1 | 66.7% |
| 1 : 1 | 50% |
| 1.5 : 1 | 40% |
| 2 : 1 | 33.3% |
| 3 : 1 | 25% |
Costs push every line up (at 1:1 with costs ~10% of target, true breakeven ≈ 52.6%). Quote win rate only alongside R:R, or not at all.
Max Drawdown — the Survival Number
The largest peak-to-trough equity decline. This decides whether you’d actually have kept trading the strategy — and it’s the number prop firms test you against. A strategy with great expectancy and a 40% drawdown is untradeable by most humans: you’d quit (or breach a firm’s limit) before the edge paid off. Read it on the equity curve, and compare it to any drawdown limit you must respect.
Longest Losing Streak — the Psychology Number
If the backtest contains 9 consecutive losers, live trading will too. The question isn’t whether it’ll happen — it’s whether you’ll keep following the rules when it does. Streak length also sets your sizing ceiling: streak × risk% must stay inside your drawdown tolerance (and any prop limit). An 8-loss streak at 3% risk is −22%; at 1%, −7.7%.
Profit Factor
Profit factor = gross wins ÷ gross losses
Above 1.0 is profitable; above ~1.3 after costs is respectable for a discretionary system; suspiciously above ~2.5 on a small sample usually means overfitting or too few trades. A single number for “how much do winners outweigh losers in total.”
Putting Them Together
Read them as a set, never alone:
| Metric | Answers | Danger if ignored |
|---|---|---|
| Expectancy (net) | Is there an edge? | Trading a negative-edge system |
| Win rate + R:R | What kind of edge? | Misjudging viability from win rate |
| Max drawdown | Could I hold it? | Quitting / breaching at the worst moment |
| Longest streak | Can I size it safely? | Over-sizing into ruin |
| Profit factor | How efficient? | Mistaking a fragile fit for an edge |
All five fall out of a complete journal — a few spreadsheet formulas, or the built-in stats of replay tools that track P&L (tick tools like StrategyTune compute win/loss, expectancy and streaks automatically; keep the cost columns in your own sheet since no tool models swap).
Frequently Asked Questions
What's a good expectancy for a trading strategy?
Any reliably positive net-of-cost expectancy is tradeable — the magnitude matters less than the reliability and the drawdown it comes with. As a feel: +0.2R to +0.5R per trade is a solid discretionary result over 200+ trades. Be suspicious of much higher figures on small samples; they usually shrink with more data.
Is a high win rate good or bad?
Neither, on its own — it's only meaningful next to reward-to-risk. A 70% win rate at 0.4:1 R:R loses money; a 35% win rate at 3:1 prints. High win rates also tend to pair with occasional large losers, so always check the loss distribution and drawdown rather than celebrating the percentage.
How do I calculate max drawdown from a trade list?
Build a running cumulative-equity column, then a running-maximum column. Drawdown at each trade is running equity minus running max (zero or negative); max drawdown is the most negative value. Express it as a percentage of the peak for comparability across account sizes and against prop-firm limits.
Which metric matters most for prop firm challenges?
Max drawdown and longest losing streak, because challenges fail on rule breaches, not on weak expectancy. Your worst drawdown must fit comfortably inside the firm's limit at your planned risk per trade, and your worst streak must not breach the daily loss limit — simulate both before paying.
More in Method
All Method →Writing Entry/Exit Rules You Can Actually Test
From vibe to if-then rules.
Hindsight Bias: Why Scrolling Charts Isn’t Backtesting
The bias replay tools exist to kill.
Overfitting in Manual Backtesting
Yes, it happens without code too.
Look-Ahead Bias and How Replay Prevents It
Using information you wouldn’t have had.
Reading an Equity Curve Like a Risk Manager
What the shape of the curve tells you.
Journaling Backtest Trades
A minimal template that captures what matters.
Testing Across Trending, Ranging & News Regimes
One regime is not a backtest.
Practice This in a Free Replay Tool
StrategyTune replays real bid/ask tick data for 70+ instruments in the browser — free, no registration, no downloads. Place simulated trades and see your stats build.
Open StrategyTune