Backtest Metrics That Matter

Last updated: 2026-06-11

In short

Judge a backtest by expectancy (the headline: average R per trade, net of costs), max drawdown and longest losing streak (whether you could survive it), and profit factor. Win rate alone is the metric that lies — meaningless without the reward-to-risk ratio beside it.

Expectancy — the Headline Number

Expectancy = (win rate × average win) − (loss rate × average loss)

It folds frequency and magnitude into one figure: the average R (or pips) you earn per trade. It must be positive after the full cost stack or there’s no edge. Example: 45% winners at +15 pips, 55% losers at −8 pips → 0.45×15 − 0.55×8 = +2.35 pips/trade gross; subtract ~1 pip costs → +1.35 net. Everything else is context for this number.

Win Rate — the Metric That Lies Alone

A 40% win rate is excellent or terrible depending entirely on reward-to-risk. The breakeven table:

Reward : RiskBreakeven win rate
0.5 : 166.7%
1 : 150%
1.5 : 140%
2 : 133.3%
3 : 125%

Costs push every line up (at 1:1 with costs ~10% of target, true breakeven ≈ 52.6%). Quote win rate only alongside R:R, or not at all.

Max Drawdown — the Survival Number

The largest peak-to-trough equity decline. This decides whether you’d actually have kept trading the strategy — and it’s the number prop firms test you against. A strategy with great expectancy and a 40% drawdown is untradeable by most humans: you’d quit (or breach a firm’s limit) before the edge paid off. Read it on the equity curve, and compare it to any drawdown limit you must respect.

Longest Losing Streak — the Psychology Number

If the backtest contains 9 consecutive losers, live trading will too. The question isn’t whether it’ll happen — it’s whether you’ll keep following the rules when it does. Streak length also sets your sizing ceiling: streak × risk% must stay inside your drawdown tolerance (and any prop limit). An 8-loss streak at 3% risk is −22%; at 1%, −7.7%.

Profit Factor

Profit factor = gross wins ÷ gross losses

Above 1.0 is profitable; above ~1.3 after costs is respectable for a discretionary system; suspiciously above ~2.5 on a small sample usually means overfitting or too few trades. A single number for “how much do winners outweigh losers in total.”

Putting Them Together

Read them as a set, never alone:

MetricAnswersDanger if ignored
Expectancy (net)Is there an edge?Trading a negative-edge system
Win rate + R:RWhat kind of edge?Misjudging viability from win rate
Max drawdownCould I hold it?Quitting / breaching at the worst moment
Longest streakCan I size it safely?Over-sizing into ruin
Profit factorHow efficient?Mistaking a fragile fit for an edge

All five fall out of a complete journal — a few spreadsheet formulas, or the built-in stats of replay tools that track P&L (tick tools like StrategyTune compute win/loss, expectancy and streaks automatically; keep the cost columns in your own sheet since no tool models swap).

Frequently Asked Questions

What's a good expectancy for a trading strategy?

Any reliably positive net-of-cost expectancy is tradeable — the magnitude matters less than the reliability and the drawdown it comes with. As a feel: +0.2R to +0.5R per trade is a solid discretionary result over 200+ trades. Be suspicious of much higher figures on small samples; they usually shrink with more data.

Is a high win rate good or bad?

Neither, on its own — it's only meaningful next to reward-to-risk. A 70% win rate at 0.4:1 R:R loses money; a 35% win rate at 3:1 prints. High win rates also tend to pair with occasional large losers, so always check the loss distribution and drawdown rather than celebrating the percentage.

How do I calculate max drawdown from a trade list?

Build a running cumulative-equity column, then a running-maximum column. Drawdown at each trade is running equity minus running max (zero or negative); max drawdown is the most negative value. Express it as a percentage of the peak for comparability across account sizes and against prop-firm limits.

Which metric matters most for prop firm challenges?

Max drawdown and longest losing streak, because challenges fail on rule breaches, not on weak expectancy. Your worst drawdown must fit comfortably inside the firm's limit at your planned risk per trade, and your worst streak must not breach the daily loss limit — simulate both before paying.

More in Method

All Method →

Practice This in a Free Replay Tool

StrategyTune replays real bid/ask tick data for 70+ instruments in the browser — free, no registration, no downloads. Place simulated trades and see your stats build.

Open StrategyTune