12 Manual Backtesting Mistakes That Invalidate Results

Last updated: 2026-06-11

In short

Twelve mistakes account for nearly every untrustworthy manual backtest. The big four: scrolling instead of replaying (hindsight bias), cherry-picking setups, ignoring costs, and tiny samples. Each mistake below has a one-line fix — the full list doubles as a pre-flight checklist.

The Twelve

1. Testing on a scrolled chart. With the future visible, every read is contaminated — hindsight bias isn’t a willpower problem, it’s perception. Fix: hidden-future replay, always.

2. Unwritten rules. “I know my setup when I see it” guarantees the setup quietly changes to fit results. Fix: if-then rules on paper before the first bar.

3. Cherry-picking. Skipping valid-but-ugly setups inflates every statistic. Fix: if it meets the written conditions, it’s a trade.

4. Changing rules mid-test. Improvement ideas applied on the fly blend two strategies into one meaningless log. Fix: parking list now, new test later.

5. Ignoring costs. Gross-pip results overstate reality by 40–70% for intraday styles — the worked audit shows the arithmetic. Fix: charge spread, commission, swap before believing anything.

6. Stopping at 20–30 trades. Small samples are luck in a lab coat (how many trades you need). Fix: 100 minimum, 200 preferred, analyze once at the end.

7. Candle data under tight stops. Same-bar stop/target conflicts get resolved by optimistic guessing. Fix: tick-level replay when trade geometry fits inside single bars.

8. One regime. A trend strategy tested in a trend flatters itself. Fix: windows spanning trending, ranging and news regimes, segmented results.

9. Wrong clock. Time-based rules on an unverified server offset fire an hour off for half the year. Fix: the Sunday-candle test before any session rule.

10. Optimistic fills. Stops filled at the level through gaps, limits filled on a touch — both systematically wrong in your favor. Fix: pessimistic fill rules — gap-side stops, pierced-not-touched limits.

11. No journal. Without per-trade records there’s no expectancy, no streaks, no segmentation — just a feeling. Fix: the minimal journal, filled in immediately.

12. Skipping the forward test. Replay can’t measure real-time you. Fix: 2–4 weeks of demo between backtest and money.

The Pattern Behind All Twelve

Every entry on the list is a way of making the test easier to pass. That’s the tell worth internalizing: any methodological shortcut that feels convenient almost certainly biases results upward, because the convenient direction and the optimistic direction are the same direction. A valid backtest is adversarial by design — you’re trying to fail the strategy and being unable to. (Replay tooling automates several of the fixes — hidden futures, real bid/ask fills, automatic logs; see the comparison — but no tool enforces rules 2, 3, 4 or 6. Those stay yours.)

Frequently Asked Questions

Which single mistake causes the most damage?

Cost-blindness (mistake 5), by expected damage: it's near-universal, it biases results 40-70% upward for intraday styles, and it survives otherwise-careful testing. Hindsight bias is more fundamental, but replay tooling has made it easy to avoid; nothing automatically charges your swap.

I've already run a backtest with several of these mistakes. Is it salvageable?

Partially: cost mistakes are repairable after the fact (re-run the journal through the cost audit), and fill-rule mistakes can be approximated by re-resolving contested trades pessimistically. Hindsight contamination and cherry-picking are not repairable — those require retesting, ideally on a different date window than the one you've now seen.

Is there a mistake list specific to prop-firm preparation?

The twelve all apply, plus one: testing the strategy without simulating the firm's rules — daily loss limits, trailing drawdown, consistency — which is where most challenges actually fail. The prop-firms guide covers that simulation layer.

How do I use this as a checklist?

Before the test: rules written (2), window regime-diverse (8), clock verified (9), granularity matched (7). During: replay only (1), every setup (3), no edits (4), journal live (11), pessimistic fills (10). After: costs (5), sample target met (6), forward test scheduled (12).

More in Guide

All Guide →

Practice This in a Free Replay Tool

StrategyTune replays real bid/ask tick data for 70+ instruments in the browser — free, no registration, no downloads. Place simulated trades and see your stats build.

Open StrategyTune