12 Manual Backtesting Mistakes That Invalidate Results
Last updated: 2026-06-11
In short
Twelve mistakes account for nearly every untrustworthy manual backtest. The big four: scrolling instead of replaying (hindsight bias), cherry-picking setups, ignoring costs, and tiny samples. Each mistake below has a one-line fix — the full list doubles as a pre-flight checklist.
The Twelve
1. Testing on a scrolled chart. With the future visible, every read is contaminated — hindsight bias isn’t a willpower problem, it’s perception. Fix: hidden-future replay, always.
2. Unwritten rules. “I know my setup when I see it” guarantees the setup quietly changes to fit results. Fix: if-then rules on paper before the first bar.
3. Cherry-picking. Skipping valid-but-ugly setups inflates every statistic. Fix: if it meets the written conditions, it’s a trade.
4. Changing rules mid-test. Improvement ideas applied on the fly blend two strategies into one meaningless log. Fix: parking list now, new test later.
5. Ignoring costs. Gross-pip results overstate reality by 40–70% for intraday styles — the worked audit shows the arithmetic. Fix: charge spread, commission, swap before believing anything.
6. Stopping at 20–30 trades. Small samples are luck in a lab coat (how many trades you need). Fix: 100 minimum, 200 preferred, analyze once at the end.
7. Candle data under tight stops. Same-bar stop/target conflicts get resolved by optimistic guessing. Fix: tick-level replay when trade geometry fits inside single bars.
8. One regime. A trend strategy tested in a trend flatters itself. Fix: windows spanning trending, ranging and news regimes, segmented results.
9. Wrong clock. Time-based rules on an unverified server offset fire an hour off for half the year. Fix: the Sunday-candle test before any session rule.
10. Optimistic fills. Stops filled at the level through gaps, limits filled on a touch — both systematically wrong in your favor. Fix: pessimistic fill rules — gap-side stops, pierced-not-touched limits.
11. No journal. Without per-trade records there’s no expectancy, no streaks, no segmentation — just a feeling. Fix: the minimal journal, filled in immediately.
12. Skipping the forward test. Replay can’t measure real-time you. Fix: 2–4 weeks of demo between backtest and money.
The Pattern Behind All Twelve
Every entry on the list is a way of making the test easier to pass. That’s the tell worth internalizing: any methodological shortcut that feels convenient almost certainly biases results upward, because the convenient direction and the optimistic direction are the same direction. A valid backtest is adversarial by design — you’re trying to fail the strategy and being unable to. (Replay tooling automates several of the fixes — hidden futures, real bid/ask fills, automatic logs; see the comparison — but no tool enforces rules 2, 3, 4 or 6. Those stay yours.)
Frequently Asked Questions
Which single mistake causes the most damage?
Cost-blindness (mistake 5), by expected damage: it's near-universal, it biases results 40-70% upward for intraday styles, and it survives otherwise-careful testing. Hindsight bias is more fundamental, but replay tooling has made it easy to avoid; nothing automatically charges your swap.
I've already run a backtest with several of these mistakes. Is it salvageable?
Partially: cost mistakes are repairable after the fact (re-run the journal through the cost audit), and fill-rule mistakes can be approximated by re-resolving contested trades pessimistically. Hindsight contamination and cherry-picking are not repairable — those require retesting, ideally on a different date window than the one you've now seen.
Is there a mistake list specific to prop-firm preparation?
The twelve all apply, plus one: testing the strategy without simulating the firm's rules — daily loss limits, trailing drawdown, consistency — which is where most challenges actually fail. The prop-firms guide covers that simulation layer.
How do I use this as a checklist?
Before the test: rules written (2), window regime-diverse (8), clock verified (9), granularity matched (7). During: replay only (1), every setup (3), no edits (4), journal live (11), pessimistic fills (10). After: costs (5), sample target met (6), forward test scheduled (12).
More in Guide
All Guide →What Is Manual Backtesting? (vs Automated)
Definition, how it differs from coded backtests, and when each fits.
How to Manually Backtest a Strategy, Step by Step
The full workflow from written rules to analyzed results.
Candle-Stepping vs Tick Replay: What Your Test Can’t See
Why replay granularity changes your results.
How Many Trades Makes a Valid Backtest?
30? 100? 200? What sample size actually buys you.
Forward Testing After the Backtest
The bridge between backtest and live.
Manual Backtesting in Excel: When It Works
The spreadsheet method, honestly assessed.
Practice This in a Free Replay Tool
StrategyTune replays real bid/ask tick data for 70+ instruments in the browser — free, no registration, no downloads. Place simulated trades and see your stats build.
Open StrategyTune