12 Manual Backtesting Mistakes That Invalidate Results

In short

Twelve mistakes account for nearly every untrustworthy manual backtest. The big four: scrolling instead of replaying (hindsight bias), cherry-picking setups, ignoring costs, and tiny samples. Each mistake below has a one-line fix — the full list doubles as a pre-flight checklist.

The Twelve

1. Testing on a scrolled chart. With the future visible, every read is contaminated — hindsight bias isn’t a willpower problem, it’s perception. Fix: hidden-future replay, always.

2. Unwritten rules. “I know my setup when I see it” guarantees the setup quietly changes to fit results. Fix: if-then rules on paper before the first bar.

3. Cherry-picking. Skipping valid-but-ugly setups inflates every statistic. Fix: if it meets the written conditions, it’s a trade.

4. Changing rules mid-test. Improvement ideas applied on the fly blend two strategies into one meaningless log. Fix: parking list now, new test later.

5. Ignoring costs. Gross-pip results overstate reality by 40–70% for intraday styles — the worked audit shows the arithmetic. Fix: charge spread, commission, swap before believing anything.

6. Stopping at 20–30 trades. Small samples are luck in a lab coat (how many trades you need). Fix: 100 minimum, 200 preferred, analyze once at the end.

7. Candle data under tight stops. Same-bar stop/target conflicts get resolved by optimistic guessing. Fix: tick-level replay when trade geometry fits inside single bars.

8. One regime. A trend strategy tested in a trend flatters itself. Fix: windows spanning trending, ranging and news regimes, segmented results.

9. Wrong clock. Time-based rules on an unverified server offset fire an hour off for half the year. Fix: the Sunday-candle test before any session rule.

10. Optimistic fills. Stops filled at the level through gaps, limits filled on a touch — both systematically wrong in your favor. Fix: pessimistic fill rules — gap-side stops, pierced-not-touched limits.

11. No journal. Without per-trade records there’s no expectancy, no streaks, no segmentation — just a feeling. Fix: the minimal journal, filled in immediately.

12. Skipping the forward test. Replay can’t measure real-time you. Fix: 2–4 weeks of demo between backtest and money.

The Pattern Behind All Twelve

Every entry on the list is a way of making the test easier to pass. That’s the tell worth internalizing: any methodological shortcut that feels convenient almost certainly biases results upward, because the convenient direction and the optimistic direction are the same direction. A valid backtest is adversarial by design — you’re trying to fail the strategy and being unable to. (Replay tooling automates several of the fixes — hidden futures, real bid/ask fills, automatic logs; see the comparison — but no tool enforces rules 2, 3, 4 or 6. Those stay yours.)

Frequently Asked Questions

Which single mistake causes the most damage?

Cost-blindness (mistake 5), by expected damage: it's near-universal, it biases results 40-70% upward for intraday styles, and it survives otherwise-careful testing. Hindsight bias is more fundamental, but replay tooling has made it easy to avoid; nothing automatically charges your swap.

I've already run a backtest with several of these mistakes. Is it salvageable?

Partially: cost mistakes are repairable after the fact (re-run the journal through the cost audit), and fill-rule mistakes can be approximated by re-resolving contested trades pessimistically. Hindsight contamination and cherry-picking are not repairable — those require retesting, ideally on a different date window than the one you've now seen.

Is there a mistake list specific to prop-firm preparation?

The twelve all apply, plus one: testing the strategy without simulating the firm's rules — daily loss limits, trailing drawdown, consistency — which is where most challenges actually fail. The prop-firms guide covers that simulation layer.

How do I use this as a checklist?

Before the test: rules written (2), window regime-diverse (8), clock verified (9), granularity matched (7). During: replay only (1), every setup (3), no edits (4), journal live (11), pessimistic fills (10). After: costs (5), sample target met (6), forward test scheduled (12).

12 Manual Backtesting Mistakes That Invalidate Results

The Twelve

The Pattern Behind All Twelve

Frequently Asked Questions

Which single mistake causes the most damage?

I've already run a backtest with several of these mistakes. Is it salvageable?

Is there a mistake list specific to prop-firm preparation?

How do I use this as a checklist?

More in Guide

What Is Manual Backtesting? (vs Automated)

How to Manually Backtest a Strategy, Step by Step

Candle-Stepping vs Tick Replay: What Your Test Can’t See

How Many Trades Makes a Valid Backtest?

Forward Testing After the Backtest

Manual Backtesting in Excel: When It Works

Practice This in a Free Replay Tool