sim, dry-run, and paper-trade modes are all live-forward — they test a
strategy against today’s prices going forward. Backtesting is the missing
historical leg: replay your skill against past prediction-market data to see how
it would have performed before you commit anything. It’s the first rung on the
graduation ladder:
Self-serve backtesting needs
simmer-sdk >= 0.19.0 and the [backtest] extra.
Data currently covers Nov 2022 → ~May 5 2026 — pick a window in that range.1. Install and try the demo
The engine ships as an optional extra; a bundled demo lets you see a full run with zero setup or network.2. Backtest your own skill
Give the CLI your skill bundle and a window — the historical tape is fetched and cached for you, no data hunting.The first run for a window fetches a small slice (tens of MB) from Simmer’s tape
service and caches it under
~/.simmer/tapes/; repeat runs of the same window are
instant. The fetch uses your SIMMER_API_KEY — no separate signup. Prefer your own
data? Pass --tape <dir> with a local markets.parquet + quant.parquet
(details)./api/sdk/* backtests without code changes. The replay clock never serves data
dated after the current tick, so a skill can’t accidentally “see the future.”
3. Read the report — skill vs. luck
The summary prints to stdout;--out writes the full JSON. The numbers that matter:
pnl/hit_rate/max_drawdown— did it make money, how often was it right, how deep was the worst drawdown.baselines— the same entries and notionals run under buy-and-hold-YES and a seeded random side rule. This is the most important line. If your skill doesn’t clearly beat both baselines, you’re looking at luck or beta, not edge.realism_gaps— what the model does not capture (see below).reproducibility.config_hash— a deterministic hash of the run inputs. Same(bundle, window, cadence, args)→ same hash → identical results.
4. Iterate
Backtesting is a tight loop: change a threshold, re-run, compare. Because theconfig_hash changes whenever the bundle or inputs change, you can tell a real
improvement from a re-run of the same thing. A few honest practices:
- Beat the baselines by a margin, not a hair. Real venues have 1–5% spreads plus fees — a backtest that edges buy-and-hold by 1% is a loss live.
- Vary the window. A strategy that only works on one month is overfit. Run a few windows across different regimes.
- Watch
--cadence. Too coarse and you miss entries; too fine and you over-trade. Match it to how often your skill actually decides.
5. Graduate
A backtest is a filter for bad ideas, not a promise of live P&L. Once a skill beats its baselines across windows, walk it up the ladder:Paper trade in $SIM
Run live-forward against real prices with virtual currency —
venue="sim". Confirm
the live behavior matches what the backtest implied.Real prices, no money
venue="polymarket", live=False — real prices with spread modeled, still no USDC at
risk.Go live
venue="polymarket" (or kalshi) with safety rails on. See the Trading Guide.What backtests do and don’t model
Next steps
Backtesting reference
Every flag, the programmatic
run_backtest() API, and the full report schema.Building Skills
Build the skill you want to backtest.
Trading Guide
The live-forward workflow you graduate into.
Risk Management
Stops, caps, and monitors for when you go live.
