Self-serve window download available in
simmer-sdk >= 0.19.0. Backtesting
ships as an optional extra — it pulls a few heavier dependencies (duckdb,
fastapi, uvicorn) that most SDK users don’t need.Install
simmer command:
Try it offline
The SDK bundles a tiny demo slice, so you can run a complete backtest with no data download and no network:Backtest your own skill
Point the CLI at a skill bundle and a window — the historical tape is fetched for you and cached, no data hunting required:~/.simmer/tapes/; repeat runs of the same window are
instant. The fetch needs your SIMMER_API_KEY (set it in the environment, the
same key you use to trade) — there’s no separate signup.
The engine runs your unmodified skill once per tick as a subprocess against
a frozen, look-ahead-safe replay server — the same wire shapes as production, so
anything that calls /api/sdk/* can be backtested. State files the skill writes
(daily-spend counters, etc.) are sandboxed in a temp copy.
| Flag | Meaning |
|---|---|
bundle | Path to the skill bundle directory (positional). |
--entrypoint | Script filename inside the bundle to run each tick. |
--t0 / --t1 | Window bounds (ISO, e.g. 2026-03-01). Required (or use --window). |
--window | Window duration to fetch, e.g. 30d / 12h — alternative to --t0/--t1. |
--max-markets | Cap on markets in a fetched slice (default 300, max 1000). |
--min-volume | Minimum market volume to include (default 1000). |
--cadence | Tick spacing: 15m / 12h / 30d / minutes (default 15m). |
--balance | Starting balance (default 1000). |
--tape | Use a local tape slice instead of fetching (BYO — see Getting a tape). |
--args | Entrypoint CLI args, space-separated (default --live --quiet). |
--out | Write the full report JSON here. |
--demo | Run the bundled offline demo (no key, no tape, no network). |
Programmatic API
Reading the report
The report (stdout summary + full JSON via--out) includes:
summary— pnl, hit rate, max drawdown, trades, decisions, settlements, ticks.baselines— the same entries/notionals under buy-and-hold-YES and a seeded random side rule, so you can tell skill from luck.decisions/fills/equity_curve— the full per-tick trace.realism_gaps— what the model does not capture (see below).reproducibility.config_hash— a deterministic hash of the run inputs. Same(bundle, tape, window, cadence, args)→ sameconfig_hash→ identical results.
What backtests do and don’t model
A run is only trustworthy if it’s clean —bundle.clean == true means the
skill executed successfully on every tick. A run with failed ticks under-reports
the strategy (the skill didn’t actually run on those ticks) and the CLI exits
non-zero.
Getting a tape
Most users don’t need to — pass--t0/--t1 (or --window) and the slice is
fetched and cached automatically (see above).
Data coverage currently ends ~2026-05-05. Pick a window inside that range;
a window starting after it returns an error. (The dataset is a snapshot of
public on-chain Polymarket history; a freshness updater is planned.)
--tape). If you’d rather supply your own data — a
different window, your own source, or to work fully offline — point --tape at a
local directory containing markets.parquet + quant.parquet. The public,
MIT-licensed dataset and the toolkit to regenerate it live at
SII-WANGZJ/Polymarket_data;
--tape lets power users slice their own and skip the hosted fetch entirely.
