Autonomous skill optimization — your agent mutates skill config, measures P&L, and keeps what works.
Pro & Elite feature. Autoresearch is available on Simmer Pro and Elite plans (Elite includes everything in Pro). Free users get a 403 when calling autoresearch API endpoints.
Package renamed.simmer-autoresearch (npm) has been renamed to simmer-mcp. Update your install command and MCP config — see Install below. The v1 OpenClaw plugin (openclaw plugins install simmer-autoresearch) is unaffected and documented in the Legacy section.
Autoresearch lets your agent optimize its own trading skills. It runs experiments — changing config values, measuring results over real trading cycles, and keeping changes that improve performance. Think of it as automated A/B testing for your trading strategy.
simmer-sdk installed with at least one trading skill running on sim venue
Node.js 18+ (for the MCP server)
Git initialized in your skill workspace (autoresearch uses git for commit/revert)
Start with sim venue. Always run autoresearch against the simulated venue first. Autoresearch mutates your skill’s code and config — running against a real-money venue risks unexpected losses from untested changes.
Then install the behavioral skill (tells your agent how to run the experiment loop):
npx simmer-mcp install-skill
This auto-detects your runtime (OpenClaw, Hermes) and copies the skill instructions to the right directory. For Claude Code, add the skill content to your project’s CLAUDE.md.
Hypothesize — what change might improve the metric?
Mutate — change the skill’s code or config
Run — call run_experiment to execute the skill
Log — call log_experiment to record the result (keep, discard, or crash)
keep auto-commits to git. discard and crash auto-revert the working directory.Use backtest_experiment for fast config exploration (seconds) before committing to live runs (minutes).
Never skip the baseline run. The first experiment establishes the reference point for all comparisons.
Always log — even crashes. Crash data matters for confidence scoring and crash detection.
Check confidence scores. ≥2× noise floor = improvement is likely real. under 1× = within noise. 1-2× = marginal, re-run to confirm.
Code mutations beat config tuning. Structural changes (new data sources, different models, alternative strategies) find bigger wins than parameter sweaks.
Keep ideas in autoresearch.ideas.md. Promising but deferred optimizations go here.
Review the autoresearch git branch. Experiments that were keep-ed are committed with result metadata in the commit message. Merge the branch (or cherry-pick specific experiments) into your main skill branch to lock in the improvements.
Replay historical trades against new config thresholds without live execution. Returns simulated P&L in seconds — use this for fast config tuning before committing to live experiments.
Backtest requires trades with signal_data. Skills must pass structured signal data on client.trade() calls (SDK 0.9.17+). All official Simmer skills include signal_data as of March 2026.
Parameter
Required
Description
skill_slug
Yes
Skill to backtest
config
Yes
Config overrides to test (e.g., {"min_edge": 0.05})
days
No
Days of history to replay (default: 7, max: 30)
venue
No
sim or polymarket (default: sim)
Config threshold convention:
min_edge: 0.05 → only include trades where signal_data.edge >= 0.05
max_probability: 0.85 → only include trades where signal_data.probability <= 0.85
Bare keys (e.g., edge: 0.10) → treated as min threshold
Skills can include structured signal data on each trade to enable backtest replay. This is optional — trades work fine without it — but required for the backtest_experiment tool.
Additional skill-specific fields are freeform. Values must be strings or numbers (flat dict, no nesting).Signal data is private — only visible to the trade owner via authenticated API calls. Never exposed publicly.
Your agent manages its own session state using the SKILL.md behavioral instructions (installed via npx simmer-mcp install-skill). There is no /autoresearch command interface — the agent drives the loop autonomously.
Resume: The agent reads autoresearch.jsonl on startup and resumes where it left off.
New session: Call init_experiment with a new name to start a fresh segment (previous results are archived, not deleted).
Context compaction: If the agent’s context resets, it should re-read autoresearch.md and autoresearch.jsonl to restore state.
Experiments are capped at AUTORESEARCH_MAX_EXPERIMENTS (default 50) per session. At 80% of the cap, your agent gets a warning. At the limit, run_experiment is blocked.Set AUTORESEARCH_MAX_EXPERIMENTS=0 to disable the cap (not recommended for unattended agents).
The server cross-checks self-reported P&L metrics against the Simmer API. If the agent-reported metric diverges significantly from actual trade data, a warning is logged. This prevents metric gaming — the agent can’t inflate results by changing how metrics are calculated.
Current skill, experiment count, keep rate, budget remaining, pause state
/autoresearch reset
Clear state and start fresh (clears pause if paused)
Upgrade to v2 — Install simmer-mcp via npm (npm install -g simmer-mcp) and switch to the MCP config above. v2 works with OpenClaw, Hermes, and Claude Code.