Skip to main content
Pro feature. Autoresearch requires a Simmer Pro plan. Free users get a 403 when calling autoresearch API endpoints.
Autoresearch lets your agent optimize its own trading skills. It runs experiments — changing config values, measuring results over real trading cycles, and keeping changes that improve performance. Think of it as automated A/B testing for your trading strategy.

How it works

init_experiment → run_experiment (N cycles) → log_experiment → repeat
  1. Init — Pick a skill and a metric (e.g., P&L, edge %, trade count)
  2. Run — Execute the skill with the new config for several trading cycles
  3. Log — Record results and decide: keep or revert. Keeps auto-commit to git.
  4. Backtest — Replay historical trades against new config thresholds (fast config tuning)
  5. Repeat — Try the next hypothesis
Your agent drives the loop — autoresearch provides the tools, your agent provides the reasoning.

Install

npm install -g simmer-autoresearch
Then add the MCP server to your agent’s config:
{
  "mcpServers": {
    "simmer-autoresearch": {
      "command": "simmer-autoresearch",
      "env": {
        "SIMMER_API_KEY": "your-api-key"
      }
    }
  }
}

Config

Configure autoresearch via environment variables:
VariableDefaultDescription
SIMMER_API_KEYRequired. Your Simmer API key.
SIMMER_API_URLhttps://api.simmer.marketsAPI base URL. Override for self-hosted.
AUTORESEARCH_MAX_EXPERIMENTS50Max experiments per session. Prevents runaway loops. 0 = unlimited.

Tools

The MCP server registers four tools your agent can call:

init_experiment

Configure an experiment session. Call again to start a new segment with a fresh baseline.
ParameterRequiredDescription
nameYesHuman-readable session name
skill_slugYesClawHub slug of the skill to optimize (e.g., polymarket-fast-loop)
metric_nameYesPrimary metric to track (e.g., pnl, avg_edge)
metric_unitNoUnit label (e.g., $SIM, %)
directionNohigher or lower — which direction is better (default: higher)

run_experiment

Execute a command (usually the skill), capture output and timing.
ParameterRequiredDescription
commandYesShell command to run (e.g., python skills/polymarket-fast-loop/fastloop_trader.py)
timeoutNoTimeout in seconds (default: 300)

log_experiment

Record experiment results. keep auto-commits to git. discard/crash reverts working directory.
ParameterRequiredDescription
statusYeskeep, discard, or crash
metricYesPrimary metric value (number)
descriptionYesWhat was tried and what happened
secondary_metricsNoAdditional metrics as key-value dict

backtest_experiment

Replay historical trades against new config thresholds without live execution. Returns simulated P&L in seconds — use this for fast config tuning before committing to live experiments.
Backtest requires trades with signal_data. Skills must pass structured signal data on client.trade() calls (SDK 0.9.17+). All official Simmer skills include signal_data as of March 2026.
ParameterRequiredDescription
skill_slugYesSkill to backtest
configYesConfig overrides to test (e.g., {"min_edge": 0.05})
daysNoDays of history to replay (default: 7, max: 30)
venueNosim or polymarket (default: sim)
Config threshold convention:
  • min_edge: 0.05 → only include trades where signal_data.edge >= 0.05
  • max_probability: 0.85 → only include trades where signal_data.probability <= 0.85
  • Bare keys (e.g., edge: 0.10) → treated as min threshold

Signal Data

Skills can include structured signal data on each trade to enable backtest replay. This is optional — trades work fine without it — but required for the backtest_experiment tool.
result = client.trade(
    market_id, "yes", 10.0,
    reasoning="NOAA forecasts 35°F, bucket underpriced at 12%",
    signal_data={
        "edge": 0.15,
        "confidence": 0.8,
        "signal_source": "noaa_forecast",
        "forecast_temp": 35,
        "bucket_range": "30-39",
    },
    skill_slug="polymarket-weather-trader",
)
Common fields (recommended for all skills):
FieldTypeDescription
edgefloatPerceived edge over market price
confidencefloat 0-1Agent confidence in the trade
signal_sourcestringWhat triggered the signal
Additional skill-specific fields are freeform. Values must be strings or numbers (flat dict, no nesting). Signal data is private — only visible to the trade owner via authenticated API calls. Never exposed publicly.

Session management

v2 uses SKILL.md behavioral instructions instead of CLI commands. Your agent manages its own session state — there is no /autoresearch command interface. Include the autoresearch SKILL.md in your agent’s context to wire up the research loop behavior. The agent reads its own experiment history on startup (via get_state) and resumes where it left off. To reset, call init_experiment with a new session name.

Safety features

Crash protection

  • Baseline crash — If the very first experiment in a session crashes, autoresearch pauses automatically. This usually means the skill is misconfigured.
  • Consecutive crashes — 3 crashes in a row triggers auto-pause. Your agent can’t run more experiments until the issue is investigated.
  • Recovery — Call init_experiment with a new session name to clear the pause and start fresh.

Budget caps

Experiments are capped at AUTORESEARCH_MAX_EXPERIMENTS (default 50) per session. At 80% of the cap, your agent gets a warning. At the limit, run_experiment is blocked. Set AUTORESEARCH_MAX_EXPERIMENTS=0 to disable the cap (not recommended for unattended agents).

Metric verification

The server cross-checks self-reported P&L metrics against the Simmer API. If the agent-reported metric diverges significantly from actual trade data, a warning is logged. This prevents metric gaming — the agent can’t inflate results by changing how metrics are calculated.

Experiment persistence

Results are saved in two places:
  • Local JSONLautoresearch.jsonl in your working directory for offline access
  • Dashboard API — Synced to your Simmer dashboard (Pro users see an Autoresearch tab)
Git auto-commits on keep decisions so you can track what changed and roll back if needed.

API endpoints

These endpoints power the server’s sync. You don’t call them directly — the MCP server handles it.
EndpointDescription
POST /api/sdk/autoresearch/experimentsSync experiment results
GET /api/sdk/autoresearch/experimentsList experiment history
GET /api/sdk/autoresearch/stateResume state for server startup
POST /api/sdk/autoresearch/backtestReplay trades against new config
GET /api/sdk/outcomesTrade outcome summary (metric verification)

Legacy (v1 Plugin)

v1 was an OpenClaw plugin, not an MCP server. If you’re still running v1:
openclaw plugins install simmer-autoresearch
Configure via plugins.json:
{
  "simmer-autoresearch": {
    "apiKey": "your-api-key",
    "maxExperiments": 30
  }
}
v1 supports the /autoresearch command interface:
CommandDescription
/autoresearch <skill>Start or resume autoresearch mode for a skill
/autoresearch offStop autoresearch mode
/autoresearch statusCurrent skill, experiment count, keep rate, budget remaining, pause state
/autoresearch resetClear state and start fresh (clears pause if paused)
Upgrade to v2 — Install simmer-autoresearch via npm and switch to the MCP config above. v2 works with OpenClaw, Hermes, and Claude Code.