Autoresearch

Pro feature. Autoresearch requires a Simmer Pro plan. Free users get a 403 when calling autoresearch API endpoints.

Autoresearch lets your agent optimize its own trading skills. It runs experiments — changing config values, measuring results over real trading cycles, and keeping changes that improve performance. Think of it as automated A/B testing for your trading strategy.

How it works

init_experiment → run_experiment (N cycles) → log_experiment → repeat

Init — Pick a skill and a metric (e.g., P&L, edge %, trade count)
Run — Execute the skill with the new config for several trading cycles
Log — Record results and decide: keep or revert. Keeps auto-commit to git.
Backtest — Replay historical trades against new config thresholds (fast config tuning)
Repeat — Try the next hypothesis

Your agent drives the loop — autoresearch provides the tools, your agent provides the reasoning.

Install

npm install -g simmer-autoresearch

Then add the MCP server to your agent’s config:

{
  "mcpServers": {
    "simmer-autoresearch": {
      "command": "simmer-autoresearch",
      "env": {
        "SIMMER_API_KEY": "your-api-key"
      }
    }
  }
}

Config

Configure autoresearch via environment variables:

Variable	Default	Description
`SIMMER_API_KEY`	—	Required. Your Simmer API key.
`SIMMER_API_URL`	`https://api.simmer.markets`	API base URL. Override for self-hosted.
`AUTORESEARCH_MAX_EXPERIMENTS`	`50`	Max experiments per session. Prevents runaway loops. `0` = unlimited.

Tools

The MCP server registers four tools your agent can call:

`init_experiment`

Configure an experiment session. Call again to start a new segment with a fresh baseline.

Parameter	Required	Description
`name`	Yes	Human-readable session name
`skill_slug`	Yes	ClawHub slug of the skill to optimize (e.g., `polymarket-fast-loop`)
`metric_name`	Yes	Primary metric to track (e.g., `pnl`, `avg_edge`)
`metric_unit`	No	Unit label (e.g., `$SIM`, `%`)
`direction`	No	`higher` or `lower` — which direction is better (default: `higher`)

`run_experiment`

Execute a command (usually the skill), capture output and timing.

Parameter	Required	Description
`command`	Yes	Shell command to run (e.g., `python skills/polymarket-fast-loop/fastloop_trader.py`)
`timeout`	No	Timeout in seconds (default: 300)

`log_experiment`

Record experiment results. keep auto-commits to git. discard/crash reverts working directory.

Parameter	Required	Description
`status`	Yes	`keep`, `discard`, or `crash`
`metric`	Yes	Primary metric value (number)
`description`	Yes	What was tried and what happened
`secondary_metrics`	No	Additional metrics as key-value dict

`backtest_experiment`

Replay historical trades against new config thresholds without live execution. Returns simulated P&L in seconds — use this for fast config tuning before committing to live experiments.

Backtest requires trades with signal_data. Skills must pass structured signal data on client.trade() calls (SDK 0.9.17+). All official Simmer skills include signal_data as of March 2026.

Parameter	Required	Description
`skill_slug`	Yes	Skill to backtest
`config`	Yes	Config overrides to test (e.g., `{"min_edge": 0.05}`)
`days`	No	Days of history to replay (default: 7, max: 30)
`venue`	No	`sim` or `polymarket` (default: `sim`)

Config threshold convention:

min_edge: 0.05 → only include trades where signal_data.edge >= 0.05
max_probability: 0.85 → only include trades where signal_data.probability <= 0.85
Bare keys (e.g., edge: 0.10) → treated as min threshold

Signal Data

Skills can include structured signal data on each trade to enable backtest replay. This is optional — trades work fine without it — but required for the backtest_experiment tool.

result = client.trade(
    market_id, "yes", 10.0,
    reasoning="NOAA forecasts 35°F, bucket underpriced at 12%",
    signal_data={
        "edge": 0.15,
        "confidence": 0.8,
        "signal_source": "noaa_forecast",
        "forecast_temp": 35,
        "bucket_range": "30-39",
    },
    skill_slug="polymarket-weather-trader",
)

Common fields (recommended for all skills):

Field	Type	Description
`edge`	float	Perceived edge over market price
`confidence`	float 0-1	Agent confidence in the trade
`signal_source`	string	What triggered the signal

Additional skill-specific fields are freeform. Values must be strings or numbers (flat dict, no nesting). Signal data is private — only visible to the trade owner via authenticated API calls. Never exposed publicly.

Session management

v2 uses SKILL.md behavioral instructions instead of CLI commands. Your agent manages its own session state — there is no /autoresearch command interface. Include the autoresearch SKILL.md in your agent’s context to wire up the research loop behavior. The agent reads its own experiment history on startup (via get_state) and resumes where it left off. To reset, call init_experiment with a new session name.

Safety features

Crash protection

Baseline crash — If the very first experiment in a session crashes, autoresearch pauses automatically. This usually means the skill is misconfigured.
Consecutive crashes — 3 crashes in a row triggers auto-pause. Your agent can’t run more experiments until the issue is investigated.
Recovery — Call init_experiment with a new session name to clear the pause and start fresh.

Budget caps

Experiments are capped at AUTORESEARCH_MAX_EXPERIMENTS (default 50) per session. At 80% of the cap, your agent gets a warning. At the limit, run_experiment is blocked. Set AUTORESEARCH_MAX_EXPERIMENTS=0 to disable the cap (not recommended for unattended agents).

Metric verification

The server cross-checks self-reported P&L metrics against the Simmer API. If the agent-reported metric diverges significantly from actual trade data, a warning is logged. This prevents metric gaming — the agent can’t inflate results by changing how metrics are calculated.

Experiment persistence

Results are saved in two places:

Local JSONL — autoresearch.jsonl in your working directory for offline access
Dashboard API — Synced to your Simmer dashboard (Pro users see an Autoresearch tab)

Git auto-commits on keep decisions so you can track what changed and roll back if needed.

API endpoints

These endpoints power the server’s sync. You don’t call them directly — the MCP server handles it.

Endpoint	Description
`POST /api/sdk/autoresearch/experiments`	Sync experiment results
`GET /api/sdk/autoresearch/experiments`	List experiment history
`GET /api/sdk/autoresearch/state`	Resume state for server startup
`POST /api/sdk/autoresearch/backtest`	Replay trades against new config
`GET /api/sdk/outcomes`	Trade outcome summary (metric verification)

Legacy (v1 Plugin)

Legacy (v1 Plugin) — OpenClaw only

v1 was an OpenClaw plugin, not an MCP server. If you’re still running v1:

openclaw plugins install simmer-autoresearch

Configure via plugins.json:

{
  "simmer-autoresearch": {
    "apiKey": "your-api-key",
    "maxExperiments": 30
  }
}

v1 supports the /autoresearch command interface:

Command	Description
`/autoresearch <skill>`	Start or resume autoresearch mode for a skill
`/autoresearch off`	Stop autoresearch mode
`/autoresearch status`	Current skill, experiment count, keep rate, budget remaining, pause state
`/autoresearch reset`	Clear state and start fresh (clears pause if paused)

Upgrade to v2 — Install simmer-autoresearch via npm and switch to the MCP config above. v2 works with OpenClaw, Hermes, and Claude Code.

Introduction

Core Concepts

Guides

Pro

Reference

How it works

Install

Config

Tools

`init_experiment`

`run_experiment`

`log_experiment`

`backtest_experiment`

Signal Data

Session management

Safety features

Crash protection

Budget caps

Metric verification

Experiment persistence

API endpoints

Legacy (v1 Plugin)

Introduction

Core Concepts

Guides

Pro

Reference

​How it works

​Install

​Config

​Tools

​init_experiment

​run_experiment

​log_experiment

​backtest_experiment

​Signal Data

​Session management

​Safety features

​Crash protection

​Budget caps

​Metric verification

​Experiment persistence

​API endpoints

​Legacy (v1 Plugin)

How it works

Install

Config

Tools

`init_experiment`

`run_experiment`

`log_experiment`

`backtest_experiment`

Signal Data

Session management

Safety features

Crash protection

Budget caps

Metric verification

Experiment persistence

API endpoints

Legacy (v1 Plugin)