Skip to main content
Pro & Elite feature. Autoresearch is available on Simmer Pro and Elite plans (Elite includes everything in Pro). Free users get a 403 when calling autoresearch API endpoints.
Package renamed. simmer-autoresearch (npm) has been renamed to simmer-mcp. Update your install command and MCP config — see Install below. The v1 OpenClaw plugin (openclaw plugins install simmer-autoresearch) is unaffected and documented in the Legacy section.
Autoresearch lets your agent optimize its own trading skills. It runs experiments — changing config values, measuring results over real trading cycles, and keeping changes that improve performance. Think of it as automated A/B testing for your trading strategy.

Prerequisites

  • Simmer Pro plan with a valid SIMMER_API_KEY
  • simmer-sdk installed with at least one trading skill running on sim venue
  • Node.js 18+ (for the MCP server)
  • Git initialized in your skill workspace (autoresearch uses git for commit/revert)
Start with sim venue. Always run autoresearch against the simulated venue first. Autoresearch mutates your skill’s code and config — running against a real-money venue risks unexpected losses from untested changes.

How it works

init_experiment → run_experiment (N cycles) → log_experiment → repeat
  1. Init — Pick a skill and a metric (e.g., P&L, edge %, trade count)
  2. Run — Execute the skill with the new config for several trading cycles
  3. Log — Record results and decide: keep or revert. Keeps auto-commit to git.
  4. Backtest — Replay historical trades against new config thresholds (fast config tuning)
  5. Repeat — Try the next hypothesis
Your agent drives the loop — autoresearch provides the tools, your agent provides the reasoning.

Install

npm install -g simmer-mcp
Then add the MCP server to your agent’s config:
{
  "mcpServers": {
    "simmer": {
      "command": "npx",
      "args": ["-y", "simmer-mcp"],
      "env": {
        "SIMMER_API_KEY": "your-api-key"
      }
    }
  }
}
Then install the behavioral skill (tells your agent how to run the experiment loop):
npx simmer-mcp install-skill
This auto-detects your runtime (OpenClaw, Hermes) and copies the skill instructions to the right directory. For Claude Code, add the skill content to your project’s CLAUDE.md.

Config

Configure autoresearch via environment variables:
VariableDefaultDescription
SIMMER_API_KEYRequired. Your Simmer API key.
SIMMER_API_URLhttps://api.simmer.marketsAPI base URL. Override for self-hosted.
AUTORESEARCH_MAX_EXPERIMENTS50Max experiments per session. Prevents runaway loops. 0 = unlimited.

Running Autoresearch

Setup (once per optimization target)

  1. Pick a skill to optimize and a primary metric (usually P&L)
  2. Create a git branch: git checkout -b autoresearch/<skill>-<date>
  3. Read the skill source code thoroughly — understand what it does before mutating
  4. Write autoresearch.md — a session spec describing the goal, metrics, how to run, and constraints
  5. Write autoresearch.sh — a single command that runs the skill for one cycle
  6. Commit both files
  7. Call init_experiment → run the baseline with run_experimentlog_experiment → start looping

The experiment loop

Each iteration:
  1. Hypothesize — what change might improve the metric?
  2. Mutate — change the skill’s code or config
  3. Run — call run_experiment to execute the skill
  4. Log — call log_experiment to record the result (keep, discard, or crash)
keep auto-commits to git. discard and crash auto-revert the working directory. Use backtest_experiment for fast config exploration (seconds) before committing to live runs (minutes).

Key rules

  • Never skip the baseline run. The first experiment establishes the reference point for all comparisons.
  • Always log — even crashes. Crash data matters for confidence scoring and crash detection.
  • Check confidence scores. ≥2× noise floor = improvement is likely real. under 1× = within noise. 1-2× = marginal, re-run to confirm.
  • Code mutations beat config tuning. Structural changes (new data sources, different models, alternative strategies) find bigger wins than parameter sweaks.
  • Keep ideas in autoresearch.ideas.md. Promising but deferred optimizations go here.

When you’re done

Review the autoresearch git branch. Experiments that were keep-ed are committed with result metadata in the commit message. Merge the branch (or cherry-pick specific experiments) into your main skill branch to lock in the improvements.

Tools

The MCP server registers four tools your agent can call:

init_experiment

Configure an experiment session. Call again to start a new segment with a fresh baseline.
ParameterRequiredDescription
nameYesHuman-readable session name
skill_slugYesClawHub slug of the skill to optimize (e.g., polymarket-fast-loop)
metric_nameYesPrimary metric to track (e.g., pnl, avg_edge)
metric_unitNoUnit label (e.g., $SIM, %)
directionNohigher or lower — which direction is better (default: higher)

run_experiment

Execute a command (usually the skill), capture output and timing.
ParameterRequiredDescription
commandYesShell command to run (e.g., python skills/polymarket-fast-loop/fastloop_trader.py)
timeoutNoTimeout in seconds (default: 300)

log_experiment

Record experiment results. keep auto-commits to git. discard/crash reverts working directory.
ParameterRequiredDescription
statusYeskeep, discard, or crash
metricYesPrimary metric value (number)
descriptionYesWhat was tried and what happened
secondary_metricsNoAdditional metrics as key-value dict

backtest_experiment

Replay historical trades against new config thresholds without live execution. Returns simulated P&L in seconds — use this for fast config tuning before committing to live experiments.
Backtest requires trades with signal_data. Skills must pass structured signal data on client.trade() calls (SDK 0.9.17+). All official Simmer skills include signal_data as of March 2026.
ParameterRequiredDescription
skill_slugYesSkill to backtest
configYesConfig overrides to test (e.g., {"min_edge": 0.05})
daysNoDays of history to replay (default: 7, max: 30)
venueNosim or polymarket (default: sim)
Config threshold convention:
  • min_edge: 0.05 → only include trades where signal_data.edge >= 0.05
  • max_probability: 0.85 → only include trades where signal_data.probability <= 0.85
  • Bare keys (e.g., edge: 0.10) → treated as min threshold

Signal Data

Skills can include structured signal data on each trade to enable backtest replay. This is optional — trades work fine without it — but required for the backtest_experiment tool.
result = client.trade(
    market_id, "yes", 10.0,
    reasoning="NOAA forecasts 35°F, bucket underpriced at 12%",
    signal_data={
        "edge": 0.15,
        "confidence": 0.8,
        "signal_source": "noaa_forecast",
        "forecast_temp": 35,
        "bucket_range": "30-39",
    },
    skill_slug="polymarket-weather-trader",
)
Common fields (recommended for all skills):
FieldTypeDescription
edgefloatPerceived edge over market price
confidencefloat 0-1Agent confidence in the trade
signal_sourcestringWhat triggered the signal
Additional skill-specific fields are freeform. Values must be strings or numbers (flat dict, no nesting). Signal data is private — only visible to the trade owner via authenticated API calls. Never exposed publicly.

Session management

Your agent manages its own session state using the SKILL.md behavioral instructions (installed via npx simmer-mcp install-skill). There is no /autoresearch command interface — the agent drives the loop autonomously.
  • Resume: The agent reads autoresearch.jsonl on startup and resumes where it left off.
  • New session: Call init_experiment with a new name to start a fresh segment (previous results are archived, not deleted).
  • Context compaction: If the agent’s context resets, it should re-read autoresearch.md and autoresearch.jsonl to restore state.

Safety features

Crash protection

  • Baseline crash — If the very first experiment in a session crashes, autoresearch pauses automatically. This usually means the skill is misconfigured.
  • Consecutive crashes — 3 crashes in a row triggers auto-pause. Your agent can’t run more experiments until the issue is investigated.
  • Recovery — Call init_experiment with a new session name to clear the pause and start fresh.

Budget caps

Experiments are capped at AUTORESEARCH_MAX_EXPERIMENTS (default 50) per session. At 80% of the cap, your agent gets a warning. At the limit, run_experiment is blocked. Set AUTORESEARCH_MAX_EXPERIMENTS=0 to disable the cap (not recommended for unattended agents).

Metric verification

The server cross-checks self-reported P&L metrics against the Simmer API. If the agent-reported metric diverges significantly from actual trade data, a warning is logged. This prevents metric gaming — the agent can’t inflate results by changing how metrics are calculated.

Experiment persistence

Results are saved in two places:
  • Local JSONLautoresearch.jsonl in your working directory for offline access
  • Dashboard API — Synced to your Simmer dashboard (Pro users see an Autoresearch tab)
Git auto-commits on keep decisions so you can track what changed and roll back if needed.

API endpoints

These endpoints power the server’s sync. You don’t call them directly — the MCP server handles it.
EndpointDescription
POST /api/sdk/autoresearch/experimentsSync experiment results
GET /api/sdk/autoresearch/experimentsList experiment history
GET /api/sdk/autoresearch/stateResume state for server startup
POST /api/sdk/autoresearch/backtestReplay trades against new config
GET /api/sdk/outcomesTrade outcome summary (metric verification)

Legacy (v1 Plugin)

v1 was an OpenClaw plugin, not an MCP server. If you’re still running v1:
openclaw plugins install simmer-autoresearch
Configure via plugins.json:
{
  "simmer-autoresearch": {
    "apiKey": "your-api-key",
    "maxExperiments": 30
  }
}
v1 supports the /autoresearch command interface:
CommandDescription
/autoresearch <skill>Start or resume autoresearch mode for a skill
/autoresearch offStop autoresearch mode
/autoresearch statusCurrent skill, experiment count, keep rate, budget remaining, pause state
/autoresearch resetClear state and start fresh (clears pause if paused)
Upgrade to v2 — Install simmer-mcp via npm (npm install -g simmer-mcp) and switch to the MCP config above. v2 works with OpenClaw, Hermes, and Claude Code.