Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Guide

Autoresearch is a loop controller for agents: define a measurable goal, modify one thing, verify mechanically, keep or discard, and repeat.

Core Commands

NeedUse
Improve a metric/autoresearch or $autoresearch
Pick a metric from a vague goal/autoresearch:plan or $autoresearch plan
Find a root cause/autoresearch:debug or $autoresearch debug
Reduce errors to zero/autoresearch:fix or $autoresearch fix
Run a security audit/autoresearch:security or $autoresearch security
Ship through gates/autoresearch:ship or $autoresearch ship
Analyze prior runs/autoresearch:evals or $autoresearch evals

Binary Operations

The agent-facing protocols delegate stateful work to the autoresearch binary:

autoresearch init --verify "cat metric.txt" --direction lower
autoresearch verify --command "cat metric.txt"
autoresearch verify --command "cat metric.txt" --repeat 3 --aggregate median
autoresearch plan --goal "reduce any types" --format json
autoresearch debug --symptom "API returns 500" --scope "src/**/*.rs"
autoresearch fix --target "npx tsc --noEmit" --scope "src/**/*.ts" --category type
autoresearch improve --goal "Improve onboarding activation" --icp "Developer tools teams"
autoresearch prd --title "Improve onboarding" --problem "New users stall before first run"
autoresearch security --scope "src/**/*.rs" --focus auth
autoresearch ship --target "Release v1.2.0" --type code-release --dry-run
autoresearch scenario --target "Checkout flow" --domain web --format test-scenarios --scope "src/checkout/**"
autoresearch predict --proposal "Add cache warming to search results" --scope "src/search/**"
autoresearch predict --proposal "Find product improvements for onboarding" --scope "src/**" --improve
autoresearch reason --question "Should we replace the storage layer" --mode debate --domain software
autoresearch probe --subject "Payment retry workflow" --scope "src/payments/**"
autoresearch probe --subject "Onboarding activation workflow" --scope "src/**" --improve
autoresearch learn --mode summarize --scope "src/**/*.rs"
autoresearch decide --decision auto --metric 4 --commit abc1234 --description "improved"
autoresearch status --summary
autoresearch progress
autoresearch cost --per-iteration-usd 0.25 --format json
autoresearch dashboard --once
autoresearch health --strict
autoresearch env --format json
autoresearch init --environment-summary auto --verify "cat metric.txt" --direction lower
autoresearch checkpoint --format json
autoresearch reanchor --format json
autoresearch watch --lines 20 --format jsonl
autoresearch watch --websocket --websocket-addr 127.0.0.1:8765
autoresearch lessons --add "Prefer fixture-level assertions" --context "reduced flaky tests"
autoresearch search --from-state --provider-command 'exa "$AUTORESEARCH_SEARCH_QUERY"' --log
autoresearch parallel prepare --workers 3
autoresearch parallel run --manifest autoresearch-results/parallel-manifest.json --timeout-seconds 1200
autoresearch parallel template --workers 3 --output autoresearch-results/parallel-workers.json
autoresearch parallel compare --a "simplify parser" --b "cache scan results"
autoresearch parallel closeout --batch-file autoresearch-results/parallel-workers.json --merge-strategy cherry-pick
autoresearch parallel cleanup --manifest autoresearch-results/parallel-manifest.json
autoresearch evals --file autoresearch-results/results.tsv --format json --recommend --plateau-window 5 --target 90 --fail-on goal-not-met --chain ship
autoresearch evals --file autoresearch-results/results.tsv --compare autoresearch-results/previous-results.tsv --format json
autoresearch api --format json
autoresearch mcp serve
autoresearch mcp call --server-command "autoresearch mcp serve" --tool autoresearch_status
autoresearch scope expand --format json
autoresearch workspace exec --command "cargo test" --rollback-on-failure
autoresearch guard-presets --format json
autoresearch lessons --workspace-context --last 5
autoresearch plugin list
autoresearch plugin validate --path .autoresearch/plugins/example.toml
autoresearch plugin marketplace
autoresearch completions zsh > ~/.zfunc/_autoresearch

Use autoresearch runtime run for supervised background Codex sessions and autoresearch runtime status / autoresearch runtime stop for control. Use autoresearch env --format json to capture CPU, disk, container, toolchain, and recommended parallel-worker context before planning long or parallel runs; pass --environment-summary auto to init to persist that probe summary in results.tsv. Use autoresearch status --summary for compact monitor-friendly counters. Use autoresearch progress for the current metric, trend, counters, escalation state, and terminal metric history sparkline. Use autoresearch verify --repeat <n> --aggregate <median|mean|min|max|last> for noisy scalar metrics; repeated verification returns the aggregate metric plus the raw samples. Use autoresearch cost --per-iteration-usd <usd> or token/rate flags to estimate completed, remaining, and projected run spend. Use autoresearch dashboard --once for a combined terminal view of status, trend, metric history, escalation, and recent rows; omit --once for live refresh. Use autoresearch checkpoint --format json inside long loops to run evals only when the active iteration reaches the configured or adaptive checkpoint interval. Use autoresearch reanchor --format json every 10 iterations or after context compaction to print the protocol fingerprint, reload references, and [RE-ANCHOR] logging tag. Use autoresearch watch --format <tsv|jsonl> for human-readable tails or machine-readable JSON Lines. Use autoresearch watch --websocket --websocket-addr <host:port> to serve snapshot and row update payloads to real-time dashboards. Add --once to print the initial WebSocket snapshot envelope without starting a server. Use autoresearch lessons --add <strategy> --context <note> to append reusable lessons without editing lessons.md by hand. Use autoresearch search --from-state with --provider-command or AUTORESEARCH_SEARCH_CMD to run cached, run-aware web searches. Add --log to append a search meta-iteration. When decide escalates to Web Search, it automatically runs the same cached helper with AUTORESEARCH_SEARCH_CMD and logs the result when timing/cooldown limits allow it. Use autoresearch parallel closeout --merge-strategy <cherry-pick|fast-forward|squash|rebase> to select how the retained worker commit is merged. Use autoresearch parallel compare --a <hypothesis> --b <hypothesis> to prepare a two-arm A/B batch that reuses parallel run and verified parallel closeout. Use autoresearch evals --file <path> --format json --recommend --plateau-window 5 --target <metric-threshold> --fail-on goal-not-met --chain ship after parallel closeout to include worker improvement counts, a sign-test summary, anomaly detection, goal-achieved status, CI-friendly exit gating, next-step guidance, and downstream handoff metadata. Use autoresearch evals --file <path> --compare <other-results.tsv> --format json to compare run improvement, efficiency, and plateau length before choosing the next strategy. Use autoresearch completions <bash|zsh|fish|elvish|powershell> to generate shell completions. Use autoresearch manpages --output-dir man/man1 to generate a local autoresearch.1 manual page. Use autoresearch config template --output .autoresearch.toml to write a starter project defaults file. Use autoresearch config validate to parse defaults, validate options, and screen configured commands without running them. Use autoresearch plan --goal <goal> --format json to get a launch-ready suggested scope, metric, direction, verify, guard, and iteration count from detected repo tooling. Use autoresearch plan --goal <goal> --debug to write the derived config into a downstream debug handoff. Native artifact generators default to ignored autoresearch-results/<mode>/ paths; pass --output or --output-dir only when you intentionally want a different artifact location. Use autoresearch debug --symptom <failure> --scope <glob> to write a hypothesis-driven investigation bundle with summary, findings, eliminated hypotheses, TSV, and handoff JSON. Add --fix or --chain <targets> to autoresearch debug to record downstream chain metadata in the debug handoff. Use autoresearch debug --depth deep --iterations 12 --severity high to override the investigation budget and record severity filter metadata. Use autoresearch fix --target <verify-command> --scope <glob> --iterations 7 to write a repair-plan bundle under autoresearch-results/fix with priority order, results TSV, iteration budget, and handoff JSON. Use autoresearch fix --from-debug to import the latest debug handoff scope, symptom, and finding count into the repair plan. Use autoresearch fix --learn --evals to record downstream learn handoff and checkpoint propagation metadata. Use autoresearch improve --goal <product-area> --icp <persona> to write an improve-mode artifact bundle: research findings, ranked plan, summary, TSV, and handoff JSON. Use autoresearch improve --goal <product-area> --icp <persona> --depth deep --iterations 24 --evals to override the research budget and record active category count plus checkpoint metadata. Use autoresearch improve --goal <product-area> --seeds 5 --no-discover --learn to record seed volume, discovery posture, and downstream learn handoff metadata. Use autoresearch prd --title <title> --problem <problem> to write a focused improve-mode PRD with DECISION NEEDED markers, acceptance criteria, risks, success metrics, and an autoresearch config block. Use autoresearch security --scope <glob> --focus <area> to write a STRIDE + OWASP audit bundle with overview, threat model, attack surface, coverage, findings, recommendations, TSV, and handoff JSON. Add --fail-on <severity> and --fix to autoresearch security to record CI gate and repair-chain metadata for confirmed findings. Use autoresearch security --scope <glob> --depth deep --iterations 18 --diff --fix --evals to override the audit budget and record delta mode, downstream fix handoff, and checkpoint metadata. Use autoresearch ship --target <thing> --type <kind> --dry-run to write an 8-phase ship checklist, summary, ship log, and handoff JSON without external side effects. Use autoresearch ship --target <thing> --auto --force --rollback --monitor 15 --learn to record approval, rollback, monitoring, and downstream learn handoff metadata. Use autoresearch scenario --target <feature> --domain <general|web|mobile|api|cli|data-pipeline|infrastructure> --format <test-scenarios|threat-scenarios|use-cases|user-stories> to write a 12-dimension scenario matrix for tests, threat modeling, or debug follow-up. Use autoresearch scenario --target <feature> --domain web --depth deep --iterations 16 --evals --debug to override the exploration budget and record domain, checkpoint metadata, and downstream debug handoff. Use autoresearch predict --proposal <change> to write a five-persona review covering architecture, security, performance, UX, and adversarial risks. Use autoresearch predict --proposal <change> --depth deep --adversarial --fail-on high to record review profile and CI gate metadata. Use autoresearch predict --proposal <change> --debug to record the review as handoff context for downstream investigation. Use autoresearch predict --proposal <product-area> --improve to pass expert findings into product improvement research. Use autoresearch reason --question <decision> to write an adversarial debate artifact with candidate solutions, blind judge rubric, and convergence criteria. Use autoresearch reason --question <decision> --predict to pass the selected debate context into downstream review. Use autoresearch reason --question <decision> --iterations 11 --judges 7 --convergence 4 --temperature 0.2 to record debate budget, judge panel, convergence, synthesis, and generation hints. Use autoresearch probe --subject <requirement> to write eight persona-driven questions, constraint slots, and a saturation rule before implementation. Use autoresearch probe --subject <requirement> --mode autonomous --depth deep --iterations 9 --adversarial to override the interrogation round budget and record saturation metadata. Use autoresearch probe --subject <requirement> --plan to pass discovered constraints into planning through handoff metadata. Use autoresearch probe --subject <product-area> --improve to pass discovered constraints into product improvement research. Use autoresearch learn --mode <init|update|check|summarize> --scope <glob> to write documentation summary, validation, TSV, and handoff artifacts. Use autoresearch learn --mode check --file <path> --depth overview --iterations 14 --topics architecture,api --no-fix --evals to record learn profile, specific-file scope, validation behavior, chain, and checkpoint metadata. Use autoresearch api --format json to inspect the stable command/flag manifest and semver policy used by wrappers and agents. Use autoresearch mcp serve as a stdio MCP server exposing read-only autoresearch_status and autoresearch_watch_snapshot tools. Use autoresearch mcp call --server-command <cmd> --tool <name> --arguments '{}' to call a tool on an external stdio MCP server from an iteration script. Use autoresearch scope expand --format json to resolve active primary and companion repo scopes, with package roots inferred from Cargo.toml, package.json, pyproject.toml, and go.mod. Use autoresearch workspace exec --command <cmd> --rollback-on-failure to run one screened command across primary and companion repo targets, restoring attempted repos if any target fails. Use autoresearch guard-presets --format json to suggest per-repo guard commands for primary and companion repositories. Use autoresearch lessons --workspace-context --last 5 from any managed repo to show the shared workspace lessons path and repo targets. Use autoresearch plugin list and autoresearch plugin validate --path <file> to load local TOML mode plugin manifests with command safety screening. Use autoresearch plugin marketplace to validate .autoresearch/plugins/marketplace.toml and every referenced community mode manifest before installing or sharing it. Use ./install.sh --yes --vscode to install the lightweight VS Code package from integrations/vscode; it opens status --summary, dashboard --once, and watch --format jsonl from editor commands. Codex packages keep .agents/skills/autoresearch/SKILL.md as a thin router and load references/binary-operations.md only when native command details are needed. Use .github/actions/autoresearch in GitHub Actions to run exec mode with a checked-in goal, scope, metric, and verify command.

steps:
  - uses: actions/checkout@v4
  - uses: ./.github/actions/autoresearch
    with:
      goal: Reduce lint failures
      scope: '["src/**/*.rs", "tests/**/*.rs"]'
      metric: lint failure count
      verify: cargo clippy --all-targets --all-features -- -D warnings 2>&1 | tail -1
      direction: lower
      iterations: "3"

Project Defaults

autoresearch init reads .autoresearch.toml from the workspace root when present. CLI flags override file values.

goal = "Reduce failing tests"
scope = ["src/**/*.rs", "tests/**/*.rs"]
metric = "failing test count"
direction = "lower"
verify = "cargo test 2>&1 | tail -1"
guard = "cargo fmt -- --check"
iterations = 25
run_tag = "nightly"

Run with defaults:

autoresearch init

Generate a starter file:

autoresearch config template --output .autoresearch.toml
autoresearch config validate

Run Artifacts

All run state lives under autoresearch-results/:

results.tsv
state.json
context.json
lessons.md
handoff.json
launch.json
runtime.json
runtime.log

Do not commit autoresearch-results/ or .codex-autoresearch/.

Detailed Guides