Documentation
Start here when browsing the repository documentation directly.
| Doc | Purpose |
|---|---|
| Installation | Install paths for Claude Code, Codex, OpenCode, and source builds |
| Guide | Command map, binary operations, artifacts, and guide links |
| Examples | Copy-paste configs for common autoresearch goals |
| System Architecture | Binary, agent package, runtime, and artifact architecture |
| Project Changelog | Release history entrypoint and current development track |
| Detailed Changelog | Versioned release notes |
| Development Roadmap | Current and planned runtime, search, MCP, workspace, and release work |
The documentation set also builds as an mdBook site from book.toml and docs/SUMMARY.md.
Installation
Autoresearch ships as a Rust binary plus agent-specific skill or command packages.
Agent-Driven Install
The primary path is to give this prompt to the agent that should use Autoresearch:
Install Autoresearch in this environment.
Use the installer from:
https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh
Pick the install flag for the current agent:
- Claude Code: --claude
- Codex: --codex
- OpenCode: --opencode
- If you cannot infer the agent, use --all.
Run the installer non-interactively with bash, verify `autoresearch --help`, then tell me the command I should use to start Autoresearch in this agent.
Start commands are `/autoresearch` for Claude Code, `$autoresearch` for Codex, and `/autoresearch` for OpenCode.
Use a global install unless I explicitly asked for a project-local install.
Raw GitHub Installer
Use the raw installer when you want the source build plus agent package without cloning first. Pick the exact command for your agent:
Claude Code:
curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --claude
Codex:
curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --codex
OpenCode:
curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --opencode
All packages:
curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --all
The raw script downloads a source archive, builds the Rust binary, and runs the same install.sh used by a local clone. Set AUTORESEARCH_INSTALL_REF to use a different branch, AUTORESEARCH_INSTALL_REPO to use a fork, or AUTORESEARCH_INSTALL_ARCHIVE_URL to provide an explicit archive URL.
Pre-Built Binaries
Tagged releases publish .tar.gz archives for Linux x86_64, Linux aarch64, macOS x86_64, macOS aarch64, and Windows x86_64. Download the archive for your platform from the GitHub release, verify the adjacent .sha256 file, and place the autoresearch binary on your PATH.
Cargo Binstall
Cargo.toml includes cargo-binstall metadata for the same target-named release archives:
cargo binstall autoresearch
Homebrew
Homebrew tap maintainers can render packaging/homebrew/autoresearch.rb.template with the release version and SHA-256 values from GitHub release assets, then publish it as Formula/autoresearch.rb in a tap.
Claude Code
curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --claude
This builds autoresearch, installs it on your PATH, and installs the Claude Code plugin hooks.
If the binary is already installed:
claude plugin add coder-company/agent-autoresearch
Restart Claude Code after installing the plugin.
Manual local Claude package:
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
# From the project where you want local commands/skills:
mkdir -p /path/to/project/.claude
cp -R .claude/commands /path/to/project/.claude/commands
cp -R .claude/skills/autoresearch /path/to/project/.claude/skills/autoresearch
The .claude/ package is generated from the same canonical command and reference files as the plugin package.
Codex
curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --codex
This builds autoresearch, installs it on your PATH, and installs the Codex skill package.
Then start Codex from your project and invoke:
$autoresearch
For the smoothest foreground and background runs, start Codex with full workspace access:
codex --dangerously-bypass-approvals-and-sandbox
If you only want the Codex skill package and not the source-built binary:
$skill-installer install https://github.com/coder-company/agent-autoresearch
For a project-local Codex skill install, run the raw installer from the target project:
curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --codex --local
That installs to ./.codex/skills/autoresearch in the current project. Use --global for the default user-wide target, or --codex-dir for an explicit destination.
From a local clone:
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --codex
The installer copies .agents/skills/autoresearch/ and validates the target path before replacing the installed skill directory.
Project-local install from a local clone:
/path/to/agent-autoresearch/install.sh --yes --codex --local
To install the local Codex plugin package through the installer:
./install.sh --yes --codex-plugin
Local plugin package:
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
codex plugin marketplace add .agents/plugins/marketplace.json
codex plugin install autoresearch@autoresearch-local
The marketplace entry points at plugins/autoresearch/, which packages the same maintained Codex skill plus plugin metadata.
OpenCode
curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --opencode
This builds autoresearch, installs it on your PATH, and installs the OpenCode command and skill package.
Start OpenCode from your project and invoke:
/autoresearch
OpenCode mode commands use underscore names such as /autoresearch_debug, /autoresearch_fix, and /autoresearch_security.
The package also installs the hidden docs-manager helper agent for focused documentation updates.
For a project-local OpenCode install, run the raw installer from the target project:
curl -fsSL https://raw.githubusercontent.com/coder-company/agent-autoresearch/main/install.sh | bash -s -- --yes --opencode --local
That installs to ./.opencode in the current project. Use --global for the default user-wide target, or --opencode-dir for an explicit OpenCode config root.
Project-local install from a local clone:
/path/to/agent-autoresearch/install.sh --yes --opencode --local
The installer refuses empty, home, and parent config paths before replacing skills/autoresearch.
VS Code
./install.sh --yes --vscode
That copies integrations/vscode into your VS Code extensions directory and keeps the installed extension delegated to the autoresearch binary on PATH. Use --vscode-dir for an explicit extensions directory.
From Source
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --all
Use ./install.sh without flags for the guided installer.
Verify The Install
autoresearch --help
autoresearch screen --command "npm test"
autoresearch completions zsh >/tmp/_autoresearch
autoresearch manpages --output-dir /tmp/autoresearch-manpages
autoresearch config template >/tmp/autoresearch.toml
autoresearch config validate --path /tmp/autoresearch.toml
For repository contributors:
./scripts/validate_distribution.sh
./scripts/run_skill_e2e.sh binary-smoke --clean
./scripts/run_skill_e2e.sh runtime-smoke --clean
./scripts/run_skill_e2e.sh parallel-smoke --clean
./scripts/run_contributor_gate.sh
See Getting Started and Codex usage for first-run examples.
Guide
Autoresearch is a loop controller for agents: define a measurable goal, modify one thing, verify mechanically, keep or discard, and repeat.
Core Commands
| Need | Use |
|---|---|
| Improve a metric | /autoresearch or $autoresearch |
| Pick a metric from a vague goal | /autoresearch:plan or $autoresearch plan |
| Find a root cause | /autoresearch:debug or $autoresearch debug |
| Reduce errors to zero | /autoresearch:fix or $autoresearch fix |
| Run a security audit | /autoresearch:security or $autoresearch security |
| Ship through gates | /autoresearch:ship or $autoresearch ship |
| Analyze prior runs | /autoresearch:evals or $autoresearch evals |
Binary Operations
The agent-facing protocols delegate stateful work to the autoresearch binary:
autoresearch init --verify "cat metric.txt" --direction lower
autoresearch verify --command "cat metric.txt"
autoresearch verify --command "cat metric.txt" --repeat 3 --aggregate median
autoresearch plan --goal "reduce any types" --format json
autoresearch debug --symptom "API returns 500" --scope "src/**/*.rs"
autoresearch fix --target "npx tsc --noEmit" --scope "src/**/*.ts" --category type
autoresearch improve --goal "Improve onboarding activation" --icp "Developer tools teams"
autoresearch prd --title "Improve onboarding" --problem "New users stall before first run"
autoresearch security --scope "src/**/*.rs" --focus auth
autoresearch ship --target "Release v1.2.0" --type code-release --dry-run
autoresearch scenario --target "Checkout flow" --domain web --format test-scenarios --scope "src/checkout/**"
autoresearch predict --proposal "Add cache warming to search results" --scope "src/search/**"
autoresearch predict --proposal "Find product improvements for onboarding" --scope "src/**" --improve
autoresearch reason --question "Should we replace the storage layer" --mode debate --domain software
autoresearch probe --subject "Payment retry workflow" --scope "src/payments/**"
autoresearch probe --subject "Onboarding activation workflow" --scope "src/**" --improve
autoresearch learn --mode summarize --scope "src/**/*.rs"
autoresearch decide --decision auto --metric 4 --commit abc1234 --description "improved"
autoresearch status --summary
autoresearch progress
autoresearch cost --per-iteration-usd 0.25 --format json
autoresearch dashboard --once
autoresearch health --strict
autoresearch env --format json
autoresearch init --environment-summary auto --verify "cat metric.txt" --direction lower
autoresearch checkpoint --format json
autoresearch reanchor --format json
autoresearch watch --lines 20 --format jsonl
autoresearch watch --websocket --websocket-addr 127.0.0.1:8765
autoresearch lessons --add "Prefer fixture-level assertions" --context "reduced flaky tests"
autoresearch search --from-state --provider-command 'exa "$AUTORESEARCH_SEARCH_QUERY"' --log
autoresearch parallel prepare --workers 3
autoresearch parallel run --manifest autoresearch-results/parallel-manifest.json --timeout-seconds 1200
autoresearch parallel template --workers 3 --output autoresearch-results/parallel-workers.json
autoresearch parallel compare --a "simplify parser" --b "cache scan results"
autoresearch parallel closeout --batch-file autoresearch-results/parallel-workers.json --merge-strategy cherry-pick
autoresearch parallel cleanup --manifest autoresearch-results/parallel-manifest.json
autoresearch evals --file autoresearch-results/results.tsv --format json --recommend --plateau-window 5 --target 90 --fail-on goal-not-met --chain ship
autoresearch evals --file autoresearch-results/results.tsv --compare autoresearch-results/previous-results.tsv --format json
autoresearch api --format json
autoresearch mcp serve
autoresearch mcp call --server-command "autoresearch mcp serve" --tool autoresearch_status
autoresearch scope expand --format json
autoresearch workspace exec --command "cargo test" --rollback-on-failure
autoresearch guard-presets --format json
autoresearch lessons --workspace-context --last 5
autoresearch plugin list
autoresearch plugin validate --path .autoresearch/plugins/example.toml
autoresearch plugin marketplace
autoresearch completions zsh > ~/.zfunc/_autoresearch
Use autoresearch runtime run for supervised background Codex sessions and autoresearch runtime status / autoresearch runtime stop for control.
Use autoresearch env --format json to capture CPU, disk, container, toolchain, and recommended parallel-worker context before planning long or parallel runs; pass --environment-summary auto to init to persist that probe summary in results.tsv.
Use autoresearch status --summary for compact monitor-friendly counters.
Use autoresearch progress for the current metric, trend, counters, escalation state, and terminal metric history sparkline.
Use autoresearch verify --repeat <n> --aggregate <median|mean|min|max|last> for noisy scalar metrics; repeated verification returns the aggregate metric plus the raw samples.
Use autoresearch cost --per-iteration-usd <usd> or token/rate flags to estimate completed, remaining, and projected run spend.
Use autoresearch dashboard --once for a combined terminal view of status, trend, metric history, escalation, and recent rows; omit --once for live refresh.
Use autoresearch checkpoint --format json inside long loops to run evals only when the active iteration reaches the configured or adaptive checkpoint interval.
Use autoresearch reanchor --format json every 10 iterations or after context compaction to print the protocol fingerprint, reload references, and [RE-ANCHOR] logging tag.
Use autoresearch watch --format <tsv|jsonl> for human-readable tails or machine-readable JSON Lines.
Use autoresearch watch --websocket --websocket-addr <host:port> to serve snapshot and row update payloads to real-time dashboards. Add --once to print the initial WebSocket snapshot envelope without starting a server.
Use autoresearch lessons --add <strategy> --context <note> to append reusable lessons without editing lessons.md by hand.
Use autoresearch search --from-state with --provider-command or AUTORESEARCH_SEARCH_CMD to run cached, run-aware web searches. Add --log to append a search meta-iteration. When decide escalates to Web Search, it automatically runs the same cached helper with AUTORESEARCH_SEARCH_CMD and logs the result when timing/cooldown limits allow it.
Use autoresearch parallel closeout --merge-strategy <cherry-pick|fast-forward|squash|rebase> to select how the retained worker commit is merged.
Use autoresearch parallel compare --a <hypothesis> --b <hypothesis> to prepare a two-arm A/B batch that reuses parallel run and verified parallel closeout.
Use autoresearch evals --file <path> --format json --recommend --plateau-window 5 --target <metric-threshold> --fail-on goal-not-met --chain ship after parallel closeout to include worker improvement counts, a sign-test summary, anomaly detection, goal-achieved status, CI-friendly exit gating, next-step guidance, and downstream handoff metadata.
Use autoresearch evals --file <path> --compare <other-results.tsv> --format json to compare run improvement, efficiency, and plateau length before choosing the next strategy.
Use autoresearch completions <bash|zsh|fish|elvish|powershell> to generate shell completions.
Use autoresearch manpages --output-dir man/man1 to generate a local autoresearch.1 manual page.
Use autoresearch config template --output .autoresearch.toml to write a starter project defaults file.
Use autoresearch config validate to parse defaults, validate options, and screen configured commands without running them.
Use autoresearch plan --goal <goal> --format json to get a launch-ready suggested scope, metric, direction, verify, guard, and iteration count from detected repo tooling.
Use autoresearch plan --goal <goal> --debug to write the derived config into a downstream debug handoff.
Native artifact generators default to ignored autoresearch-results/<mode>/ paths; pass --output or --output-dir only when you intentionally want a different artifact location.
Use autoresearch debug --symptom <failure> --scope <glob> to write a hypothesis-driven investigation bundle with summary, findings, eliminated hypotheses, TSV, and handoff JSON.
Add --fix or --chain <targets> to autoresearch debug to record downstream chain metadata in the debug handoff.
Use autoresearch debug --depth deep --iterations 12 --severity high to override the investigation budget and record severity filter metadata.
Use autoresearch fix --target <verify-command> --scope <glob> --iterations 7 to write a repair-plan bundle under autoresearch-results/fix with priority order, results TSV, iteration budget, and handoff JSON.
Use autoresearch fix --from-debug to import the latest debug handoff scope, symptom, and finding count into the repair plan.
Use autoresearch fix --learn --evals to record downstream learn handoff and checkpoint propagation metadata.
Use autoresearch improve --goal <product-area> --icp <persona> to write an improve-mode artifact bundle: research findings, ranked plan, summary, TSV, and handoff JSON.
Use autoresearch improve --goal <product-area> --icp <persona> --depth deep --iterations 24 --evals to override the research budget and record active category count plus checkpoint metadata.
Use autoresearch improve --goal <product-area> --seeds 5 --no-discover --learn to record seed volume, discovery posture, and downstream learn handoff metadata.
Use autoresearch prd --title <title> --problem <problem> to write a focused improve-mode PRD with DECISION NEEDED markers, acceptance criteria, risks, success metrics, and an autoresearch config block.
Use autoresearch security --scope <glob> --focus <area> to write a STRIDE + OWASP audit bundle with overview, threat model, attack surface, coverage, findings, recommendations, TSV, and handoff JSON.
Add --fail-on <severity> and --fix to autoresearch security to record CI gate and repair-chain metadata for confirmed findings.
Use autoresearch security --scope <glob> --depth deep --iterations 18 --diff --fix --evals to override the audit budget and record delta mode, downstream fix handoff, and checkpoint metadata.
Use autoresearch ship --target <thing> --type <kind> --dry-run to write an 8-phase ship checklist, summary, ship log, and handoff JSON without external side effects.
Use autoresearch ship --target <thing> --auto --force --rollback --monitor 15 --learn to record approval, rollback, monitoring, and downstream learn handoff metadata.
Use autoresearch scenario --target <feature> --domain <general|web|mobile|api|cli|data-pipeline|infrastructure> --format <test-scenarios|threat-scenarios|use-cases|user-stories> to write a 12-dimension scenario matrix for tests, threat modeling, or debug follow-up.
Use autoresearch scenario --target <feature> --domain web --depth deep --iterations 16 --evals --debug to override the exploration budget and record domain, checkpoint metadata, and downstream debug handoff.
Use autoresearch predict --proposal <change> to write a five-persona review covering architecture, security, performance, UX, and adversarial risks.
Use autoresearch predict --proposal <change> --depth deep --adversarial --fail-on high to record review profile and CI gate metadata.
Use autoresearch predict --proposal <change> --debug to record the review as handoff context for downstream investigation.
Use autoresearch predict --proposal <product-area> --improve to pass expert findings into product improvement research.
Use autoresearch reason --question <decision> to write an adversarial debate artifact with candidate solutions, blind judge rubric, and convergence criteria.
Use autoresearch reason --question <decision> --predict to pass the selected debate context into downstream review.
Use autoresearch reason --question <decision> --iterations 11 --judges 7 --convergence 4 --temperature 0.2 to record debate budget, judge panel, convergence, synthesis, and generation hints.
Use autoresearch probe --subject <requirement> to write eight persona-driven questions, constraint slots, and a saturation rule before implementation.
Use autoresearch probe --subject <requirement> --mode autonomous --depth deep --iterations 9 --adversarial to override the interrogation round budget and record saturation metadata.
Use autoresearch probe --subject <requirement> --plan to pass discovered constraints into planning through handoff metadata.
Use autoresearch probe --subject <product-area> --improve to pass discovered constraints into product improvement research.
Use autoresearch learn --mode <init|update|check|summarize> --scope <glob> to write documentation summary, validation, TSV, and handoff artifacts.
Use autoresearch learn --mode check --file <path> --depth overview --iterations 14 --topics architecture,api --no-fix --evals to record learn profile, specific-file scope, validation behavior, chain, and checkpoint metadata.
Use autoresearch api --format json to inspect the stable command/flag manifest and semver policy used by wrappers and agents.
Use autoresearch mcp serve as a stdio MCP server exposing read-only autoresearch_status and autoresearch_watch_snapshot tools.
Use autoresearch mcp call --server-command <cmd> --tool <name> --arguments '{}' to call a tool on an external stdio MCP server from an iteration script.
Use autoresearch scope expand --format json to resolve active primary and companion repo scopes, with package roots inferred from Cargo.toml, package.json, pyproject.toml, and go.mod.
Use autoresearch workspace exec --command <cmd> --rollback-on-failure to run one screened command across primary and companion repo targets, restoring attempted repos if any target fails.
Use autoresearch guard-presets --format json to suggest per-repo guard commands for primary and companion repositories.
Use autoresearch lessons --workspace-context --last 5 from any managed repo to show the shared workspace lessons path and repo targets.
Use autoresearch plugin list and autoresearch plugin validate --path <file> to load local TOML mode plugin manifests with command safety screening.
Use autoresearch plugin marketplace to validate .autoresearch/plugins/marketplace.toml and every referenced community mode manifest before installing or sharing it.
Use ./install.sh --yes --vscode to install the lightweight VS Code package from integrations/vscode; it opens status --summary, dashboard --once, and watch --format jsonl from editor commands.
Codex packages keep .agents/skills/autoresearch/SKILL.md as a thin router and load references/binary-operations.md only when native command details are needed.
Use .github/actions/autoresearch in GitHub Actions to run exec mode with a checked-in goal, scope, metric, and verify command.
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/autoresearch
with:
goal: Reduce lint failures
scope: '["src/**/*.rs", "tests/**/*.rs"]'
metric: lint failure count
verify: cargo clippy --all-targets --all-features -- -D warnings 2>&1 | tail -1
direction: lower
iterations: "3"
Project Defaults
autoresearch init reads .autoresearch.toml from the workspace root when present.
CLI flags override file values.
goal = "Reduce failing tests"
scope = ["src/**/*.rs", "tests/**/*.rs"]
metric = "failing test count"
direction = "lower"
verify = "cargo test 2>&1 | tail -1"
guard = "cargo fmt -- --check"
iterations = 25
run_tag = "nightly"
Run with defaults:
autoresearch init
Generate a starter file:
autoresearch config template --output .autoresearch.toml
autoresearch config validate
Run Artifacts
All run state lives under autoresearch-results/:
results.tsv
state.json
context.json
lessons.md
handoff.json
launch.json
runtime.json
runtime.log
Do not commit autoresearch-results/ or .codex-autoresearch/.
Detailed Guides
- Docs Index
- Getting Started
- Examples
- System Architecture
- Project Changelog
- Core Loop
- Codex
- OpenCode
- Examples by Domain
- Chains & Combinations
- Advanced Patterns
- Hooks
- Full Guide Index
Autoresearch Examples
Copy one block into your agent prompt after installing Autoresearch. Adjust the scope and commands to match your project.
TypeScript: Remove any
/autoresearch
Goal: Remove all explicit any usage
Scope: src/**/*.ts src/**/*.tsx
Metric: explicit any count
Direction: lower
Verify: rg -n ": any| as any|<any>" src 2>/dev/null | wc -l
Guard: npm test && npm run typecheck
Iterations: 30
Python: Raise Coverage
/autoresearch
Goal: Raise test coverage to 90%
Scope: src/**/*.py tests/**/*.py
Metric: coverage percent
Direction: higher
Verify: pytest --cov=src --cov-report=term | awk '/TOTAL/ {gsub("%", "", $4); print $4}'
Guard: pytest
Iterations: 25
Rust: Reduce Clippy Warnings
/autoresearch
Goal: Reduce clippy warnings to zero
Scope: src/**/*.rs tests/**/*.rs
Metric: clippy warning count
Direction: lower
Verify: cargo clippy --message-format short 2>&1 | tee /tmp/autoresearch-clippy.txt >/dev/null; rg -c "warning:" /tmp/autoresearch-clippy.txt || true
Guard: cargo test
Iterations: 20
Web App: Shrink Bundle
/autoresearch
Goal: Reduce production JavaScript bundle size
Scope: src/**/* package.json vite.config.* webpack.config.*
Metric: bundle bytes
Direction: lower
Verify: npm run build -- --json > /tmp/autoresearch-stats.json && node -e "const s=require('/tmp/autoresearch-stats.json'); console.log(s.assets.filter(a => a.name.endsWith('.js')).reduce((n, a) => n + a.size, 0))"
Guard: npm test
Iterations: 20
API: Lower Latency
/autoresearch
Goal: Lower p95 latency for the health endpoint
Scope: src/**/* routes/**/* handlers/**/*
Metric: p95 latency milliseconds
Direction: lower
Verify: hey -z 30s -c 10 http://localhost:3000/health | awk '/95%/ {print $2 * 1000}'
Guard: npm test
Iterations: 15
Parallel Experiments
Use this when several hypotheses are plausible and the run has enough CPU, RAM, and disk for isolated worker worktrees:
autoresearch parallel prepare --workers 3
autoresearch parallel run --manifest autoresearch-results/parallel-manifest.json --timeout-seconds 1200
# Fill in each worker metric, guard status, commit, and description.
autoresearch parallel closeout --batch-file autoresearch-results/parallel-workers.json
autoresearch parallel cleanup --manifest autoresearch-results/parallel-manifest.json
More domain-specific examples are in Examples by Domain.
Autoresearch — Product Design Review
Problem Statement
AI coding agents (Claude Code, Codex CLI, Cursor, etc.) need autonomous iteration to improve codebases against measurable metrics. Today, agents either:
- Ask after every change — breaking flow, requiring human attention for mechanical decisions
- Use heavyweight orchestration — Python/Node scripts with complex dependency chains, slow startup, runtime dependencies
- Have no memory across turns — repeat failed experiments, lose context on compaction
There is no lightweight, compiled infrastructure that gives agents a tight modify→verify→keep/discard loop with git as memory, automatic rollback on failure, and escalation when stuck.
Solution
A single compiled Rust binary (about 3MB) that provides:
- Hook handler — sub-5ms responses for Claude Code’s plugin hook system (PreToolUse, PostToolUse, UserPromptSubmit, Stop, etc.)
- CLI operations —
init,verify,guard,decide,resume,health,progress,watch,lessons,handoff,exec, plusruntime run/start/status/supervise/stopandparallel prepare/run/closeout/cleanup - Agent packages — Claude plugin commands, Codex
.agentsskill/plugin package, OpenCode command/skill/helper-agent package, and shared markdown protocols for iteration loops, security audits, debugging, shipping, product improvement research, and more
The binary handles the mechanical infrastructure. The agent handles the intelligence. Clean separation.
Target Users
| User | Integration |
|---|---|
| Claude Code users | Installer builds the binary and installs the plugin hooks |
| Codex CLI users | $skill-installer skill plus local .agents/plugins/marketplace.json plugin package |
| OpenCode users | Generated .opencode/ commands, skill, and helper agent |
| Any LLM agent | CLI called directly, skill markdown parsed by agent |
Architecture
┌─────────────────┐ ┌──────────────┐ ┌───────────────┐
│ Agent (Claude/ │────▶│ autoresearch │────▶│ Git repo │
│ Codex/other) │ │ binary │ │ (experiments) │
└─────────────────┘ └──────────────┘ └───────────────┘
│ │
│ reads │ writes
▼ ▼
┌─────────────────┐ ┌──────────────────────┐
│ SKILL.md / │ │ autoresearch-results/ │
│ commands/*.md │ │ ├── results.tsv │
│ agent packages │ │ ├── state.json │
└─────────────────┘ │ ├── context.json │
│ ├── lessons.md │
│ ├── handoff.json │
│ ├── launch.json │
│ ├── runtime.json │
│ └── runtime.log │
└──────────────────────┘
Key Metrics
| Metric | Target | Rationale |
|---|---|---|
| Hook response latency | <5ms p99 | Hooks fire on every tool use; must be invisible |
| Binary size | <5MB | Single-file distribution, no extraction needed |
| Runtime dependencies | Zero | No Node, Python, Docker. Just the binary. |
| Cold start | <10ms | First invocation must feel instant |
| Memory usage | <5MB RSS | Runs alongside the agent, not competing for resources |
Non-Goals
- Not a replacement for the agent itself — the binary doesn’t make decisions about what to change. It handles verification, logging, rollback, and state management.
- Not a CI/CD system — it runs locally alongside the agent. The
execmode supports CI but is not a pipeline orchestrator. - Not a test framework — it calls your existing test/lint/build commands and parses their output.
- Not a package manager — it doesn’t manage dependencies, just detects dangerous ones during security audits.
Modes
| Mode | Purpose |
|---|---|
| Core loop | Iterate against any numeric metric |
| Debug | Scientific bug hunting with hypothesis testing |
| Fix | Crush errors one-by-one until zero |
| Security | STRIDE + OWASP audit with red-team personas |
| Scenario | Edge case generation across 12 dimensions |
| Predict | Multi-persona expert debate |
| Reason | Adversarial refinement with blind judges |
| Probe | Requirements interrogation until saturation |
| Learn | Auto-generate documentation |
| Ship | 8-phase ship workflow |
| Improve | Research ICP needs and generate product improvement PRDs |
| Evals | Analyze iteration results |
| Plan | Convert goal → validated config |
Success Criteria
- Agent can iterate 25+ times without human intervention
- Failed experiments are automatically reverted (zero pollution)
- Cross-session memory via lessons.md survives compaction
- Hook latency is imperceptible to the agent/user
- Background
autoresearch runtime runcan relaunch Codex turns without corrupting artifacts - Parallel worker closeout produces one authoritative retained result after verification
- Installation is one command for Claude, Codex, and OpenCode paths
Architecture
Binary Dual-Use
The autoresearch binary serves two roles from the same executable:
- CLI tool — direct invocation via
autoresearch init,autoresearch decide, etc. - Hook handler — invoked by the Claude Code plugin system via
autoresearch hook <name>
The entry point in main.rs dispatches through clap subcommands. Hook mode parses
the hook name and delegates to the corresponding handler in src/hooks/.
╭──────────────╮ ╭──────────────────╮ ╭───────────────╮
│ Agent call │────▶│ autoresearch │────▶│ CLI dispatch │
│ (or hook) │ │ binary │ │ (clap) │
╰──────────────╯ ╰──────────────────╯ ╰───────┬───────╯
│
╭────────────────────────────────┼────────────────╮
│ │ │
▼ ▼ ▼
╭──────────────╮ ╭──────────────╮ ╭─────────────╮
│ CLI command │ │ Hook handler│ │ Exec mode │
│ (init/log/ │ │ (scout, │ │ (CI/CD │
│ decide/..) │ │ stop,..) │ │ JSON-line)│
╰──────┬───────╯ ╰──────┬───────╯ ╰──────┬──────╯
│ │ │
▼ ▼ ▼
╭──────────────────────────────────────────────────────────────╮
│ src/core/ │
│ config.rs state.rs results.rs git.rs verify.rs │
╰─────────────────────────────────────────────────────────────╯
Module Breakdown
src/core/ — Shared foundation
| File | Purpose |
|---|---|
config.rs | RunConfig, Direction, Mode, VerifyFormat, RollbackStrategy |
state.rs | RunState, RunPhase (state machine), IterationStatus, StopReason |
results.rs | ResultRow, ResultsLog (TSV append/read), completion summary |
git.rs | GitRepo wrapper around libgit2 — HEAD, revert, reset, worktree status |
verify.rs | Run verify/guard commands, parse scalar or JSON output, screen for danger |
metrics.rs | Metric parsing utilities, decimal handling |
src/hooks/ — Claude Code hook handlers
Each hook is a function that reads minimal state, makes a decision, and prints output. Hooks must complete in <5ms. No network calls, no heavy I/O.
| Hook | Fires on | Purpose |
|---|---|---|
session_init | Session start | Detect interrupted runs, load state |
session_end | Session end | Write final state, cleanup |
iteration_context | UserPromptSubmit | Inject iteration number + last result |
stop_check | Stop | Check if iteration cap reached |
scout_block | PreToolUse: Write/Edit/MultiEdit/Bash/Glob/Grep/Read | Block generated/vendor/sensitive paths, Bash reads, and out-of-scope writes |
dangerous_cmd | PreToolUse: Bash | Screen for rm -rf, DROP TABLE, etc. |
simplify_gate | UserPromptSubmit | Enforce “equal metric + less code = keep” |
compaction_reanchor | Context compaction | Re-inject critical state after compaction |
privacy_block | PreToolUse: Write/Edit/MultiEdit/Bash/Glob/Grep/Read | Block credential paths and secret-looking inputs; warn on sensitive Bash paths |
dev_rules_reminder | UserPromptSubmit | Remind agent of project conventions |
subagent_context | Subagent spawn | Inject autoresearch state into subagent prompt |
src/escalation/ — Failure recovery
| File | Purpose |
|---|---|
pivot.rs | EscalationState — tracks consecutive discards, triggers refine/pivot/search |
lessons.rs | LessonsLog — append/search/read lessons.md |
src/modes/ — Mode-specific logic
Each mode file contains the structured output types and validation logic for that subcommand. The actual iteration orchestration is done by the agent reading the corresponding command markdown file.
src/agents/ — Multi-agent support
Agent detection, context injection for different agent runtimes (Claude Code, Codex CLI).
State Machine
The RunPhase enum enforces valid transitions at the type level:
Setup → Baseline { metric }
Baseline → Iterating { iteration, current, best, best_iteration }
Iterating → Iterating (on keep/discard/crash/no-op)
Iterating → Complete { reason }
Iterating → Blocked { reason }
Blocked → Iterating (on resume)
RunState persists to autoresearch-results/state.json after every iteration.
On resume, the binary reads state.json and reconstructs the full context.
Data Flow
Agent decides to modify code
│
▼
autoresearch verify --command "..." → metric (Decimal)
│
▼
autoresearch guard --command "..." → pass/fail (optional)
│
▼
autoresearch decide --decision auto --metric N --metrics-json '{...}'
│
├── keep: state.record_keep() → update state.json, append TSV
└── discard: state.record_discard() → rollback, update state.json, append TSV
System Architecture
Autoresearch is a Rust binary plus agent-facing instruction packs. The binary owns mechanical state transitions; agents own reasoning, code edits, and hypothesis selection.
Components
| Component | Role |
|---|---|
autoresearch binary | CLI, hook dispatcher, verifier, rollback controller, runtime supervisor |
.claude-plugin/marketplace.json | Claude marketplace manifest pointing at the repo-root plugin package |
commands/ | Claude Code slash command instructions |
skills/autoresearch/ | Claude/OpenCode skill package and shared references |
.agents/skills/autoresearch/ | Codex/generic agent skill package |
.agents/plugins/marketplace.json | Local Codex marketplace root for the packaged plugin |
plugins/autoresearch/ | Codex plugin package generated from .agents/skills/autoresearch/ |
.opencode/ | OpenCode command, skill, and helper-agent distribution |
references/ | Protocol source docs copied into installable packages |
autoresearch-results/ | Runtime artifacts created inside the user’s target repo |
Runtime Flow
agent chooses one hypothesis
|
v
edits scoped files and creates a trial commit
|
v
autoresearch verify runs the metric command
|
v
autoresearch guard runs the regression command when configured
|
v
autoresearch decide keeps, discards, logs, and updates state
The binary writes results.tsv and state.json after each decision. A discarded
experiment is rolled back automatically, while a kept experiment remains in git
history as the next baseline.
Parallel Flow
Parallel work is recorded as a batch:
autoresearch parallel prepare --workers 3
autoresearch parallel run --manifest autoresearch-results/parallel-manifest.json --timeout-seconds 1200
autoresearch parallel closeout --batch-file autoresearch-results/parallel-workers.json
autoresearch parallel cleanup --manifest autoresearch-results/parallel-manifest.json
Prepare creates branch-backed worker worktrees, prompt files, a manifest, and the editable batch file. Run executes the prepared worker prompts in those worktrees and records crashed or timed-out workers in the manifest. Closeout cherry-picks the best worker, re-runs verify and guard in the main worktree, falls back to the next worker on merge or verification failure, then writes worker audit rows and one authoritative retained batch row. Cleanup removes worker worktrees and branches.
Background Runtime
autoresearch runtime run supervises Codex execution through persisted artifacts:
| Artifact | Purpose |
|---|---|
launch.json | Command, cwd, repo targets, goal, iteration limit, and stop criteria |
runtime.json | Current status and supervisor recommendation |
runtime.log | Detached runtime output |
Manual controls remain available through runtime start, runtime status,
runtime supervise, and runtime stop.
More Detail
See Architecture for module-level internals and Guide for user-facing command flow.
Autoresearch — Codebase Summary
AI-friendly reference for agents working on this codebase.
Entry Points
| Path | Purpose |
|---|---|
src/main.rs | CLI entry — Clap-based command dispatch for run setup, verify/guard/decide, health, resume/status/progress/watch, lessons, handoff, exec, runtime, parallel, screen, and hooks |
.claude-plugin/marketplace.json | Claude marketplace manifest for the repo-root plugin package |
hooks/hooks.json | Claude Code plugin hook definitions — maps lifecycle events to binary invocations |
skills/autoresearch/SKILL.md | Agent skill file — iteration protocol, subcommand table, references |
.agents/skills/autoresearch/ | Maintained Codex skill package used by direct Codex installs |
plugins/autoresearch/ | Codex plugin package generated from the .agents skill package |
.agents/plugins/marketplace.json | Local Codex marketplace entry pointing at plugins/autoresearch/ |
.opencode/ | Generated OpenCode commands and skill package, plus the maintained docs-manager helper agent |
commands/autoresearch.md | Root command protocol (core iteration loop) |
commands/autoresearch/*.md | Subcommand protocols (debug, fix, security, scenario, etc.) |
Core Modules (src/core/)
| File | Responsibility |
|---|---|
config.rs | RunConfig, Direction, VerifyFormat, RollbackStrategy types |
git.rs | GitRepo wrapper — status, head, revert, worktree checks |
verify.rs | Run verify commands, parse scalar/JSON output, safety screening |
results.rs | ResultsLog — append TSV rows, read history |
state.rs | RunState — iteration count, metrics, keeps/discards, phase tracking |
metrics.rs | Metric parsing, delta calculation, direction comparison |
context.rs | Canonical context.json and repo-local pointer writing |
health.rs | Native preflight checks for runtime launch safety |
runtime.rs | Background launch/runtime manifests, supervisor snapshots, and stop control |
Escalation (src/escalation/)
| File | Responsibility |
|---|---|
pivot.rs | EscalationState — track consecutive discards, trigger REFINE/PIVOT/SEARCH/STOP |
lessons.rs | LessonsLog — read/write/search cross-run learning entries |
Hooks (src/hooks/)
| File | Hook | Fires On |
|---|---|---|
scout_block.rs | scout-block | PreToolUse (Write/Edit/MultiEdit/Bash/Glob/Grep/Read) — blocks generated paths, Bash reads, and out-of-scope writes |
privacy_block.rs | privacy-block | PreToolUse — blocks access to sensitive paths |
dangerous_cmd.rs | dangerous-cmd-block | PreToolUse (Bash) — blocks rm -rf, fork bombs, etc. |
iteration_context.rs | iteration-context | UserPromptSubmit — injects run state into agent context |
dev_rules_reminder.rs | dev-rules-reminder | UserPromptSubmit — re-injects active protocol and code standards |
simplify_gate.rs | simplify-gate | UserPromptSubmit — reminds agent of simplicity rule |
stop_check.rs | stop-check | Stop — detects premature stop during active run |
compaction_reanchor.rs | compaction-reanchor | PostCompact — re-injects critical state after context compaction |
session_init.rs | session-init | SessionStart — detects interrupted runs |
session_end.rs | session-end | SessionEnd — emits terminal notification and optional webhook summary |
subagent_context.rs | subagent-context | SubagentStart — passes run context to subagents |
Modes (src/modes/)
Thin logic for mode-specific state (most protocol lives in markdown commands):
loop_mode.rs, debug.rs, fix.rs, security.rs, scenario.rs, predict.rs, reason.rs, probe.rs, learn.rs, ship.rs, evals.rs, improve.rs, plan.rs
Agents (src/agents/)
| File | Purpose |
|---|---|
claude.rs | Claude Code-specific integration helpers |
codex.rs | Codex CLI-specific integration helpers |
Data Flow
User prompt → [hook: iteration-context injects state]
→ Agent reads state + TSV + git log
→ Agent makes ONE change
→ Agent calls: autoresearch verify --command "..."
→ Binary runs command, returns metric + metrics JSON
→ Agent calls: autoresearch decide --decision auto --metric N --metrics-json '{...}'
→ Binary: evaluates criteria, updates state.json, appends TSV, reverts if discard
→ [hook: stop-check ensures agent doesn't quit early]
→ Next iteration
Background runs route the same state machine through autoresearch runtime run,
which writes launch.json, runtime.json, and runtime.log, runs the native
health preflight at each relaunch boundary, and supervises detached Codex turns.
Parallel batches use autoresearch parallel prepare/run/closeout/cleanup to run
worker worktrees and retain only one verified batch winner.
Key Types
| Type | Location | Fields |
|---|---|---|
RunConfig | core/config.rs | verify, direction, format, scope, guard, primary_metric_key |
RunState | core/state.rs | iteration, baseline_metric, current_metric, best_metric, keeps, discards, crashes, consecutive_discards, phase |
ResultRow | core/results.rs | iteration, commit, metric, delta, guard, status, description |
LaunchManifest | core/runtime.rs | workspace_root, execution_policy, codex_bin, repo_targets, config |
EscalationState | escalation/pivot.rs | consecutive_discards, pivots, last_action |
Direction | core/config.rs | Higher, Lower |
IterationStatus | core/state.rs | Baseline, Keep, Discard, Crash, NoOp, Blocked, Pivot, Refine, Search |
How to Add…
A new CLI command
- Add variant to
Commandsenum insrc/main.rs - Add match arm in
main()dispatching tocmd_<name>()function - Implement function at bottom of
main.rs(or extract to module if >100 lines)
A new hook
- Add handler in
src/hooks/<name>.rs - Register in
src/hooks/mod.rs - Add hook entry in
hooks/hooks.jsonunder the appropriate lifecycle event - Hook receives JSON on stdin, returns JSON on stdout, must complete in <5ms
A new mode/subcommand
- Add command protocol in
commands/autoresearch/<mode>.md - Add mode-specific state logic in
src/modes/<mode>.rs(if needed) - Register in
src/modes/mod.rs - Update SKILL.md subcommand table
Code Standards
Rust conventions for the autoresearch codebase.
Error Handling
- Use
anyhow::Resultfor all fallible functions. - Use
thiserrorfor custom error types in library code that callers need to match on. - Use
.context("descriptive message")on every?— errors should be traceable. - Never
unwrap()in library code.unwrap()is acceptable only in tests. expect()is acceptable for provably infallible operations (e.g., regex compilation).
Serialization
- All persistent types derive
Serialize, Deserialize. - Use
#[serde(rename_all = "snake_case")]for enum variants. - Use
#[serde(tag = "phase")]for internally tagged enums (likeRunPhase). - Use
#[serde(default)]for optional fields added in later versions (forward compat). - Use
#[serde(skip_serializing_if = "Option::is_none")]to keep JSON clean.
Documentation
- Every public type and function has a
///doc comment. - Module-level
//!doc comments describe the module’s role. - Use
# Examplesin doc comments for non-obvious APIs.
Testing
- Unit tests live in
#[cfg(test)] mod testsat the bottom of each file. - Integration tests live in
tests/. - E2E fixtures live in
tests/e2e/fixtures/. - Every new CLI subcommand gets a test in
tests/cli_test.rs. - Every state transition gets a test in
tests/state_test.rs. - Target: 80%+ line coverage on
src/core/.
Style
- Run
cargo clippy -- -D warningsbefore every commit. Zero warnings. - Run
cargo fmtbefore every commit. - Run
./scripts/run_contributor_gate.shbefore opening a PR. - Run
./scripts/validate_distribution.shafter changing skill, command, reference, or agent metadata files. - Run
./scripts/run_skill_e2e.sh binary-smoke --cleanafter changing core run closeout or result-monitoring behavior. - Run
./scripts/run_skill_e2e.sh runtime-smoke --cleanafter changing runtime launch, status, or stop behavior. - Run
./scripts/run_skill_e2e.sh parallel-smoke --cleanafter changing parallel worker prepare/run/cleanup behavior. - Max line length: 100 characters (soft), 120 characters (hard).
- Prefer
matchoverif letchains for exhaustive enum handling. - Prefer
&stroverStringin function parameters when ownership isn’t needed.
Performance
- Hooks must complete in <5ms. No network calls, no heavy I/O in hook handlers.
- Use
Decimal(notf64) for all metric values — financial-grade precision. - Release builds use
opt-level = "z", LTO, strip,panic = "abort".
Naming
- Types:
PascalCase(e.g.,RunState,ResultRow) - Functions:
snake_case(e.g.,record_keep,run_verify) - CLI subcommands: lowercase single words (e.g.,
init,decide,evals) - Constants:
SCREAMING_SNAKE_CASE - Files:
snake_case.rs
Dependencies
- Minimize dependency count. Current deps are intentional:
clap— CLI parsingserde+serde_json— serializationtokio— async runtime (for exec mode)rust_decimal— precise metric valueschrono— timestampsgit2— libgit2 bindingsregex— pattern matchinganyhow+thiserror— error handlingglob— file pattern matching
- Do not add dependencies without justification in the PR description.
Project Changelog
This page is the high-level release history entrypoint. The canonical Keep-a-Changelog file is changelog.md.
Current Development Track
Recent work has focused on catching the binary and installable agent packages up to the stronger autoresearch implementations:
- Background runtime control through
autoresearch runtime runandruntime start/status/supervise/stop - Live log monitoring through
autoresearch watch - Native parallel worker support through
autoresearch parallel prepare,run, verifiedcloseout, andcleanup, including worker crash/timeout recording - Codex, Claude Code, and OpenCode installation paths
- Distribution validation for generated command and skill packages
- Binary smoke tests for installed skill instructions
- Direct documentation entrypoints for installation, usage, examples, and system architecture
Release Notes
See changelog.md for versioned release notes and development-roadmap.md for planned work.
Changelog
All notable changes to this project will be documented in this file.
Format based on Keep a Changelog.
[0.1.0] — 2025-05-27
Initial release.
Added
- Core engine:
init,verify,guard,log,decide,status,resume,progress,watchCLI commands - State machine:
RunPhaseenum (Setup → Baseline → Iterating → Complete/Blocked) with typed transitions - Results logging: TSV format with iteration, commit, metric, delta, guard, status, description columns
- State persistence:
state.jsonwith full run context, resume support for interrupted sessions - Git integration: libgit2-based revert and hard-reset rollback strategies, worktree status detection
- Verify system: scalar and
metrics_jsonoutput formats, command screening for dangerous patterns - Escalation protocol: 3-tier (refine → pivot → web search → stop) triggered by consecutive discards
- Lessons log: Markdown-based learnings that persist across sessions, with search and tail queries
- 12 subcommands: improve, debug, fix, security, scenario, predict, learn, reason, probe, evals, ship, plan
- Exec mode: Non-interactive CI/CD mode — reads config from stdin, emits JSON lines
- Background runtime:
runtime runmanaged relaunch loop plusstart/status/supervise/stopartifacts, detached launch control, and relaunch/stop/needs_human supervisor recommendations - Parallel runtime:
parallel prepare/run/closeout/cleanupmanages worker worktrees, records crashes/timeouts, cherry-picks verified winners, and logs one authoritative retained batch row - Handoff system: Structured JSON handoff between modes for chained workflows
- 11 hook handlers: session_init, session_end, iteration_context, stop_check, scout_block, dangerous_cmd, simplify_gate, compaction_reanchor, privacy_block, dev_rules_reminder, subagent_context
- Claude Code plugin:
.claude-plugin/plugin.jsonmanifest with hook definitions - Codex skill:
.agents/skills/autoresearch/for direct Codex installs, plusplugins/autoresearch/for the local Codex plugin marketplace - OpenCode package:
.opencode/commands, skill package, and hiddendocs-managerhelper agent - Agent commands:
commands/autoresearch.mdroot + 12 subcommand files - Reference docs: 27 protocol and workflow reference documents
- Release profile:
opt-level = "z", LTO, strip,panic = "abort"— about 3MB binary with a 5MB contributor-gate ceiling
Development Roadmap
v0.1.0 — Foundation (current)
- Core iteration engine (init, verify, guard, decide, log)
- State machine with typed transitions
- TSV results + JSON state persistence
- Git rollback (revert + hard-reset)
- Noise-aware scalar verification repeats with aggregation
- 12 subcommands with full reference docs
- Exec mode for CI/CD
- 11 hook handlers
- Claude Code plugin + Codex skill
- Codex plugin package + local marketplace entry
- Thin Codex skill router with detailed binary operations in references
- Escalation protocol (refine → pivot → search → stop)
- Lessons log with search
v0.2.0 — Background Mode + Parallel Experiments
- Background runtime artifacts + detached Codex launch control (
autoresearch runtime start/status/supervise/stop) - Background supervisor recommendation (
autoresearch runtime supervise) with iteration cap, criteria, stop-condition, soft-blocker, and stagnation decisions - Background supervisor relaunch loop that automatically executes recommended relaunches (
autoresearch runtime run) - Parallel batch templates (
autoresearch parallel template) for editable worker result JSON - Parallel worker preparation (
autoresearch parallel prepare) with branch-backed git worktrees, prompts, manifest, and batch file - Parallel worker launch (
autoresearch parallel run) for preparedcodex execworkers, including timeout/crash recording - Parallel batch closeout (
autoresearch parallel closeout) with cherry-pick, post-merge verify/guard, fallback, worker audit rows, and one authoritative retained-state update - Parallel cleanup (
autoresearch parallel cleanup) for worker worktrees and branches - Experiment branching — each trial on its own git branch
- Branch merge strategy selection (fast-forward, squash, rebase)
-
autoresearch watch— tail results in real-time - Progress websocket for real-time monitoring
- Improved evals: statistical significance testing on parallel results
v0.3.0 — Web Search + MCP Integration
- Built-in web search escalation (configurable provider command)
- MCP tool server mode — expose autoresearch as an MCP tool
- MCP client mode — call external MCP tools during iteration
- Structured search queries from escalation context
- Search result caching to avoid redundant queries
-
autoresearch search— standalone web search for the current problem
v0.4.0 — Multi-Repo + Workspace Support
- Workspace-owned artifacts (
autoresearch-results/) and repo-local pointers for managed repos - Companion repo registration through
--companion-repo-scope PATH=SCOPE - Companion repo preflight, health, and runtime dirty-worktree safeguards
- Cross-repo change execution and rollback across companion repos
- Workspace-aware scope expansion (monorepo package boundaries)
- Cross-repo guard command presets
- Native environment probe command for CPU, disk, container, toolchain context, and init metadata
- Shared lessons across repos in a workspace
v1.0.0 — Stable API + Ecosystem
- Stable CLI API — semver guarantees on commands, flags, and output formats
- Native plan command for repo-aware launch config suggestions
- Native debug generator for hypothesis-driven investigation bundles
- Native fix generator for one-error-at-a-time repair-plan bundles
- Native improve artifact bundle for research findings, ranked plan, TSV, summary, and handoff
- Native PRD generator for selected improve-mode ideas
- Native security generator for STRIDE + OWASP audit bundles
- Native ship generator for 8-phase checklist bundles
- Native scenario generator for 12-dimension edge-case artifacts
- Native predict generator for five-persona review artifacts
- Native reason generator for adversarial candidate debate artifacts
- Native probe generator for eight-persona constraint artifacts
- Native learn generator for documentation summary artifacts
- Adaptive eval checkpoint command for long-running loops
- Native protocol re-anchor command for long-running Codex sessions
- Plugin system — loadable mode definitions (TOML or YAML)
- Plugin marketplace — community-contributed modes
- Configuration file (
.autoresearch.toml) for project-level defaults - Shell completions (bash, zsh, fish, elvish, PowerShell)
- Man pages generation
- Pre-built binaries for Linux (x86_64, aarch64), macOS (x86_64, aarch64), Windows
- Homebrew formula and cargo-binstall support
- Comprehensive documentation site
- GitHub Action for autoresearch in CI
- Metric history graphing (sparklines in terminal)
- Cost tracking — estimate token/API spend per iteration
- A/B experiment mode — compare two approaches head-to-head
- Interactive TUI dashboard for monitoring runs
- VS Code extension for run visualization with source installer support
Future Ideas (unscheduled)
- Re-check upstream autoresearch projects before the next feature milestone
autoresearch
Autonome zielgerichtete Iterations-Engine für Coding-Agenten. In Rust geschrieben.
„Ziel festlegen → Agent führt die Schleife aus → Du wachst mit Ergebnissen auf“
English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский
Funktionsweise
Du beschreibst das Ziel → Agent bestätigt Konfiguration → Du sagst "los"
│
┌────────┴────────┐
│ Schleife aktiv │
│ │
│ 1. Kontext lesen│
│ 2. Hypothese │
│ 3. EINE Änderung│
│ 4. Git Commit │
│ 5. Verifizieren │
│ 6. Verbessert? │
│ → behalten │
│ → rückgängig │
│ 7. Protokoll │
│ 8. Nächste Runde│
└─────────────────┘
Jede Verbesserung addiert sich. Jeder Fehlschlag wird automatisch zurückgesetzt. Der Fortschritt wird im TSV-Format protokolliert. Die Eskalationsleiter (Verfeinern → Schwenken → Websuche → Stopp) verhindert endlose Wiederholungen.
Befehle
| Befehl | Funktion | Standard-Iterationen |
|---|---|---|
/autoresearch | Kern-Schleife: ändern → verifizieren → behalten/verwerfen | 25 |
/autoresearch:plan | Interaktiver Assistent → validierte Konfiguration | einmalig |
/autoresearch:debug | Bug-Jagd durch Hypothesen-Iteration | 15 |
/autoresearch:fix | Fehler einzeln bis auf null korrigieren | 20 |
/autoresearch:security | STRIDE + OWASP Sicherheitsaudit | 15 |
/autoresearch:ship | 8-Phasen-Release-Workflow | linear |
/autoresearch:scenario | Grenzfälle über 12 Dimensionen generieren | 20 |
/autoresearch:predict | Debatte zwischen 5 Experten-Personas | einmalig |
/autoresearch:learn | Erkunden → Doku generieren → validieren → korrigieren | 10 |
/autoresearch:reason | Kontradiktorische Debatte mit Blind-Richtern | 8 |
/autoresearch:probe | 8 Personas hinterfragen Anforderungen | 15 |
/autoresearch:improve | Recherche zu Produktverbesserungen | 20 |
/autoresearch:evals | Ergebnisanalyse: Trends und Plateaus | einmalig |
Schnellstart
Claude Code (Plugin-Installation)
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude
Session neu starten. Alle 13 Befehle sind sofort verfügbar.
Codex CLI
$skill-installer install https://github.com/coder-company/agent-autoresearch
Dann: $autoresearch
OpenCode
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode
Nutzen: /autoresearch oder /autoresearch_debug.
Aus dem Quellcode
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh
Benötigt die Rust-Toolchain (rustup.rs). Erzeugt eine ca. 3 MB große Binärdatei ohne Laufzeitabhängigkeiten.
Wichtigste Regeln
- Eine Änderung pro Runde — atomare Experimente schaffen Kausalität
- Erst lesen, dann schreiben — git log und TSV vor der Änderung prüfen
- Nur mechanische Verifikation — Befehl ausführen, Zahl auswerten
- Automatischer Rollback —
git revert HEAD --no-editbei Fehlschlag - Einfachheit gewinnt — gleiche Metrik + weniger Code = behalten
Vollständige Dokumentation (English)
autoresearch
Motor de iteración autónoma dirigido por objetivos para agentes de programación. Escrito en Rust.
«Define el OBJETIVO → El agente ejecuta el BUCLE → Despiertas con resultados»
English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский
Cómo funciona
Describes el objetivo → El agente confirma la config → Dices "adelante"
│
┌────────┴────────┐
│ Bucle activo │
│ │
│ 1. Leer contexto│
│ 2. Hipótesis │
│ 3. Modificar UNO│
│ 4. Git commit │
│ 5. Verificar │
│ 6. ¿Mejoró? │
│ → conservar │
│ → revertir │
│ 7. Registrar │
│ 8. Siguiente │
└─────────────────┘
Cada mejora se acumula. Cada fallo se revierte automáticamente. El progreso se registra en formato TSV. La escalera de escalamiento (Refinar → Pivotar → Búsqueda web → Detener) previene reintentos infinitos.
Comandos
| Comando | Función | Iteraciones por defecto |
|---|---|---|
/autoresearch | Bucle principal: modificar → verificar → conservar/descartar | 25 |
/autoresearch:plan | Asistente interactivo → configuración validada | única |
/autoresearch:debug | Caza de bugs mediante iteración de hipótesis | 15 |
/autoresearch:fix | Corregir errores uno a uno hasta llegar a cero | 20 |
/autoresearch:security | Auditoría STRIDE + OWASP con red-team | 15 |
/autoresearch:ship | Flujo de lanzamiento en 8 fases | lineal |
/autoresearch:scenario | Generar casos límite en 12 dimensiones | 20 |
/autoresearch:predict | Debate entre 5 expertos | única |
/autoresearch:learn | Explorar → generar docs → validar → corregir | 10 |
/autoresearch:reason | Debate adversarial con jueces ciegos | 8 |
/autoresearch:probe | 8 personas interrogan los requisitos | 15 |
/autoresearch:improve | Investigación de mejoras de producto | 20 |
/autoresearch:evals | Análisis de resultados: tendencias y mesetas | única |
Inicio rápido
Claude Code (instalación de plugin)
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude
Reinicia tu sesión. Los 13 comandos están disponibles.
Codex CLI
$skill-installer install https://github.com/coder-company/agent-autoresearch
Luego: $autoresearch
OpenCode
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode
Usa: /autoresearch o /autoresearch_debug.
Desde el código fuente
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh
Requiere la cadena de herramientas de Rust (rustup.rs). Genera un binario de ~3 MB sin dependencias en tiempo de ejecución.
Reglas fundamentales
- Un solo cambio por turno — los experimentos atómicos establecen causalidad
- Leer antes de escribir — revisar git log y TSV antes de modificar
- Solo verificación mecánica — ejecutar el comando, extraer el número
- Rollback automático —
git revert HEAD --no-editante fallos - La simplicidad gana — misma métrica + menos código = conservar
Documentación completa (English)
autoresearch
Moteur d’itération autonome dirigé par objectifs pour agents de programmation. Écrit en Rust.
« Définir l’OBJECTIF → L’agent exécute la BOUCLE → Vous vous réveillez avec des résultats »
English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский
Fonctionnement
Vous décrivez l'objectif → L'agent confirme la config → Vous dites "go"
│
┌────────┴────────┐
│ Boucle active │
│ │
│ 1. Lire contexte│
│ 2. Hypothèse │
│ 3. Modifier UN │
│ 4. Git commit │
│ 5. Vérifier │
│ 6. Amélioré ? │
│ → garder │
│ → annuler │
│ 7. Journaliser │
│ 8. Tour suivant │
└─────────────────┘
Chaque amélioration s’empile. Chaque échec est automatiquement annulé. La progression est enregistrée au format TSV. L’échelle d’escalade (Affiner → Pivoter → Recherche web → Arrêt) empêche les tentatives infinies.
Commandes
| Commande | Fonction | Itérations par défaut |
|---|---|---|
/autoresearch | Boucle principale : modifier → vérifier → garder/rejeter | 25 |
/autoresearch:plan | Assistant interactif → configuration validée | unique |
/autoresearch:debug | Chasse aux bugs par itération d’hypothèses | 15 |
/autoresearch:fix | Corriger les erreurs une par une jusqu’à zéro | 20 |
/autoresearch:security | Audit STRIDE + OWASP avec red-team | 15 |
/autoresearch:ship | Flux de livraison en 8 phases | linéaire |
/autoresearch:scenario | Générer des cas limites sur 12 dimensions | 20 |
/autoresearch:predict | Débat entre 5 experts | unique |
/autoresearch:learn | Explorer → générer docs → valider → corriger | 10 |
/autoresearch:reason | Débat contradictoire avec juges aveugles | 8 |
/autoresearch:probe | 8 personas interrogent les exigences | 15 |
/autoresearch:improve | Recherche d’améliorations produit | 20 |
/autoresearch:evals | Analyse des résultats : tendances et plateaux | unique |
Démarrage rapide
Claude Code (installation plugin)
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude
Redémarrez votre session. Les 13 commandes sont disponibles.
Codex CLI
$skill-installer install https://github.com/coder-company/agent-autoresearch
Puis : $autoresearch
OpenCode
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode
Utilisez : /autoresearch ou /autoresearch_debug.
Depuis les sources
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh
Nécessite la chaîne d’outils Rust (rustup.rs). Produit un binaire d’environ 3 Mo sans aucune dépendance d’exécution.
Règles essentielles
- Un seul changement par tour — les expériences atomiques établissent la causalité
- Lire avant d’écrire — consulter git log et le TSV avant de modifier
- Vérification mécanique uniquement — exécuter la commande, extraire le nombre
- Rollback automatique —
git revert HEAD --no-editen cas d’échec - La simplicité l’emporte — métrique identique + moins de code = garder
Documentation complète (English)
autoresearch
コーディングエージェント向け自律型目標駆動イテレーションエンジン。Rust 製。
「目標を設定 → エージェントがループを実行 → 目覚めたら結果が出ている」
English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский
仕組み
目標を記述 → エージェントが設定を確認 → 「開始」と伝える
│
┌────────┴────────┐
│ ループ実行中 │
│ │
│ 1. コンテキスト読取 │
│ 2. 仮説を立てる │
│ 3. 1箇所を変更 │
│ 4. Git コミット │
│ 5. 検証を実行 │
│ 6. 改善した? │
│ → 保持 │
│ → 元に戻す │
│ 7. 結果を記録 │
│ 8. 次のターン │
└─────────────────┘
改善は積み重なり、失敗は自動的にリバートされます。進捗は TSV 形式で記録されます。エスカレーション(改良 → 方針転換 → Web 検索 → 停止)により無限リトライを防止します。
コマンド
| コマンド | 機能 | デフォルト反復回数 |
|---|---|---|
/autoresearch | コアループ:変更 → 検証 → 保持/破棄 | 25 |
/autoresearch:plan | 対話型ウィザード → 検証済み設定 | 1回 |
/autoresearch:debug | 仮説ベースのバグ追跡 | 15 |
/autoresearch:fix | エラーをゼロになるまで1つずつ修正 | 20 |
/autoresearch:security | STRIDE + OWASP セキュリティ監査 | 15 |
/autoresearch:ship | 8フェーズのリリースフロー | 線形 |
/autoresearch:scenario | 12次元のエッジケース生成 | 20 |
/autoresearch:predict | 5人の専門家ペルソナによる議論 | 1回 |
/autoresearch:learn | 偵察 → ドキュメント生成 → 検証 → 修正 | 10 |
/autoresearch:reason | ブラインド審査付き対立的議論 | 8 |
/autoresearch:probe | 8つのペルソナが要件を徹底質問 | 15 |
/autoresearch:improve | プロダクト改善リサーチ | 20 |
/autoresearch:evals | 反復結果の分析:傾向とプラトー | 1回 |
クイックスタート
Claude Code(プラグインインストール)
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude
セッションを再起動。13個すべてのコマンドが利用可能になります。
Codex CLI
$skill-installer install https://github.com/coder-company/agent-autoresearch
使い方:$autoresearch
OpenCode
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode
使用:/autoresearch または /autoresearch_debug
ソースからビルド
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh
Rust ツールチェーンが必要です(rustup.rs)。ランタイム依存ゼロの約 3MB バイナリが生成されます。
重要ルール
- 1ターン1変更 — 原子的な実験で因果関係を確立
- 書く前に読む — 変更前に git log と結果 TSV を確認
- 機械的検証のみ — コマンド実行、数値パース
- 自動ロールバック — 失敗時は
git revert HEAD --no-edit - シンプルさが勝つ — 同じメトリクス + コード削減 = 保持
autoresearch
코딩 에이전트를 위한 자율 목표 지향 반복 엔진. Rust로 작성.
“목표를 설정 → 에이전트가 루프를 실행 → 결과를 확인”
English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский
작동 방식
목표를 설명 → 에이전트가 설정을 확인 → "시작"이라고 말함
│
┌────────┴────────┐
│ 루프 실행 중 │
│ │
│ 1. 컨텍스트 읽기 │
│ 2. 가설 수립 │
│ 3. 한 곳만 수정 │
│ 4. Git 커밋 │
│ 5. 검증 실행 │
│ 6. 개선됨? │
│ → 유지 │
│ → 롤백 │
│ 7. 결과 기록 │
│ 8. 다음 턴 │
└─────────────────┘
모든 개선은 누적됩니다. 모든 실패는 자동으로 되돌려집니다. 진행 상황은 TSV 형식으로 기록됩니다. 에스컬레이션 사다리(정제 → 전환 → 웹 검색 → 중지)가 무한 재시도를 방지합니다.
명령어
| 명령어 | 기능 | 기본 반복 횟수 |
|---|---|---|
/autoresearch | 핵심 반복 루프: 수정 → 검증 → 유지/폐기 | 25 |
/autoresearch:plan | 대화형 마법사 → 검증된 설정 | 1회 |
/autoresearch:debug | 가설 반복을 통한 버그 추적 | 15 |
/autoresearch:fix | 오류를 하나씩 제로까지 수정 | 20 |
/autoresearch:security | STRIDE + OWASP 보안 감사 | 15 |
/autoresearch:ship | 8단계 배포 워크플로우 | 선형 |
/autoresearch:scenario | 12개 차원에서 엣지 케이스 생성 | 20 |
/autoresearch:predict | 5명의 전문가 페르소나 토론 | 1회 |
/autoresearch:learn | 탐색 → 문서 생성 → 검증 → 수정 | 10 |
/autoresearch:reason | 블라인드 심사가 있는 적대적 토론 | 8 |
/autoresearch:probe | 8개 페르소나가 요구사항 심문 | 15 |
/autoresearch:improve | 제품 개선 리서치 | 20 |
/autoresearch:evals | 반복 결과 분석: 추세와 정체기 | 1회 |
빠른 시작
Claude Code (플러그인 설치)
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude
세션을 재시작하세요. 13개 명령어가 모두 사용 가능합니다.
Codex CLI
$skill-installer install https://github.com/coder-company/agent-autoresearch
사용법: $autoresearch
OpenCode
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode
사용법: /autoresearch 또는 /autoresearch_debug
소스에서 빌드
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh
Rust 툴체인이 필요합니다(rustup.rs). 런타임 의존성 없는 약 3MB 바이너리가 생성됩니다.
핵심 규칙
- 턴당 하나의 변경 — 원자적 실험으로 인과 관계를 확립
- 쓰기 전에 읽기 — 수정 전 git log와 결과 TSV 확인
- 기계적 검증만 — 명령 실행, 숫자 파싱
- 자동 롤백 — 실패 시
git revert HEAD --no-edit - 단순함이 이긴다 — 동일한 메트릭 + 더 적은 코드 = 유지
autoresearch
Motor de iteração autônoma orientado a objetivos para agentes de programação. Escrito em Rust.
“Defina o OBJETIVO → O agente executa o LOOP → Você acorda com resultados”
English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский
Como funciona
Você descreve o objetivo → Agente confirma a config → Você diz "vai"
│
┌────────┴────────┐
│ Loop ativo │
│ │
│ 1. Ler contexto │
│ 2. Hipótese │
│ 3. Modificar UM │
│ 4. Git commit │
│ 5. Verificar │
│ 6. Melhorou? │
│ → manter │
│ → reverter │
│ 7. Registrar │
│ 8. Próximo turno│
└─────────────────┘
Cada melhoria se acumula. Cada falha é revertida automaticamente. O progresso é registrado em formato TSV. A escada de escalação (Refinar → Pivotar → Busca web → Parar) impede tentativas infinitas.
Comandos
| Comando | Função | Iterações padrão |
|---|---|---|
/autoresearch | Loop principal: modificar → verificar → manter/descartar | 25 |
/autoresearch:plan | Assistente interativo → configuração validada | única |
/autoresearch:debug | Caça a bugs por iteração de hipóteses | 15 |
/autoresearch:fix | Corrigir erros um a um até zerar | 20 |
/autoresearch:security | Auditoria STRIDE + OWASP com red-team | 15 |
/autoresearch:ship | Fluxo de lançamento em 8 fases | linear |
/autoresearch:scenario | Gerar casos-limite em 12 dimensões | 20 |
/autoresearch:predict | Debate entre 5 especialistas | única |
/autoresearch:learn | Explorar → gerar docs → validar → corrigir | 10 |
/autoresearch:reason | Debate adversarial com juízes cegos | 8 |
/autoresearch:probe | 8 personas interrogam requisitos | 15 |
/autoresearch:improve | Pesquisa de melhorias de produto | 20 |
/autoresearch:evals | Análise de resultados: tendências e platôs | única |
Início rápido
Claude Code (instalação via plugin)
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude
Reinicie sua sessão. Todos os 13 comandos ficam disponíveis.
Codex CLI
$skill-installer install https://github.com/coder-company/agent-autoresearch
Depois: $autoresearch
OpenCode
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode
Use: /autoresearch ou /autoresearch_debug.
A partir do código-fonte
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh
Requer a toolchain Rust (rustup.rs). Gera um binário de ~3 MB sem dependências de execução.
Regras fundamentais
- Uma mudança por turno — experimentos atômicos estabelecem causalidade
- Ler antes de escrever — checar git log e TSV antes de modificar
- Apenas verificação mecânica — executar o comando, extrair o número
- Rollback automático —
git revert HEAD --no-editem caso de falha - Simplicidade vence — mesma métrica + menos código = manter
Documentação completa (English)
autoresearch
Автономный целенаправленный итерационный движок для кодинг-агентов. Написан на Rust.
«Задай ЦЕЛЬ → Агент крутит ЦИКЛ → Просыпаешься с результатами»
English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский
Как это работает
Описываешь цель → Агент подтверждает конфигурацию → Говоришь "поехали"
│
┌────────┴────────┐
│ Цикл активен │
│ │
│ 1. Читать контекст│
│ 2. Гипотеза │
│ 3. Изменить ОДНО │
│ 4. Git коммит │
│ 5. Проверить │
│ 6. Улучшилось? │
│ → оставить │
│ → откатить │
│ 7. Записать │
│ 8. Следующий ход │
└─────────────────┘
Каждое улучшение накапливается. Каждая неудача автоматически откатывается. Прогресс записывается в формате TSV. Лестница эскалации (Уточнить → Сменить подход → Веб-поиск → Стоп) предотвращает бесконечные повторы.
Команды
| Команда | Функция | Итераций по умолчанию |
|---|---|---|
/autoresearch | Основной цикл: изменить → проверить → оставить/отбросить | 25 |
/autoresearch:plan | Интерактивный мастер → валидированная конфигурация | разово |
/autoresearch:debug | Поиск багов через итерацию гипотез | 15 |
/autoresearch:fix | Исправление ошибок по одной до нуля | 20 |
/autoresearch:security | Аудит STRIDE + OWASP с red-team | 15 |
/autoresearch:ship | 8-фазный процесс выпуска | линейно |
/autoresearch:scenario | Генерация граничных случаев по 12 измерениям | 20 |
/autoresearch:predict | Дебаты 5 экспертных персон | разово |
/autoresearch:learn | Разведка → генерация документации → валидация → исправление | 10 |
/autoresearch:reason | Состязательные дебаты со слепыми судьями | 8 |
/autoresearch:probe | 8 персон допрашивают требования | 15 |
/autoresearch:improve | Исследование улучшений продукта | 20 |
/autoresearch:evals | Анализ результатов: тренды и плато | разово |
Быстрый старт
Claude Code (установка плагина)
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude
Перезапустите сессию. Все 13 команд доступны.
Codex CLI
$skill-installer install https://github.com/coder-company/agent-autoresearch
Затем: $autoresearch
OpenCode
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode
Используйте: /autoresearch или /autoresearch_debug.
Сборка из исходников
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh
Требуется Rust toolchain (rustup.rs). На выходе — бинарник ~3 МБ без runtime-зависимостей.
Ключевые правила
- Одно изменение за ход — атомарные эксперименты устанавливают причинность
- Читай перед записью — проверь git log и TSV перед изменением
- Только механическая верификация — выполнить команду, извлечь число
- Автоматический откат —
git revert HEAD --no-editпри неудаче - Простота побеждает — та же метрика + меньше кода = оставить
autoresearch
面向编码代理的自主目标驱动迭代引擎。Rust 编写。
“设定目标 → 代理运行循环 → 你醒来就有结果”
English · 中文 · 日本語 · 한국어 · Français · Deutsch · Español · Português · Русский
工作原理
你描述目标 → 代理确认配置 → 你说"开始"
│
┌────────┴────────┐
│ 循环运行中 │
│ │
│ 1. 读取上下文 │
│ 2. 提出假设 │
│ 3. 修改一处 │
│ 4. Git 提交 │
│ 5. 运行验证 │
│ 6. 有改善? │
│ → 保留 │
│ → 回滚 │
│ 7. 记录结果 │
│ 8. 下一轮 │
└─────────────────┘
每次改善都会累积。每次失败都会自动回滚。进度以 TSV 格式记录。升级策略(细化 → 转向 → 网络搜索 → 停止)防止无限暴力重试。
命令
| 命令 | 功能 | 默认迭代次数 |
|---|---|---|
/autoresearch | 核心迭代循环:修改 → 验证 → 保留/丢弃 | 25 |
/autoresearch:plan | 交互式向导 → 验证后的配置 | 一次性 |
/autoresearch:debug | 通过假设迭代追踪缺陷 | 15 |
/autoresearch:fix | 逐一修复错误直至归零 | 20 |
/autoresearch:security | STRIDE + OWASP 安全审计 | 15 |
/autoresearch:ship | 8 阶段发布流程 | 线性 |
/autoresearch:scenario | 跨 12 个维度生成边界用例 | 20 |
/autoresearch:predict | 5 位专家角色辩论 | 一次性 |
/autoresearch:learn | 侦察 → 生成文档 → 验证 → 修复 | 10 |
/autoresearch:reason | 对抗性辩论与盲审评判 | 8 |
/autoresearch:probe | 8 个角色审问需求 | 15 |
/autoresearch:improve | 产品改进研究 | 20 |
/autoresearch:evals | 分析迭代结果:趋势与瓶颈 | 一次性 |
快速开始
Claude Code(插件安装)
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --claude
重启会话。全部 13 个命令立即可用。
Codex CLI
$skill-installer install https://github.com/coder-company/agent-autoresearch
然后使用:$autoresearch
OpenCode
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh --yes --opencode
使用:/autoresearch 或 /autoresearch_debug
从源码构建
git clone https://github.com/coder-company/agent-autoresearch.git
cd agent-autoresearch
./install.sh
需要 Rust 工具链(rustup.rs)。生成约 3MB 的零依赖二进制文件。
核心规则
- 每轮只改一处 — 原子实验才能建立因果关系
- 先读再写 — 修改前先查看 git log 和结果 TSV
- 机械验证 — 运行命令,解析数字
- 自动回滚 — 失败时执行
git revert HEAD --no-edit - 简洁为王 — 指标相同 + 代码更少 = 保留