An AI coding agent that reads your code, edits files, runs commands, and reasons through bugs — across a terminal REPL, a full-screen TUI, and a browser UI, while you keep your hand on every permission.
npm i -g wrongstack && wrongstackWrongStack drives autonomous goal loops, parallel subagent fan-out, multi-agent Director orchestration, Brain-governed policy decisions, and collaborative debugging — and walks you through full Spec-Driven Development cycles. It ships with 36 built-in tools, 20 skills, 8 core plugins + 10 bundled in @wrongstack/plugins, and ~110 providers pulled live from models.dev — no hardcoded model names, no hardcoded pricing, no hardcoded lists. Secrets are AES-256-GCM encrypted at rest with a per-machine key; every tool call clears a per-tool permission policy. Everything lives under ~/.wrongstack/ — the only thing you'd ever commit is .wrongstack/AGENTS.md.
- 🧠 Three surfaces, one brain — a plain readline REPL, an Ink/React TUI (
--tui), and a standalone web UI. - 🤖 A fleet, not a lone agent — a 47-role roster + smart dispatcher fan out under a Director, each subagent fully isolated with its own budget and JSONL transcript.
- 🧠 Brain as an authority seam — risky AutoPhase and Director choices can be auto-decided by policy, denied, or escalated to the human through the TUI.
- ♾️ Set a goal, walk away —
/goallocks in a contract and the eternal / parallel engines grind until it's verifiably done. - 🔌 ~110 providers, zero lock-in — Anthropic, OpenAI, Google, and ~100 OpenAI-compatible endpoints, catalog refreshed from models.dev at boot.
- 🔑 Sign in with a subscription — authenticate with a ChatGPT (Codex), Claude Pro/Max, or GitHub Copilot subscription over OAuth, alongside (not instead of) API keys. See
docs/oauth-signin.md. - 🔎 Fast model switching — the TUI
/modelpicker supports type-to-search filtering with scroll-window navigation, andwstack modelssupports search + pagination. - 🔐 Locked down by default — encrypted secrets, SSRF guards on every redirect hop, fail-closed subagents, symlink containment, plugin trust tiers, WebUI redaction, and cloud-sync path guards.
- 🪶 A compact kernel —
Container · Pipeline · EventBus · RunController(~1670 lines including the full event type catalog). Everything above it is swappable;--no-featuresboots it fully offline.
- Node.js ≥ 22.0.0
- pnpm ≥ 9.0.0 (recommended) or npm
npm install -g wrongstack
# or
pnpm install -g wrongstackThis pulls in the full stack — @wrongstack/core, @wrongstack/runtime, @wrongstack/providers, @wrongstack/tools, @wrongstack/mcp, @wrongstack/plug-lsp, and @wrongstack/tui. The TUI is shipped but lazy-loaded behind --tui, so plain-REPL users pay no React/Ink import cost at startup. The web-based UI (@wrongstack/webui) is available as a separate binary (webui).
After install, wrongstack is on your PATH. (wstack works too — it's an alias.)
Plain REPL (default): readline-based, multiline heredoc, slash commands, streaming text. Works everywhere a terminal works.
TUI (--tui): Ink + React frontend, lazy-loaded. Key features:
- Multi-line paste collapse,
@<query>fuzzy file picker, clipboard image paste (Alt+V) - Live status bar: model · tokens · cache hit · cost ·
running: <tool>while tools execute - LiveActivityStrip: tool in flight + elapsed timer per running subagent
- Esc-to-steer: aborts run, terminates fleet, prepends STEERING preamble to your next message
/goal <description>: locks in full-autonomy mode — no implicit budget cap- Signal-safe cleanup, non-TTY guard, re-entrancy guard on Enter, resize ghost mitigation
- Real-time stage chip:
⟳ DECIDE/⚡ EXECUTE/◎ REFLECTupdates every tick during eternal/parallel runs
Web UI (@wrongstack/webui): React + Radix + Tailwind frontend with a Node ws backend. Standalone webui binary serves on 3456/3457; CLI can opt in with wrongstack --webui. Highlights:
- Topbar status bar: ctx% · tokens · cache hit · cost · elapsed · iteration
- Per-message footer: token usage, Pin / Edit & resend / Retry
- Tool bubbles: live
tool.progressstream, collapsible gutter, Download/Copy on hover - Sidebar: live TODO snapshot, Pinned panel, History with grouping + search
- Operations panels: Goal, Process Monitor, Checkpoint Timeline, AutoPhase, phase agents, task board, and worktree lanes
- Autonomy picker: switch
off/suggest/auto/eternal/eternal-parallelfrom the UI - Overlays:
Ctrl+Kcommand palette,Ctrl+Mmodel switcher,Ctrl+Fchat search,?shortcuts - Slash commands with keyboard nav, day-separator dividers, dynamic tab title
# Standalone (recommended for the full experience)
webui # binds backend to 127.0.0.1:3457, serves UI on 3456
WS_HOST=0.0.0.0 webui # expose on the LAN
# Or piggy-back on the CLI process
wrongstack --webuiAll tools are registered out of the box — no plugin required.
| Tool | What it does |
|---|---|
read |
Read file contents with offset/limit |
write |
Write or overwrite a file |
edit |
Surgical string replacement in existing files |
replace |
Batch regex replacement across matched files |
glob |
Find files matching a pattern |
grep |
Search file contents with regex |
bash |
Run shell commands |
exec |
Restricted shell with an allowlist (node, npm, pnpm, …) |
fetch |
HTTP fetch with HTML→markdown (localhost blocked by default) |
search |
Web search (DuckDuckGo / Google / Bing) |
patch |
Apply unified diff patches |
json |
Parse and query JSON with dot notation |
diff |
Show differences between files or commits |
tree |
Display directory structure as ASCII tree |
lint |
Run linter (Biome / ESLint / TSLint) |
format |
Format code with Biome / Prettier |
typecheck |
TypeScript type checking |
test |
Run tests with Vitest / Jest / Mocha |
install |
Install npm packages |
audit |
Security vulnerability audit |
outdated |
Check for outdated packages |
logs |
Stream or fetch service log files |
document |
Generate JSDoc/TSDoc comments |
scaffold |
Generate boilerplate from templates |
tool_search / tool_use / batch_tool_use / tool_help |
Meta-tooling for tool discovery and orchestration |
todo |
Track multi-step tasks |
git |
Common git operations |
task |
Structured work items with dependencies, types, and priorities |
context_manager |
Inspect / trim / compact the in-flight context window |
remember / forget |
Persist notes across sessions (project- or user-scoped, gated by features.memory) |
codebase-index |
Build / update the SQLite symbol index (incremental; multi-language) |
codebase-search |
BM25-ranked search over indexed symbol names, signatures, and doc comments |
codebase-stats |
Summary of the current symbol index |
The engine has two modes, both launched via /autonomy:
eternal — single-leader goal-driven loop. Runs decide → execute → reflect → sleep cycles against the persistent /goal until you stop it (Esc / /steer / Ctrl+C / /autonomy stop). Force-enables YOLO. Hybrid decide pipeline walks pending todos → dirty git → LLM brainstorm, so the loop produces useful work even with no queued task. Each phase transition calls the onStage callback — the TUI renders the live stage chip (⟳ DECIDE, ⚡ EXECUTE, ◎ REFLECT) updating every tick. TUI shows a red ETERNAL chip; WebUI receives a live eternal.iteration WS broadcast per cycle.
parallel — leader drives, N subagents fan out simultaneously. ParallelEternalEngine decomposes the goal into up to parallelSlots tasks per tick (default 4, max 16), spawns that many subagents via DefaultMultiAgentCoordinator, awaits all results, aggregates, and writes a journal entry. [GOAL_COMPLETE] in any subagent's output stops the engine cleanly. maxConcurrent: 8 supports the higher fan-out density. TUI shows an amber ⟳ PARALLEL chip with the live iteration count.
Both engines persist state to goal.json. Both can be paused and resumed without losing work: /goal pause sets goalState: 'paused' — the engine exits gracefully after the current iteration finishes (no AbortController kill, no work torn mid-task); /goal resume flips goalState back to 'active'.
/goal <text> persists to ~/.wrongstack/projects/<hash>/goal.json and injects a full-autonomy preamble into the next turn:
[GOAL — LOCKED IN. You will work on this until it is verifiably done.
YOUR GOAL: <user text>
AUTHORITY: Spawn unlimited subagents · Use any provider/model · Unlimited tool calls + iterations
Agent loop auto-extends every 100 iterations. Retry failed tools; switch providers on 429.
WHAT "DONE" MEANS: Named artifact (passing test, file at a path, fixed bug verified by re-running)
A 10-second user verification recipe. No hedges.
WHAT IS NOT DONE: Unhandled error · empty result · "should I continue?" · partial progress as success
PERSISTENCE: Blocked? Try 3 angles. Tool failed? Read error, alter input, retry.
Subagent useless? Respawn with tighter prompt.
BEGIN.]
/goal shows status + journal. /goal clear stops the engine. /goal journal [N] prints the FIFO ring (default 25, cap 500). Goal state lives in goalState: 'active' | 'paused' | 'done'.
Fleet tools (14 on the Director's belt from first message): spawn_subagent, assign_task, await_tasks, ask_subagent, ask_result, roll_up, terminate_subagent, terminate_all, fleet_status, fleet_usage, fleet_health, fleet_session, fleet_emit, collab_debug. Large ask_subagent responses (>2K chars) are stashed in a per-Director LargeAnswerStore and returned as a summary + key — the Director pulls the full text back with ask_result only when it needs it, so its context stays bounded no matter how many big asks happen.
/fleet command: status — task progress per subagent · usage — token + cost breakdown · kill <id> — stop one subagent · kill — stop all · manifest — full fleet snapshot · log <id> — transcript summary · log <id> raw — full JSONL dump · journal — recent parallel engine entries · spawn <role> [count] — spawn N subagents of a role · terminate <subagentId> — stop one · retry <id> — re-spawn a failed subagent · stream on|off — toggle live output streaming.
/spawn [--provider --model --name --tools] <task> — launch a single subagent. No implicit budget cap; runs until done.
/director — promote the session to Director mode at runtime (must be called before any subagent is spawned).
/autonomy parallel — LLM-driven fan-out mode described above.
Subagent failure taxonomy (14-kind discriminated union): budget_timeout (✓ retryable), budget_tool_calls, budget_iterations, provider_rate_limit (✓), provider_5xx (✓), provider_auth, tool_failed, empty_response, aborted_by_parent, context_overflow. Every failure includes cause (error name + message + stack). The delegate tool exposes errorKind / retryable / backoffMs so the calling LLM can branch on classification.
Architecture: Host EventBus (always-on bridge) → Leader Agent (Director) + FleetBus (director-only fan-in) → DefaultMultiAgentCoordinator → AgentSubagentRunner per task (fresh Agent + Context + EventBus, full isolation) → per-subagent JSONL transcripts on disk.
47-agent roster + smart dispatcher. The Director draws from a 47-role agent catalog; a smart dispatcher routes each task to the best-matching role instead of spawning generic clones. The TUI fleet monitor (Ctrl+F) shows per-subagent status and a fleet-wide token gauge, and auto-extended budgets surface as a ⚡ extended ×N badge across all fleet UIs. Spawned subagents take a memorable scientist nickname (Turing, Shannon, Gauss, …) so you can track them across the fleet at a glance.
Collaborative debugging. Director.spawnCollab() runs BugHunter, RefactorPlanner, and Critic in parallel on one shared, immutable file snapshot. Findings flow through the FleetBus as structured events (bug.found → refactor.plan → critic.evaluation); the Director routes each output to its dependents through a shared scratchpad — so agents build on each other's conclusions without exchanging full transcripts — and returns a single structured CollabDebugReport. Subagents signal upward with the fleet_emit tool.
--director flag launches the full fleet roster from the CLI directly:
wrongstack --director "audit src/ for security issues"Brain is a small authority layer for decisions that are bigger than one tool
permission prompt. Director and AutoPhase can ask a BrainArbiter whether to
continue, deny, or escalate a risky choice — for example extending a subagent's
budget or attempting an automatic worktree merge-conflict resolution.
The default Brain is conservative: low-risk requests with an explicit
recommended option can be answered automatically; higher-risk choices route to a
human decision prompt in the TUI. The prompt appears in chat history, the status
bar shows a compact 🧠 chip, and the answer flows back through typed
brain.* EventBus events.
The /sdd <path-to-spec.md> slash command guides the agent through the SDD loop: parse → analyze → generate → track → execute. Built on SpecParser, TaskTracker, TaskGenerator, and TaskFlow from @wrongstack/core/sdd. Reads a markdown spec file, generates tasks via TaskGenerator, and displays task status inline.
const parser = new SpecParser();
const spec = parser.parse(markdownSpec);
const analysis = parser.analyze(spec);
const tracker = new TaskTracker({ store });
const generator = new TaskGenerator({ taskTracker: tracker });
await generator.generateFromSpec(spec);
const flow = new TaskFlow({ tracker });
await flow.execute({ executeTask: async (task) => { /* ... */ } });Ten ready-to-use plugins ship in one package, each available via a subpath export (@wrongstack/plugins/<name>):
| Plugin | Tools | Notes |
|---|---|---|
auto-doc |
auto_doc, auto_doc_preview |
JSDoc / TSDoc comment generation |
git-autocommit |
git_autocommit, git_stage, git_status_summary |
Conventional-commit messages |
shell-check |
shellcheck_run, shellcheck_scan |
ShellCheck wrapper |
cost-tracker |
cost_summary, cost_reset, cost_export |
Token usage + cost per model via provider.response events |
file-watcher |
watch_start, watch_stop, watch_list |
Emits file-watcher:changed events |
web-search |
web_search, web_fetch |
Cached DuckDuckGo + URL→markdown |
json-path |
jsonpath_query, jsonpath_mutate |
JSONPath-style queries and mutations |
cron |
cron_schedule, cron_list, cron_cancel |
Recurring actions via beforeIteration / afterIteration hooks |
template-engine |
render_template, template_variables |
{{var}} / {{#if}} / {{#each}} expansion + system-prompt contributor |
semver-bump |
semver_bump, changelog_update |
Conventional-commit-driven version bumps |
All plugins type-check under strict + noUncheckedIndexedAccess, use the real plugin API (api.onEvent not pipeline mutation), register AgentExtension for iteration hooks, and ship Record<string, unknown> typings on every tool execute.
Seven plugins ship enabled by default and load before any user plugin — they wire core infrastructure and claim bare slash-command names (only official first-party plugins may do so; external plugins stay namespaced). Opt a specific one out with { "name": "wstack-git", "enabled": false } in config.plugins, or disable all with features.plugins: false.
| Built-in plugin | Slash commands | What it adds |
|---|---|---|
wstack-prompts |
/prompts list|view|add|delete|edit|extend |
Personal prompt library with LLM-powered enhancement |
wstack-sync |
/sync status|enable|disable|push|pull|categories |
GitHub cloud sync for settings, skills, prompts, memory, and history — token encrypted via the secret vault, no git CLI needed |
wstack-git |
/commit, /gitcheck, /push |
LLM-written conventional commits, pre-commit sanity check, push |
wstack-security |
/security scan|audit|report |
Security scanning surface |
wstack-skills |
/skill, /skill-gen, /skill-install, /skill-update, /skill-uninstall |
Skill discovery, generation, and lifecycle |
wstack-plan |
/plan show|add|start|done|remove|clear |
Per-session strategic roadmap (chip in the TUI status bar) |
wstack-observability |
/metrics, /health |
Prometheus metrics + health snapshot |
Cloud sync (/sync) pushes/pulls user-selected ~/.wrongstack categories — settings, skills, prompts, memory, history — to a private GitHub repo over the REST API. State lives in ~/.wrongstack/sync.json (token encrypted) + sync-state.json; pick categories with /sync categories. Pulls validate every remote tree entry and reject traversal (.., absolute paths, or paths resolving outside the category root); file-backed categories such as settings also reject nested paths.
Manage from CLI or REPL:
wstack plugin list
wstack plugin install telegram
wstack plugin official
wstack plugin disable telegram
wstack plugin enable telegram
wstack plugin remove telegramtelegram and lsp are bundled aliases for @wrongstack/telegram and @wrongstack/plug-lsp.
Authenticate against a vendor subscription instead of a metered API key — an orthogonal credential layer that sits next to the API-key providers, not in place of them:
wstack auth login chatgpt # ChatGPT Plus/Pro/Team (Codex) → provider openai-codex
wstack auth login claude # Claude Pro/Max → provider anthropic-oauth
wstack auth login copilot # GitHub Copilot → provider github-copilot
# or: wstack auth → "s) Sign in with a subscription"Codex and Claude use a PKCE loopback flow; Copilot uses GitHub's device flow.
Tokens self-refresh (near-expiry + once on 401) and are AES-256-GCM encrypted
at rest like any other secret. After login, select the provider/model from the
/model picker or with --provider … --model … — modern Claude (Opus 4.8,
Sonnet 4.6) serves its full 1M-token context on this path automatically.
⚠️ Using a subscription outside its official client is a ToS gray area and can get your account rate-limited or banned. The sanctioned path for programmatic use is an API key; sign in with a subscription only if you accept that risk. Full reference + per-provider details:docs/oauth-signin.md.
| Family | Transport | Providers |
|---|---|---|
anthropic |
Native Claude API + SSE | Anthropic, MiniMax, Kimi, Google Vertex (Anthropic) |
openai |
Native OpenAI Chat Completions + SSE | OpenAI, Perplexity Agent, Vivgrid |
openai-compatible |
OpenAI-spec endpoints + SSE | ~100 providers: Mistral, Groq, DeepSeek, OpenRouter, Together, xAI, Cerebras, Ollama, Fireworks, Moonshot, GLM, Alibaba, … |
google |
Gemini :streamGenerateContent?alt=sse |
Google AI Studio |
anthropic-oauth |
Claude Messages API + Bearer (OAuth) | Claude Pro/Max — Sign in with Claude |
openai-codex |
ChatGPT Responses API (OAuth) | ChatGPT Plus/Pro/Team — Sign in with ChatGPT |
github-copilot |
Copilot proxy, OpenAI wire (OAuth) | GitHub Copilot — Sign in with Copilot |
unsupported |
Needs plugin | Cohere, Bedrock, Vertex (non-Anthropic), Azure |
All four supported families implement real streaming end-to-end: provider.stream() is the source of truth, complete() is aggregateStream(stream(...)). Mid-stream aborts preserve any partial assistant text already received. Catalog comes from models.dev/api.json — no hardcoded pricing, no hardcoded model names — and is refreshed synchronously on boot so provider resolution, model capabilities, and the TUI picker see fresh data before the app starts. The refresh has a 15-second timeout and falls back to cache with a warning; use --no-models-refresh for offline or CI runs.
Vision MCP adapters: text-only models work with images via MCP server adapters:
wstack mcp add zai-vision --enable
wstack mcp add minimax-vision --enableWhen the active model lacks native vision, WrongStack writes clipboard images to a temp file, invokes the adapter, replaces the image with the returned text, then removes the temp file.
Every run writes a <id>.jsonl append-only event log under ~/.wrongstack/projects/<sha256>/sessions/. On close, a tiny <id>.summary.json manifest is written alongside — now analytics-grade: title, model, provider, tokenTotal, endedAt, iterationCount, toolCallCount, toolErrorCount, fileChangeCount, compactionCount, a per-tool toolBreakdown, and an outcome (completed / error / timeout / aborted). wrongstack sessions lists hundreds of past runs without re-parsing each JSONL (O(N) stats, not O(N) full parses). session_resumed marker written on resume. Orphan tool_result events (missing matching tool_use) emit session.damaged event so the session can be flagged for repair. Housekeeping: /prune deletes stale sessions (and their summary / plan / todos sidecars) by age, and /prune --rebuild-index rebuilds the session index from disk.
API keys and MCP auth tokens encrypted with AES-256-GCM using a 32-byte key kept at ~/.wrongstack/.key (mode 0600 on POSIX). Format: enc:v1:<iv>:<tag>:<ciphertext>. Random IV per encryption — same plaintext yields different ciphertexts. The CLI auto-migrates plaintext keys it finds in config.json on every boot. Field detection is regex-based (/apikey|authtoken|bearer|secret|password|refreshtoken|sessionkey|access[_-]?token|private[_-]?key/i); publicKey is on a hard-coded override list.
Agents can operate in different modes that inject role-specific system prompts: default, code-reviewer, code-auditor, architect, debugger, tester, devops, refactorer. Modes live in ~/.wrongstack/modes/ and can be extended with custom prompts.
import { DefaultModeStore } from '@wrongstack/core';
import { createModeTool } from '@wrongstack/tools';
const modeStore = new DefaultModeStore({ directory: '~/.wrongstack/modes' });
const modeTool = createModeTool(modeStore);
await modeTool.execute({ action: 'set', mode: 'code-reviewer' });Switch how aggressively the session trims history: balanced (default), frugal (most token-friendly), deep (preserves more recent turns), archival (steady decision-preserving compaction). Use /context mode to list policies and switch at runtime. repair to fix damaged tool-call adjacency.
EventBus carries events across Session, Iteration, Provider, Tool, Token/compaction, Subagent lifecycle, MCP, and Error categories. Subscribe with events.on(name, fn) or events.once(name, fn); listeners that throw are caught and logged, never re-thrown.
Four-layer observability:
- Human view (TUI): LiveActivityStrip (top-4), FleetPanel, lifecycle chips on failures
- Host EventBus (always-on): lifecycle + per-tool bridge
- FleetBus (director-only): full per-subagent event stream —
tool.*,iteration.*,provider.{text,thinking}_*,compaction.*,token.* - Per-subagent JSONL on disk:
/fleet log <id>summarises,/fleet log <id> rawdumps full transcript
- Permission policy (
trust.json): per-tool allow/deny, persisted to disk, applies to subagents - YOLO mode (
--yoloor/yolo): skips all permission prompts — for CI and trusted workflows - Bash tool env allowlist:
WRONGSTACK_BASH_ENV_PASSTHROUGH=1disables the allowlist (legacy unsafe mode — seeSECURITY.md) WRONGSTACK_FETCH_ALLOW_PRIVATE=1: enables localhost/private IPs in thefetchtool- AES-256-GCM encryption for all secrets at rest
- SSRF guard everywhere: the
fetchtool and the builtinsearchtool resolve + re-validate every redirect hop against private/IMDS ranges; MCP transport URL validation blocks IPv4 and IPv6 IMDS (fd00:ec2::254, link-localfe80::/10) - Fail-closed subagent guard: the non-interactive
AutoApprovePermissionPolicyis allowlist-by-default — it approves only an explicit set of safe tools, so newly-added mutating tools (and allmcp__*tools) are denied to prompt-injected delegates by default instead of slipping through a denylist gap - Session-log scrubbing: user- and model-turn text is scrubbed before it hits the
0o600JSONL (and before it rides along in cloud sync), not just tool output - WebUI broadcast redaction:
tool.startedandtool.executedpayloads are scrubbed before WebSocket broadcast, so API keys and bearer tokens don't leak to connected tabs - Cloud-sync path containment: pulled remote entries are rejected if they traverse outside their category root; file-backed categories reject nested paths
- Symlink containment:
read/edit/writeresolve symlinks and re-check the target is inside the project root - Plugin trust tiers + capability gating: only first-party (
official) plugins may register bare slash-command names; toolwrap/override/unregisteris gated on declared capabilities in addition to the officiality tier, so a plugin can only mutate tools it is actually authorized for - Threat model and adversary trust assumptions in
SECURITY.md; audit findings and verification insecurity-report/
api-design, audit-log, bug-hunter, docker-deploy, git-flow, multi-agent, node-modern, observability, prompt-engineering, react-modern, refactor-planner, sdd, security-scanner, skill-creator, tech-stack, testing, typescript-strict — all following one structure (Overview → Rules → Patterns → Skills in scope). Discovered in order: project → user → bundled, with first-seen winning on name collisions.
A lean operating mode that shrinks the per-request prompt without going fully offline. Three levers compound:
- Tier-1 tool belt — the 90+ built-in tools collapse to 10 essentials
(
read,write,edit,bash,grep,glob,diff,patch,json,search), trimming ~4000–6000 tokens of tool-definition overhead. - Compact skills — each in-scope skill renders only its Overview + Rules
sections (or a dedicated
SKILL.save.mdvariant when the skill ships one). - Lazy MCP — MCP server tools are no longer expanded into the tool list at
startup; the model reaches any of them on demand through the
mcp_use({ server, tool, input })meta-tool, so the registered surface stays bounded no matter how many MCP servers are connected.
Toggle it at launch with --token-saving-mode (or features.tokenSavingMode in
config), or live from the TUI settings panel — the status bar shows a token-
saving indicator and the current registered-tool count.
Flips off MCP, plugins, memory tools, models.dev fetch, and skill discovery. What's left: kernel (Container + Pipeline + EventBus + RunController, ~1670 lines incl. events) + agent (525 lines) + 36 tools + permission policy + curated system prompt. The minimal-viable WrongStack runs offline with no network calls at startup. Provider family must be declared explicitly in config when using this mode.
Current release: 0.268.0. The HQ command center hardening release. It
ships the Phase 1 hq.welcome handshake, validates HQ client frames with
parseHqFrame(), documents the browser/client protocol and Phase 2 auth
roadmap, expands mailbox drawer/live-feed jsdom coverage, and includes the
release-check cleanup needed to keep audit, typecheck, tests, and build green.
Additive only — no breaking changes.
See CHANGELOG.md for the full, versioned history.
# First run — interactive auth/setup
wrongstack auth
# Sign in with ChatGPT/Codex subscription OAuth
wstack auth login chatgpt
# No config? Interactive picker launches automatically:
wrongstack # provider list → model list → save prompt → REPL
wrongstack --tui # same, then enters TUI
# TUI + YOLO (skip all permission prompts)
wrongstack --tui --yolo
# Specific provider/model — skip the picker
wrongstack --provider groq --model llama-3.3-70b-versatile
# Director fleet orchestration
wrongstack --director "audit src/ for security issues"
# Single-shot query
wrongstack "refactor src/auth.ts to async/await"
# Resume a saved session
wrongstack --resume <session-id>
wrongstack resume <session-id> # sameThree ways to configure:
wrongstack auth— interactive credential manager, saves encrypted credentials to~/.wrongstack/config.json- Shortcut:
wstack auth login chatgptstarts the ChatGPT/Codex OAuth flow directly.
- Shortcut:
- Automatic picker — just run
wrongstackwith no config; saves after selection - CLI flags —
wrongstack --provider <id> --model <id>— skips all interactivity
Add a key later: wrongstack auth groq (prompts, encrypts, stores).
In the TUI, /model opens a two-step provider → model picker with type-to-search filtering in step 2 — no restart needed. After picking a provider, type printable characters to filter models live, use Backspace to edit the filter, and navigate long lists through a centered 10-row window with ▲ N above / ▼ N below indicators. In the plain REPL, relaunch with --provider / --model:
wrongstack --provider openai --model gpt-5.5
wrongstack --provider deepseek --model deepseek-v4-pro
wrongstack --provider openrouter --model anthropic/claude-opus-4-7--provider <id> Override provider (e.g. anthropic, openai, groq)
--model <id> Override model
--cwd <path> Project root (default: process.cwd())
--resume <id> Resume a saved session
--record Record provider request/response to a replay log
--replay <id> Replay a recorded session deterministically (no API calls)
--tui Use the Ink TUI instead of readline REPL
--no-tui Force-disable the TUI (overrides --tui)
--no-banner Suppress the startup banner
--no-features Minimal kernel — no MCP, plugins, memory, models.dev, skills
--no-models-refresh Skip the boot-time models.dev catalog refresh (offline/CI)
--token-saving-mode Lean prompt: 10 Tier-1 tools, compact skills, lazy MCP (mcp_use)
--yolo Auto-allow all tool calls (don't ask for confirmation)
--director Enable Director-based fleet orchestration (LLM-driven subagent planning)
--goal "<task>" Boot directly into goal mode — GOAL preamble injected, TUI auto-enabled
--ask "<text>" Submit one turn verbatim on TUI boot (no preamble)
--verbose / -v Log level → debug
--trace Log level → trace
--log-level <lvl> Explicit log level
--help / --version Standard
Core (both the plain REPL and the TUI): /init /help /clear /compact /context /codebase-reindex /dev /diag /stats /tools /plugin /mcp /auth /memory /todos /tasks /mode /yolo /autonomy /interrupt /btw /next /enhance /fix /autophase /worktree /settings /sdd /save /load /prune /exit
Every built-in command is tagged with a category (Run · Session · Inspect · Agent · Config · App); the TUI slash picker groups matches under category headers, and the WebUI surfaces 55 commands in its slash list.
Multi-agent: /spawn /fleet /agents /goal /director /collab /setmodel /models /fallback
TUI-only (need --tui): /model (provider → model picker) · /steer (mid-flight redirect — the plain REPL uses Esc instead) · /queue
Built-in plugins (enabled by default): /prompts /sync /commit /gitcheck /push /security /skill /skill-gen /skill-install /skill-update /skill-uninstall /plan /metrics /health
/telegram becomes available once the Telegram plugin is enabled — wstack plugin install telegram.
| Command | Effect |
|---|---|
/init |
Create .wrongstack/AGENTS.md — auto-detects build system (package.json / pyproject.toml / go.mod / Cargo.toml / Makefile) and pre-fills build/test/lint/run commands |
/dev <shell command> |
Run a shell command from the chat input and see the output. The LLM does NOT see the result — this is a developer convenience shortcut. Timeout: 60 s. Max output: 500 lines |
/spawn [--provider --model --name --tools] <task> |
Launch a single subagent with optional overrides. No implicit budget cap |
/director |
Promote session to Director mode at runtime (must be before any subagent spawns) |
/fleet status|usage|kill|manifest|retry|log|stream on|off|journal|spawn|terminate |
Inspect and control the subagent fleet. log <id> summarises; log <id> raw dumps full JSONL |
/agents |
Print fleet roster (running, idle, completed) with kind chips for failures |
/steer <text> |
(TUI; in the plain REPL use Esc) Mid-flight redirect — aborts iteration, terminates fleet, drops queue, prepends STEERING preamble |
/goal <text> |
Lock in a goal — auto-refines it into deliverables, persists to ~/.wrongstack/projects/<hash>/goal.json, tracks progress/trends, and injects the full-autonomy preamble. Subcommands: /goal (status + journal), /goal refine, /goal clear, /goal pause, /goal resume, /goal journal [N] |
/tasks add|start|done|fail|status|depends|assign|promote|clear |
Structured task management between /plan and /todos: dependencies, types, priorities, estimates, agent assignment, and promote-to-todos flow |
/queue |
(TUI) Show, clear, or delete entries from the in-flight message queue |
/plan show|add|start|done|remove|clear |
Per-session plan JSON. Mirrored to disk; surfaces 📋 ⌛N ☐N ✓N chip in TUI status bar |
/autonomy off|suggest|on|eternal|parallel|stop|toggle |
Self-driving mode. suggest shows next steps without executing; on auto-continues; eternal runs goal-driven loop; parallel fans out 4-8 subagents per tick. TUI shows ∞ AUTO / ∞ SUGGEST / ETERNAL / ⟳ PARALLEL chip |
/yolo on|off|toggle |
Flip YOLO mode (auto-approve all tool calls). /yolo alone shows status. TUI shows ⚠ YOLO chip |
/interrupt (aliases /stop, /int) |
Stop the in-flight leader run and terminate the whole fleet — for when Esc is eaten by tmux or you're driving from the WebUI. REPL Ctrl+C now also stops subagents |
/mode |
Switch persona: default, code-reviewer, code-auditor, architect, debugger, tester, devops, refactorer. Custom modes in ~/.wrongstack/modes/ |
/model |
(TUI) Two-step provider → model picker. In the plain REPL, relaunch with --provider / --model |
/setmodel <key> <provider/model> |
Set per-role or per-phase model in the model matrix (e.g. /setmodel security-scanner openai/gpt-4o). Also supports resolve <role> and doctor for matrix diagnostics |
/fallback |
View or edit the rate-limit fallback chain. On a 429/529/5xx after retries, the agent rotates to the next model in the chain instead of failing; each hop prints ↻ switched to <provider/model> |
/auth [status <provider>|open|help] |
In-session credential dashboard. Shows saved provider/key status without blocking the REPL/TUI; run wstack auth for the full interactive key manager |
/image or /paste-image |
Attach clipboard PNG. TUI also Alt+V |
/context mode <policy> |
Switch context-window mode: balanced, frugal, deep, archival. repair fixes damaged tool-call adjacency |
/plugin install|disable|enable|remove|official [name] |
Manage plugins. install adds bundled package to config (no npm). Restart to load/unload |
/telegram send|read|chat|attach |
Telegram plugin (enable with wstack plugin install telegram): send <chatId> <message>, read <chatId> [limit], chat list recent, attach <file> send file |
/sdd <path-to-spec.md> |
Spec-Driven Development workflow: parse → analyze → generate → track → execute. Built on SpecParser, TaskTracker, TaskGenerator, TaskFlow |
/settings |
View or change settings (non-blocking, works in REPL + TUI): /settings (show), /settings delay <seconds>, /settings mode <off|suggest|auto>, /settings defaults; persists to ~/.wrongstack/config.json |
/prune [days] [--dry-run] [--rebuild-index] |
Delete sessions older than N days (default 30, clamped 1–365). --dry-run previews; --rebuild-index rebuilds the session index from disk. Sessions referenced by active.json are never pruned |
/compact, /tools, /skill, /save, /resume, /help, /clear, /stats, /diag, /exit |
Compact context, list tools/skills, save/resume session, help, clear, token+cost stats, diagnostics, exit |
| Key / Command | What it does |
|---|---|
| Esc (while busy) | Soft interrupt — abort agent, terminate fleet, drop queue, set "steering pending". Next message carries a STEERING preamble |
/steer <text> |
Same as Esc + typing, in one shot. Works when Esc is eaten by tmux |
/goal <text> |
"No force stops this" — full autonomy contract. Only Esc / Ctrl+C interrupt |
| Ctrl+C × 1 | Cancel current iteration + terminate fleet (1.5s cap) |
| Ctrl+C × 2 | Force-exit Ink loop |
| Ctrl+C × 3 | Hard process.exit(130) |
| Ctrl+F | Toggle the fleet monitor — per-subagent status + fleet-wide token gauge |
| Ctrl+G | Toggle the agents monitor — live per-agent context (current tool, streaming tail, sparkline) |
| F9 | Toggle the goal panel — refined mission, deliverables checklist, progress bar, trend, state, and last task |
| Ctrl+T | Toggle the worktree monitor — AutoPhase isolation branches |
| Ctrl+P | Toggle the phase monitor — active AutoPhase phases and tasks |
| Ctrl+T | Close worktree monitor (when open); otherwise delete word before cursor |
/fleet kill <id> |
Stop one specific subagent |
wrongstack init # First-run setup wizard
wrongstack auth <prov> # Store an API key (prompted, encrypted at rest)
wrongstack sessions # List saved sessions for this project
wrongstack resume <id> # Continue a saved session
wrongstack replay <id> # Inspect a recorded replay log (see --replay)
wrongstack audit <id> # Verify the SHA-256-chained tool-call audit trail
wrongstack config # Show / edit config
wrongstack tools # List registered tools
wrongstack skills # List discovered skills
wrongstack providers # ~110 providers grouped by wire family
wrongstack models [prov] [--search <term>] [--page N] [--per-page N] # searchable, paginated model list
wrongstack mcp # Inspect connected MCP servers
wrongstack plugin # Plugin manifest commands
wrongstack diag # Diagnostics: provider, tokens, paths
wrongstack usage # Token + cost totals across sessions
wrongstack projects # List known project hashes → paths
wrongstack help # Help text
wrongstack version # Version| Variable | Description |
|---|---|
<PROVIDER>_API_KEY |
API key for the provider (e.g. ANTHROPIC_API_KEY, OPENAI_API_KEY) |
WRONGSTACK_FETCH_ALLOW_PRIVATE |
Set to 1 to allow localhost / private IPs in the fetch tool |
WRONGSTACK_BASH_ENV_PASSTHROUGH |
Set to 1 to disable the bash-tool env allowlist (legacy unsafe mode — see SECURITY.md) |
apiKey-like fields are auto-encrypted on first contact. Plaintext keys in older config files get migrated transparently on boot.
Commit this file to share project conventions with the agent across all developers:
// Conventions for this repo
- Always run tests after editing
- Use pnpm, not npm
- PR titles follow conventionalcommits.org
1. Minimal kernel. Container, Pipeline, EventBus, RunController, and the token table total ~1670 lines (including the full event type catalog). The agent loop adds another 525 lines. Everything else is replaceable.
2. Zero non-overridable behavior. 16 services bound through Container (Logger, TokenCounter, SessionStore, MemoryStore, PermissionPolicy, Compactor, PathResolver, ConfigLoader, Renderer, InputReader, ErrorHandler, RetryPolicy, SkillLoader, SystemPromptBuilder, SecretScrubber, ModelsRegistry). 6 pipelines as middleware chains. Tools, providers, MCP servers, and slash commands all live in registries.
3. Standalone sufficiency. Works with 36 built-in tools, 4 wire-family transports, permission policy, and a curated system prompt — no plugins required.
4. Layered, not monolithic. --no-features flips off MCP, plugins, memory tools, models.dev fetch, and skill discovery. The minimal-viable WrongStack runs offline with no network calls at startup.
| Package | Purpose |
|---|---|
@wrongstack/core |
Kernel, agent, types, registries, plugin contract |
@wrongstack/runtime |
Default runtime implementations, host composition helpers, extension pack contracts |
@wrongstack/providers |
Anthropic/OpenAI/OpenAI-compatible/Google wire adapters + SSE |
@wrongstack/tools |
36 built-in tools (incl. SQLite codebase index) |
@wrongstack/mcp |
MCP server registry + reconnection logic |
@wrongstack/cli |
REPL, subcommands, slash commands, terminal renderer |
@wrongstack/tui |
Ink-based TUI (lazy-loaded behind --tui) |
@wrongstack/plug-lsp |
LSP plugin (wrongstack-lsp-setup binary) |
@wrongstack/telegram |
Telegram plugin: send/read/notifications, /telegram:* slash commands |
@wrongstack/webui |
Standalone web UI — webui binary, also via wrongstack --webui |
@wrongstack/plugins |
Official plugin collection — 10 plugins via subpath exports |
CLI → REPL, renderer, slash commands, subcommands
TUI → Ink frontend (lazy-loaded behind --tui)
Steering → Esc / /steer / /goal — mid-flight redirect + autonomous lock-in
Director → Fleet orchestration (LLM-driven, opt-in via --director)
Agent → loop, context, system prompt, permission, compaction, autoExtendLimit
Tools → ToolExecutor (parallel/sequential/smart strategies, abort-safe)
Runtime → Default host assembly + WrongStackPack extension composition
Kernel → Container · Pipeline · EventBus · RunController (the 4 primitives)
Provider → 4 wire families, factories built from ModelsRegistry, real SSE
Models → models.dev/api.json fetched + cached + classified
For the full walk-through — including the L1-A reactive ConversationState, how the six pipelines fire per turn, and how plugins / MCP / observability plug in — see docs/architecture.md.
- 6700+ tests passing across 500+ test files in the 0.257.0 release gate
- Coverage thresholds: ≥85 % lines / ≥85 % functions / ≥70 % branches / ≥82 % statements
- All workspace packages build clean with TypeScript strict +
noUncheckedIndexedAccess - Node 22+ only, ESM-only, no CommonJS bundles
- Release gate verified locally:
pnpm audit --audit-level=moderate+pnpm typecheck+pnpm test+pnpm build - Threat model:
SECURITY.md
| Doc | What it covers |
|---|---|
docs/architecture.md |
Package layout, kernel primitives, agent lifecycle, L1-A reactive split |
docs/slash/ |
Full reference for every built-in slash command |
docs/subcommands/ |
Full reference for every wstack <subcommand> |
docs/director-architecture.md |
Director: FleetBus, prompt layering, safety caps, per-subagent JSONL, shared scratchpad |
docs/plugin-author-guide.md |
Building a plugin: capabilities, dependencies, configSchema, teardown, testing |
docs/plugin-management.md |
User-facing plugin workflows: list/add/remove/enable/disable, config layout |
docs/provider-author-guide.md |
Adding an LLM provider via WireFormatConfig, stream-state design, vendor quirks |
docs/tool-author-guide.md |
Writing a tool: streaming executeStream, permission semantics, cleanup vs registerAbortHook |
docs/yolo-mode.md |
YOLO mode: permission pipeline, runtime toggle, trust file, subagent policy |
docs/skills.md |
Writing skills: frontmatter format, discovery layers, description quality, token budget |
docs/oauth-signin.md |
Sign in with a ChatGPT/Codex, Claude, or GitHub Copilot subscription (OAuth); flows, models, token storage, ToS caveat |
docs/configuration.md |
Full config reference: every field with type, default, and example |
docs/troubleshooting.md |
Common issues, diagnosis steps, wstack diag, exit codes, reset commands |
pnpm bench runs all *.bench.ts files via a separate vitest config and writes results to bench-results.json (gitignored). Current suite covers compactor hot paths, token estimation, JSON-schema validation, and the system prompt builder. See vitest.bench.config.ts.
See examples/ for 6 categories of working examples:
MIT © 2026 ECOSTACK TECHNOLOGY OÜ — see LICENSE.

{ "version": 1, "provider": "anthropic", "model": "claude-opus-4-7", "providers": { "anthropic": { "apiKey": "enc:v1:<iv>:<tag>:<ciphertext>" } }, "features": { "mcp": true, "plugins": true, "memory": true, "modelsRegistry": true, "skills": true }, "plugins": [] }