GitHub - HoneyAdam/WrongStack · GitHub
Skip to content

HoneyAdam/WrongStack

 
 

Folders and files

Repository files navigation

WrongStack

Built on the wrong stack. Shipped anyway.

An AI coding agent that reads your code, edits files, runs commands, and reasons through bugs — across a terminal REPL, a full-screen TUI, and a browser UI, while you keep your hand on every permission.

npm downloads node TypeScript tests license

npm i -g wrongstack && wrongstack

WrongStack drives autonomous goal loops, parallel subagent fan-out, multi-agent Director orchestration, Brain-governed policy decisions, and collaborative debugging — and walks you through full Spec-Driven Development cycles. It ships with 36 built-in tools, 20 skills, 8 core plugins + 10 bundled in @wrongstack/plugins, and ~110 providers pulled live from models.dev — no hardcoded model names, no hardcoded pricing, no hardcoded lists. Secrets are AES-256-GCM encrypted at rest with a per-machine key; every tool call clears a per-tool permission policy. Everything lives under ~/.wrongstack/ — the only thing you'd ever commit is .wrongstack/AGENTS.md.

✨ Why it slaps

  • 🧠 Three surfaces, one brain — a plain readline REPL, an Ink/React TUI (--tui), and a standalone web UI.
  • 🤖 A fleet, not a lone agent — a 47-role roster + smart dispatcher fan out under a Director, each subagent fully isolated with its own budget and JSONL transcript.
  • 🧠 Brain as an authority seam — risky AutoPhase and Director choices can be auto-decided by policy, denied, or escalated to the human through the TUI.
  • ♾️ Set a goal, walk away/goal locks in a contract and the eternal / parallel engines grind until it's verifiably done.
  • 🔌 ~110 providers, zero lock-in — Anthropic, OpenAI, Google, and ~100 OpenAI-compatible endpoints, catalog refreshed from models.dev at boot.
  • 🔑 Sign in with a subscription — authenticate with a ChatGPT (Codex), Claude Pro/Max, or GitHub Copilot subscription over OAuth, alongside (not instead of) API keys. See docs/oauth-signin.md.
  • 🔎 Fast model switching — the TUI /model picker supports type-to-search filtering with scroll-window navigation, and wstack models supports search + pagination.
  • 🔐 Locked down by default — encrypted secrets, SSRF guards on every redirect hop, fail-closed subagents, symlink containment, plugin trust tiers, WebUI redaction, and cloud-sync path guards.
  • 🪶 A compact kernelContainer · Pipeline · EventBus · RunController (~1670 lines including the full event type catalog). Everything above it is swappable; --no-features boots it fully offline.

Requirements

  • Node.js ≥ 22.0.0
  • pnpm ≥ 9.0.0 (recommended) or npm

Install

npm install -g wrongstack
# or
pnpm install -g wrongstack

This pulls in the full stack — @wrongstack/core, @wrongstack/runtime, @wrongstack/providers, @wrongstack/tools, @wrongstack/mcp, @wrongstack/plug-lsp, and @wrongstack/tui. The TUI is shipped but lazy-loaded behind --tui, so plain-REPL users pay no React/Ink import cost at startup. The web-based UI (@wrongstack/webui) is available as a separate binary (webui).

After install, wrongstack is on your PATH. (wstack works too — it's an alias.)


Features

Three interactive surfaces

Plain REPL (default): readline-based, multiline heredoc, slash commands, streaming text. Works everywhere a terminal works.

TUI (--tui): Ink + React frontend, lazy-loaded. Key features:

  • Multi-line paste collapse, @<query> fuzzy file picker, clipboard image paste (Alt+V)
  • Live status bar: model · tokens · cache hit · cost · running: <tool> while tools execute
  • LiveActivityStrip: tool in flight + elapsed timer per running subagent
  • Esc-to-steer: aborts run, terminates fleet, prepends STEERING preamble to your next message
  • /goal <description>: locks in full-autonomy mode — no implicit budget cap
  • Signal-safe cleanup, non-TTY guard, re-entrancy guard on Enter, resize ghost mitigation
  • Real-time stage chip: ⟳ DECIDE / ⚡ EXECUTE / ◎ REFLECT updates every tick during eternal/parallel runs

Web UI (@wrongstack/webui): React + Radix + Tailwind frontend with a Node ws backend. Standalone webui binary serves on 3456/3457; CLI can opt in with wrongstack --webui. Highlights:

  • Topbar status bar: ctx% · tokens · cache hit · cost · elapsed · iteration
  • Per-message footer: token usage, Pin / Edit & resend / Retry
  • Tool bubbles: live tool.progress stream, collapsible gutter, Download/Copy on hover
  • Sidebar: live TODO snapshot, Pinned panel, History with grouping + search
  • Operations panels: Goal, Process Monitor, Checkpoint Timeline, AutoPhase, phase agents, task board, and worktree lanes
  • Autonomy picker: switch off / suggest / auto / eternal / eternal-parallel from the UI
  • Overlays: Ctrl+K command palette, Ctrl+M model switcher, Ctrl+F chat search, ? shortcuts
  • Slash commands with keyboard nav, day-separator dividers, dynamic tab title
# Standalone (recommended for the full experience)
webui                          # binds backend to 127.0.0.1:3457, serves UI on 3456
WS_HOST=0.0.0.0 webui          # expose on the LAN

# Or piggy-back on the CLI process
wrongstack --webui

36 built-in tools

All tools are registered out of the box — no plugin required.

Tool What it does
read Read file contents with offset/limit
write Write or overwrite a file
edit Surgical string replacement in existing files
replace Batch regex replacement across matched files
glob Find files matching a pattern
grep Search file contents with regex
bash Run shell commands
exec Restricted shell with an allowlist (node, npm, pnpm, …)
fetch HTTP fetch with HTML→markdown (localhost blocked by default)
search Web search (DuckDuckGo / Google / Bing)
patch Apply unified diff patches
json Parse and query JSON with dot notation
diff Show differences between files or commits
tree Display directory structure as ASCII tree
lint Run linter (Biome / ESLint / TSLint)
format Format code with Biome / Prettier
typecheck TypeScript type checking
test Run tests with Vitest / Jest / Mocha
install Install npm packages
audit Security vulnerability audit
outdated Check for outdated packages
logs Stream or fetch service log files
document Generate JSDoc/TSDoc comments
scaffold Generate boilerplate from templates
tool_search / tool_use / batch_tool_use / tool_help Meta-tooling for tool discovery and orchestration
todo Track multi-step tasks
git Common git operations
task Structured work items with dependencies, types, and priorities
context_manager Inspect / trim / compact the in-flight context window
remember / forget Persist notes across sessions (project- or user-scoped, gated by features.memory)
codebase-index Build / update the SQLite symbol index (incremental; multi-language)
codebase-search BM25-ranked search over indexed symbol names, signatures, and doc comments
codebase-stats Summary of the current symbol index

Autonomy engine

The engine has two modes, both launched via /autonomy:

eternal — single-leader goal-driven loop. Runs decide → execute → reflect → sleep cycles against the persistent /goal until you stop it (Esc / /steer / Ctrl+C / /autonomy stop). Force-enables YOLO. Hybrid decide pipeline walks pending todos → dirty git → LLM brainstorm, so the loop produces useful work even with no queued task. Each phase transition calls the onStage callback — the TUI renders the live stage chip (⟳ DECIDE, ⚡ EXECUTE, ◎ REFLECT) updating every tick. TUI shows a red ETERNAL chip; WebUI receives a live eternal.iteration WS broadcast per cycle.

parallel — leader drives, N subagents fan out simultaneously. ParallelEternalEngine decomposes the goal into up to parallelSlots tasks per tick (default 4, max 16), spawns that many subagents via DefaultMultiAgentCoordinator, awaits all results, aggregates, and writes a journal entry. [GOAL_COMPLETE] in any subagent's output stops the engine cleanly. maxConcurrent: 8 supports the higher fan-out density. TUI shows an amber ⟳ PARALLEL chip with the live iteration count.

Both engines persist state to goal.json. Both can be paused and resumed without losing work: /goal pause sets goalState: 'paused' — the engine exits gracefully after the current iteration finishes (no AbortController kill, no work torn mid-task); /goal resume flips goalState back to 'active'.

Goal system

/goal <text> persists to ~/.wrongstack/projects/<hash>/goal.json and injects a full-autonomy preamble into the next turn:

[GOAL — LOCKED IN. You will work on this until it is verifiably done.

YOUR GOAL: <user text>

AUTHORITY: Spawn unlimited subagents · Use any provider/model · Unlimited tool calls + iterations
           Agent loop auto-extends every 100 iterations. Retry failed tools; switch providers on 429.

WHAT "DONE" MEANS: Named artifact (passing test, file at a path, fixed bug verified by re-running)
                   A 10-second user verification recipe. No hedges.

WHAT IS NOT DONE: Unhandled error · empty result · "should I continue?" · partial progress as success

PERSISTENCE: Blocked? Try 3 angles. Tool failed? Read error, alter input, retry.
             Subagent useless? Respawn with tighter prompt.

BEGIN.]

/goal shows status + journal. /goal clear stops the engine. /goal journal [N] prints the FIFO ring (default 25, cap 500). Goal state lives in goalState: 'active' | 'paused' | 'done'.

Multi-agent fleet + Director

Fleet tools (14 on the Director's belt from first message): spawn_subagent, assign_task, await_tasks, ask_subagent, ask_result, roll_up, terminate_subagent, terminate_all, fleet_status, fleet_usage, fleet_health, fleet_session, fleet_emit, collab_debug. Large ask_subagent responses (>2K chars) are stashed in a per-Director LargeAnswerStore and returned as a summary + key — the Director pulls the full text back with ask_result only when it needs it, so its context stays bounded no matter how many big asks happen.

/fleet command: status — task progress per subagent · usage — token + cost breakdown · kill <id> — stop one subagent · kill — stop all · manifest — full fleet snapshot · log <id> — transcript summary · log <id> raw — full JSONL dump · journal — recent parallel engine entries · spawn <role> [count] — spawn N subagents of a role · terminate <subagentId> — stop one · retry <id> — re-spawn a failed subagent · stream on|off — toggle live output streaming.

/spawn [--provider --model --name --tools] <task> — launch a single subagent. No implicit budget cap; runs until done.

/director — promote the session to Director mode at runtime (must be called before any subagent is spawned).

/autonomy parallel — LLM-driven fan-out mode described above.

Subagent failure taxonomy (14-kind discriminated union): budget_timeout (✓ retryable), budget_tool_calls, budget_iterations, provider_rate_limit (✓), provider_5xx (✓), provider_auth, tool_failed, empty_response, aborted_by_parent, context_overflow. Every failure includes cause (error name + message + stack). The delegate tool exposes errorKind / retryable / backoffMs so the calling LLM can branch on classification.

Architecture: Host EventBus (always-on bridge) → Leader Agent (Director) + FleetBus (director-only fan-in) → DefaultMultiAgentCoordinatorAgentSubagentRunner per task (fresh Agent + Context + EventBus, full isolation) → per-subagent JSONL transcripts on disk.

47-agent roster + smart dispatcher. The Director draws from a 47-role agent catalog; a smart dispatcher routes each task to the best-matching role instead of spawning generic clones. The TUI fleet monitor (Ctrl+F) shows per-subagent status and a fleet-wide token gauge, and auto-extended budgets surface as a ⚡ extended ×N badge across all fleet UIs. Spawned subagents take a memorable scientist nickname (Turing, Shannon, Gauss, …) so you can track them across the fleet at a glance.

Collaborative debugging. Director.spawnCollab() runs BugHunter, RefactorPlanner, and Critic in parallel on one shared, immutable file snapshot. Findings flow through the FleetBus as structured events (bug.found → refactor.plan → critic.evaluation); the Director routes each output to its dependents through a shared scratchpad — so agents build on each other's conclusions without exchanging full transcripts — and returns a single structured CollabDebugReport. Subagents signal upward with the fleet_emit tool.

--director flag launches the full fleet roster from the CLI directly:

wrongstack --director "audit src/ for security issues"

Brain-governed decisions

Brain is a small authority layer for decisions that are bigger than one tool permission prompt. Director and AutoPhase can ask a BrainArbiter whether to continue, deny, or escalate a risky choice — for example extending a subagent's budget or attempting an automatic worktree merge-conflict resolution.

The default Brain is conservative: low-risk requests with an explicit recommended option can be answered automatically; higher-risk choices route to a human decision prompt in the TUI. The prompt appears in chat history, the status bar shows a compact 🧠 chip, and the answer flows back through typed brain.* EventBus events.

Spec-Driven Development (/sdd)

The /sdd <path-to-spec.md> slash command guides the agent through the SDD loop: parseanalyzegeneratetrackexecute. Built on SpecParser, TaskTracker, TaskGenerator, and TaskFlow from @wrongstack/core/sdd. Reads a markdown spec file, generates tasks via TaskGenerator, and displays task status inline.

const parser = new SpecParser();
const spec = parser.parse(markdownSpec);
const analysis = parser.analyze(spec);

const tracker = new TaskTracker({ store });
const generator = new TaskGenerator({ taskTracker: tracker });
await generator.generateFromSpec(spec);

const flow = new TaskFlow({ tracker });
await flow.execute({ executeTask: async (task) => { /* ... */ } });

Plugin ecosystem — @wrongstack/plugins

Ten ready-to-use plugins ship in one package, each available via a subpath export (@wrongstack/plugins/<name>):

Plugin Tools Notes
auto-doc auto_doc, auto_doc_preview JSDoc / TSDoc comment generation
git-autocommit git_autocommit, git_stage, git_status_summary Conventional-commit messages
shell-check shellcheck_run, shellcheck_scan ShellCheck wrapper
cost-tracker cost_summary, cost_reset, cost_export Token usage + cost per model via provider.response events
file-watcher watch_start, watch_stop, watch_list Emits file-watcher:changed events
web-search web_search, web_fetch Cached DuckDuckGo + URL→markdown
json-path jsonpath_query, jsonpath_mutate JSONPath-style queries and mutations
cron cron_schedule, cron_list, cron_cancel Recurring actions via beforeIteration / afterIteration hooks
template-engine render_template, template_variables {{var}} / {{#if}} / {{#each}} expansion + system-prompt contributor
semver-bump semver_bump, changelog_update Conventional-commit-driven version bumps

All plugins type-check under strict + noUncheckedIndexedAccess, use the real plugin API (api.onEvent not pipeline mutation), register AgentExtension for iteration hooks, and ship Record<string, unknown> typings on every tool execute.

Built-in (first-party) plugins

Seven plugins ship enabled by default and load before any user plugin — they wire core infrastructure and claim bare slash-command names (only official first-party plugins may do so; external plugins stay namespaced). Opt a specific one out with { "name": "wstack-git", "enabled": false } in config.plugins, or disable all with features.plugins: false.

Built-in plugin Slash commands What it adds
wstack-prompts /prompts list|view|add|delete|edit|extend Personal prompt library with LLM-powered enhancement
wstack-sync /sync status|enable|disable|push|pull|categories GitHub cloud sync for settings, skills, prompts, memory, and history — token encrypted via the secret vault, no git CLI needed
wstack-git /commit, /gitcheck, /push LLM-written conventional commits, pre-commit sanity check, push
wstack-security /security scan|audit|report Security scanning surface
wstack-skills /skill, /skill-gen, /skill-install, /skill-update, /skill-uninstall Skill discovery, generation, and lifecycle
wstack-plan /plan show|add|start|done|remove|clear Per-session strategic roadmap (chip in the TUI status bar)
wstack-observability /metrics, /health Prometheus metrics + health snapshot

Cloud sync (/sync) pushes/pulls user-selected ~/.wrongstack categories — settings, skills, prompts, memory, history — to a private GitHub repo over the REST API. State lives in ~/.wrongstack/sync.json (token encrypted) + sync-state.json; pick categories with /sync categories. Pulls validate every remote tree entry and reject traversal (.., absolute paths, or paths resolving outside the category root); file-backed categories such as settings also reject nested paths.

Manage from CLI or REPL:

wstack plugin list
wstack plugin install telegram
wstack plugin official
wstack plugin disable telegram
wstack plugin enable telegram
wstack plugin remove telegram

telegram and lsp are bundled aliases for @wrongstack/telegram and @wrongstack/plug-lsp.

Sign in with a subscription (OAuth)

Authenticate against a vendor subscription instead of a metered API key — an orthogonal credential layer that sits next to the API-key providers, not in place of them:

wstack auth login chatgpt     # ChatGPT Plus/Pro/Team (Codex) → provider openai-codex
wstack auth login claude      # Claude Pro/Max               → provider anthropic-oauth
wstack auth login copilot     # GitHub Copilot               → provider github-copilot
# or: wstack auth → "s) Sign in with a subscription"

Codex and Claude use a PKCE loopback flow; Copilot uses GitHub's device flow. Tokens self-refresh (near-expiry + once on 401) and are AES-256-GCM encrypted at rest like any other secret. After login, select the provider/model from the /model picker or with --provider … --model … — modern Claude (Opus 4.8, Sonnet 4.6) serves its full 1M-token context on this path automatically.

⚠️ Using a subscription outside its official client is a ToS gray area and can get your account rate-limited or banned. The sanctioned path for programmatic use is an API key; sign in with a subscription only if you accept that risk. Full reference + per-provider details: docs/oauth-signin.md.

Provider catalog (~110 providers, 4 API-key families + 3 subscription families + 1 stub)

Family Transport Providers
anthropic Native Claude API + SSE Anthropic, MiniMax, Kimi, Google Vertex (Anthropic)
openai Native OpenAI Chat Completions + SSE OpenAI, Perplexity Agent, Vivgrid
openai-compatible OpenAI-spec endpoints + SSE ~100 providers: Mistral, Groq, DeepSeek, OpenRouter, Together, xAI, Cerebras, Ollama, Fireworks, Moonshot, GLM, Alibaba, …
google Gemini :streamGenerateContent?alt=sse Google AI Studio
anthropic-oauth Claude Messages API + Bearer (OAuth) Claude Pro/Max — Sign in with Claude
openai-codex ChatGPT Responses API (OAuth) ChatGPT Plus/Pro/Team — Sign in with ChatGPT
github-copilot Copilot proxy, OpenAI wire (OAuth) GitHub Copilot — Sign in with Copilot
unsupported Needs plugin Cohere, Bedrock, Vertex (non-Anthropic), Azure

All four supported families implement real streaming end-to-end: provider.stream() is the source of truth, complete() is aggregateStream(stream(...)). Mid-stream aborts preserve any partial assistant text already received. Catalog comes from models.dev/api.json — no hardcoded pricing, no hardcoded model names — and is refreshed synchronously on boot so provider resolution, model capabilities, and the TUI picker see fresh data before the app starts. The refresh has a 15-second timeout and falls back to cache with a warning; use --no-models-refresh for offline or CI runs.

Vision MCP adapters: text-only models work with images via MCP server adapters:

wstack mcp add zai-vision --enable
wstack mcp add minimax-vision --enable

When the active model lacks native vision, WrongStack writes clipboard images to a temp file, invokes the adapter, replaces the image with the returned text, then removes the temp file.

Session persistence + resume

Every run writes a <id>.jsonl append-only event log under ~/.wrongstack/projects/<sha256>/sessions/. On close, a tiny <id>.summary.json manifest is written alongside — now analytics-grade: title, model, provider, tokenTotal, endedAt, iterationCount, toolCallCount, toolErrorCount, fileChangeCount, compactionCount, a per-tool toolBreakdown, and an outcome (completed / error / timeout / aborted). wrongstack sessions lists hundreds of past runs without re-parsing each JSONL (O(N) stats, not O(N) full parses). session_resumed marker written on resume. Orphan tool_result events (missing matching tool_use) emit session.damaged event so the session can be flagged for repair. Housekeeping: /prune deletes stale sessions (and their summary / plan / todos sidecars) by age, and /prune --rebuild-index rebuilds the session index from disk.

Encrypted secrets

API keys and MCP auth tokens encrypted with AES-256-GCM using a 32-byte key kept at ~/.wrongstack/.key (mode 0600 on POSIX). Format: enc:v1:<iv>:<tag>:<ciphertext>. Random IV per encryption — same plaintext yields different ciphertexts. The CLI auto-migrates plaintext keys it finds in config.json on every boot. Field detection is regex-based (/apikey|authtoken|bearer|secret|password|refreshtoken|sessionkey|access[_-]?token|private[_-]?key/i); publicKey is on a hard-coded override list.

Mode system (8 personas)

Agents can operate in different modes that inject role-specific system prompts: default, code-reviewer, code-auditor, architect, debugger, tester, devops, refactorer. Modes live in ~/.wrongstack/modes/ and can be extended with custom prompts.

import { DefaultModeStore } from '@wrongstack/core';
import { createModeTool } from '@wrongstack/tools';

const modeStore = new DefaultModeStore({ directory: '~/.wrongstack/modes' });
const modeTool = createModeTool(modeStore);
await modeTool.execute({ action: 'set', mode: 'code-reviewer' });

Context window policies

Switch how aggressively the session trims history: balanced (default), frugal (most token-friendly), deep (preserves more recent turns), archival (steady decision-preserving compaction). Use /context mode to list policies and switch at runtime. repair to fix damaged tool-call adjacency.

Observability — 28 typed events

EventBus carries events across Session, Iteration, Provider, Tool, Token/compaction, Subagent lifecycle, MCP, and Error categories. Subscribe with events.on(name, fn) or events.once(name, fn); listeners that throw are caught and logged, never re-thrown.

Four-layer observability:

  • Human view (TUI): LiveActivityStrip (top-4), FleetPanel, lifecycle chips on failures
  • Host EventBus (always-on): lifecycle + per-tool bridge
  • FleetBus (director-only): full per-subagent event stream — tool.*, iteration.*, provider.{text,thinking}_*, compaction.*, token.*
  • Per-subagent JSONL on disk: /fleet log <id> summarises, /fleet log <id> raw dumps full transcript

Security

  • Permission policy (trust.json): per-tool allow/deny, persisted to disk, applies to subagents
  • YOLO mode (--yolo or /yolo): skips all permission prompts — for CI and trusted workflows
  • Bash tool env allowlist: WRONGSTACK_BASH_ENV_PASSTHROUGH=1 disables the allowlist (legacy unsafe mode — see SECURITY.md)
  • WRONGSTACK_FETCH_ALLOW_PRIVATE=1: enables localhost/private IPs in the fetch tool
  • AES-256-GCM encryption for all secrets at rest
  • SSRF guard everywhere: the fetch tool and the builtin search tool resolve + re-validate every redirect hop against private/IMDS ranges; MCP transport URL validation blocks IPv4 and IPv6 IMDS (fd00:ec2::254, link-local fe80::/10)
  • Fail-closed subagent guard: the non-interactive AutoApprovePermissionPolicy is allowlist-by-default — it approves only an explicit set of safe tools, so newly-added mutating tools (and all mcp__* tools) are denied to prompt-injected delegates by default instead of slipping through a denylist gap
  • Session-log scrubbing: user- and model-turn text is scrubbed before it hits the 0o600 JSONL (and before it rides along in cloud sync), not just tool output
  • WebUI broadcast redaction: tool.started and tool.executed payloads are scrubbed before WebSocket broadcast, so API keys and bearer tokens don't leak to connected tabs
  • Cloud-sync path containment: pulled remote entries are rejected if they traverse outside their category root; file-backed categories reject nested paths
  • Symlink containment: read/edit/write resolve symlinks and re-check the target is inside the project root
  • Plugin trust tiers + capability gating: only first-party (official) plugins may register bare slash-command names; tool wrap/override/unregister is gated on declared capabilities in addition to the officiality tier, so a plugin can only mutate tools it is actually authorized for
  • Threat model and adversary trust assumptions in SECURITY.md; audit findings and verification in security-report/

Bundled skills (17)

api-design, audit-log, bug-hunter, docker-deploy, git-flow, multi-agent, node-modern, observability, prompt-engineering, react-modern, refactor-planner, sdd, security-scanner, skill-creator, tech-stack, testing, typescript-strict — all following one structure (Overview → Rules → Patterns → Skills in scope). Discovered in order: project → user → bundled, with first-seen winning on name collisions.

Token-saving mode (--token-saving-mode)

A lean operating mode that shrinks the per-request prompt without going fully offline. Three levers compound:

  • Tier-1 tool belt — the 90+ built-in tools collapse to 10 essentials (read, write, edit, bash, grep, glob, diff, patch, json, search), trimming ~4000–6000 tokens of tool-definition overhead.
  • Compact skills — each in-scope skill renders only its Overview + Rules sections (or a dedicated SKILL.save.md variant when the skill ships one).
  • Lazy MCP — MCP server tools are no longer expanded into the tool list at startup; the model reaches any of them on demand through the mcp_use({ server, tool, input }) meta-tool, so the registered surface stays bounded no matter how many MCP servers are connected.

Toggle it at launch with --token-saving-mode (or features.tokenSavingMode in config), or live from the TUI settings panel — the status bar shows a token- saving indicator and the current registered-tool count.

--no-features minimal kernel

Flips off MCP, plugins, memory tools, models.dev fetch, and skill discovery. What's left: kernel (Container + Pipeline + EventBus + RunController, ~1670 lines incl. events) + agent (525 lines) + 36 tools + permission policy + curated system prompt. The minimal-viable WrongStack runs offline with no network calls at startup. Provider family must be declared explicitly in config when using this mode.


Recent changes

Current release: 0.268.0. The HQ command center hardening release. It ships the Phase 1 hq.welcome handshake, validates HQ client frames with parseHqFrame(), documents the browser/client protocol and Phase 2 auth roadmap, expands mailbox drawer/live-feed jsdom coverage, and includes the release-check cleanup needed to keep audit, typecheck, tests, and build green. Additive only — no breaking changes.

See CHANGELOG.md for the full, versioned history.

Quick start

# First run — interactive auth/setup
wrongstack auth

# Sign in with ChatGPT/Codex subscription OAuth
wstack auth login chatgpt

# No config? Interactive picker launches automatically:
wrongstack          # provider list → model list → save prompt → REPL
wrongstack --tui    # same, then enters TUI

# TUI + YOLO (skip all permission prompts)
wrongstack --tui --yolo

# Specific provider/model — skip the picker
wrongstack --provider groq --model llama-3.3-70b-versatile

# Director fleet orchestration
wrongstack --director "audit src/ for security issues"

# Single-shot query
wrongstack "refactor src/auth.ts to async/await"

# Resume a saved session
wrongstack --resume <session-id>
wrongstack resume <session-id>       # same

First-run setup

Three ways to configure:

  1. wrongstack auth — interactive credential manager, saves encrypted credentials to ~/.wrongstack/config.json
    • Shortcut: wstack auth login chatgpt starts the ChatGPT/Codex OAuth flow directly.
  2. Automatic picker — just run wrongstack with no config; saves after selection
  3. CLI flagswrongstack --provider <id> --model <id> — skips all interactivity

Add a key later: wrongstack auth groq (prompts, encrypts, stores).

Switching providers at runtime

In the TUI, /model opens a two-step provider → model picker with type-to-search filtering in step 2 — no restart needed. After picking a provider, type printable characters to filter models live, use Backspace to edit the filter, and navigate long lists through a centered 10-row window with ▲ N above / ▼ N below indicators. In the plain REPL, relaunch with --provider / --model:

wrongstack --provider openai --model gpt-5.5
wrongstack --provider deepseek --model deepseek-v4-pro
wrongstack --provider openrouter --model anthropic/claude-opus-4-7

CLI flags

--provider <id>      Override provider (e.g. anthropic, openai, groq)
--model <id>         Override model
--cwd <path>         Project root (default: process.cwd())
--resume <id>        Resume a saved session
--record             Record provider request/response to a replay log
--replay <id>        Replay a recorded session deterministically (no API calls)
--tui                Use the Ink TUI instead of readline REPL
--no-tui             Force-disable the TUI (overrides --tui)
--no-banner          Suppress the startup banner
--no-features        Minimal kernel — no MCP, plugins, memory, models.dev, skills
--no-models-refresh  Skip the boot-time models.dev catalog refresh (offline/CI)
--token-saving-mode  Lean prompt: 10 Tier-1 tools, compact skills, lazy MCP (mcp_use)
--yolo               Auto-allow all tool calls (don't ask for confirmation)
--director           Enable Director-based fleet orchestration (LLM-driven subagent planning)
--goal "<task>"      Boot directly into goal mode — GOAL preamble injected, TUI auto-enabled
--ask "<text>"       Submit one turn verbatim on TUI boot (no preamble)
--verbose / -v       Log level → debug
--trace              Log level → trace
--log-level <lvl>    Explicit log level
--help / --version   Standard

Slash commands

Core (both the plain REPL and the TUI): /init /help /clear /compact /context /codebase-reindex /dev /diag /stats /tools /plugin /mcp /auth /memory /todos /tasks /mode /yolo /autonomy /interrupt /btw /next /enhance /fix /autophase /worktree /settings /sdd /save /load /prune /exit

Every built-in command is tagged with a category (Run · Session · Inspect · Agent · Config · App); the TUI slash picker groups matches under category headers, and the WebUI surfaces 55 commands in its slash list.

Multi-agent: /spawn /fleet /agents /goal /director /collab /setmodel /models /fallback

TUI-only (need --tui): /model (provider → model picker) · /steer (mid-flight redirect — the plain REPL uses Esc instead) · /queue

Built-in plugins (enabled by default): /prompts /sync /commit /gitcheck /push /security /skill /skill-gen /skill-install /skill-update /skill-uninstall /plan /metrics /health

/telegram becomes available once the Telegram plugin is enabled — wstack plugin install telegram.

Command Effect
/init Create .wrongstack/AGENTS.md — auto-detects build system (package.json / pyproject.toml / go.mod / Cargo.toml / Makefile) and pre-fills build/test/lint/run commands
/dev <shell command> Run a shell command from the chat input and see the output. The LLM does NOT see the result — this is a developer convenience shortcut. Timeout: 60 s. Max output: 500 lines
/spawn [--provider --model --name --tools] <task> Launch a single subagent with optional overrides. No implicit budget cap
/director Promote session to Director mode at runtime (must be before any subagent spawns)
/fleet status|usage|kill|manifest|retry|log|stream on|off|journal|spawn|terminate Inspect and control the subagent fleet. log <id> summarises; log <id> raw dumps full JSONL
/agents Print fleet roster (running, idle, completed) with kind chips for failures
/steer <text> (TUI; in the plain REPL use Esc) Mid-flight redirect — aborts iteration, terminates fleet, drops queue, prepends STEERING preamble
/goal <text> Lock in a goal — auto-refines it into deliverables, persists to ~/.wrongstack/projects/<hash>/goal.json, tracks progress/trends, and injects the full-autonomy preamble. Subcommands: /goal (status + journal), /goal refine, /goal clear, /goal pause, /goal resume, /goal journal [N]
/tasks add|start|done|fail|status|depends|assign|promote|clear Structured task management between /plan and /todos: dependencies, types, priorities, estimates, agent assignment, and promote-to-todos flow
/queue (TUI) Show, clear, or delete entries from the in-flight message queue
/plan show|add|start|done|remove|clear Per-session plan JSON. Mirrored to disk; surfaces 📋 ⌛N ☐N ✓N chip in TUI status bar
/autonomy off|suggest|on|eternal|parallel|stop|toggle Self-driving mode. suggest shows next steps without executing; on auto-continues; eternal runs goal-driven loop; parallel fans out 4-8 subagents per tick. TUI shows ∞ AUTO / ∞ SUGGEST / ETERNAL / ⟳ PARALLEL chip
/yolo on|off|toggle Flip YOLO mode (auto-approve all tool calls). /yolo alone shows status. TUI shows ⚠ YOLO chip
/interrupt (aliases /stop, /int) Stop the in-flight leader run and terminate the whole fleet — for when Esc is eaten by tmux or you're driving from the WebUI. REPL Ctrl+C now also stops subagents
/mode Switch persona: default, code-reviewer, code-auditor, architect, debugger, tester, devops, refactorer. Custom modes in ~/.wrongstack/modes/
/model (TUI) Two-step provider → model picker. In the plain REPL, relaunch with --provider / --model
/setmodel <key> <provider/model> Set per-role or per-phase model in the model matrix (e.g. /setmodel security-scanner openai/gpt-4o). Also supports resolve <role> and doctor for matrix diagnostics
/fallback View or edit the rate-limit fallback chain. On a 429/529/5xx after retries, the agent rotates to the next model in the chain instead of failing; each hop prints ↻ switched to <provider/model>
/auth [status <provider>|open|help] In-session credential dashboard. Shows saved provider/key status without blocking the REPL/TUI; run wstack auth for the full interactive key manager
/image or /paste-image Attach clipboard PNG. TUI also Alt+V
/context mode <policy> Switch context-window mode: balanced, frugal, deep, archival. repair fixes damaged tool-call adjacency
/plugin install|disable|enable|remove|official [name] Manage plugins. install adds bundled package to config (no npm). Restart to load/unload
/telegram send|read|chat|attach Telegram plugin (enable with wstack plugin install telegram): send <chatId> <message>, read <chatId> [limit], chat list recent, attach <file> send file
/sdd <path-to-spec.md> Spec-Driven Development workflow: parse → analyze → generate → track → execute. Built on SpecParser, TaskTracker, TaskGenerator, TaskFlow
/settings View or change settings (non-blocking, works in REPL + TUI): /settings (show), /settings delay <seconds>, /settings mode <off|suggest|auto>, /settings defaults; persists to ~/.wrongstack/config.json
/prune [days] [--dry-run] [--rebuild-index] Delete sessions older than N days (default 30, clamped 1–365). --dry-run previews; --rebuild-index rebuilds the session index from disk. Sessions referenced by active.json are never pruned
/compact, /tools, /skill, /save, /resume, /help, /clear, /stats, /diag, /exit Compact context, list tools/skills, save/resume session, help, clear, token+cost stats, diagnostics, exit

Mid-flight controls

Key / Command What it does
Esc (while busy) Soft interrupt — abort agent, terminate fleet, drop queue, set "steering pending". Next message carries a STEERING preamble
/steer <text> Same as Esc + typing, in one shot. Works when Esc is eaten by tmux
/goal <text> "No force stops this" — full autonomy contract. Only Esc / Ctrl+C interrupt
Ctrl+C × 1 Cancel current iteration + terminate fleet (1.5s cap)
Ctrl+C × 2 Force-exit Ink loop
Ctrl+C × 3 Hard process.exit(130)
Ctrl+F Toggle the fleet monitor — per-subagent status + fleet-wide token gauge
Ctrl+G Toggle the agents monitor — live per-agent context (current tool, streaming tail, sparkline)
F9 Toggle the goal panel — refined mission, deliverables checklist, progress bar, trend, state, and last task
Ctrl+T Toggle the worktree monitor — AutoPhase isolation branches
Ctrl+P Toggle the phase monitor — active AutoPhase phases and tasks
Ctrl+T Close worktree monitor (when open); otherwise delete word before cursor
/fleet kill <id> Stop one specific subagent

Subcommands

wrongstack init           # First-run setup wizard
wrongstack auth <prov>    # Store an API key (prompted, encrypted at rest)
wrongstack sessions       # List saved sessions for this project
wrongstack resume <id>    # Continue a saved session
wrongstack replay <id>    # Inspect a recorded replay log (see --replay)
wrongstack audit <id>     # Verify the SHA-256-chained tool-call audit trail
wrongstack config         # Show / edit config
wrongstack tools          # List registered tools
wrongstack skills         # List discovered skills
wrongstack providers      # ~110 providers grouped by wire family
wrongstack models [prov] [--search <term>] [--page N] [--per-page N]  # searchable, paginated model list
wrongstack mcp            # Inspect connected MCP servers
wrongstack plugin         # Plugin manifest commands
wrongstack diag           # Diagnostics: provider, tokens, paths
wrongstack usage          # Token + cost totals across sessions
wrongstack projects       # List known project hashes → paths
wrongstack help           # Help text
wrongstack version        # Version

Configuration

Environment variables

Variable Description
<PROVIDER>_API_KEY API key for the provider (e.g. ANTHROPIC_API_KEY, OPENAI_API_KEY)
WRONGSTACK_FETCH_ALLOW_PRIVATE Set to 1 to allow localhost / private IPs in the fetch tool
WRONGSTACK_BASH_ENV_PASSTHROUGH Set to 1 to disable the bash-tool env allowlist (legacy unsafe mode — see SECURITY.md)

Config file (~/.wrongstack/config.json)

{
  "version": 1,
  "provider": "anthropic",
  "model": "claude-opus-4-7",
  "providers": {
    "anthropic": { "apiKey": "enc:v1:<iv>:<tag>:<ciphertext>" }
  },
  "features": {
    "mcp": true,
    "plugins": true,
    "memory": true,
    "modelsRegistry": true,
    "skills": true
  },
  "plugins": []
}

apiKey-like fields are auto-encrypted on first contact. Plaintext keys in older config files get migrated transparently on boot.

Project-level (<project>/.wrongstack/AGENTS.md)

Commit this file to share project conventions with the agent across all developers:

// Conventions for this repo
- Always run tests after editing
- Use pnpm, not npm
- PR titles follow conventionalcommits.org

Four contracts

1. Minimal kernel. Container, Pipeline, EventBus, RunController, and the token table total ~1670 lines (including the full event type catalog). The agent loop adds another 525 lines. Everything else is replaceable.

2. Zero non-overridable behavior. 16 services bound through Container (Logger, TokenCounter, SessionStore, MemoryStore, PermissionPolicy, Compactor, PathResolver, ConfigLoader, Renderer, InputReader, ErrorHandler, RetryPolicy, SkillLoader, SystemPromptBuilder, SecretScrubber, ModelsRegistry). 6 pipelines as middleware chains. Tools, providers, MCP servers, and slash commands all live in registries.

3. Standalone sufficiency. Works with 36 built-in tools, 4 wire-family transports, permission policy, and a curated system prompt — no plugins required.

4. Layered, not monolithic. --no-features flips off MCP, plugins, memory tools, models.dev fetch, and skill discovery. The minimal-viable WrongStack runs offline with no network calls at startup.

Packages

Package Purpose
@wrongstack/core Kernel, agent, types, registries, plugin contract
@wrongstack/runtime Default runtime implementations, host composition helpers, extension pack contracts
@wrongstack/providers Anthropic/OpenAI/OpenAI-compatible/Google wire adapters + SSE
@wrongstack/tools 36 built-in tools (incl. SQLite codebase index)
@wrongstack/mcp MCP server registry + reconnection logic
@wrongstack/cli REPL, subcommands, slash commands, terminal renderer
@wrongstack/tui Ink-based TUI (lazy-loaded behind --tui)
@wrongstack/plug-lsp LSP plugin (wrongstack-lsp-setup binary)
@wrongstack/telegram Telegram plugin: send/read/notifications, /telegram:* slash commands
@wrongstack/webui Standalone web UI — webui binary, also via wrongstack --webui
@wrongstack/plugins Official plugin collection — 10 plugins via subpath exports

Architecture

CLI       → REPL, renderer, slash commands, subcommands
TUI       → Ink frontend (lazy-loaded behind --tui)
Steering  → Esc / /steer / /goal — mid-flight redirect + autonomous lock-in
Director  → Fleet orchestration (LLM-driven, opt-in via --director)
Agent     → loop, context, system prompt, permission, compaction, autoExtendLimit
Tools     → ToolExecutor (parallel/sequential/smart strategies, abort-safe)
Runtime   → Default host assembly + WrongStackPack extension composition
Kernel    → Container · Pipeline · EventBus · RunController (the 4 primitives)
Provider  → 4 wire families, factories built from ModelsRegistry, real SSE
Models    → models.dev/api.json fetched + cached + classified

For the full walk-through — including the L1-A reactive ConversationState, how the six pipelines fire per turn, and how plugins / MCP / observability plug in — see docs/architecture.md.

Status

  • 6700+ tests passing across 500+ test files in the 0.257.0 release gate
  • Coverage thresholds: ≥85 % lines / ≥85 % functions / ≥70 % branches / ≥82 % statements
  • All workspace packages build clean with TypeScript strict + noUncheckedIndexedAccess
  • Node 22+ only, ESM-only, no CommonJS bundles
  • Release gate verified locally: pnpm audit --audit-level=moderate + pnpm typecheck + pnpm test + pnpm build
  • Threat model: SECURITY.md

Contributor docs

Doc What it covers
docs/architecture.md Package layout, kernel primitives, agent lifecycle, L1-A reactive split
docs/slash/ Full reference for every built-in slash command
docs/subcommands/ Full reference for every wstack <subcommand>
docs/director-architecture.md Director: FleetBus, prompt layering, safety caps, per-subagent JSONL, shared scratchpad
docs/plugin-author-guide.md Building a plugin: capabilities, dependencies, configSchema, teardown, testing
docs/plugin-management.md User-facing plugin workflows: list/add/remove/enable/disable, config layout
docs/provider-author-guide.md Adding an LLM provider via WireFormatConfig, stream-state design, vendor quirks
docs/tool-author-guide.md Writing a tool: streaming executeStream, permission semantics, cleanup vs registerAbortHook
docs/yolo-mode.md YOLO mode: permission pipeline, runtime toggle, trust file, subagent policy
docs/skills.md Writing skills: frontmatter format, discovery layers, description quality, token budget
docs/oauth-signin.md Sign in with a ChatGPT/Codex, Claude, or GitHub Copilot subscription (OAuth); flows, models, token storage, ToS caveat
docs/configuration.md Full config reference: every field with type, default, and example
docs/troubleshooting.md Common issues, diagnosis steps, wstack diag, exit codes, reset commands

Benchmarks

pnpm bench runs all *.bench.ts files via a separate vitest config and writes results to bench-results.json (gitignored). Current suite covers compactor hot paths, token estimation, JSON-schema validation, and the system prompt builder. See vitest.bench.config.ts.

Examples

See examples/ for 6 categories of working examples:

# Example What it demonstrates
01 Basic usage Single-shot, REPL, session resume, YOLO
02 Tool usage File editing, code search, git, tests
03 Multi-provider Switching providers, custom endpoints
04 MCP integration Connecting MCP servers, using MCP tools
05 Multi-agent Director fleet, delegation, subagents
06 Real-world workflows Refactoring, testing, debugging, audits

License

MIT © 2026 ECOSTACK TECHNOLOGY OÜ — see LICENSE.

About

No description, website, or topics provided.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

Languages

  • TypeScript 99.2%
  • Other 0.8%