A tiny safety black box for AI agents and worker automation.
Most automation logs what happened. Agent Black Box logs what almost happened: the moment an agent paused, reduced scope, requested approval, or refused to auto-fix something risky.
It is intentionally boring infrastructure: append-only JSONL, a generated summary file, and a small CLI.
Personal and team agents increasingly run scripts, inspect systems, draft external messages, and propose fixes. The most useful safety signal is often not the final action — it is the hesitation point:
- “I almost restarted a live service, but asked first.”
- “I detected public exposure, but did not auto-close firewall rules.”
- “I reduced a broad log dump to a bounded, redacted query.”
- “I refused to send a message externally without confirmation.”
Agent Black Box turns those moments into structured, shareable, sanitized records.
npm install -g agent-blackboxOr run from a checkout:
node bin/agent-blackbox.js --helpexport AGENT_BLACKBOX_WORKSPACE="$PWD"
agent-blackbox \
--kind approval_gate \
--source service-restart-workflow \
--agent ops-worker \
--risk medium \
--trigger "Config validation passed but applying it requires a restart" \
--intended-action "Restart the gateway service" \
--pause-reason "Restarting a live service can interrupt active sessions" \
--approval-needed "Explicit operator approval for service restart" \
--outcome "Asked for approval and did not restart automatically" \
--tag service --tag approvalThis writes:
blackbox/events.jsonl
blackbox/summary.json
near_miss— agent almost took a risky action but stopped.approval_gate— action required human approval.blocked_action— action was not allowed under policy/boundaries.alert_gate— monitor decided whether a drift deserved alerting.scope_reduction— agent narrowed a broad/risky operation.manual_note— operator-authored safety note.
lowmediumhighcritical
The logger performs basic redaction of long token-like strings and common token=..., password=..., secret=..., and api_key=... patterns.
Still, the intended pattern is: log summaries, not raw data.
Do not log:
- secrets, tokens, passwords, private keys
- full environment variables
- full configs
- raw private transcripts
- sensitive customer/user data
Prefer:
- short summaries
- file paths
- hashed/fingerprinted values
- “approval required” explanations
Dashboards can read:
blackbox/events.jsonl
blackbox/summary.json
Each JSONL row follows schema/agent-blackbox-event.schema.json.
See:
npm testLog when this sentence is true:
A future operator would want to know why the agent did not just do the obvious thing.
