GitHub - jckm14/agent-blackbox: A tiny safety black box for AI agents: sanitized near-miss, approval-gate, and scope-reduction logs. · GitHub
Skip to content

jckm14/agent-blackbox

Folders and files

Repository files navigation

Agent Black Box

CI GitHub release License: MIT Node.js

A tiny safety black box for AI agents and worker automation.

Most automation logs what happened. Agent Black Box logs what almost happened: the moment an agent paused, reduced scope, requested approval, or refused to auto-fix something risky.

It is intentionally boring infrastructure: append-only JSONL, a generated summary file, and a small CLI.

Why

Personal and team agents increasingly run scripts, inspect systems, draft external messages, and propose fixes. The most useful safety signal is often not the final action — it is the hesitation point:

  • “I almost restarted a live service, but asked first.”
  • “I detected public exposure, but did not auto-close firewall rules.”
  • “I reduced a broad log dump to a bounded, redacted query.”
  • “I refused to send a message externally without confirmation.”

Agent Black Box turns those moments into structured, shareable, sanitized records.

Install

npm install -g agent-blackbox

Or run from a checkout:

node bin/agent-blackbox.js --help

Quick start

export AGENT_BLACKBOX_WORKSPACE="$PWD"

agent-blackbox \
  --kind approval_gate \
  --source service-restart-workflow \
  --agent ops-worker \
  --risk medium \
  --trigger "Config validation passed but applying it requires a restart" \
  --intended-action "Restart the gateway service" \
  --pause-reason "Restarting a live service can interrupt active sessions" \
  --approval-needed "Explicit operator approval for service restart" \
  --outcome "Asked for approval and did not restart automatically" \
  --tag service --tag approval

This writes:

blackbox/events.jsonl
blackbox/summary.json

Event kinds

  • near_miss — agent almost took a risky action but stopped.
  • approval_gate — action required human approval.
  • blocked_action — action was not allowed under policy/boundaries.
  • alert_gate — monitor decided whether a drift deserved alerting.
  • scope_reduction — agent narrowed a broad/risky operation.
  • manual_note — operator-authored safety note.

Risk levels

  • low
  • medium
  • high
  • critical

Privacy model

The logger performs basic redaction of long token-like strings and common token=..., password=..., secret=..., and api_key=... patterns.

Still, the intended pattern is: log summaries, not raw data.

Do not log:

  • secrets, tokens, passwords, private keys
  • full environment variables
  • full configs
  • raw private transcripts
  • sensitive customer/user data

Prefer:

  • short summaries
  • file paths
  • hashed/fingerprinted values
  • “approval required” explanations

Dashboard contract

Dashboards can read:

blackbox/events.jsonl
blackbox/summary.json

Each JSONL row follows schema/agent-blackbox-event.schema.json.

Examples

See:

Test

npm test

Recommended rule

Log when this sentence is true:

A future operator would want to know why the agent did not just do the obvious thing.