feat: CLI telemetry for all commands with centralized lifecycle by nicknisi · Pull Request #122 · workos/cli · GitHub
Skip to content

feat: CLI telemetry for all commands with centralized lifecycle#122

Merged
nicknisi merged 45 commits into
mainfrom
nicknisi/telemetry
Jun 1, 2026
Merged

feat: CLI telemetry for all commands with centralized lifecycle#122
nicknisi merged 45 commits into
mainfrom
nicknisi/telemetry

Conversation

@nicknisi

@nicknisi nicknisi commented Apr 14, 2026

Copy link
Copy Markdown
Member

Summary

Adds telemetry coverage to every CLI command with a centralized lifecycle that owns timing, success/failure classification, and event emission from one place.

Telemetry infrastructure

  • Command events for every yargs command (name, duration, success/failure, flags, termination reason, error code, API context)
  • Crash reporting via global uncaughtException/unhandledRejection handlers with sanitized stack traces
  • Store-and-forward persistence to PID-based temp files on exit; next invocation recovers and sends
  • WORKOS_DEBUG=1 env var for verbose debug logging on all commands

Centralized command lifecycle (runCli())

  • Uses yargs.exitProcess(false) + parseAsync() with a single try/catch
  • exitWithCode() / exitWithError() throw CliExit (typed error carrying exit code + telemetry context) instead of calling process.exit()
  • One emitCommandEvent() call per command outcome (success, structured exit, crash)
  • Eliminates per-handler wrapCommandHandler() wrappers, provisional events, and event patching
  • Command handlers are plain async functions with no telemetry awareness

What this replaces

The previous design required every handler to be wrapped with wrapCommandHandler() and used a provisional-event/replace/patch chain across 4 layers (middleware, wrapper, exit helpers, analytics patching). That design was fragile (forgotten wrappers produced misleading success=true/duration=0 events) and required a regex guardrail test to enforce.

Design decisions

  • install, dashboard, and $0 are excluded from command telemetry (own session-based telemetry)
  • Non-installer telemetry only works for JWT-authenticated users (API-key-only users' events are silently dropped by the gateway guard)
  • Long-running commands (dev, emulate) keep the event loop alive via server/child listeners; their signal handlers call process.exit() directly
  • Pre-lifecycle validation (--mode) uses outputError() + process.exit() directly since runCli() doesn't exist yet at that point

Test plan

  • pnpm typecheck passes
  • pnpm test passes (146 files, 1926 tests)
  • pnpm build passes
  • workos doctor --json --skip-ai --skip-api exits 0 with clean JSON (no CliExit leak)
  • workos emulate --json --port 0 stays alive and serves /health
  • workos org list (no API key) produces structured no_api_key error with correct termination.reason
  • workos --mode robot doctor exits 1 with structured error, no crash event in store-forward

Summary by CodeRabbit

  • New Features

    • CLI telemetry: command/session/crash events with alias-aware command names, stable device ID, environment/device fingerprint, auth-mode selection, and command-duration/flag telemetry.
    • Reliability: store-and-forward persistence with recovery, queued flush semantics, and sanitized crash reporting.
  • Bug Fixes

    • Standardized structured exit/error handling that preserves exit classifications and API context.
  • Documentation

    • Updated telemetry docs and README, including disable instructions.

nicknisi added a commit that referenced this pull request Apr 15, 2026
Closes security-audit finding #1 on PR #122 (telemetry message
sanitization). `error.message` was flowing into 4 capture sites
unsanitized, leaking homedir paths (and rarely, credentials) to the
WorkOS gateway.

- Add `sanitizeMessage()` in crash-reporter.ts: homedir strip + Bearer/
  sk_*/JWT redaction + 1KB truncation.
- Factor secret redaction into shared `redactSecrets()` used by both
  `sanitizeMessage` and `sanitizeStack` (Node echoes `.message` into the
  leading `Error.stack` line, so message-only sanitization was
  insufficient).
- Add private `extractErrorFields()` chokepoint on `Analytics`; route
  all 4 capture sites through it (`captureException`, `stepCompleted`,
  `commandExecuted`, `captureUnhandledCrash`). `replaceLastCommandEvent`
  inherits sanitization via its delegation to `commandExecuted`.
- `captureUnhandledCrash` now uses `sanitizeStack` instead of inline
  truncation, providing defense-in-depth for callers that bypass the
  crash-reporter wrapper.
- Add regression guard test (`telemetry-sanitize.spec.ts`): poisons
  every capture method with homedir + Bearer + sk_live_ + JWT, asserts
  no marker reaches the serialized queue.

Reviewed: ideation:reviewer cycle 1 PASS (0 critical, 0 high).
@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (4)
src/utils/telemetry-types.ts (1)

58-58: ⚡ Quick win

Align startTimestamp optionality with backward-compat contract.

telemetry-schema.spec.ts explicitly accepts step/agent.tool without startTimestamp, but these interfaces currently require it. Making both optional avoids contract drift.

Suggested patch
 export interface StepEvent extends TelemetryEvent {
   type: 'step';
   name: string;
-  startTimestamp: string;
+  startTimestamp?: string;
   durationMs: number;
   success: boolean;
   error?: {
@@
 export interface AgentToolEvent extends TelemetryEvent {
   type: 'agent.tool';
   toolName: string;
-  startTimestamp: string;
+  startTimestamp?: string;
   durationMs: number;
   success: boolean;
 }

Also applies to: 70-70

src/utils/telemetry-client.spec.ts (1)

60-105: ⚡ Quick win

Add coverage for claim-token auth headers.

setClaimTokenAuth() is new behavior in this PR, but there’s no test asserting x-workos-claim-token + x-workos-client-id emission (and bearer-token precedence when both exist).

Also applies to: 124-218

src/utils/telemetry-store-forward.ts (1)

1-3: ⚡ Quick win

Use async fs APIs for startup recovery path.

recoverPendingEvents() (Line 25+) is async but uses sync disk calls (Lines 27-56), which blocks startup and conflicts with the project rule for TS files.

As per coding guidelines "Avoid Node-specific sync APIs (crypto, fs sync) unless necessary".

Also applies to: 27-56

src/utils/output.ts (1)

123-124: ⚡ Quick win

Protect process termination from telemetry failures

analytics.recordTermination(...) on Line 123 can throw and prevent Line 124 from executing. Error exits should remain deterministic even when telemetry is unhealthy.

🔧 Proposed fix
   const reason = error.apiContext ? 'api_error' : codeReason;
-  analytics.recordTermination(reason, error.code, error.apiContext);
-  process.exit(exit);
+  try {
+    analytics.recordTermination(reason, error.code, error.apiContext);
+  } finally {
+    process.exit(exit);
+  }
 }

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c45afda9-659c-43a0-9646-cd08dd0cabe3

📥 Commits

Reviewing files that changed from the base of the PR and between 524c709 and 31829ec.

📒 Files selected for processing (27)
  • CLAUDE.md
  • src/bin.ts
  • src/commands/debug.ts
  • src/lib/api-error-handler.spec.ts
  • src/lib/api-error-handler.ts
  • src/lib/command-aliases.ts
  • src/lib/device-id.spec.ts
  • src/lib/device-id.ts
  • src/lib/run-with-core.ts
  • src/utils/analytics.spec.ts
  • src/utils/analytics.ts
  • src/utils/command-telemetry.spec.ts
  • src/utils/command-telemetry.ts
  • src/utils/crash-reporter.spec.ts
  • src/utils/crash-reporter.ts
  • src/utils/exit-codes.spec.ts
  • src/utils/exit-codes.ts
  • src/utils/output.spec.ts
  • src/utils/output.ts
  • src/utils/register-subcommand.ts
  • src/utils/telemetry-client.spec.ts
  • src/utils/telemetry-client.ts
  • src/utils/telemetry-sanitize.spec.ts
  • src/utils/telemetry-schema.spec.ts
  • src/utils/telemetry-store-forward.spec.ts
  • src/utils/telemetry-store-forward.ts
  • src/utils/telemetry-types.ts

Comment thread src/lib/device-id.ts Outdated
Comment thread src/lib/device-id.ts
Comment on lines +31 to +43
try {
if (fs.existsSync(filePath)) {
const raw = fs.readFileSync(filePath, 'utf8').trim();
if (UUID_REGEX.test(raw)) {
cached = raw;
return raw;
}
}

const id = crypto.randomUUID();
fs.mkdirSync(path.dirname(filePath), { recursive: true, mode: 0o700 });
fs.writeFileSync(filePath, id, { encoding: 'utf8', mode: 0o600 });
cached = id;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Replace sync fs usage in the command path.

Line 31–43 and Line 49 use synchronous fs/crypto APIs on a hot CLI path. Please move this to async (node:fs/promises) and cache a pending promise to keep call sites simple.

As per coding guidelines src/**/*.ts: “Avoid Node-specific sync APIs (crypto, fs sync) unless necessary”.

Also applies to: 49-50

Comment thread src/utils/analytics.ts
Comment thread src/utils/command-telemetry.ts
Comment thread src/utils/command-telemetry.ts Outdated
Comment thread src/utils/crash-reporter.ts Outdated
Comment thread src/utils/exit-codes.ts Outdated
Comment on lines +85 to 94
async flush(): Promise<boolean> {
if (this.events.length === 0) return true;
if (!this.gatewayUrl) {
debug('[Telemetry] No gateway URL configured, skipping flush');
return;
return false;
}

const payload: TelemetryRequest = { events: [...this.events] };
this.events = [];
const count = this.events.length;
const payload: TelemetryRequest = { events: this.events.slice(0, count) };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Guard against concurrent flush() calls to prevent event loss/duplication.

Line 85 currently allows overlapping flushes. Two concurrent calls can send the same snapshot twice, and later splice(0, count) can remove events queued after the first flush.

Suggested patch
 export class TelemetryClient {
   private events: TelemetryEvent[] = [];
+  private flushInFlight: Promise<boolean> | null = null;
@@
   async flush(): Promise<boolean> {
+    if (this.flushInFlight) return this.flushInFlight;
+    this.flushInFlight = this.flushInternal();
+    try {
+      return await this.flushInFlight;
+    } finally {
+      this.flushInFlight = null;
+    }
+  }
+
+  private async flushInternal(): Promise<boolean> {
     if (this.events.length === 0) return true;
@@
-  }
+  }
 }

Also applies to: 135-137, 143-144

Comment thread src/utils/telemetry-client.ts Outdated
@greptile-apps

greptile-apps Bot commented May 12, 2026

Copy link
Copy Markdown

Greptile Summary

This PR replaces a fragile per-handler wrapCommandHandler() telemetry chain with a single centralized lifecycle (runCli()) that owns timing, success/failure classification, and event emission. All command handlers are now plain async functions with no telemetry awareness; a single try/catch/finally in runCli() emits one command event per invocation.

  • New telemetry infrastructure: CliExit typed exception, TelemetryClient flush coalescing (flushInFlight), PID-based store-and-forward persistence, global crash reporter with secret-sanitized stack traces, and a stable device-id for correlation across sessions.
  • Richer event schema: CommandEvent and CrashEvent added to the discriminated union; no_api_key correctly mapped to auth_required.
  • Auth improvements: configureAuthFromAvailableSources() cascades through JWT → claim-token → API-key; expired JWTs fall through rather than 401-ing the telemetry gateway.

Confidence Score: 4/5

The centralized runCli() lifecycle is a clear improvement, but several edge cases from earlier review rounds remain unresolved.

apiContext fields are still serialized into user-visible JSON error output; the global crash-reporter handlers still exit silently for crashes escaping runCli(); recoverPendingEvents() lacks a telemetry-enabled guard causing a bounce loop when disabled; and exitWithCode() in async dev.ts listeners bypasses the finally flush. The new infrastructure — flushInFlight coalescing, store-forward, CliExit typed exception, and secret sanitization — is well-designed.

src/utils/output.ts (apiContext in JSON output), src/utils/crash-reporter.ts (silent exit on real crashes), src/utils/telemetry-store-forward.ts and src/bin.ts (bounce loop when telemetry disabled), src/commands/dev.ts (async listener bypass).

Important Files Changed

Filename Overview
src/bin.ts Central refactor: yargs chain wrapped in runCli() with exitProcess(false) + single try/catch/finally for telemetry. recoverPendingEvents() call lacks a WORKOS_TELEMETRY_ENABLED guard (bounce loop when disabled).
src/utils/crash-reporter.ts New file — global handlers correctly distinguish CliExit from real crashes. Handlers call process.exit(1) without writing to stderr, so genuine crashes beyond runCli()'s catch remain silent to the user.
src/utils/telemetry-store-forward.ts New file — PID-based pending-file recovery with age/count caps. No WORKOS_TELEMETRY_ENABLED guard; events bounce between disk and memory when telemetry is disabled.
src/utils/telemetry-client.ts flushInFlight coalescing correctly prevents concurrent duplicate sends; splice(0,count) snapshot preserves events added during in-flight flush.
src/utils/analytics.ts extractErrorFields() sanitizes before debug logging (previous thread fixed). configureAuthFromAvailableSources() cascades auth sources correctly; expired JWTs fall through to claim-token/api-key.
src/utils/output.ts exitWithError() passes the full object including apiContext to outputError(); JSON.stringify serializes apiContext into stderr visible to automation consumers.
src/commands/dev.ts exitWithCode() in fire-and-forget async listeners throws CliExit as an unhandled rejection, bypassing runCli()'s finally block; command event reaches the backend only via store-forward on the next run.
src/lib/api-error-handler.ts normalizeApiError() deduplicates error handling cleanly. apiContext propagated to CliExit for telemetry, but also appears in user-visible JSON output via outputError().
src/utils/command-telemetry.ts New file — resolveCommandNameFromRawArgs() safely matches argv tokens against the command registry. extractUserFlags() captures flag names only. SKIP_TELEMETRY_COMMANDS correctly guards install/dashboard/root.
src/lib/device-id.ts New file — async loadDeviceId() pre-warms cache read by synchronous getDeviceId(). UUID v4 format validated; falls back to session-scoped UUID on IO failure.

Sequence Diagram

sequenceDiagram
    participant bin as bin.ts
    participant crash as CrashReporter
    participant sf as StoreForward
    participant recover as recoverPendingEvents
    participant cli as runCli()
    participant yargs as yargs
    participant handler as CommandHandler
    participant analytics as Analytics
    participant tc as TelemetryClient
    bin->>crash: installCrashReporter()
    bin->>sf: installStoreForward()
    bin->>analytics: initForNonInstaller()
    bin->>recover: recoverPendingEvents() no-await
    recover->>tc: queueEvents + flush()
    bin->>cli: await runCli()
    cli->>yargs: parseAsync(rawArgs)
    yargs->>handler: dispatch
    handler-->>cli: resolve or throw CliExit
    cli->>analytics: emitCommandEvent()
    cli->>tc: flush() coalesced
    sf-->>tc: persistToFile on exit if events remain
Loading

Reviews (15): Last reviewed commit: "refactor(install): exit via exitWithCode..." | Re-trigger Greptile

Comment thread src/utils/output.ts
Comment thread src/bin.ts
Comment thread src/utils/command-telemetry.ts Outdated
Comment thread src/utils/telemetry-client.ts Outdated
nicknisi added a commit that referenced this pull request May 13, 2026
Closes security-audit finding #1 on PR #122 (telemetry message
sanitization). `error.message` was flowing into 4 capture sites
unsanitized, leaking homedir paths (and rarely, credentials) to the
WorkOS gateway.

- Add `sanitizeMessage()` in crash-reporter.ts: homedir strip + Bearer/
  sk_*/JWT redaction + 1KB truncation.
- Factor secret redaction into shared `redactSecrets()` used by both
  `sanitizeMessage` and `sanitizeStack` (Node echoes `.message` into the
  leading `Error.stack` line, so message-only sanitization was
  insufficient).
- Add private `extractErrorFields()` chokepoint on `Analytics`; route
  all 4 capture sites through it (`captureException`, `stepCompleted`,
  `commandExecuted`, `captureUnhandledCrash`). `replaceLastCommandEvent`
  inherits sanitization via its delegation to `commandExecuted`.
- `captureUnhandledCrash` now uses `sanitizeStack` instead of inline
  truncation, providing defense-in-depth for callers that bypass the
  crash-reporter wrapper.
- Add regression guard test (`telemetry-sanitize.spec.ts`): poisons
  every capture method with homedir + Bearer + sk_live_ + JWT, asserts
  no marker reaches the serialized queue.

Reviewed: ideation:reviewer cycle 1 PASS (0 critical, 0 high).
@nicknisi nicknisi force-pushed the nicknisi/telemetry branch from 31829ec to 0475ea8 Compare May 13, 2026 03:20

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e6b7b384-44b3-47fe-9030-070bd92bf44e

📥 Commits

Reviewing files that changed from the base of the PR and between 31829ec and 0475ea8.

📒 Files selected for processing (39)
  • CLAUDE.md
  • src/bin.ts
  • src/commands/api/index.spec.ts
  • src/commands/api/index.ts
  • src/commands/api/interactive.spec.ts
  • src/commands/api/interactive.ts
  • src/commands/debug.ts
  • src/commands/dev.ts
  • src/commands/doctor.ts
  • src/commands/emulate.ts
  • src/commands/env.ts
  • src/commands/install-skill.ts
  • src/commands/login.ts
  • src/commands/uninstall-skill.ts
  • src/lib/api-error-handler.spec.ts
  • src/lib/api-error-handler.ts
  • src/lib/command-aliases.ts
  • src/lib/device-id.spec.ts
  • src/lib/device-id.ts
  • src/lib/run-with-core.ts
  • src/utils/analytics.spec.ts
  • src/utils/analytics.ts
  • src/utils/command-telemetry.spec.ts
  • src/utils/command-telemetry.ts
  • src/utils/crash-reporter.spec.ts
  • src/utils/crash-reporter.ts
  • src/utils/exit-codes.spec.ts
  • src/utils/exit-codes.ts
  • src/utils/help-json.ts
  • src/utils/output.spec.ts
  • src/utils/output.ts
  • src/utils/register-subcommand.ts
  • src/utils/telemetry-client.spec.ts
  • src/utils/telemetry-client.ts
  • src/utils/telemetry-sanitize.spec.ts
  • src/utils/telemetry-schema.spec.ts
  • src/utils/telemetry-store-forward.spec.ts
  • src/utils/telemetry-store-forward.ts
  • src/utils/telemetry-types.ts
✅ Files skipped from review due to trivial changes (1)
  • src/commands/emulate.ts
🚧 Files skipped from review as they are similar to previous changes (22)
  • CLAUDE.md
  • src/lib/command-aliases.ts
  • src/utils/register-subcommand.ts
  • src/utils/telemetry-client.spec.ts
  • src/utils/crash-reporter.spec.ts
  • src/commands/debug.ts
  • src/utils/telemetry-schema.spec.ts
  • src/utils/telemetry-client.ts
  • src/utils/telemetry-store-forward.spec.ts
  • src/lib/device-id.spec.ts
  • src/utils/output.spec.ts
  • src/utils/telemetry-store-forward.ts
  • src/utils/command-telemetry.spec.ts
  • src/lib/api-error-handler.spec.ts
  • src/lib/device-id.ts
  • src/lib/api-error-handler.ts
  • src/utils/telemetry-sanitize.spec.ts
  • src/utils/crash-reporter.ts
  • src/utils/analytics.ts
  • src/utils/telemetry-types.ts
  • src/utils/exit-codes.ts
  • src/utils/analytics.spec.ts

Comment thread src/commands/doctor.ts
Comment thread src/lib/run-with-core.ts Outdated
Comment thread src/utils/crash-reporter.ts
nicknisi added a commit that referenced this pull request May 27, 2026
Closes security-audit finding #1 on PR #122 (telemetry message
sanitization). `error.message` was flowing into 4 capture sites
unsanitized, leaking homedir paths (and rarely, credentials) to the
WorkOS gateway.

- Add `sanitizeMessage()` in crash-reporter.ts: homedir strip + Bearer/
  sk_*/JWT redaction + 1KB truncation.
- Factor secret redaction into shared `redactSecrets()` used by both
  `sanitizeMessage` and `sanitizeStack` (Node echoes `.message` into the
  leading `Error.stack` line, so message-only sanitization was
  insufficient).
- Add private `extractErrorFields()` chokepoint on `Analytics`; route
  all 4 capture sites through it (`captureException`, `stepCompleted`,
  `commandExecuted`, `captureUnhandledCrash`). `replaceLastCommandEvent`
  inherits sanitization via its delegation to `commandExecuted`.
- `captureUnhandledCrash` now uses `sanitizeStack` instead of inline
  truncation, providing defense-in-depth for callers that bypass the
  crash-reporter wrapper.
- Add regression guard test (`telemetry-sanitize.spec.ts`): poisons
  every capture method with homedir + Bearer + sk_live_ + JWT, asserts
  no marker reaches the serialized queue.

Reviewed: ideation:reviewer cycle 1 PASS (0 critical, 0 high).
@nicknisi nicknisi force-pushed the nicknisi/telemetry branch from 0475ea8 to 8003788 Compare May 27, 2026 21:34

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6ce40944-6658-4efe-aa8c-e9db5c81063a

📥 Commits

Reviewing files that changed from the base of the PR and between 0475ea8 and 8003788.

📒 Files selected for processing (39)
  • CLAUDE.md
  • src/bin.ts
  • src/commands/api/index.spec.ts
  • src/commands/api/index.ts
  • src/commands/api/interactive.spec.ts
  • src/commands/api/interactive.ts
  • src/commands/debug.ts
  • src/commands/dev.ts
  • src/commands/doctor.ts
  • src/commands/emulate.ts
  • src/commands/env.ts
  • src/commands/install-skill.ts
  • src/commands/login.ts
  • src/commands/uninstall-skill.ts
  • src/lib/api-error-handler.spec.ts
  • src/lib/api-error-handler.ts
  • src/lib/command-aliases.ts
  • src/lib/device-id.spec.ts
  • src/lib/device-id.ts
  • src/lib/run-with-core.ts
  • src/utils/analytics.spec.ts
  • src/utils/analytics.ts
  • src/utils/command-telemetry.spec.ts
  • src/utils/command-telemetry.ts
  • src/utils/crash-reporter.spec.ts
  • src/utils/crash-reporter.ts
  • src/utils/exit-codes.spec.ts
  • src/utils/exit-codes.ts
  • src/utils/help-json.ts
  • src/utils/output.spec.ts
  • src/utils/output.ts
  • src/utils/register-subcommand.ts
  • src/utils/telemetry-client.spec.ts
  • src/utils/telemetry-client.ts
  • src/utils/telemetry-sanitize.spec.ts
  • src/utils/telemetry-schema.spec.ts
  • src/utils/telemetry-store-forward.spec.ts
  • src/utils/telemetry-store-forward.ts
  • src/utils/telemetry-types.ts
✅ Files skipped from review due to trivial changes (3)
  • src/commands/api/index.spec.ts
  • CLAUDE.md
  • src/commands/api/interactive.spec.ts
🚧 Files skipped from review as they are similar to previous changes (31)
  • src/commands/debug.ts
  • src/commands/doctor.ts
  • src/commands/emulate.ts
  • src/utils/telemetry-store-forward.spec.ts
  • src/commands/uninstall-skill.ts
  • src/commands/install-skill.ts
  • src/lib/command-aliases.ts
  • src/commands/api/index.ts
  • src/utils/output.ts
  • src/utils/help-json.ts
  • src/utils/exit-codes.spec.ts
  • src/commands/env.ts
  • src/commands/login.ts
  • src/utils/command-telemetry.ts
  • src/utils/telemetry-types.ts
  • src/utils/crash-reporter.spec.ts
  • src/commands/api/interactive.ts
  • src/commands/dev.ts
  • src/utils/exit-codes.ts
  • src/lib/run-with-core.ts
  • src/utils/telemetry-schema.spec.ts
  • src/utils/telemetry-store-forward.ts
  • src/utils/crash-reporter.ts
  • src/lib/api-error-handler.ts
  • src/utils/telemetry-client.spec.ts
  • src/lib/device-id.spec.ts
  • src/utils/telemetry-client.ts
  • src/lib/api-error-handler.spec.ts
  • src/lib/device-id.ts
  • src/utils/analytics.ts
  • src/bin.ts

Comment thread src/utils/command-telemetry.spec.ts Outdated
Comment thread src/utils/analytics.ts Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/utils/command-telemetry.ts (1)

12-17: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Harden flag extraction to avoid invalid telemetry flags.

The current filter accepts -- (which becomes empty string after processing) and numeric short flags like -1, -2 (which become '1', '2') that pollute the telemetry command.flags attribute with invalid entries.

🛡️ Suggested fix from previous review
 export function extractUserFlags(rawArgs: string[]): string[] {
   const passedFlags = rawArgs
-    .filter((arg) => arg.startsWith('--') || (arg.startsWith('-') && arg.length === 2))
-    .map((arg) => arg.replace(/^-+/, '').split('=')[0]);
+    .filter((arg) => {
+      if (arg === '--') return false;
+      if (/^--[A-Za-z][\w-]*(=.*)?$/.test(arg)) return true;
+      if (/^-[A-Za-z]$/.test(arg)) return true;
+      return false;
+    })
+    .map((arg) => arg.replace(/^-+/, '').split('=')[0])
+    .filter(Boolean);
   return [...new Set(passedFlags)];
 }

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a84b559e-a9da-49e6-a70b-1b60ba4b833d

📥 Commits

Reviewing files that changed from the base of the PR and between c7804a2 and 6a80d3f.

📒 Files selected for processing (27)
  • CLAUDE.md
  • src/bin.ts
  • src/commands/api/index.spec.ts
  • src/commands/api/interactive.spec.ts
  • src/commands/connection.spec.ts
  • src/commands/directory.spec.ts
  • src/commands/doctor.ts
  • src/commands/env.spec.ts
  • src/commands/install.spec.ts
  • src/commands/install.ts
  • src/commands/membership.spec.ts
  • src/commands/seed.spec.ts
  • src/commands/uninstall-skill.spec.ts
  • src/lib/api-error-handler.spec.ts
  • src/utils/analytics.spec.ts
  • src/utils/analytics.ts
  • src/utils/cli-exit.spec.ts
  • src/utils/cli-exit.ts
  • src/utils/command-telemetry.spec.ts
  • src/utils/command-telemetry.ts
  • src/utils/exit-codes.spec.ts
  • src/utils/exit-codes.ts
  • src/utils/output.spec.ts
  • src/utils/output.ts
  • src/utils/telemetry-client.spec.ts
  • src/utils/telemetry-client.ts
  • src/utils/telemetry-sanitize.spec.ts
💤 Files with no reviewable changes (2)
  • src/utils/telemetry-client.spec.ts
  • src/utils/telemetry-client.ts
✅ Files skipped from review due to trivial changes (1)
  • CLAUDE.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/utils/command-telemetry.spec.ts

Comment thread src/commands/install.ts Outdated
Comment thread src/commands/dev.ts
@nicknisi nicknisi changed the title feat: CLI telemetry for all commands + crash reporting feat: CLI telemetry for all commands with centralized lifecycle May 28, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4dc74551-d649-4729-a366-dc7e375c36f4

📥 Commits

Reviewing files that changed from the base of the PR and between c7804a2 and 41379b1.

📒 Files selected for processing (28)
  • CLAUDE.md
  • README.md
  • src/bin.ts
  • src/commands/api/index.spec.ts
  • src/commands/api/interactive.spec.ts
  • src/commands/connection.spec.ts
  • src/commands/directory.spec.ts
  • src/commands/doctor.ts
  • src/commands/env.spec.ts
  • src/commands/install.spec.ts
  • src/commands/install.ts
  • src/commands/membership.spec.ts
  • src/commands/seed.spec.ts
  • src/commands/uninstall-skill.spec.ts
  • src/lib/api-error-handler.spec.ts
  • src/utils/analytics.spec.ts
  • src/utils/analytics.ts
  • src/utils/cli-exit.spec.ts
  • src/utils/cli-exit.ts
  • src/utils/command-telemetry.spec.ts
  • src/utils/command-telemetry.ts
  • src/utils/exit-codes.spec.ts
  • src/utils/exit-codes.ts
  • src/utils/output.spec.ts
  • src/utils/output.ts
  • src/utils/telemetry-client.spec.ts
  • src/utils/telemetry-client.ts
  • src/utils/telemetry-sanitize.spec.ts
💤 Files with no reviewable changes (2)
  • src/utils/telemetry-client.spec.ts
  • src/utils/telemetry-client.ts
✅ Files skipped from review due to trivial changes (1)
  • CLAUDE.md
🚧 Files skipped from review as they are similar to previous changes (22)
  • src/commands/seed.spec.ts
  • src/utils/cli-exit.spec.ts
  • src/commands/directory.spec.ts
  • src/commands/doctor.ts
  • src/utils/telemetry-sanitize.spec.ts
  • src/commands/membership.spec.ts
  • src/commands/api/interactive.spec.ts
  • src/utils/command-telemetry.spec.ts
  • src/commands/uninstall-skill.spec.ts
  • src/utils/command-telemetry.ts
  • src/commands/install.spec.ts
  • src/utils/cli-exit.ts
  • src/commands/install.ts
  • src/utils/output.ts
  • src/utils/exit-codes.spec.ts
  • src/lib/api-error-handler.spec.ts
  • src/commands/connection.spec.ts
  • src/utils/analytics.spec.ts
  • src/utils/output.spec.ts
  • src/utils/analytics.ts
  • src/commands/api/index.spec.ts
  • src/commands/env.spec.ts

Comment thread README.md Outdated
Comment thread src/utils/exit-codes.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/utils/analytics.spec.ts (1)

518-548: ⚡ Quick win

Improve type safety in mock call assertions.

The any type annotation in the find() callbacks bypasses TypeScript's type checking. Consider defining a type for the queued event structure or using a more specific type.

♻️ Proposed improvement

Define a helper type at the top of the test file:

+type QueuedEvent = {
+  type: string;
+  attributes: Record<string, unknown>;
+  [key: string]: unknown;
+};

Then update the find callbacks:

-        const event = mockQueueEvent.mock.calls.find((c: any) => c[0].type === 'command')[0];
+        const event = mockQueueEvent.mock.calls.find((c: [QueuedEvent]) => c[0].type === 'command')?.[0];

Apply the same pattern to lines 528, 539, and 548.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a3ddcfb0-a669-4a6b-a9d4-f54c043cb0bc

📥 Commits

Reviewing files that changed from the base of the PR and between c7804a2 and 849f632.

📒 Files selected for processing (29)
  • CLAUDE.md
  • README.md
  • src/bin.ts
  • src/commands/api/index.spec.ts
  • src/commands/api/interactive.spec.ts
  • src/commands/connection.spec.ts
  • src/commands/directory.spec.ts
  • src/commands/doctor.ts
  • src/commands/env.spec.ts
  • src/commands/install.spec.ts
  • src/commands/install.ts
  • src/commands/membership.spec.ts
  • src/commands/seed.spec.ts
  • src/commands/uninstall-skill.spec.ts
  • src/lib/api-error-handler.spec.ts
  • src/utils/analytics.spec.ts
  • src/utils/analytics.ts
  • src/utils/cli-exit.spec.ts
  • src/utils/cli-exit.ts
  • src/utils/command-telemetry.spec.ts
  • src/utils/command-telemetry.ts
  • src/utils/crash-reporter.ts
  • src/utils/exit-codes.spec.ts
  • src/utils/exit-codes.ts
  • src/utils/output.spec.ts
  • src/utils/output.ts
  • src/utils/telemetry-client.spec.ts
  • src/utils/telemetry-client.ts
  • src/utils/telemetry-sanitize.spec.ts
✅ Files skipped from review due to trivial changes (1)
  • README.md
🚧 Files skipped from review as they are similar to previous changes (26)
  • src/utils/cli-exit.spec.ts
  • CLAUDE.md
  • src/utils/output.ts
  • src/commands/doctor.ts
  • src/commands/env.spec.ts
  • src/commands/api/interactive.spec.ts
  • src/utils/exit-codes.ts
  • src/commands/connection.spec.ts
  • src/utils/command-telemetry.spec.ts
  • src/utils/cli-exit.ts
  • src/commands/seed.spec.ts
  • src/commands/install.ts
  • src/commands/directory.spec.ts
  • src/utils/telemetry-client.spec.ts
  • src/utils/crash-reporter.ts
  • src/commands/install.spec.ts
  • src/utils/output.spec.ts
  • src/lib/api-error-handler.spec.ts
  • src/utils/exit-codes.spec.ts
  • src/utils/telemetry-sanitize.spec.ts
  • src/commands/uninstall-skill.spec.ts
  • src/commands/membership.spec.ts
  • src/utils/command-telemetry.ts
  • src/commands/api/index.spec.ts
  • src/utils/analytics.ts
  • src/utils/telemetry-client.ts

@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown

Actionable comments posted: 0

1 similar comment
@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown

Comment thread src/utils/telemetry-store-forward.ts
nicknisi added 8 commits June 1, 2026 11:39
…amps, and new event types

Add environment fingerprint fields (OS, Node version, CI detection, shell)
to session.start and session.end events. Add startTimestamp to step and
agent.tool events for span reconstruction. Define command and crash event
types with stub emission methods. Add discriminated union Zod schema
validation tests mirroring the API schema.
… persistence

Wire up yargs middleware that emits a provisional command event before each
handler runs, then replaces it with actual duration/success on completion.
This covers the ~25 process.exit() call sites without modifying them.

- Command telemetry middleware with canonical name resolution and flag extraction
- Crash reporter with sanitized stack traces (sync handlers, no async)
- Store-forward: persist unsent events to temp file on exit, recover on next run
- Fix flush() to retain events until HTTP success (was clearing before fetch)
- Auto-wrap handlers in registerSubcommand() (single change point)
- Shared COMMAND_ALIASES map for telemetry and help-json
- analytics.initForNonInstaller() sets gatewayUrl + JWT from stored creds
- Enable debug output for non-installer commands via env var
- Log telemetry event details (type, name, duration, attributes) on flush
- Register in debug command's env var catalog
- Wrap inline command handlers (seed, setup-org, doctor, etc.) with
  wrapCommandHandler so they report real duration/success
- Skip provisional telemetry event for install command (has own session telemetry)
- Add claim -> env.claim to canonical alias map
- Defer store-forward file deletion until after successful flush
Client errors (401, 403) are permanent failures that won't succeed on
retry. Only retain events for 5xx (transient server errors) and network
failures where store-forward retry is meaningful.
- flush() returns true (sent/dropped) or false (retryable) so callers
  can act on the result
- Use splice(0, count) instead of clearing all events, protecting
  events queued concurrently during the fetch
- wrapCommandHandler flushes in-process so events are sent immediately
  instead of always deferring to next invocation via store-forward
- Store-forward recovery deletes files after loading into memory
  (events are re-persisted by exit handler if flush fails)
- Skip provisional events for dashboard and $0 (installer entry points)
- Add 4xx drop test coverage
Add a section to CLAUDE.md explaining which commands auto-emit telemetry
(registerSubcommand) versus which need manual wrapCommandHandler wrapping
(inline top-level .command() calls). Add a pointer comment in bin.ts near
the workflow commands block.

Prevents new top-level commands from silently emitting duration=0 telemetry.
- Add workos.user_id to command and crash events (from stored credentials
  or unclaimed environment clientId) so dashboards can count unique users
- Add cli.version to command and crash events for release adoption tracking
- Support claim-token auth path on the telemetry client, so unclaimed
  environments' telemetry reaches the API (guard accepts this path too)
- Rename CrashEvent's installer.version to cli.version (crashes happen
  outside the installer too)
- initForNonInstaller() now wires up user_id and claim-token auth
nicknisi added 13 commits June 1, 2026 11:41
Wraps the yargs chain in runCli(), captures the canonical command name via
middleware, emits a single command telemetry event from the lifecycle
try/catch/finally (success, CliExit, or crash), and flushes the telemetry
client before exiting. Removes per-handler wrapCommandHandler() wrappers and
the legacy commandTelemetryMiddleware. Also drops process.exit(0) calls in
handlers that no longer need to short-circuit since exit is now centralized.
Migrate 12 spec files from process.exit spies to CliExit throw
assertions, matching the exitWithCode/exitWithError refactor.
- .fail() now re-throws CliExit so handler exits preserve their
  context (reason, errorCode, apiContext) instead of being replaced
  with generic validation_error
- Remove unconditional process.exit() from finally block so
  long-running commands (dev, emulate) stay alive after wiring
  listeners; their signal handlers still call process.exit() directly
- doctor.ts catch block re-throws CliExit so exitWithCode(SUCCESS)
  doesn't get caught and re-classified as GENERAL_ERROR
exitWithError outside runCli() threw CliExit into the crash reporter,
recording intentional validation errors as crash events. Now caught
and exited cleanly before the crash reporter sees it.
…dation

exitWithError outside runCli() threw CliExit into the crash reporter,
recording intentional validation errors as crash events. Use
outputError + process.exit directly since this runs before the
centralized lifecycle exists.
…API caps

- crash-reporter: cap sanitized stack (marker included) at 4096 so the API's
  per-attribute Zod limit can't silently drop oversized crash events; also
  collapse Windows node_modules/dist/src paths; treat a CliExit that reaches
  the global handlers as an intentional exit rather than a crash (fixes false
  crash telemetry from dev's fire-and-forget child `error` listener)
- exit-codes: map `no_api_key` to `auth_required`, not `validation_error`
- analytics: sanitize error message before the WORKOS_DEBUG log line
- doctor: emit structured `{ error: { code, message } }` in JSON mode
- command-telemetry: harden extractUserFlags (ignore `--` and negative values)
- telemetry-client: persist store-forward file with 0700/0600 modes
- README: scope command-event wording to telemetry-enabled commands
Telemetry now posts to a dedicated /cli/telemetry endpoint (WORKOS_TELEMETRY_URL,
default https://api.workos.com/cli) rather than sharing the LLM gateway route.
WORKOS_LLM_GATEWAY_URL stays scoped to doctor/install LLM proxy traffic.

The client sends x-workos-api-key when auth mode is API key and no JWT or
claim-token transport is available, so API-key-only cohorts (CI, headless)
finally deliver telemetry. Auth precedence: JWT > claim token > API key.
Record the detected framework on session.end (read from the final state
machine snapshot) so the API can break install metrics down by framework
and compute success rate per integration. Absent when a session aborts
before detection runs.
@nicknisi nicknisi force-pushed the nicknisi/telemetry branch from c36368d to cfed292 Compare June 1, 2026 16:46
Comment thread src/bin.ts
nicknisi added 3 commits June 1, 2026 14:00
An expired access token 401s against the telemetry guard, which drops the
event permanently (4xx). Gate the JWT auth branch on token validity so an
expired token falls through to claim-token / api-key auth, while still
preserving the user identity (distinctId) for attribution. The client's
flush() applies the same check when reading fresh credentials mid-session.
clearCredentials()/saveCredentials() operate on the live `workos-cli`
keychain entry regardless of file-fallback settings, so any spec exercising
them wiped the developer's real CLI login (mocking node:os homedir only
isolates the file path, not the OS keychain).

Add a global setup file that swaps @napi-rs/keyring for an in-memory mock for
every spec, plus a guardrail test that fails if the setup is ever removed
(asserts the __IS_TEST_MOCK__ sentinel) rather than silently re-arming the
footgun.
…al command

yargs runs demand/strict validation before dispatching middleware, so the
command-name middleware never ran on a validation failure and the event was
recorded as 'root' (which SKIP_TELEMETRY_COMMANDS drops). We were blind to
every misused command. Recover the top-level command from raw argv in the
fail handler. Only the top-level token is used: later positionals can be user
values (org names, emails, IDs), so recording them would leak data and
explode cardinality.

Add 'debug simulate --crash' (throws a plain Error) to exercise the
unhandled-crash path, and an integration test driving the real CLI lifecycle
as a subprocess that asserts both the validation_error attribution and the
crash event (with a redacted stack), observed via the store-forward payload.
@nicknisi nicknisi force-pushed the nicknisi/telemetry branch from ab47f6c to 199b9d9 Compare June 1, 2026 19:43
nicknisi added 6 commits June 1, 2026 15:02
The validation-failure command-name recovery selected the first non-flag
token, so a value-taking option before the command (e.g.
`workos --api-key sk_live_… organization create` or `--mode ci organization`)
made command.name the option value — leaking a credential into telemetry,
which is then sent to the backend. Match tokens against the known top-level
command registry instead: return the first token that resolves to a real
command, else 'root' (skipped). Option values, secrets, and typos can never
be recorded, which also bounds command.name cardinality.
A command handler throwing a plain Error recorded crash telemetry and set
exit code 1 but printed nothing, so a real crash (or
`WORKOS_TELEMETRY=false workos debug simulate --crash`) left the user with an
empty stderr and no diagnostic. Emit a sanitized structured error (secrets and
paths stripped) on the crash path; full details remain in the crash log and
telemetry.

Also harden the CLI-lifecycle integration test: isolate the spawned process's
HOME so it can't read/write the developer's real ~/.workos (device-id,
credentials), and force WORKOS_TELEMETRY=true so an inherited opt-out can't
make the test silently pass with no event.
Harden the lifecycle integration test per review: build an explicit minimal
env for the child (no inherited host env) and preload a force-insecure-storage
module so the subprocess startup uses file-only credential/config storage
inside the temp HOME instead of reading the real OS keychain. Removes the last
non-hermetic path in the spawned-CLI test.
Two overlapping flush() calls snapshot and POST the same events twice, and a
later splice() could drop events queued after the first flush began. Share a
single in-flight promise so concurrent callers await the same flush.
Add an async, memoized loadDeviceId() (fs/promises) and prewarm it at startup
so the synchronous telemetry event path reads the id from cache instead of
doing blocking fs IO. getDeviceId() stays synchronous as a cold fallback and
still returns the persisted id, so device.id never fragments.
…directly

Command handlers should terminate through the exit helpers so lifecycle
classification stays consistent (per CLAUDE.md handler guidance).
@nicknisi nicknisi merged commit 25604bd into main Jun 1, 2026
6 checks passed
@nicknisi nicknisi deleted the nicknisi/telemetry branch June 1, 2026 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant