feat: CLI telemetry for all commands with centralized lifecycle#122
Conversation
Closes security-audit finding #1 on PR #122 (telemetry message sanitization). `error.message` was flowing into 4 capture sites unsanitized, leaking homedir paths (and rarely, credentials) to the WorkOS gateway. - Add `sanitizeMessage()` in crash-reporter.ts: homedir strip + Bearer/ sk_*/JWT redaction + 1KB truncation. - Factor secret redaction into shared `redactSecrets()` used by both `sanitizeMessage` and `sanitizeStack` (Node echoes `.message` into the leading `Error.stack` line, so message-only sanitization was insufficient). - Add private `extractErrorFields()` chokepoint on `Analytics`; route all 4 capture sites through it (`captureException`, `stepCompleted`, `commandExecuted`, `captureUnhandledCrash`). `replaceLastCommandEvent` inherits sanitization via its delegation to `commandExecuted`. - `captureUnhandledCrash` now uses `sanitizeStack` instead of inline truncation, providing defense-in-depth for callers that bypass the crash-reporter wrapper. - Add regression guard test (`telemetry-sanitize.spec.ts`): poisons every capture method with homedir + Bearer + sk_live_ + JWT, asserts no marker reaches the serialized queue. Reviewed: ideation:reviewer cycle 1 PASS (0 critical, 0 high).
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (4)
src/utils/telemetry-types.ts (1)
58-58: ⚡ Quick winAlign
startTimestampoptionality with backward-compat contract.
telemetry-schema.spec.tsexplicitly acceptsstep/agent.toolwithoutstartTimestamp, but these interfaces currently require it. Making both optional avoids contract drift.Suggested patch
export interface StepEvent extends TelemetryEvent { type: 'step'; name: string; - startTimestamp: string; + startTimestamp?: string; durationMs: number; success: boolean; error?: { @@ export interface AgentToolEvent extends TelemetryEvent { type: 'agent.tool'; toolName: string; - startTimestamp: string; + startTimestamp?: string; durationMs: number; success: boolean; }Also applies to: 70-70
src/utils/telemetry-client.spec.ts (1)
60-105: ⚡ Quick winAdd coverage for claim-token auth headers.
setClaimTokenAuth()is new behavior in this PR, but there’s no test assertingx-workos-claim-token+x-workos-client-idemission (and bearer-token precedence when both exist).Also applies to: 124-218
src/utils/telemetry-store-forward.ts (1)
1-3: ⚡ Quick winUse async fs APIs for startup recovery path.
recoverPendingEvents()(Line 25+) is async but uses sync disk calls (Lines 27-56), which blocks startup and conflicts with the project rule for TS files.As per coding guidelines "Avoid Node-specific sync APIs (crypto, fs sync) unless necessary".
Also applies to: 27-56
src/utils/output.ts (1)
123-124: ⚡ Quick winProtect process termination from telemetry failures
analytics.recordTermination(...)on Line 123 can throw and prevent Line 124 from executing. Error exits should remain deterministic even when telemetry is unhealthy.🔧 Proposed fix
const reason = error.apiContext ? 'api_error' : codeReason; - analytics.recordTermination(reason, error.code, error.apiContext); - process.exit(exit); + try { + analytics.recordTermination(reason, error.code, error.apiContext); + } finally { + process.exit(exit); + } }
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c45afda9-659c-43a0-9646-cd08dd0cabe3
📒 Files selected for processing (27)
CLAUDE.mdsrc/bin.tssrc/commands/debug.tssrc/lib/api-error-handler.spec.tssrc/lib/api-error-handler.tssrc/lib/command-aliases.tssrc/lib/device-id.spec.tssrc/lib/device-id.tssrc/lib/run-with-core.tssrc/utils/analytics.spec.tssrc/utils/analytics.tssrc/utils/command-telemetry.spec.tssrc/utils/command-telemetry.tssrc/utils/crash-reporter.spec.tssrc/utils/crash-reporter.tssrc/utils/exit-codes.spec.tssrc/utils/exit-codes.tssrc/utils/output.spec.tssrc/utils/output.tssrc/utils/register-subcommand.tssrc/utils/telemetry-client.spec.tssrc/utils/telemetry-client.tssrc/utils/telemetry-sanitize.spec.tssrc/utils/telemetry-schema.spec.tssrc/utils/telemetry-store-forward.spec.tssrc/utils/telemetry-store-forward.tssrc/utils/telemetry-types.ts
| try { | ||
| if (fs.existsSync(filePath)) { | ||
| const raw = fs.readFileSync(filePath, 'utf8').trim(); | ||
| if (UUID_REGEX.test(raw)) { | ||
| cached = raw; | ||
| return raw; | ||
| } | ||
| } | ||
|
|
||
| const id = crypto.randomUUID(); | ||
| fs.mkdirSync(path.dirname(filePath), { recursive: true, mode: 0o700 }); | ||
| fs.writeFileSync(filePath, id, { encoding: 'utf8', mode: 0o600 }); | ||
| cached = id; |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift
Replace sync fs usage in the command path.
Line 31–43 and Line 49 use synchronous fs/crypto APIs on a hot CLI path. Please move this to async (node:fs/promises) and cache a pending promise to keep call sites simple.
As per coding guidelines src/**/*.ts: “Avoid Node-specific sync APIs (crypto, fs sync) unless necessary”.
Also applies to: 49-50
| async flush(): Promise<boolean> { | ||
| if (this.events.length === 0) return true; | ||
| if (!this.gatewayUrl) { | ||
| debug('[Telemetry] No gateway URL configured, skipping flush'); | ||
| return; | ||
| return false; | ||
| } | ||
|
|
||
| const payload: TelemetryRequest = { events: [...this.events] }; | ||
| this.events = []; | ||
| const count = this.events.length; | ||
| const payload: TelemetryRequest = { events: this.events.slice(0, count) }; | ||
|
|
There was a problem hiding this comment.
Guard against concurrent flush() calls to prevent event loss/duplication.
Line 85 currently allows overlapping flushes. Two concurrent calls can send the same snapshot twice, and later splice(0, count) can remove events queued after the first flush.
Suggested patch
export class TelemetryClient {
private events: TelemetryEvent[] = [];
+ private flushInFlight: Promise<boolean> | null = null;
@@
async flush(): Promise<boolean> {
+ if (this.flushInFlight) return this.flushInFlight;
+ this.flushInFlight = this.flushInternal();
+ try {
+ return await this.flushInFlight;
+ } finally {
+ this.flushInFlight = null;
+ }
+ }
+
+ private async flushInternal(): Promise<boolean> {
if (this.events.length === 0) return true;
@@
- }
+ }
}Also applies to: 135-137, 143-144
Greptile SummaryThis PR replaces a fragile per-handler
Confidence Score: 4/5The centralized runCli() lifecycle is a clear improvement, but several edge cases from earlier review rounds remain unresolved. apiContext fields are still serialized into user-visible JSON error output; the global crash-reporter handlers still exit silently for crashes escaping runCli(); recoverPendingEvents() lacks a telemetry-enabled guard causing a bounce loop when disabled; and exitWithCode() in async dev.ts listeners bypasses the finally flush. The new infrastructure — flushInFlight coalescing, store-forward, CliExit typed exception, and secret sanitization — is well-designed. src/utils/output.ts (apiContext in JSON output), src/utils/crash-reporter.ts (silent exit on real crashes), src/utils/telemetry-store-forward.ts and src/bin.ts (bounce loop when telemetry disabled), src/commands/dev.ts (async listener bypass). Important Files Changed
|
Closes security-audit finding #1 on PR #122 (telemetry message sanitization). `error.message` was flowing into 4 capture sites unsanitized, leaking homedir paths (and rarely, credentials) to the WorkOS gateway. - Add `sanitizeMessage()` in crash-reporter.ts: homedir strip + Bearer/ sk_*/JWT redaction + 1KB truncation. - Factor secret redaction into shared `redactSecrets()` used by both `sanitizeMessage` and `sanitizeStack` (Node echoes `.message` into the leading `Error.stack` line, so message-only sanitization was insufficient). - Add private `extractErrorFields()` chokepoint on `Analytics`; route all 4 capture sites through it (`captureException`, `stepCompleted`, `commandExecuted`, `captureUnhandledCrash`). `replaceLastCommandEvent` inherits sanitization via its delegation to `commandExecuted`. - `captureUnhandledCrash` now uses `sanitizeStack` instead of inline truncation, providing defense-in-depth for callers that bypass the crash-reporter wrapper. - Add regression guard test (`telemetry-sanitize.spec.ts`): poisons every capture method with homedir + Bearer + sk_live_ + JWT, asserts no marker reaches the serialized queue. Reviewed: ideation:reviewer cycle 1 PASS (0 critical, 0 high).
31829ec to
0475ea8
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e6b7b384-44b3-47fe-9030-070bd92bf44e
📒 Files selected for processing (39)
CLAUDE.mdsrc/bin.tssrc/commands/api/index.spec.tssrc/commands/api/index.tssrc/commands/api/interactive.spec.tssrc/commands/api/interactive.tssrc/commands/debug.tssrc/commands/dev.tssrc/commands/doctor.tssrc/commands/emulate.tssrc/commands/env.tssrc/commands/install-skill.tssrc/commands/login.tssrc/commands/uninstall-skill.tssrc/lib/api-error-handler.spec.tssrc/lib/api-error-handler.tssrc/lib/command-aliases.tssrc/lib/device-id.spec.tssrc/lib/device-id.tssrc/lib/run-with-core.tssrc/utils/analytics.spec.tssrc/utils/analytics.tssrc/utils/command-telemetry.spec.tssrc/utils/command-telemetry.tssrc/utils/crash-reporter.spec.tssrc/utils/crash-reporter.tssrc/utils/exit-codes.spec.tssrc/utils/exit-codes.tssrc/utils/help-json.tssrc/utils/output.spec.tssrc/utils/output.tssrc/utils/register-subcommand.tssrc/utils/telemetry-client.spec.tssrc/utils/telemetry-client.tssrc/utils/telemetry-sanitize.spec.tssrc/utils/telemetry-schema.spec.tssrc/utils/telemetry-store-forward.spec.tssrc/utils/telemetry-store-forward.tssrc/utils/telemetry-types.ts
✅ Files skipped from review due to trivial changes (1)
- src/commands/emulate.ts
🚧 Files skipped from review as they are similar to previous changes (22)
- CLAUDE.md
- src/lib/command-aliases.ts
- src/utils/register-subcommand.ts
- src/utils/telemetry-client.spec.ts
- src/utils/crash-reporter.spec.ts
- src/commands/debug.ts
- src/utils/telemetry-schema.spec.ts
- src/utils/telemetry-client.ts
- src/utils/telemetry-store-forward.spec.ts
- src/lib/device-id.spec.ts
- src/utils/output.spec.ts
- src/utils/telemetry-store-forward.ts
- src/utils/command-telemetry.spec.ts
- src/lib/api-error-handler.spec.ts
- src/lib/device-id.ts
- src/lib/api-error-handler.ts
- src/utils/telemetry-sanitize.spec.ts
- src/utils/crash-reporter.ts
- src/utils/analytics.ts
- src/utils/telemetry-types.ts
- src/utils/exit-codes.ts
- src/utils/analytics.spec.ts
Closes security-audit finding #1 on PR #122 (telemetry message sanitization). `error.message` was flowing into 4 capture sites unsanitized, leaking homedir paths (and rarely, credentials) to the WorkOS gateway. - Add `sanitizeMessage()` in crash-reporter.ts: homedir strip + Bearer/ sk_*/JWT redaction + 1KB truncation. - Factor secret redaction into shared `redactSecrets()` used by both `sanitizeMessage` and `sanitizeStack` (Node echoes `.message` into the leading `Error.stack` line, so message-only sanitization was insufficient). - Add private `extractErrorFields()` chokepoint on `Analytics`; route all 4 capture sites through it (`captureException`, `stepCompleted`, `commandExecuted`, `captureUnhandledCrash`). `replaceLastCommandEvent` inherits sanitization via its delegation to `commandExecuted`. - `captureUnhandledCrash` now uses `sanitizeStack` instead of inline truncation, providing defense-in-depth for callers that bypass the crash-reporter wrapper. - Add regression guard test (`telemetry-sanitize.spec.ts`): poisons every capture method with homedir + Bearer + sk_live_ + JWT, asserts no marker reaches the serialized queue. Reviewed: ideation:reviewer cycle 1 PASS (0 critical, 0 high).
0475ea8 to
8003788
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6ce40944-6658-4efe-aa8c-e9db5c81063a
📒 Files selected for processing (39)
CLAUDE.mdsrc/bin.tssrc/commands/api/index.spec.tssrc/commands/api/index.tssrc/commands/api/interactive.spec.tssrc/commands/api/interactive.tssrc/commands/debug.tssrc/commands/dev.tssrc/commands/doctor.tssrc/commands/emulate.tssrc/commands/env.tssrc/commands/install-skill.tssrc/commands/login.tssrc/commands/uninstall-skill.tssrc/lib/api-error-handler.spec.tssrc/lib/api-error-handler.tssrc/lib/command-aliases.tssrc/lib/device-id.spec.tssrc/lib/device-id.tssrc/lib/run-with-core.tssrc/utils/analytics.spec.tssrc/utils/analytics.tssrc/utils/command-telemetry.spec.tssrc/utils/command-telemetry.tssrc/utils/crash-reporter.spec.tssrc/utils/crash-reporter.tssrc/utils/exit-codes.spec.tssrc/utils/exit-codes.tssrc/utils/help-json.tssrc/utils/output.spec.tssrc/utils/output.tssrc/utils/register-subcommand.tssrc/utils/telemetry-client.spec.tssrc/utils/telemetry-client.tssrc/utils/telemetry-sanitize.spec.tssrc/utils/telemetry-schema.spec.tssrc/utils/telemetry-store-forward.spec.tssrc/utils/telemetry-store-forward.tssrc/utils/telemetry-types.ts
✅ Files skipped from review due to trivial changes (3)
- src/commands/api/index.spec.ts
- CLAUDE.md
- src/commands/api/interactive.spec.ts
🚧 Files skipped from review as they are similar to previous changes (31)
- src/commands/debug.ts
- src/commands/doctor.ts
- src/commands/emulate.ts
- src/utils/telemetry-store-forward.spec.ts
- src/commands/uninstall-skill.ts
- src/commands/install-skill.ts
- src/lib/command-aliases.ts
- src/commands/api/index.ts
- src/utils/output.ts
- src/utils/help-json.ts
- src/utils/exit-codes.spec.ts
- src/commands/env.ts
- src/commands/login.ts
- src/utils/command-telemetry.ts
- src/utils/telemetry-types.ts
- src/utils/crash-reporter.spec.ts
- src/commands/api/interactive.ts
- src/commands/dev.ts
- src/utils/exit-codes.ts
- src/lib/run-with-core.ts
- src/utils/telemetry-schema.spec.ts
- src/utils/telemetry-store-forward.ts
- src/utils/crash-reporter.ts
- src/lib/api-error-handler.ts
- src/utils/telemetry-client.spec.ts
- src/lib/device-id.spec.ts
- src/utils/telemetry-client.ts
- src/lib/api-error-handler.spec.ts
- src/lib/device-id.ts
- src/utils/analytics.ts
- src/bin.ts
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/utils/command-telemetry.ts (1)
12-17:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winHarden flag extraction to avoid invalid telemetry flags.
The current filter accepts
--(which becomes empty string after processing) and numeric short flags like-1,-2(which become'1','2') that pollute the telemetrycommand.flagsattribute with invalid entries.🛡️ Suggested fix from previous review
export function extractUserFlags(rawArgs: string[]): string[] { const passedFlags = rawArgs - .filter((arg) => arg.startsWith('--') || (arg.startsWith('-') && arg.length === 2)) - .map((arg) => arg.replace(/^-+/, '').split('=')[0]); + .filter((arg) => { + if (arg === '--') return false; + if (/^--[A-Za-z][\w-]*(=.*)?$/.test(arg)) return true; + if (/^-[A-Za-z]$/.test(arg)) return true; + return false; + }) + .map((arg) => arg.replace(/^-+/, '').split('=')[0]) + .filter(Boolean); return [...new Set(passedFlags)]; }
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a84b559e-a9da-49e6-a70b-1b60ba4b833d
📒 Files selected for processing (27)
CLAUDE.mdsrc/bin.tssrc/commands/api/index.spec.tssrc/commands/api/interactive.spec.tssrc/commands/connection.spec.tssrc/commands/directory.spec.tssrc/commands/doctor.tssrc/commands/env.spec.tssrc/commands/install.spec.tssrc/commands/install.tssrc/commands/membership.spec.tssrc/commands/seed.spec.tssrc/commands/uninstall-skill.spec.tssrc/lib/api-error-handler.spec.tssrc/utils/analytics.spec.tssrc/utils/analytics.tssrc/utils/cli-exit.spec.tssrc/utils/cli-exit.tssrc/utils/command-telemetry.spec.tssrc/utils/command-telemetry.tssrc/utils/exit-codes.spec.tssrc/utils/exit-codes.tssrc/utils/output.spec.tssrc/utils/output.tssrc/utils/telemetry-client.spec.tssrc/utils/telemetry-client.tssrc/utils/telemetry-sanitize.spec.ts
💤 Files with no reviewable changes (2)
- src/utils/telemetry-client.spec.ts
- src/utils/telemetry-client.ts
✅ Files skipped from review due to trivial changes (1)
- CLAUDE.md
🚧 Files skipped from review as they are similar to previous changes (1)
- src/utils/command-telemetry.spec.ts
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4dc74551-d649-4729-a366-dc7e375c36f4
📒 Files selected for processing (28)
CLAUDE.mdREADME.mdsrc/bin.tssrc/commands/api/index.spec.tssrc/commands/api/interactive.spec.tssrc/commands/connection.spec.tssrc/commands/directory.spec.tssrc/commands/doctor.tssrc/commands/env.spec.tssrc/commands/install.spec.tssrc/commands/install.tssrc/commands/membership.spec.tssrc/commands/seed.spec.tssrc/commands/uninstall-skill.spec.tssrc/lib/api-error-handler.spec.tssrc/utils/analytics.spec.tssrc/utils/analytics.tssrc/utils/cli-exit.spec.tssrc/utils/cli-exit.tssrc/utils/command-telemetry.spec.tssrc/utils/command-telemetry.tssrc/utils/exit-codes.spec.tssrc/utils/exit-codes.tssrc/utils/output.spec.tssrc/utils/output.tssrc/utils/telemetry-client.spec.tssrc/utils/telemetry-client.tssrc/utils/telemetry-sanitize.spec.ts
💤 Files with no reviewable changes (2)
- src/utils/telemetry-client.spec.ts
- src/utils/telemetry-client.ts
✅ Files skipped from review due to trivial changes (1)
- CLAUDE.md
🚧 Files skipped from review as they are similar to previous changes (22)
- src/commands/seed.spec.ts
- src/utils/cli-exit.spec.ts
- src/commands/directory.spec.ts
- src/commands/doctor.ts
- src/utils/telemetry-sanitize.spec.ts
- src/commands/membership.spec.ts
- src/commands/api/interactive.spec.ts
- src/utils/command-telemetry.spec.ts
- src/commands/uninstall-skill.spec.ts
- src/utils/command-telemetry.ts
- src/commands/install.spec.ts
- src/utils/cli-exit.ts
- src/commands/install.ts
- src/utils/output.ts
- src/utils/exit-codes.spec.ts
- src/lib/api-error-handler.spec.ts
- src/commands/connection.spec.ts
- src/utils/analytics.spec.ts
- src/utils/output.spec.ts
- src/utils/analytics.ts
- src/commands/api/index.spec.ts
- src/commands/env.spec.ts
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/utils/analytics.spec.ts (1)
518-548: ⚡ Quick winImprove type safety in mock call assertions.
The
anytype annotation in thefind()callbacks bypasses TypeScript's type checking. Consider defining a type for the queued event structure or using a more specific type.♻️ Proposed improvement
Define a helper type at the top of the test file:
+type QueuedEvent = { + type: string; + attributes: Record<string, unknown>; + [key: string]: unknown; +};Then update the find callbacks:
- const event = mockQueueEvent.mock.calls.find((c: any) => c[0].type === 'command')[0]; + const event = mockQueueEvent.mock.calls.find((c: [QueuedEvent]) => c[0].type === 'command')?.[0];Apply the same pattern to lines 528, 539, and 548.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a3ddcfb0-a669-4a6b-a9d4-f54c043cb0bc
📒 Files selected for processing (29)
CLAUDE.mdREADME.mdsrc/bin.tssrc/commands/api/index.spec.tssrc/commands/api/interactive.spec.tssrc/commands/connection.spec.tssrc/commands/directory.spec.tssrc/commands/doctor.tssrc/commands/env.spec.tssrc/commands/install.spec.tssrc/commands/install.tssrc/commands/membership.spec.tssrc/commands/seed.spec.tssrc/commands/uninstall-skill.spec.tssrc/lib/api-error-handler.spec.tssrc/utils/analytics.spec.tssrc/utils/analytics.tssrc/utils/cli-exit.spec.tssrc/utils/cli-exit.tssrc/utils/command-telemetry.spec.tssrc/utils/command-telemetry.tssrc/utils/crash-reporter.tssrc/utils/exit-codes.spec.tssrc/utils/exit-codes.tssrc/utils/output.spec.tssrc/utils/output.tssrc/utils/telemetry-client.spec.tssrc/utils/telemetry-client.tssrc/utils/telemetry-sanitize.spec.ts
✅ Files skipped from review due to trivial changes (1)
- README.md
🚧 Files skipped from review as they are similar to previous changes (26)
- src/utils/cli-exit.spec.ts
- CLAUDE.md
- src/utils/output.ts
- src/commands/doctor.ts
- src/commands/env.spec.ts
- src/commands/api/interactive.spec.ts
- src/utils/exit-codes.ts
- src/commands/connection.spec.ts
- src/utils/command-telemetry.spec.ts
- src/utils/cli-exit.ts
- src/commands/seed.spec.ts
- src/commands/install.ts
- src/commands/directory.spec.ts
- src/utils/telemetry-client.spec.ts
- src/utils/crash-reporter.ts
- src/commands/install.spec.ts
- src/utils/output.spec.ts
- src/lib/api-error-handler.spec.ts
- src/utils/exit-codes.spec.ts
- src/utils/telemetry-sanitize.spec.ts
- src/commands/uninstall-skill.spec.ts
- src/commands/membership.spec.ts
- src/utils/command-telemetry.ts
- src/commands/api/index.spec.ts
- src/utils/analytics.ts
- src/utils/telemetry-client.ts
|
Actionable comments posted: 0 |
1 similar comment
…amps, and new event types Add environment fingerprint fields (OS, Node version, CI detection, shell) to session.start and session.end events. Add startTimestamp to step and agent.tool events for span reconstruction. Define command and crash event types with stub emission methods. Add discriminated union Zod schema validation tests mirroring the API schema.
… persistence Wire up yargs middleware that emits a provisional command event before each handler runs, then replaces it with actual duration/success on completion. This covers the ~25 process.exit() call sites without modifying them. - Command telemetry middleware with canonical name resolution and flag extraction - Crash reporter with sanitized stack traces (sync handlers, no async) - Store-forward: persist unsent events to temp file on exit, recover on next run - Fix flush() to retain events until HTTP success (was clearing before fetch) - Auto-wrap handlers in registerSubcommand() (single change point) - Shared COMMAND_ALIASES map for telemetry and help-json - analytics.initForNonInstaller() sets gatewayUrl + JWT from stored creds
- Enable debug output for non-installer commands via env var - Log telemetry event details (type, name, duration, attributes) on flush - Register in debug command's env var catalog
- Wrap inline command handlers (seed, setup-org, doctor, etc.) with wrapCommandHandler so they report real duration/success - Skip provisional telemetry event for install command (has own session telemetry) - Add claim -> env.claim to canonical alias map - Defer store-forward file deletion until after successful flush
Client errors (401, 403) are permanent failures that won't succeed on retry. Only retain events for 5xx (transient server errors) and network failures where store-forward retry is meaningful.
- flush() returns true (sent/dropped) or false (retryable) so callers can act on the result - Use splice(0, count) instead of clearing all events, protecting events queued concurrently during the fetch - wrapCommandHandler flushes in-process so events are sent immediately instead of always deferring to next invocation via store-forward - Store-forward recovery deletes files after loading into memory (events are re-persisted by exit handler if flush fails) - Skip provisional events for dashboard and $0 (installer entry points) - Add 4xx drop test coverage
Add a section to CLAUDE.md explaining which commands auto-emit telemetry (registerSubcommand) versus which need manual wrapCommandHandler wrapping (inline top-level .command() calls). Add a pointer comment in bin.ts near the workflow commands block. Prevents new top-level commands from silently emitting duration=0 telemetry.
- Add workos.user_id to command and crash events (from stored credentials or unclaimed environment clientId) so dashboards can count unique users - Add cli.version to command and crash events for release adoption tracking - Support claim-token auth path on the telemetry client, so unclaimed environments' telemetry reaches the API (guard accepts this path too) - Rename CrashEvent's installer.version to cli.version (crashes happen outside the installer too) - initForNonInstaller() now wires up user_id and claim-token auth
…ndler, and auto-wrapping
Wraps the yargs chain in runCli(), captures the canonical command name via middleware, emits a single command telemetry event from the lifecycle try/catch/finally (success, CliExit, or crash), and flushes the telemetry client before exiting. Removes per-handler wrapCommandHandler() wrappers and the legacy commandTelemetryMiddleware. Also drops process.exit(0) calls in handlers that no longer need to short-circuit since exit is now centralized.
Migrate 12 spec files from process.exit spies to CliExit throw assertions, matching the exitWithCode/exitWithError refactor.
- .fail() now re-throws CliExit so handler exits preserve their context (reason, errorCode, apiContext) instead of being replaced with generic validation_error - Remove unconditional process.exit() from finally block so long-running commands (dev, emulate) stay alive after wiring listeners; their signal handlers still call process.exit() directly - doctor.ts catch block re-throws CliExit so exitWithCode(SUCCESS) doesn't get caught and re-classified as GENERAL_ERROR
exitWithError outside runCli() threw CliExit into the crash reporter, recording intentional validation errors as crash events. Now caught and exited cleanly before the crash reporter sees it.
…dation exitWithError outside runCli() threw CliExit into the crash reporter, recording intentional validation errors as crash events. Use outputError + process.exit directly since this runs before the centralized lifecycle exists.
…API caps
- crash-reporter: cap sanitized stack (marker included) at 4096 so the API's
per-attribute Zod limit can't silently drop oversized crash events; also
collapse Windows node_modules/dist/src paths; treat a CliExit that reaches
the global handlers as an intentional exit rather than a crash (fixes false
crash telemetry from dev's fire-and-forget child `error` listener)
- exit-codes: map `no_api_key` to `auth_required`, not `validation_error`
- analytics: sanitize error message before the WORKOS_DEBUG log line
- doctor: emit structured `{ error: { code, message } }` in JSON mode
- command-telemetry: harden extractUserFlags (ignore `--` and negative values)
- telemetry-client: persist store-forward file with 0700/0600 modes
- README: scope command-event wording to telemetry-enabled commands
Telemetry now posts to a dedicated /cli/telemetry endpoint (WORKOS_TELEMETRY_URL, default https://api.workos.com/cli) rather than sharing the LLM gateway route. WORKOS_LLM_GATEWAY_URL stays scoped to doctor/install LLM proxy traffic. The client sends x-workos-api-key when auth mode is API key and no JWT or claim-token transport is available, so API-key-only cohorts (CI, headless) finally deliver telemetry. Auth precedence: JWT > claim token > API key.
Record the detected framework on session.end (read from the final state machine snapshot) so the API can break install metrics down by framework and compute success rate per integration. Absent when a session aborts before detection runs.
c36368d to
cfed292
Compare
An expired access token 401s against the telemetry guard, which drops the event permanently (4xx). Gate the JWT auth branch on token validity so an expired token falls through to claim-token / api-key auth, while still preserving the user identity (distinctId) for attribution. The client's flush() applies the same check when reading fresh credentials mid-session.
clearCredentials()/saveCredentials() operate on the live `workos-cli` keychain entry regardless of file-fallback settings, so any spec exercising them wiped the developer's real CLI login (mocking node:os homedir only isolates the file path, not the OS keychain). Add a global setup file that swaps @napi-rs/keyring for an in-memory mock for every spec, plus a guardrail test that fails if the setup is ever removed (asserts the __IS_TEST_MOCK__ sentinel) rather than silently re-arming the footgun.
…al command yargs runs demand/strict validation before dispatching middleware, so the command-name middleware never ran on a validation failure and the event was recorded as 'root' (which SKIP_TELEMETRY_COMMANDS drops). We were blind to every misused command. Recover the top-level command from raw argv in the fail handler. Only the top-level token is used: later positionals can be user values (org names, emails, IDs), so recording them would leak data and explode cardinality. Add 'debug simulate --crash' (throws a plain Error) to exercise the unhandled-crash path, and an integration test driving the real CLI lifecycle as a subprocess that asserts both the validation_error attribution and the crash event (with a redacted stack), observed via the store-forward payload.
ab47f6c to
199b9d9
Compare
The validation-failure command-name recovery selected the first non-flag token, so a value-taking option before the command (e.g. `workos --api-key sk_live_… organization create` or `--mode ci organization`) made command.name the option value — leaking a credential into telemetry, which is then sent to the backend. Match tokens against the known top-level command registry instead: return the first token that resolves to a real command, else 'root' (skipped). Option values, secrets, and typos can never be recorded, which also bounds command.name cardinality.
A command handler throwing a plain Error recorded crash telemetry and set exit code 1 but printed nothing, so a real crash (or `WORKOS_TELEMETRY=false workos debug simulate --crash`) left the user with an empty stderr and no diagnostic. Emit a sanitized structured error (secrets and paths stripped) on the crash path; full details remain in the crash log and telemetry. Also harden the CLI-lifecycle integration test: isolate the spawned process's HOME so it can't read/write the developer's real ~/.workos (device-id, credentials), and force WORKOS_TELEMETRY=true so an inherited opt-out can't make the test silently pass with no event.
Harden the lifecycle integration test per review: build an explicit minimal env for the child (no inherited host env) and preload a force-insecure-storage module so the subprocess startup uses file-only credential/config storage inside the temp HOME instead of reading the real OS keychain. Removes the last non-hermetic path in the spawned-CLI test.
Two overlapping flush() calls snapshot and POST the same events twice, and a later splice() could drop events queued after the first flush began. Share a single in-flight promise so concurrent callers await the same flush.
Add an async, memoized loadDeviceId() (fs/promises) and prewarm it at startup so the synchronous telemetry event path reads the id from cache instead of doing blocking fs IO. getDeviceId() stays synchronous as a cold fallback and still returns the persisted id, so device.id never fragments.
…directly Command handlers should terminate through the exit helpers so lifecycle classification stays consistent (per CLAUDE.md handler guidance).

Summary
Adds telemetry coverage to every CLI command with a centralized lifecycle that owns timing, success/failure classification, and event emission from one place.
Telemetry infrastructure
uncaughtException/unhandledRejectionhandlers with sanitized stack tracesWORKOS_DEBUG=1env var for verbose debug logging on all commandsCentralized command lifecycle (
runCli())yargs.exitProcess(false)+parseAsync()with a single try/catchexitWithCode()/exitWithError()throwCliExit(typed error carrying exit code + telemetry context) instead of callingprocess.exit()emitCommandEvent()call per command outcome (success, structured exit, crash)wrapCommandHandler()wrappers, provisional events, and event patchingWhat this replaces
The previous design required every handler to be wrapped with
wrapCommandHandler()and used a provisional-event/replace/patch chain across 4 layers (middleware, wrapper, exit helpers, analytics patching). That design was fragile (forgotten wrappers produced misleading success=true/duration=0 events) and required a regex guardrail test to enforce.Design decisions
install,dashboard, and$0are excluded from command telemetry (own session-based telemetry)dev,emulate) keep the event loop alive via server/child listeners; their signal handlers callprocess.exit()directly--mode) usesoutputError()+process.exit()directly sincerunCli()doesn't exist yet at that pointTest plan
pnpm typecheckpassespnpm testpasses (146 files, 1926 tests)pnpm buildpassesworkos doctor --json --skip-ai --skip-apiexits 0 with clean JSON (no CliExit leak)workos emulate --json --port 0stays alive and serves/healthworkos org list(no API key) produces structuredno_api_keyerror with correct termination.reasonworkos --mode robot doctorexits 1 with structured error, no crash event in store-forwardSummary by CodeRabbit
New Features
Bug Fixes
Documentation