feat(observability): extend audit log, PostHog, and storage metering coverage by waleedlatif1 · Pull Request #5269 · simstudioai/sim · GitHub
Skip to content

feat(observability): extend audit log, PostHog, and storage metering coverage#5269

Open
waleedlatif1 wants to merge 30 commits into
stagingfrom
worktree-audit-posthog-coverage
Open

feat(observability): extend audit log, PostHog, and storage metering coverage#5269
waleedlatif1 wants to merge 30 commits into
stagingfrom
worktree-audit-posthog-coverage

Conversation

@waleedlatif1

Copy link
Copy Markdown
Collaborator

Summary

Extends the three observability layers — audit log, PostHog product analytics, and usage metering — to cover resources and actions that were previously uninstrumented. Found via a codebase-wide coverage audit; every gap below was a real blind spot.

Audit log

  • Exfiltration trail (none existed): file downloads (workspace, public-share, v1 API), and table / workflow / workspace exports.
  • Credential access audited at the token-issuance boundary (CREDENTIAL_ACCESSED), success-only.
  • Full revenue/financial trail: invoice paid/failed, overage billed, charge disputes, credit fulfillment, subscription create/cancel/transfer, enterprise provisioning, plan/seat changes.
  • Session/account lifecycle via Better Auth hooks: login, blocked sign-in, logout, session revoke, account delete.
  • The entire v1/admin programmatic surface (member/role/export ops) and copilot tool handlers (skills, custom tools, tables) — both previously bypassed audit entirely.
  • Wired several defined-but-dead AuditAction constants (lock/unlock, table update/delete, etc.).

PostHog

  • Revenue events (payments, overage, credits, disputes, plan conversion, enterprise), org/seat lifecycle, KB search, copilot/skill/tool events.
  • Client-side workspace + organization group association so events roll up by account.

Metering

  • Storage now counted for KB documents and copilot files (was workspace-files only). Connector-synced KB docs remain correctly unmetered on both ingest and delete.

Hardening

  • Audit package nulls the actor FK for system actors (admin-api) with a readable label instead of silently FK-failing; adds an awaitable recordAuditNow for pre-delete hooks.
  • Every billing-webhook actor lookup is guarded so instrumentation can never fail payment processing.
  • All instrumentation is fire-and-forget and never blocks or breaks the primary operation. Ephemeral execution-file storage is intentionally not metered (no decrement path → would leak).

Type of Change

  • New feature (observability coverage)

Testing

  • Full typecheck, biome, and check:api-validation pass.
  • 2,617 touched-area tests + 22 audit-package tests pass.
  • Validated by a multi-agent adversarial review across every changed domain (FK safety, fire-and-forget, no double-emit, storage symmetry, behavior-preserving refactors).

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

…coverage

Instruments previously-uncaptured resources and actions across the three
observability layers (audit log, PostHog product analytics, usage metering):

- Exfiltration audit: file downloads (workspace/public-share/v1), table,
  workflow, and workspace exports.
- Credential access audited at the token-issuance boundary (success-only).
- Full revenue trail: invoice paid/failed, overage billed, disputes, credit
  fulfillment, subscription lifecycle, plan/seat changes.
- Session/login lifecycle audit via Better Auth hooks (login, blocked sign-in,
  logout, session revoke, account delete).
- v1/admin programmatic surface and copilot tool handlers instrumented.
- Dead audit/PostHog constants wired (lock/unlock, table, custom tool, etc.).
- Storage metering extended to KB documents and copilot files.
- PostHog person identify + workspace/organization group hygiene.

Audit package hardened to null the actor FK for system actors (admin-api) and
adds an awaitable recordAuditNow for pre-delete hooks. All instrumentation is
fire-and-forget and never blocks or breaks the primary operation.
@vercel

vercel Bot commented Jun 29, 2026

Copy link
Copy Markdown

@cursor

cursor Bot commented Jun 29, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Wide instrumentation across auth, billing webhooks, and storage quotas introduces new failure modes if audit/PostHog misbehave, though billing paths are guarded; KB storage metering changes ingest/delete behavior and limits.

Overview
Broadens observability by wiring recordAudit and captureServerEvent into paths that previously had no trail: file downloads/exports (workspace, public share, v1 API), table exports (sync and async), OAuth/service-account token issuance, org/workspace membership and billing actions, copilot chat and management tools, workflow lock/public-API toggles, and the full v1/admin surface.

Billing and revenue get explicit audit + analytics on threshold overage settlement (including credit-only paths), invoice pay/fail, credit purchases, charge disputes (with Stripe webhook idempotency), enterprise provisioning, subscription create/cancel, plan conversion, and subscription transfer. Seat reconcile now accepts an optional actorId and emits seat provision/deprovision audit from a single place (invite acceptance passes the invitee).

Realtime workflow variables track the latest writer on debounced updates and audit WORKFLOW_VARIABLES_UPDATED after a successful flush.

Storage metering for knowledge-base documents: quota check inside ingest transactions, incrementStorageUsage after commit, and decrementStorageUsageInTx on hard delete so counters stay aligned with deleted rows (connector-synced docs still excluded).

Client analytics: WorkspaceScopeSync waits for workspace metadata, sets organization + workspace PostHog groups together, and clears stale org groups when needed. PostHog event types gain optional workspace_id / scope fields for personal vs workspace actions.

Table lifecycle threads actingUserId into rename/delete (and copilot paths) for table audit entries; column-add audits run post-commit on imports.

Reviewed by Cursor Bugbot for commit 2b7ae68. Configure here.

Comment thread apps/sim/app/api/auth/oauth/token/route.ts
Comment thread apps/sim/app/api/files/export/[id]/route.ts Outdated
Comment thread apps/sim/app/api/files/export/[id]/route.ts Outdated
@greptile-apps

greptile-apps Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR extends three observability layers — audit log, PostHog analytics, and storage metering — to cover a large number of previously uninstrumented paths. All six previously raised review issues (actor FK nulling, catch-branch actor leakage, fire-and-forget blocking, TOCTOU in delete, copilot file delete order, and the payment-failed unguarded await) have been resolved in the developer's follow-up commits.

  • Audit package hardening: actorId is now string | null; both the not-found and the catch branches correctly null the FK and apply a readable label, preventing FK violations from system actors or transient DB failures.
  • Storage metering for KB and copilot: decrementStorageUsageInTx makes the decrement atomic with the hard-delete transaction via RETURNING, so counter drift on concurrent deletes is eliminated; connector-synced documents are correctly excluded from billing on both ingest and delete.
  • Billing webhook instrumentation: handleInvoicePaymentFailed wraps its audit block in try/catch so instrumentation can never interrupt user-blocking; handleSubscriptionCreated gains Stripe-event idempotency to prevent double-reset on redelivery; handleManualEnterpriseSubscription similarly gates audit/PostHog behind idempotency.

Confidence Score: 5/5

All six issues raised in the prior review round have been addressed; the remaining code is a well-structured observability expansion with no net behavioral changes to core paths.

Every critical concern from the previous review cycle — actor FK nulling in the catch branch, audit order in the GET credential handler, payment-failed unguarded await, copilot file delete ordering, and public-share actor misattribution — is resolved in the follow-up commits incorporated here. The new instrumentation is uniformly fire-and-forget (or wrapped in try/catch where awaited), the storage decrement is made atomic with the hard-delete transaction via RETURNING, and idempotency guards prevent double-emit on Stripe redelivery. No new bugs were identified.

No files require special attention. The most complex changes — invoices.ts, knowledge/documents/service.ts, and log.ts — were re-verified and are correct.

Important Files Changed

Filename Overview
packages/audit/src/log.ts Accepts null actorId; both not-found and catch branches now null the FK and label it 'Admin API'/'System', matching the PR's stated hardening goal.
apps/sim/lib/knowledge/documents/service.ts Storage quota check added at document ingest; decrementStorageUsageInTx makes delete counter atomic with the hard-delete tx via RETURNING; connector-synced docs correctly excluded.
apps/sim/lib/billing/storage/tracking.ts New decrementStorageUsageInTx accepts a pre-resolved subscription and runs inside the caller's transaction, enabling atomic decrement-with-delete.
apps/sim/lib/billing/webhooks/invoices.ts Payment-failed audit block wrapped in try/catch so a transient DB error cannot skip user-blocking; payment-succeeded and credit-purchase audit correctly fire after the primary DB operations.
apps/sim/lib/billing/webhooks/subscription.ts handleSubscriptionCreated wrapped in Stripe-event idempotency to prevent double usage-reset on redelivery; SUBSCRIPTION_CANCELLED audit added to both enterprise and team cancellation paths.
apps/sim/lib/billing/webhooks/enterprise.ts Enterprise provisioning audit and PostHog event gated behind Stripe-event idempotency; actor resolved from stripeCustomerId with a safe fallback to referenceId.
apps/sim/app/api/auth/oauth/token/route.ts CREDENTIAL_ACCESSED audit correctly fires after successful token retrieval in all three branches; emitServiceAccountAccess closure ensures audit only fires on the success path.
apps/sim/lib/table/service.ts deleteTable now accepts actingUserId and emits TABLE_DELETED audit; renameTable similarly emits TABLE_UPDATED; auditTableColumnsAdded helper avoids auditing inside the import transaction.
apps/sim/lib/billing/organizations/seats.ts reconcileOrganizationSeats accepts optional actorId; audit for seat provisioning/deprovisioning correctly gated on actorId presence and only emits when seats actually changed.
apps/sim/lib/posthog/events.ts Adds 19 new event type definitions covering revenue, org lifecycle, credential access, and copilot tool events; workspace_id correctly made optional on events where it may be absent.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Client
    participant Route as API Route
    participant Primary as Primary Operation (DB/Stripe)
    participant Audit as recordAudit (fire-and-forget)
    participant PH as PostHog captureServerEvent

    Note over Route,PH: OAuth Token GET/POST — credential access
    Client->>Route: GET /api/auth/oauth/token
    Route->>Primary: refreshTokenIfNeeded()
    Primary-->>Route: accessToken ✓
    Route->>Audit: CREDENTIAL_ACCESSED (after success)
    Route->>PH: credential_used (after success)
    Route-->>Client: 200 accessToken

    Note over Route,PH: KB Document hard-delete — atomic decrement
    Client->>Route: DELETE /api/knowledge/documents
    Route->>Primary: getHighestPrioritySubscription (pre-tx)
    Route->>Primary: db.transaction DELETE + RETURNING
    Primary->>Primary: decrementStorageUsageInTx (same tx)
    Primary-->>Route: deletedDocs ✓
    Route->>Primary: deleteDocumentStorageFiles (S3)

    Note over Route,PH: Billing webhook — payment failed
    Client->>Route: POST /billing/webhook (invoice.payment_failed)
    Route->>Primary: isSubscriptionOrgScoped (guarded try/catch)
    Route->>Primary: resolveBillingActorId (guarded try/catch)
    Route->>Audit: INVOICE_PAYMENT_FAILED
    Route->>PH: payment_failed
    Route->>Primary: block users (independent of audit)
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Client
    participant Route as API Route
    participant Primary as Primary Operation (DB/Stripe)
    participant Audit as recordAudit (fire-and-forget)
    participant PH as PostHog captureServerEvent

    Note over Route,PH: OAuth Token GET/POST — credential access
    Client->>Route: GET /api/auth/oauth/token
    Route->>Primary: refreshTokenIfNeeded()
    Primary-->>Route: accessToken ✓
    Route->>Audit: CREDENTIAL_ACCESSED (after success)
    Route->>PH: credential_used (after success)
    Route-->>Client: 200 accessToken

    Note over Route,PH: KB Document hard-delete — atomic decrement
    Client->>Route: DELETE /api/knowledge/documents
    Route->>Primary: getHighestPrioritySubscription (pre-tx)
    Route->>Primary: db.transaction DELETE + RETURNING
    Primary->>Primary: decrementStorageUsageInTx (same tx)
    Primary-->>Route: deletedDocs ✓
    Route->>Primary: deleteDocumentStorageFiles (S3)

    Note over Route,PH: Billing webhook — payment failed
    Client->>Route: POST /billing/webhook (invoice.payment_failed)
    Route->>Primary: isSubscriptionOrgScoped (guarded try/catch)
    Route->>Primary: resolveBillingActorId (guarded try/catch)
    Route->>Audit: INVOICE_PAYMENT_FAILED
    Route->>PH: payment_failed
    Route->>Primary: block users (independent of audit)
Loading

Reviews (29): Last reviewed commit: "fix(billing): wrap new dispute/enterpris..." | Re-trigger Greptile

Comment thread apps/sim/app/api/files/public/[token]/content/route.ts
Address Cursor Bugbot review:
- GET OAuth token: emit CREDENTIAL_ACCESSED/credential_used after
  refreshTokenIfNeeded succeeds (matches POST), not before.
- File export: audit on each actual success exit via a shared helper —
  including the non-markdown serve redirect (previously unaudited) and
  after the zip is generated (previously before asset fetch).
Address Greptile P1: setting actorId to the file owner made every anonymous
external download read as a self-download, undermining the exfiltration trail.
recordAudit now accepts a null actor; the public content/inline routes record
actorId=null with the owner in metadata.sharedByUserId and rely on ip/user-agent
for the forensic trail. The misleading owner-attributed file_downloaded PostHog
event is dropped on these anonymous paths.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/app/api/v1/admin/workflows/export/route.ts Outdated
Comment thread apps/sim/lib/knowledge/documents/service.ts
- Admin workflow/workspace ZIP exports now audit only after the archive is
  built (via a local helper), so a zip/build failure no longer logs an export.
- Remove redundant 'success-only placement' inline comments and tighten the
  remaining design-rationale notes (export helper now TSDoc).
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/lib/billing/webhooks/disputes.ts Outdated
Comment thread apps/sim/lib/billing/threshold-billing.ts
Comment thread apps/sim/lib/uploads/contexts/copilot/copilot-file-manager.ts Outdated
Address Cursor review:
- handleDisputeClosed now records CHARGE_DISPUTE_CLOSED for every closed
  dispute (won/lost/warning_closed), unblocking only on favorable outcomes.
  dispute.status in the metadata distinguishes the outcome, so lost
  chargebacks are no longer missing from the trail.
- Threshold overage now emits OVERAGE_BILLED + overage_billed even when credits
  fully cover the overage (settledVia: 'credits' vs 'stripe'), so credit-settled
  overages are audited instead of silently returning null.
Address Greptile P1: deleting the metadata row before the decrement meant a
decrement failure left the quota permanently inflated with no record to retry
from. Decrement first; only remove the metadata row once it succeeds.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/app/api/table/[tableId]/export/route.ts Outdated
Comment thread apps/sim/lib/auth/auth.ts Outdated
Comment thread apps/sim/lib/uploads/contexts/copilot/copilot-file-manager.ts Outdated
…tion

- Copilot file delete now releases storage via a single transaction
  (releaseDeletedFileStorage): the soft-delete is the idempotency claim and the
  decrement shares the transaction, so neither a partial failure (inflated
  counter) nor a retry (double-decrement) can desync the quota. Resolves the
  conflicting Greptile/Cursor ordering findings.
- Policy-blocked sign-ups now record USER_SIGNUP_BLOCKED instead of
  USER_SIGNIN_BLOCKED, so account-lifecycle events aren't mislabeled.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

Per design decision: copilot files are working/conversational artifacts, so
gating their materialization on storage quota would fail an agent operation
mid-flow when a user is over limit, and metering them would inflate usage enough
to block KB/workspace uploads indirectly. Remove copilot quota gates + copilot
ingest metering entirely (revert copilot-file-manager, metadata, and the upload
route's copilot branch to baseline) and drop the now-unused releaseDeletedFileStorage.
KB document metering + quota enforcement (the deliberate, persistent storage path)
is unchanged.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

Address Cursor review: gate the group-sync effect on workspace metadata being
loaded (activeWorkspace present) so the workspace and organization groups always
update atomically. Acting during the load window paired the new workspace group
with the previous workspace's org group; until metadata loads, events stay
consistently attributed to the previous workspace.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/app/api/table/[tableId]/export-async/route.ts
Address Cursor review: async exports only emitted TABLE_EXPORTED when the
background job reached 'ready', so an authorized export whose job later failed or
was abandoned left no audit trail — inconsistent with the sync export route,
which audits before streaming. Move the audit + analytics to the async route's
authorization point (after the job is claimed/dispatched) and remove it from the
runner. Drop the now-unused userId from TableExportPayload.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit f5f8a00. Configure here.

Comment thread apps/sim/lib/billing/webhooks/invoices.ts Outdated
…king

Address Greptile P1: the payment_failed audit hoisted an unguarded
isSubscriptionOrgScoped DB read (for entity_type) directly before the
attempt-count user-blocking block. A transient failure of that read would throw
out of the handler and skip blocking. Wrap the whole audit/analytics block in
try/catch (best-effort), and let the blocking compute its own isSubscriptionOrgScoped
as it did originally — instrumentation can no longer abort payment processing.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 2afd114. Configure here.

@waleedlatif1 waleedlatif1 deleted the branch staging July 1, 2026 05:43
@waleedlatif1 waleedlatif1 reopened this Jul 1, 2026
…n in Stripe webhook idempotency

recordAudit/captureServerEvent calls added for charge disputes, enterprise
subscription provisioning, and free->paid subscription creation ran
unconditionally with no idempotency guard, unlike their sibling handlers
in the same files. Stripe redelivers webhooks at-least-once, so a retry
would double-record the audit row and PostHog event even though the
underlying DB writes were already idempotent.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptileai review

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 2b7ae68. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant