fix(mothership): stop inlining full execution traces for the logs context by waleedlatif1 · Pull Request #5353 · simstudioai/sim · GitHub
Skip to content

fix(mothership): stop inlining full execution traces for the logs context#5353

Merged
waleedlatif1 merged 2 commits into
stagingfrom
fix/logs-context-lightweight-tool
Jul 2, 2026
Merged

fix(mothership): stop inlining full execution traces for the logs context#5353
waleedlatif1 merged 2 commits into
stagingfrom
fix/logs-context-lightweight-tool

Conversation

@waleedlatif1

Copy link
Copy Markdown
Collaborator

Summary

  • "Troubleshoot in Chat" (and any logs @-mention) resolved the tagged run through processExecutionLogFromDb, which materialized the entire execution trace — every block's input/output, nested tool-call spans — and inlined it directly into the prompt. For any non-trivial run this repeatedly blew the context window, forcing multiple "Context Compaction" cycles and eventually auto-stopping the agent before it investigated anything.
  • Every other context resolver in process-contents.ts already avoids this by sending a lightweight pointer instead of a full inline dump (workflow/blocks/workflow_block point into the VFS). Logs contexts had no VFS materialization to point at, but the equivalent lightweight mechanism already exists as a tool: query_logs supports incremental disclosure (overview for the timing/cost tree, full for a scoped block's input/output, pattern to grep the trace) and is already registered for the mothership agent — no new tool needed.
  • processExecutionLogFromDb now sends a compact summary (id, workflow, level, trigger, timing, cost) plus a note pointing the model at query_logs with the executionId, instead of materializing and embedding the trace.
  • Also drops the now-unused executionData column from the select projection — resolving a logs context no longer fetches a potentially large JSONB blob it never reads.

Type of Change

  • Bug fix

Testing

  • New unit tests in process-contents.test.ts: a tagged run resolves to a compact summary that explicitly does not contain traceSpans/errorDetails/executionData, and does contain a query_logs + executionId pointer; cross-workspace and unauthorized-access cases still drop the context as before.
  • type-check, biome, check:client-boundary, check:api-validation all pass.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

…text

Tagging a run via "Troubleshoot in Chat" (or any @-mention of a logs
context) resolved through processExecutionLogFromDb, which materialized
the ENTIRE execution trace (every block's input/output, nested tool-call
spans) and inlined it directly into the prompt. For any non-trivial run
this repeatedly blew the context window, forcing multiple compactions and
eventually auto-stopping the agent before it could investigate anything.

Every other context resolver in this file already avoids this by sending
a lightweight pointer instead of a full inline dump (workflow/blocks/
workflow_block contexts point into the VFS). Logs contexts have no VFS
materialization to point at, but the equivalent lightweight mechanism
already exists as a tool: query_logs supports incremental disclosure
(overview for timing/cost, full for a scoped block's input/output, or
pattern to grep the trace) and is already registered for the mothership
agent.

Now processExecutionLogFromDb sends a compact summary (id, workflow,
level, trigger, timing, cost) plus a note pointing the model at
query_logs with the executionId, instead of materializing and embedding
the trace. Also drops the now-unused executionData column from the
select projection, so resolving a logs context no longer fetches a
potentially large JSONB blob it never reads.
@vercel

vercel Bot commented Jul 2, 2026

Copy link
Copy Markdown

@cursor

cursor Bot commented Jul 2, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes what execution data is exposed in chat prompts (less sensitive I/O inlined by default) while still loading executionData from the DB for overview materialization; behavior shift for mothership troubleshooting flows.

Overview
Logs @-mentions (e.g. Troubleshoot in Chat) no longer embed the full materialized execution trace in the copilot prompt. processExecutionLogFromDb now builds a compact JSON summary (run metadata, timing, cost) plus a block overview from toOverview (status/timing only—no span input/output) and a note steering the model to query_logs for scoped full traces or grep.

A 64KB cap drops the overview if the serialized summary is still too large. Workspace and read-permission checks are unchanged; new tests cover summary shape, redaction of raw I/O, size-cap behavior, and dropped contexts.

Reviewed by Cursor Bugbot for commit dcf884e. Configure here.

@greptile-apps

greptile-apps Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a context-window exhaustion bug in the mothership copilot: the "Troubleshoot in Chat" / @logs context resolver previously inlined the entire execution trace (every block's input/output, nested tool-call spans) into the prompt, repeatedly blowing the context budget. The fix replaces that full dump with a compact block-level overview (name/type/status/timing/cost, no input/output) plus a note directing the model to call query_logs for per-block detail or trace grepping.

  • processExecutionLogFromDb now emits a JSON summary bounded by MAX_LOG_SUMMARY_BYTES (64 KB), falling back to no overview if the span count is pathological, and always includes a query_logs + executionId pointer so the model can fetch scoped detail on demand.
  • Four new unit tests in process-contents.test.ts verify: compact summary without raw input/output, size-cap fallback that drops the overview, cross-workspace rejection, and unauthorized-access rejection.

Confidence Score: 5/5

Safe to merge — the change is narrowly scoped to the logs context resolver, replaces an unbounded trace dump with a compact projection, and is covered by four targeted tests.

The core fix is straightforward: toOverview strips all input/output data from the trace before it reaches the prompt, the 64 KB size cap guards against pathological span counts, and the existing query_logs tool provides a clean on-demand escape hatch. Authorization checks are unchanged. The only gap is a mismatch between what the PR description claims (that the executionData DB column was dropped) and what the code actually does (the column is still selected and used to build the overview), but this has no runtime impact.

No files require special attention — both changed files are internally consistent and the tests exercise the critical paths.

Important Files Changed

Filename Overview
apps/sim/lib/copilot/chat/process-contents.ts Replaces full trace inlining with a compact overview + query_logs pointer; PR description overstates the DB-fetch optimization (executionData column is still selected and used for the overview)
apps/sim/lib/copilot/chat/process-contents.test.ts Adds four well-structured tests covering happy-path compact summary, size-cap fallback, cross-workspace rejection, and unauthorized-access rejection

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant U as User (Chat)
    participant PC as processExecutionLogFromDb
    participant DB as Database
    participant OS as Object Storage (optional)
    participant LM as LLM Agent

    U->>PC: "@logs context (executionId)"
    PC->>DB: "SELECT id, workflowId, ..., executionData WHERE executionId=?"
    DB-->>PC: log row (incl. executionData JSONB)
    PC->>PC: authorizeWorkflowByWorkspacePermission
    PC->>OS: materializeExecutionData (resolve pointer if offloaded)
    OS-->>PC: "{ traceSpans[] }"
    PC->>PC: toOverview(traceSpans) → compact tree (no input/output)
    PC->>PC: "size cap: if JSON > 64KB, drop overview"
    PC-->>LM: "compact summary { id, level, trigger, timing, cost, overview, note→query_logs }"
    LM->>LM: call query_logs(executionId, view:'full', blockId) on demand
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant U as User (Chat)
    participant PC as processExecutionLogFromDb
    participant DB as Database
    participant OS as Object Storage (optional)
    participant LM as LLM Agent

    U->>PC: "@logs context (executionId)"
    PC->>DB: "SELECT id, workflowId, ..., executionData WHERE executionId=?"
    DB-->>PC: log row (incl. executionData JSONB)
    PC->>PC: authorizeWorkflowByWorkspacePermission
    PC->>OS: materializeExecutionData (resolve pointer if offloaded)
    OS-->>PC: "{ traceSpans[] }"
    PC->>PC: toOverview(traceSpans) → compact tree (no input/output)
    PC->>PC: "size cap: if JSON > 64KB, drop overview"
    PC-->>LM: "compact summary { id, level, trigger, timing, cost, overview, note→query_logs }"
    LM->>LM: call query_logs(executionId, view:'full', blockId) on demand
Loading

Reviews (2): Last reviewed commit: "improvement(mothership): send a bounded ..." | Re-trigger Greptile

…are tool pointer

Follow-up to the previous commit's fix (stop inlining full execution
traces). A pure text pointer telling the model to call query_logs made
the agent's very first useful action against a tagged run contingent on
it noticing and correctly acting on prose in a JSON blob it may only
skim — every sibling resolver in this file instead returns a
deterministic mechanism (a VFS path) the model reads on demand.

There's no VFS materialization for individual execution logs, but the
same deterministic signal is available cheaply: toOverview() (the exact
projection query_logs's own "overview" view already returns) walks the
raw trace spans and produces a compact tree — block name/type/status/
timing/cost, no input or output — without touching large-value refs at
all. The summary now includes that tree, so the model can see which
block failed on the first turn, and the note narrows to what still
requires a tool call: a block's actual input/output/error, or a grep.

materializeExecutionData is still called, but it's a no-op for the
common inline case (it only unwraps a top-level object-storage pointer
for runs whose whole trace was offloaded as one blob) and was needed to
reach traceSpans at all for those heavier runs — exactly the runs most
worth an overview.

A serialized-size cap (mirroring query-logs.ts's own truncation
fallback, scaled down since this lands in the prompt unconditionally)
drops the overview if a pathological span count pushes it over budget,
falling back to the note alone.

Extends the tests: the happy path now asserts the overview tree is
present and that no raw input/output payload leaks into the serialized
summary, plus a new test for the size-cap fallback.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit dcf884e. Configure here.

@waleedlatif1 waleedlatif1 merged commit 220da44 into staging Jul 2, 2026
18 checks passed
@waleedlatif1 waleedlatif1 deleted the fix/logs-context-lightweight-tool branch July 2, 2026 05:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant