{{ message }}
fix: tolerate stderr-polluted moduleGraphJson in ModuleGraphParser#336
Merged
tinder-maxwellelliott merged 1 commit intoMay 13, 2026
Merged
Conversation
bazel-diff 17.0.1..18.0.5 configured BazelModService.getModuleGraphJson()
with stdout=CAPTURE, stderr=CAPTURE. In Process.kt, captureAll triggers
ProcessBuilder.redirectErrorStream(true), physically merging stderr into
stdout. The captured moduleGraphJson therefore contained bazel's INFO
lines ("INFO: Invocation ID: ...", "Loading: 0 packages loaded", etc.)
prepended to the actual JSON output.
PR Tinder#330 (shipped in v18.1.0) correctly switched stderr to SILENT but
broke format compatibility: any CI pipeline re-using a base graph from
17.0.1..18.0.5 and a head graph from 18.1.0+ hits parseModuleGraph's
try/catch on the polluted side, gets emptyMap(), and cascades into
findChangedModules reporting every head-side module as "added". The
downstream queryTargetsDependingOnModules then spawns thousands of
bazel query rdeps(...) subprocesses.
Make parseModuleGraph tolerant: attempt the whole-string parse first,
and on failure retry from the first '{' to end-of-string. On second
failure fall through to the existing emptyMap() behaviour. Clean input
continues to parse in one attempt with no extra allocation.
Tests:
- ModuleGraphParserTest: positive case for stderr-prefixed input.
- StderrPollutionRegressionTest: end-to-end regression guard covering
parse, findChangedModules, and the CalculateImpactedTargetsInteractor
dispatch path. Confirms the cross-version comparison now short-circuits
to computeSimpleImpactedTargets instead of the rdeps fan-out.
65f988e to
a3cb64f
Compare
This was referenced May 16, 2026
tinder-maxwellelliott
added a commit
that referenced
this pull request
May 18, 2026
* Reproducer for #335 fix #2: parse-asymmetry fallback to simple hash diff When the two module-graph JSON payloads passed to get-impacted-targets are not byte-equal but one of them fails to parse, findChangedModules(emptyMap, fullMap) reports every module in the populated graph as "added". The impacted set then explodes -- either fanning out into one rdeps subprocess per matched repo (the multi-hour fan-out in #335) or, when no BazelQueryService is bound, falling through to allTargets.keys so every hashed label is reported as impacted. Both outcomes are far worse than a per-target hash diff. PR #336 made the parser tolerant of stderr-polluted JSON, which covers the historical 18.0.x base graph case, but the asymmetry remains a real failure mode for genuinely unparseable input (corrupted base graphs, truncation, future serialization format changes). The reproducer adds two @ignore'd tests in CalculateImpactedTargetsInteractorTest covering both code paths that branch on changedModules.isNotEmpty(): execute() and executeWithDistances(). Both feed a non-JSON fromGraph and a parseable toGraph and assert that only the actually-hash-changed target appears in the impacted set. Today both tests fail (every target is reported); the @ignore'd state keeps CI green until fix #2 lands. Generated with Claude Code (https://claude.com/claude-code) * Fix #335 fix #2: fall back to per-target hash diff on parse asymmetry `bazel mod graph --output=json` always emits at least the root module, so a parsed module-graph map that is empty really does mean parse failure (a truncated/corrupted base graph from object storage, a future serialisation change, an old stderr-polluted 18.0.x capture from before #336). When the two JSON payloads are not byte-equal but exactly one side parses to empty, the historical behaviour was for `findChangedModules(emptyMap, fullMap)` to report every module on the parseable side as "added" and the impacted set exploded: * With a `BazelQueryService` bound, every "added" module fanned out into an `rdeps` query against its canonical repo(s). On the workspace in #335 that produced ~5,000 serial subprocesses and the run took multiple hours. * With no `BazelQueryService` bound (or one that errored), the failure-tolerant fallback path returned `allTargets.keys` -- every hashed label was reported as impacted, which on a large workspace defeats the point of running bazel-diff. The fix detects this asymmetry in `detectChangedModules` and returns an empty changed-modules set, so callers fall through to `computeSimpleImpactedTargets` (the per-target hash diff that would have run if no module graph had been supplied). A per-target hash diff is bounded by the size of the hash set; the module-fan-out path is not. The two regression tests added in the parent reproducer commit have their `@Ignore` annotations dropped and now pass: 21/21 tests in `CalculateImpactedTargetsInteractorTest` (19 pre-existing + 2 new). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Partial fix for #335
bazel-diff 17.0.1..18.0.5 configured BazelModService.getModuleGraphJson() with stdout=CAPTURE, stderr=CAPTURE. In Process.kt, captureAll triggers ProcessBuilder.redirectErrorStream(true), physically merging stderr into stdout. The captured moduleGraphJson therefore contained bazel's INFO lines ("INFO: Invocation ID: ...", "Loading: 0 packages loaded", etc.) prepended to the actual JSON output.
PR #330 (shipped in v18.1.0) correctly switched stderr to SILENT but broke format compatibility: any CI pipeline re-using a base graph from 17.0.1..18.0.5 and a head graph from 18.1.0+ hits parseModuleGraph's try/catch on the polluted side, gets emptyMap(), and cascades into findChangedModules reporting every head-side module as "added". The downstream queryTargetsDependingOnModules then spawns thousands of bazel query rdeps(...) subprocesses.
Make parseModuleGraph tolerant: attempt the whole-string parse first, and on failure retry from the first '{' to end-of-string. On second failure fall through to the existing emptyMap() behaviour. Clean input continues to parse in one attempt with no extra allocation.
Tests: