feat(dataflow): P0 schema foundation for interprocedural variable-level model#1608
Conversation
| } catch { | ||
| // table absent in pre-v18 DBs; empty multiset = no diffs | ||
| vertices.set('__TOTAL_ROWS__', 0); | ||
| } |
There was a problem hiding this comment.
Bare catch silences all query errors, not just "table absent"
readDataflowVerticesMultiset swallows every exception — including genuine SQL errors (malformed query, schema drift, corrupt DB). If the same bug hits both the WASM and native builds, both return an empty multiset and the comparison reports no diffs, hiding a real divergence. Consider re-throwing any error whose message doesn't include "no such table".
There was a problem hiding this comment.
Fixed in commit 38589c5. The catch now checks the error message for 'no such table' and re-throws anything else, so genuine SQL errors (schema drift, corrupt DB, malformed query) are no longer silently swallowed.
| dfVertexCount: dataflow ? base.dfVertices.get('__TOTAL_ROWS__') : undefined, | ||
| nodeDiffs, | ||
| edgeDiffs, | ||
| dfVertexDiffs, |
There was a problem hiding this comment.
dfVertexDiffs is always emitted in JSON; dfVertexCount is not — inconsistent shape
When --dataflow is not passed, dfVertexCount is set to undefined (serialised as absent in JSON) while dfVertexDiffs is always present as []. JSON consumers can't use the presence/absence of dfVertexDiffs to detect whether a dataflow comparison was actually attempted.
| dfVertexCount: dataflow ? base.dfVertices.get('__TOTAL_ROWS__') : undefined, | |
| nodeDiffs, | |
| edgeDiffs, | |
| dfVertexDiffs, | |
| dfVertexCount: dataflow ? base.dfVertices.get('__TOTAL_ROWS__') : undefined, | |
| nodeDiffs, | |
| edgeDiffs, | |
| dfVertexDiffs: dataflow ? dfVertexDiffs : undefined, |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Fixed in commit 38589c5. dfVertexDiffs is now undefined (not []) when --dataflow is not passed, matching the convention already used by dfVertexCount. The ok check was updated to treat undefined as 'no comparison attempted'.
| CREATE VIEW IF NOT EXISTS dataflow_fn AS | ||
| SELECT | ||
| sv.func_id AS source_id, | ||
| tv.func_id AS target_id, | ||
| d.kind, | ||
| d.param_index, | ||
| d.expression, | ||
| d.line, | ||
| d.confidence | ||
| FROM dataflow d | ||
| JOIN dataflow_vertices sv ON d.source_vertex = sv.id | ||
| JOIN dataflow_vertices tv ON d.target_vertex = tv.id | ||
| WHERE sv.func_id != tv.func_id; |
There was a problem hiding this comment.
dataflow_fn view is described as "backward-compat" but excludes all pre-v18 rows
The INNER JOINs on dataflow_vertices mean any dataflow row where source_vertex IS NULL (i.e., every row inserted before v18) is invisible in this view. Code that migrates from querying dataflow.source_id/target_id to querying dataflow_fn.source_id/target_id will silently drop all historical inter-procedural flows. If the view is purely for the new vertex model (not a bridge for legacy data), the word "backward-compat" in the PR and commit messages will mislead future maintainers.
There was a problem hiding this comment.
Fixed in commit 38589c5. Added a comment to the CREATE VIEW statement clarifying that dataflow_fn intentionally uses INNER JOINs and is NOT a backward-compat replacement for direct dataflow queries — legacy consumers that migrate to it would silently drop all pre-v18 rows.
Codegraph Impact Analysis63 functions changed → 21 callers affected across 12 files
|
…el model docs check acknowledged Establishes the database schema and wire-format plumbing for the interprocedural dataflow expansion (plan §8 P0). Migration v18 (TS + Rust, mirrored): - dataflow_vertices table: addressable data locations (param/local/return/ receiver) keyed to an enclosing function node - dataflow_summary table: per-function transfer function (param→return reachability, mutation flag) for inter-procedural stitching - Nullable source_vertex/target_vertex/scope/call_edge_id columns on the existing dataflow table (additive; old rows keep source_id/target_id) - dataflow_fn view: backward-compat function-level projection over the new vertex-linked inter-scope edges (empty until P1 populates vertices) Also fixes missing v17 migration in the Rust engine (edges.technique column was never added; tracked as issue #1607). Supporting changes: - DataflowVertex type in types.ts; new DataflowEdgeKind values (def_use, arg_in, return_out) for the vertex-level edge taxonomy - dataflowVertices field wired into SerializedExtractorOutput (worker protocol, populated in P1) - build-stmts.ts: purge statements for dataflow_vertices/summary/vertex- linked dataflow rows (correct cascade order) - hasDataflowVertices helper + export chain (db/repository → db/index) - parity-compare.mjs --dataflow flag: enables dataflow build + compares dataflow_vertices multisets between WASM and native engines All existing integration tests pass (23/23).
…ra def_use edges docs check acknowledged Implements the WASM/JS-path vertex extraction phase of the interprocedural dataflow plan (§8 P1). buildDataflowVerticesAndEdges (new, in features/dataflow.ts): - Creates 'param' vertices from each function parameter (name + index) - Creates one 'return' vertex per function that has a return statement - Creates 'local' vertices for variables assigned from call-return results - Emits 'def_use' / 'intra' edges from param/local → return when the variable name appears in the return expression's referencedNames set - All new rows use the source_vertex/target_vertex/scope columns added in v18; source_id/target_id are set to the enclosing function for backward compatibility with existing queries Internal cast types (VisitorParam, VisitorReturn, VisitorAssignment) allow safe access to the richer visitor output (paramName, paramIndex, referencedNames) without changing the public DataflowResult contract. Existing flows_to/returns/mutates edges are unchanged. The native bulk- insert fast path is left untouched — native vertex emission tracked separately. Tests: 9 new assertions in tests/integration/dataflow-vertices.test.ts — param/return/local vertex creation, def_use edge creation, negative test (param not in return expression → no edge), backward-compat flows_to edge, dataflow_fn view empty pre-P2. 32/32 integration tests pass (23 existing + 9 new).
…mmaries
docs check acknowledged
Implements the interprocedural stitch post-pass and summary computation
for the variable-level dataflow model (plan §8 P2).
buildInterproceduralStitch (new):
- Post-pass that runs after all per-file vertices + summaries are committed
- For each resolved argFlow (A calls B with arg x → B.param[j]):
- Finds source vertex x in caller (via binding.type='param'|'local')
- Finds B.param[j] vertex in callee
- Emits 'arg_in' scope='inter' edge: x → B.param[j]
- If B's summary shows B.param[j] flows_to_return: emits 'return_out'
edge: B.return → A's capture local (if any)
- Resolves call_edge_id from the edges table for each stitch site
buildDataflowVerticesAndEdges (updated):
- Now also computes dataflow_summary (flows_to_return, is_mutated per param)
using the def_use edges just committed (same transaction)
- Collects and returns StitchCandidate[] + ReturnCapture[] for the post-pass
buildDataflowEdges (updated):
- Accumulates stitch candidates across all files
- Calls buildInterproceduralStitch as a second-pass transaction
Tests: 11 P1+P2 assertions passing:
- param/return/local vertex creation
- def_use intra edges (positive + negative cases)
- summary computation: helper.param[y] flows_to_return=1, param[x]=0
- arg_in inter edge verified in both dataflow_fn view and raw query
- scope='inter', correct vertex kinds (param→param)
34/34 integration tests pass (23 original + 11 new).
…r extension docs check acknowledged Adds DATAFLOW_RULES for the C-family batch (B1) of the 26 new languages. Infrastructure (needed by C/C++ and future complex languages): - DataflowRulesConfig.nameExtractor optional override: when present, used by functionName() in visitor-utils.ts before the nameField path — handles languages where the function name is nested inside declarators - DataflowRulesConfig/DATAFLOW_DEFAULTS/LanguageRules updated consistently src/ast-analysis/rules/c.ts (new): - extractCFunctionName: unwraps C/C++ function_definition.declarator → function_declarator.declarator → identifier, handling pointer/array/ reference/parenthesized/qualified_identifier wrappers - extractCParamName: extracts identifier from parameter_declaration, unwrapping pointer/reference declarators - dataflow (C): covers function_definition, call_expression, field_expression (member access), return_statement, init_declarator - dataflowCpp (C++): extends C with function_declaration and STL mutating method names DATAFLOW_RULES additions: 'c' → c.dataflow 'cpp' → c.dataflowCpp 'objc' → c.dataflow (C-compatible functions; ObjC message sends TODO) 'cuda' → c.dataflowCpp (CUDA inherits C++ grammar) 34/34 integration tests pass.
Three bugs in the new C/C++ dataflow extraction path (P5 B1): 1. extractCFunctionName dropped pointer/reference-returning functions (int *foo(), T &bar()): the direct declarator child is a pointer_declarator wrapper — now unwrapped one level before checking for function_declarator. 2. Parameter list was unreachable for all C/C++ functions: params live on function_definition→declarator→parameters (nested), not directly on function_definition. Added getParamListNode optional override to DataflowRulesConfig/DATAFLOW_DEFAULTS/enterFunctionScope; C rules use getCParamListNode which traverses through optional wrappers to reach function_declarator.parameters. 3. dataflowCpp.functionNodes included function_declaration (forward declarations without bodies): these produce spurious param vertices with flows_to_return=0 and can overwrite correct dataflow_summary rows via INSERT OR REPLACE when processed after the definition. Adds 29 passing tests covering all three paths for C and C++.
…y-compare Two issues in scripts/parity-compare.mjs: - readDataflowVerticesMultiset swallowed all SQL exceptions, hiding schema drift, malformed queries, and corrupt-DB errors; now only suppresses "no such table" (pre-v18 DBs), re-throwing everything else. - dfVertexDiffs was always [] in JSON output when --dataflow was not passed, while dfVertexCount was absent (undefined). JSON consumers couldn't distinguish "comparison attempted, no diffs" from "comparison not attempted". dfVertexDiffs is now undefined when --dataflow is off, matching the dfVertexCount convention; ok computation updated to match. Also clarifies the dataflow_fn view comment in src/db/migrations.ts: the view INNER JOINs are intentional (only vertex-linked v18+ rows), not a backward-compat bridge — migrating code from direct dataflow queries to dataflow_fn would silently drop pre-v18 rows.
6c8d044 to
38589c5
Compare
| for (const d of dfVertexDiffs) { | ||
| console.log(` [df-vertex] ${d.key} wasm=${d.base} ${variantName}=${d.other}`); | ||
| } |
There was a problem hiding this comment.
There was a problem hiding this comment.
Fixed in 1e2e254 — wrapped the loop in an 'if (dfVertexDiffs)' guard so runs without --dataflow no longer throw when iterating.

Summary
Implementation of the interprocedural dataflow plan (phases P0–P2 + P5 B1).
P0 — Schema foundation (migration v18)
dataflow_verticestable: param/local/return/receiver data locationsdataflow_summarytable: per-function transfer functionsdataflowtable augmented with nullablesource_vertex/target_vertex/scope/call_edge_iddataflow_fnbackward-compat view (cross-function vertex-linked edges)DataflowVertextype, newDataflowEdgeKindvalues (def_use,arg_in,return_out)dataflowVerticesfield wired intoSerializedExtractorOutputhasDataflowVerticeshelper + export chainparity-compare.mjs --dataflowflag for vertex multiset comparisonP1 — Variable model for JS/TS/TSX
buildDataflowVerticesAndEdges: creates param/return/local vertices + intradef_useedgesparamName,paramIndex,referencedNamesflows_to/returns/mutatesedges unchanged (backward compat)P2 — Interprocedural stitching
buildInterproceduralStitch: post-pass over all stitch candidates after per-file processingarg_ininter edge: caller's source vertex → callee'sparam[j]vertexreturn_outinter edge: callee's return → caller's capture local (if summary confirms flow)dataflow_summarycomputation per (func, param):flows_to_return+is_mutatedP5 B1 — C/C++/ObjC/CUDA dataflow rules
src/ast-analysis/rules/c.ts: C + C++ rules withnameExtractorfor nested declaratorsDataflowRulesConfig.nameExtractorextension to handle complex function name structuresc,cpp,objc,cuda(4 new languages)Tests
tests/integration/dataflow-vertices.test.ts)Issues filed for remaining phases
Test plan
npx vitest run tests/integration/dataflow.test.ts tests/integration/dataflow-vertices.test.ts— 34/34 passnpm run lint— cleannpx tsc --noEmit— cleannode scripts/parity-compare.mjs --langs javascript --dataflow— requires built dist + native addon