feat: louvain community detection + fix complexity build regression by carlos-alm · Pull Request #134 · optave/ops-codegraph-tool · GitHub
Skip to content

feat: louvain community detection + fix complexity build regression#134

Merged
carlos-alm merged 2 commits into
mainfrom
feat/community-detection
Feb 26, 2026
Merged

feat: louvain community detection + fix complexity build regression#134
carlos-alm merged 2 commits into
mainfrom
feat/community-detection

Conversation

@carlos-alm

Copy link
Copy Markdown
Contributor

Summary

  • Community detection: Add Louvain-based community detection for module boundary analysis (codegraph communities CLI, MCP tool, programmatic API). Uses jlouvain to partition the function-level dependency graph into communities, revealing natural module boundaries and cross-community coupling
  • Complexity perf fix: Eliminate redundant file re-parsing in buildComplexityMetrics by caching WASM parse trees from parseFilesAuto and passing them through. Addresses the ~2x build regression from PR feat: cognitive & cyclomatic complexity metrics #130 (native 2.1→4.7 ms/file, WASM 6.6→9.4 ms/file)

Changes

Community detection (cc28daa)

  • New src/communities.js — Louvain partitioning, community stats, bridge edge detection
  • CLI: codegraph communities command with --min-size, --json, --no-tests flags
  • MCP: detect_communities tool exposed in both single-repo and multi-repo modes
  • Programmatic API: exported from src/index.js
  • 13 integration tests covering partitioning, filtering, JSON output, bridge edges

Complexity perf fix (62e48db)

  • src/parser.js: wasmExtractSymbols returns { symbols, tree, langId }; parseFilesAuto attaches _tree/_langId to symbols objects
  • src/complexity.js: buildComplexityMetrics uses cached trees, only initializes WASM parsers when fallback is needed (native engine path)
  • src/builder.js: Nulls out tree references after complexity analysis for prompt GC

Test plan

  • npx vitest run tests/integration/communities.test.js — 13 community detection tests
  • npx vitest run tests/unit/complexity.test.js tests/integration/complexity.test.js — 30 complexity tests
  • npm test — full suite (710 pass, 13 skipped)
  • node src/cli.js build . — verify build performance regression is eliminated

Add `codegraph communities` command that runs Louvain clustering on the
dependency graph, compares discovered communities against directory
structure, and surfaces architectural drift (split/merge candidates,
drift score). Supports file-level (default) and function-level modes,
configurable resolution, and drift-only output. Integrated into stats,
MCP, and programmatic API.

Impact: 9 functions changed, 8 affected
Cache WASM parse trees from parseFilesAuto and pass them to
buildComplexityMetrics, avoiding redundant parser init, file I/O,
and AST re-parsing. Native engine falls back to re-parsing as before.
Trees are nulled after use to allow prompt GC.

Impact: 5 functions changed, 7 affected
@claude

claude Bot commented Feb 26, 2026

Copy link
Copy Markdown

@greptile-apps

greptile-apps Bot commented Feb 26, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds Louvain-based community detection for module boundary analysis and eliminates a ~2x performance regression in complexity metrics by caching WASM parse trees.

Community Detection (cc28daa)

  • Introduces src/communities.js with Louvain algorithm integration via graphology libraries
  • Builds function-level or file-level dependency graphs from the SQLite database
  • Detects natural module boundaries and compares them against directory structure
  • Provides drift analysis showing directories split across communities and communities spanning multiple directories
  • Exposed via CLI (codegraph communities), MCP tool, and programmatic API
  • Comprehensive test coverage with 13 integration tests

Complexity Performance Fix (62e48db)

  • Addresses the build regression introduced in PR feat: cognitive & cyclomatic complexity metrics #130 where files were parsed twice (once for symbols, once for complexity)
  • parser.js: wasmExtractSymbols now returns { symbols, tree, langId } and parseFilesAuto caches trees on symbols objects
  • complexity.js: Reuses cached trees when available, only initializes WASM parsers for fallback when native engine was used
  • builder.js: Nulls out tree references after complexity analysis for garbage collection
  • Performance improvement: native parsing 2.1→4.7 ms/file reduced back to ~2.1 ms/file; WASM 6.6→9.4 ms/file reduced back to ~6.6 ms/file

Confidence Score: 5/5

  • Safe to merge - well-tested feature addition with important performance fix
  • Both changes are well-architected with proper error handling, comprehensive test coverage, and no breaking changes. The complexity fix correctly handles both engine paths with appropriate fallback logic. Community detection is properly integrated across CLI, MCP, and API surfaces with graceful degradation.
  • No files require special attention

Important Files Changed

Filename Overview
src/communities.js New file implementing Louvain community detection with graph construction, drift analysis, and CLI/API integration
src/parser.js Modified wasmExtractSymbols to return tree/langId; parseFilesAuto caches trees on symbols for complexity reuse
src/complexity.js Uses cached WASM trees from parser, only initializes parsers for fallback when native engine was used
src/builder.js Nulls out cached tree references after complexity analysis for prompt GC
src/cli.js Added communities command with resolution/drift/json options; made stats command async for community summary
src/mcp.js Added communities tool with proper schema and handler for MCP integration

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[buildGraph: parseFilesAuto] --> B{Engine Type?}
    B -->|Native| C[Native parser: no tree]
    B -->|WASM| D[WASM parser: extract tree]
    C --> E[symbols without _tree/_langId]
    D --> F[symbols._tree = tree<br/>symbols._langId = langId]
    E --> G[buildComplexityMetrics]
    F --> G
    G --> H{Has cached tree?}
    H -->|Yes| I[Use cached tree directly]
    H -->|No| J[Fallback: re-parse with WASM]
    I --> K[Compute complexity metrics]
    J --> K
    K --> L[symbols._tree = null]
    L --> M[Continue to next file]
    M --> N[builder.js: final cleanup<br/>null all _tree/_langId]
Loading

Last reviewed commit: 62e48db

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@carlos-alm carlos-alm merged commit 0337318 into main Feb 26, 2026
18 checks passed
@carlos-alm carlos-alm deleted the feat/community-detection branch February 26, 2026 23:44
carlos-alm pushed a commit that referenced this pull request Feb 27, 2026
Update README, CLAUDE.md, BACKLOG, titan-paradigm, recommended-practices,
and CLI/MCP examples to reflect today's merged PRs: complexity metrics
(#130/#139), Louvain community detection (#133/#134), and manifesto rule
engine (#138). Updates MCP tool count from 21 to 24 (25 in multi-repo),
marks backlog items 6/11/21/22 as done, and adds real CLI output examples.
carlos-alm pushed a commit that referenced this pull request Feb 27, 2026
Update README, CLAUDE.md, BACKLOG, titan-paradigm, recommended-practices,
and CLI/MCP examples to reflect today's merged PRs: complexity metrics
(#130/#139), Louvain community detection (#133/#134), and manifesto rule
engine (#138). Updates MCP tool count from 21 to 24 (25 in multi-repo),
marks backlog items 6/11/21/22 as done, and adds real CLI output examples.
carlos-alm added a commit that referenced this pull request Feb 27, 2026
* fix: strict type validation for threshold values in complexity queries

Replace loose `!= null` checks with `typeof === 'number' && Number.isFinite()`
to prevent `Number("")`, `Number(null)`, and `Number(true)` from silently
coercing into valid SQL values. Add integration test verifying exceeds
arrays and summary.aboveWarn are correctly computed.

Addresses Greptile review feedback on #136.

Impact: 2 functions changed, 3 affected

* docs: add complexity, communities, and manifesto to all docs

Update README, CLAUDE.md, BACKLOG, titan-paradigm, recommended-practices,
and CLI/MCP examples to reflect today's merged PRs: complexity metrics
(#130/#139), Louvain community detection (#133/#134), and manifesto rule
engine (#138). Updates MCP tool count from 21 to 24 (25 in multi-repo),
marks backlog items 6/11/21/22 as done, and adds real CLI output examples.

* fix: remove redundant condition in paginate guard clauses

When limit === undefined, limit !== 0 is always true — the && check
was dead code. Simplified to just check limit === undefined.

Impact: 2 functions changed, 18 affected

* docs: update dogfood report with fix statuses

All 4 bugs now fixed (PR #117 merged, #116 closed via reverse-dep
cascade). 3 of 4 suggestions addressed. MCP tool counts updated
18→23 / 19→24. Rating upgraded 7/10 → 9/10 post-fix.

* fix: rename misleading test to match actual behavior

Test was named "handles non-numeric thresholds gracefully" but only
validated baseline exceeds/aboveWarn with valid thresholds. Actual
non-numeric threshold tests exist separately. Renamed to "produces
correct exceeds and aboveWarn with valid thresholds".

* fix: update stale MCP tool count in dogfood skill (21→24)

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
carlos-alm added a commit that referenced this pull request Feb 27, 2026
* fix: strict type validation for threshold values in complexity queries

Replace loose `!= null` checks with `typeof === 'number' && Number.isFinite()`
to prevent `Number("")`, `Number(null)`, and `Number(true)` from silently
coercing into valid SQL values. Add integration test verifying exceeds
arrays and summary.aboveWarn are correctly computed.

Addresses Greptile review feedback on #136.

Impact: 2 functions changed, 3 affected

* docs: add complexity, communities, and manifesto to all docs

Update README, CLAUDE.md, BACKLOG, titan-paradigm, recommended-practices,
and CLI/MCP examples to reflect today's merged PRs: complexity metrics
(#130/#139), Louvain community detection (#133/#134), and manifesto rule
engine (#138). Updates MCP tool count from 21 to 24 (25 in multi-repo),
marks backlog items 6/11/21/22 as done, and adds real CLI output examples.

* fix: remove redundant condition in paginate guard clauses

When limit === undefined, limit !== 0 is always true — the && check
was dead code. Simplified to just check limit === undefined.

Impact: 2 functions changed, 18 affected

* docs: update dogfood report with fix statuses

All 4 bugs now fixed (PR #117 merged, #116 closed via reverse-dep
cascade). 3 of 4 suggestions addressed. MCP tool counts updated
18→23 / 19→24. Rating upgraded 7/10 → 9/10 post-fix.

* fix: rename misleading test to match actual behavior

Test was named "handles non-numeric thresholds gracefully" but only
validated baseline exceeds/aboveWarn with valid thresholds. Actual
non-numeric threshold tests exist separately. Renamed to "produces
correct exceeds and aboveWarn with valid thresholds".

* fix: update stale MCP tool count in dogfood skill (21→24)

* feat: add complexity analysis for Python, Go, Rust, Java, C#, Ruby, PHP

Parameterize the complexity algorithm to support all 10 languages instead
of just JS/TS/TSX. Add per-language COMPLEXITY_RULES, HALSTEAD_RULES, and
COMMENT_PREFIXES with three else-if detection patterns (else-wraps-if,
explicit elif, alternative field). Guard against tree-sitter keyword leaf
tokens that share node type names with their parent constructs.

Impact: 4 functions changed, 4 affected

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Zeeeepa pushed a commit to Zeeeepa/codegraph that referenced this pull request Jun 22, 2026
…ptave#134)

* feat: louvain community detection for module boundary analysis

Add `codegraph communities` command that runs Louvain clustering on the
dependency graph, compares discovered communities against directory
structure, and surfaces architectural drift (split/merge candidates,
drift score). Supports file-level (default) and function-level modes,
configurable resolution, and drift-only output. Integrated into stats,
MCP, and programmatic API.

Impact: 9 functions changed, 8 affected

* fix: eliminate redundant file re-parsing in complexity metrics

Cache WASM parse trees from parseFilesAuto and pass them to
buildComplexityMetrics, avoiding redundant parser init, file I/O,
and AST re-parsing. Native engine falls back to re-parsing as before.
Trees are nulled after use to allow prompt GC.

Impact: 5 functions changed, 7 affected

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Zeeeepa pushed a commit to Zeeeepa/codegraph that referenced this pull request Jun 22, 2026
* fix: strict type validation for threshold values in complexity queries

Replace loose `!= null` checks with `typeof === 'number' && Number.isFinite()`
to prevent `Number("")`, `Number(null)`, and `Number(true)` from silently
coercing into valid SQL values. Add integration test verifying exceeds
arrays and summary.aboveWarn are correctly computed.

Addresses Greptile review feedback on optave#136.

Impact: 2 functions changed, 3 affected

* docs: add complexity, communities, and manifesto to all docs

Update README, CLAUDE.md, BACKLOG, titan-paradigm, recommended-practices,
and CLI/MCP examples to reflect today's merged PRs: complexity metrics
(optave#130/optave#139), Louvain community detection (optave#133/optave#134), and manifesto rule
engine (optave#138). Updates MCP tool count from 21 to 24 (25 in multi-repo),
marks backlog items 6/11/21/22 as done, and adds real CLI output examples.

* fix: remove redundant condition in paginate guard clauses

When limit === undefined, limit !== 0 is always true — the && check
was dead code. Simplified to just check limit === undefined.

Impact: 2 functions changed, 18 affected

* docs: update dogfood report with fix statuses

All 4 bugs now fixed (PR optave#117 merged, optave#116 closed via reverse-dep
cascade). 3 of 4 suggestions addressed. MCP tool counts updated
18→23 / 19→24. Rating upgraded 7/10 → 9/10 post-fix.

* fix: rename misleading test to match actual behavior

Test was named "handles non-numeric thresholds gracefully" but only
validated baseline exceeds/aboveWarn with valid thresholds. Actual
non-numeric threshold tests exist separately. Renamed to "produces
correct exceeds and aboveWarn with valid thresholds".

* fix: update stale MCP tool count in dogfood skill (21→24)

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Zeeeepa pushed a commit to Zeeeepa/codegraph that referenced this pull request Jun 22, 2026
* fix: strict type validation for threshold values in complexity queries

Replace loose `!= null` checks with `typeof === 'number' && Number.isFinite()`
to prevent `Number("")`, `Number(null)`, and `Number(true)` from silently
coercing into valid SQL values. Add integration test verifying exceeds
arrays and summary.aboveWarn are correctly computed.

Addresses Greptile review feedback on optave#136.

Impact: 2 functions changed, 3 affected

* docs: add complexity, communities, and manifesto to all docs

Update README, CLAUDE.md, BACKLOG, titan-paradigm, recommended-practices,
and CLI/MCP examples to reflect today's merged PRs: complexity metrics
(optave#130/optave#139), Louvain community detection (optave#133/optave#134), and manifesto rule
engine (optave#138). Updates MCP tool count from 21 to 24 (25 in multi-repo),
marks backlog items 6/11/21/22 as done, and adds real CLI output examples.

* fix: remove redundant condition in paginate guard clauses

When limit === undefined, limit !== 0 is always true — the && check
was dead code. Simplified to just check limit === undefined.

Impact: 2 functions changed, 18 affected

* docs: update dogfood report with fix statuses

All 4 bugs now fixed (PR optave#117 merged, optave#116 closed via reverse-dep
cascade). 3 of 4 suggestions addressed. MCP tool counts updated
18→23 / 19→24. Rating upgraded 7/10 → 9/10 post-fix.

* fix: rename misleading test to match actual behavior

Test was named "handles non-numeric thresholds gracefully" but only
validated baseline exceeds/aboveWarn with valid thresholds. Actual
non-numeric threshold tests exist separately. Renamed to "produces
correct exceeds and aboveWarn with valid thresholds".

* fix: update stale MCP tool count in dogfood skill (21→24)

* feat: add complexity analysis for Python, Go, Rust, Java, C#, Ruby, PHP

Parameterize the complexity algorithm to support all 10 languages instead
of just JS/TS/TSX. Add per-language COMPLEXITY_RULES, HALSTEAD_RULES, and
COMMENT_PREFIXES with three else-if detection patterns (else-wraps-if,
explicit elif, alternative field). Guard against tree-sitter keyword leaf
tokens that share node type names with their parent constructs.

Impact: 4 functions changed, 4 affected

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant