feat: performance improvement in GraphX CDLP by SemyonSinchenko · Pull Request #681 · graphframes/graphframes · GitHub
Skip to content

feat: performance improvement in GraphX CDLP#681

Merged
SemyonSinchenko merged 1 commit intographframes:masterfrom
SemyonSinchenko:360-graphx-cdlp
Sep 13, 2025
Merged

feat: performance improvement in GraphX CDLP#681
SemyonSinchenko merged 1 commit intographframes:masterfrom
SemyonSinchenko:360-graphx-cdlp

Conversation

@SemyonSinchenko
Copy link
Copy Markdown
Collaborator

@SemyonSinchenko SemyonSinchenko commented Sep 12, 2025

What changes were proposed in this pull request?

Instead of merging mutable maps on each reduce step (that leads to the complexity $O(n^2)$ ) we can just concatenate vectors ( $O(n)$ overall) and do a groupby once ( $O(n)$ ). Memory usage will also decreased, because on the first iteration of the current implementation we have Map(vertex -> 1L) for all the vertices, but with a new implementation the peak memory usage is Vector of all vertices. Concatenations of vectors is (almost) constant, vectors required 5x less memory for the same amount of elements inside compared to Maps. And there is no need to create mutable Map on each step, concatenate all the keys and iterate ( $O(2n)$ ). On my tests it is 70x performance boost and less memory consumption (see #360)

Why are the changes needed?

Potentially resolve #360

@SemyonSinchenko SemyonSinchenko self-assigned this Sep 12, 2025
@SemyonSinchenko SemyonSinchenko added scala graphx Anything related to GraphX labels Sep 12, 2025
@codecov-commenter
Copy link
Copy Markdown

@SemyonSinchenko SemyonSinchenko merged commit e25d8b8 into graphframes:master Sep 13, 2025
5 checks passed
@SemyonSinchenko SemyonSinchenko deleted the 360-graphx-cdlp branch September 13, 2025 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

graphx Anything related to GraphX scala

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Graphframes with Structured Streaming results in OOM even with very low data volumes

3 participants