Skip to content
Navigation Menu
{{ message }}
-
Notifications
You must be signed in to change notification settings - Fork 267
feat: 012 release blogpost #857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
ebb91da
feat: 012 release blogpost
SemyonSinchenko 3c91e63
Merge remote-tracking branch 'graphframes/main' into 856-012-release
SemyonSinchenko b97bdad
feat: add a note about the pregel breaking change
SemyonSinchenko c742252
feat: typo
SemyonSinchenko f6912eb
feat: mention Databricks
SemyonSinchenko d963292
Merge remote-tracking branch 'graphframes/main' into 856-012-release
SemyonSinchenko 9d6eedf
feat: mention HyperANF in the post
SemyonSinchenko File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| # GraphFrames 0.12.0 release | ||
|
|
||
| - **Published:** 2026-06-12T00:00:00Z | ||
| - **Title:** GraphFrames 0.12.0 release | ||
| - **Summary:** This release brings new Community Detection algorithm, new API to find all simple paths between subset of vertices, approximate neighbor functions and significant performance improvements for the Two-Phase Connected Components Algorithm. | ||
|
|
||
| ## New Contributors | ||
|
|
||
| - [@slavlotski](https://github.com/slavlotski) -- `asReversed` helper API to reverse all the edges of the graph | ||
|
|
||
| ## New Community Detection Algorithm | ||
|
|
||
| Previous versions of GraphFrames relied entirely on the most naive implementation of the Label Propagation algorithm. While this implementation is fast and well-known, the quality of the output clusters is questionable, and the algorithm itself is unstable. Even small changes in the local structure can alter the output. | ||
|
|
||
| The new algorithm significantly modifies the original Label Propagation algorithm. While it follows the same idea that allows for efficient implementation on distributed graphs, it also provides more flexibility. The inspiration came from [Xie, Jierui, and Boleslaw K. Szymanski. "Community detection using a neighborhood strength driven label propagation algorithm." 2011 IEEE Network Science Workshop. IEEE, 2011.](https://arxiv.org/abs/1105.3264) | ||
|
|
||
| The core idea is that, during propagation, vertices choose a community based not only on their local neighborhood, but also on the number of neighbors they have in common with other community members. Compared to existing label propagation, the new algorithm also supports passing initial labels, which allows it to be used incrementally or for semi-supervised community detection. | ||
|
|
||
| Credits to [@SemyonSinchenko](https://github.com/SemyonSinchenko). | ||
|
|
||
| ## New all paths API | ||
|
|
||
| After introducing the `AggregateNeighbors` API in version `0.11.0`, which is a generic, multi-hop aggregation API, GraphFrames is receiving built-in implementations based on neighbor aggregation. The first is the long-awaited API that finds all simple paths between a subset of vertices. | ||
|
|
||
| Credits to [@SemyonSinchenko](https://github.com/SemyonSinchenko). | ||
|
|
||
| ## Aproximate Neighbor Functions | ||
|
|
||
| This release brings a foundation API for the approximate neighbor functions. Users can use it to cpmoute an approximate graph diameter, HyperBALL or approximate closeness centrality. | ||
|
|
||
| Credits to [@SemyonSinchenko](https://github.com/SemyonSinchenko). | ||
|
|
||
| ## Performance optimizations in Connected Components | ||
|
|
||
| The Two-Phase algorithm is based on the idea of rewiring edges to end up with a star-like graph structure. However, during the rewiring process, a large number of leaves, or vertices with no outgoing edges, appear. Although determining components for these vertices is trivial, and they do not participate in the main algorithm loop, they still shuffle and join until full convergence. The new optimization adds an efficient way to determine the optimal time to remove such leaves and offset the cost of rejoining them after convergence. Based on initial benchmarks, the optimization delivers a ~25% performance boost. | ||
|
|
||
| This optimization was part of the Databricks' internal fork of GraphFrames. It was donated to the open-source GraphFrames by the company. | ||
|
|
||
| Credits to [@WeichenXu123](https://github.com/WeichenXu123) and [Databricks](https://www.databricks.com/) | ||
|
|
||
| ## Important note | ||
|
|
||
| Previous versions of Graphframes had an unspecified contract within the Pregel API regarding the handling of edge attributes. All edge attributes, including the IDs of the source (`src`) and destination (`dst`) vertices, were implicitly packed into a `StructType` and persisted. Although persisting was required for performance, it blocked the Catalyst optimizer from eliminating these columns if they were not used. This resulted in an almost twofold increase in peak memory load in all scenarios and was considered a bug. Starting with version 0.12.0, users who want to use edge attributes in the low-level Pregel should specify them explicitly using `requiredEdgeColumns(...)` in Scala or `required_edge_columns(...)` in the Python API. | ||
|
|
||
| ## Future steps | ||
|
|
||
| - Moving in the direction of support of full-featured graph queries | ||
| - Improving GraphFrames capabilities in Graph ML | ||
| - Adding features useful in Spatial Graphs analysis | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
You can’t perform that action at this time.

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WeichenXu123 feel free to provide a better text or any suggestion.