You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Abdullah Al Raqibul Islam edited this page Aug 19, 2021
·
8 revisions
Introduction to the example applications.
The best way to get familiar with GraphChi is to run and study the example applications. You should start with the simple applications, Pagerank and Connected Components prior to looking at matrix factorization or triangle counting.
Input data
To run the examples, you need an input graph (network). Currently !GraphChi can read graphs in the standard [Edge-List-Format] or [Adjacency-List-Format], but it is fairly easy to write your own parsers. Remember that GraphChi can run even very big graphs on a normal PC or laptop!
A good set of graphs of different sizes are available below:
All example programs automatically convert the input graph to GraphChi's internal shards (see Introduction-To-GraphChi). That is, you just need to compile and run the application. This conversion needs to be done only once for each application, and the system will detect whether preprocessing is needed. Also, if the value type stored in edges is of the same size, different applications can use the same shards.
To build and run:
make example_apps/pagerank
bin/example_apps/pagerank file GRAPH-NAME
Above, replace GRAPH with the input graph. If the graph has not been preprocessed, the program will ask for the format of the graph (edgelist or adjacency-list).
You may adjust the number of iterations by specifying parameter "niters" as follows:
The application will print the ids of the top 20 vertices with the highest PageRank score.
Analyzing the output
GraphChi will write the values of the edges in a binary file, which is easy to handle in other programs. The name of the file containing vertex values is GRAPH-NAME.4B.vout. Here "4B" refers to the vertex-value being a 4-byte type (float).
You can also analyze the vertex-data programmatically, in memory-efficient manner, by using VertexAggregators.
Remark
While this Pagerank example is simple, it is not the fastest way to compute Pagerank with GraphChi. Particularly, it unnecessarily loads the values for out-edges, while the value is never read, it is just written. There is another version of Pagerank available, example_apps/pagerank_functional.cpp, which uses an experimental alternative "functional" API, which is roughly 50% more efficient for Pagerank.
> make example_apps/connectedcomponents
> bin/example_apps/connectedcomponents file GRAPH-NAME
Output
The program will produce output GRAPHNAME_components.txt, which on each line has component id, num-of-vertices. As an interesting analysis, you can plot a histogram of the sizes of the connected components.
Community Detection (Easy+)
What you will learn:
How to exchange information via edges in both directions
> make example_apps/communitydetection
> bin/example_apps/communitydetection file GRAPH-NAME
Output
Same output as connected components, but instead of components, it lists community-ids and their sizes.
Matrix factorization: Alternative Least Squares - ALS (Moderate)
What you will learn:
How to integrate linear algebra package (Eigen)
How to deal with a bipartite graph
Special graph input / output
Two strategies of reading neighbor vertex value (via edges, and in-memory)
There are two versions of the program. The first version, edge-version, writes a vertex's value (a latent factor vector) to its adjacent edges; the vertices-in-memory version uses a separate array to store vertices in-memory. See the code for more details.
Dependency: Eigen
You need the Eigen matrix library for compiling the matrix factorization programs. It is available here, as a headers-only installation.
For example, if you are running ALS for the Netflix movie rating dataset, the matrix is a sparse matrix containing the ratings that users have given each movie.
Output
The matrix factors are output in the Matrix Market format with names
GRAPH-NAME_U.mm
GRAPH-NAME_V.mm
Triangle Counting (Advanced)
What you will learn:
A special design pattern, which requires working outside the basic abstraction
Writing a special preprocessor that orders the vertices by their degree
This application starts with a base-graph, and streams additional edges to the graph from a streaming graph file (which must be in the EdgeListFormat), while simultaneously computing pagerank. The source code contains also an example of how to use an HTTP-based admin dashboard for !GraphChi applications. More information is added later, and be warned that the code in the example application is not finished and contains a lot of demonstration code.