Bayan Project

Bayan Project

What is Bayan project about?

Aim

  • Developing a reliable and accurate algorithm for descriptive community detection in small and mid-sized networks, with optimality guarantees.

Community Detection

  • Descriptive community detection is the process of clustering the nodes of a network (graph) based on the structural patterns. It is an unsupervised task with a wide range of applications in life sciences (e.g., biological systems), social sciences (offline/online social networks), physical sciences (VLSI design, complex systems), mathematical sciences (agent-based models, computer vision), and even health sciences (drug discovery) (Wikipedia).
  • There are several existing methods (algorithms) for approaching this computational problem. One family of algorithms attempts to find communities by maximizing modularity: a network-level quantity indicating the fraction of intra-community edges minus the expected fraction under a degree-preserving null model.

Do we need yet another community detection algorithm? Yes, and here's why:

  • In the first part of this project, we solved the following exact Integer Programming (IP) model for a testbed of over 100 real and random networks with reasonably modular structures. We evaluated nine existing modularity-based community detection methods, CNM (Greedy), Louvain, Combo, Belief, Leiden, Paris, EdMot, Leicht Newman (LN), and Graph neural network (GNN).

  • Notation for the IP model: The input is graph G=(V,E) with |V|=n nodes, |E|=m edges, and adjacency matrix entries a_{ij}. d_i is the degree of node i, gamma is the resolution parameter. The term associated with each pair (i,j) is b_{ij} and referred to as the modularity matrix entry for (i,j). Using the binary decision variable x_{ij} for each pair of distinct nodes (i,j), their cluster membership is either the same (represented by x_{ij}=0) or different (represented by x_{ij}=1). K(i,j) indicates a minimum-cardinality separating set for the nodes i,j. Accordingly, the problem of maximizing the modularity of input graph G can be formulated as the IP model above (Dinh and Thai 2015).

  • We investigated the extent to which current modularity-based heuristics succeeds in maximizing modularity by evaluating
    (1) the ratio of their output modularity to the maximum modularity for a given graph (GOP) and
    (2) the similarity between their output partition and all modularity-maximizing partitions of that graph using Adjusted Mutual Information (AMI).
    The following figure shows a scatterplot of GOP on the y-axis and AMI on the x-axis for each algorithm (and their variations).

  • Looking at the y-axis values in the figure, we observe that there is a substantial variation in the values of GOP (i.e., extent of sub-optimality) returned by the nine heuristic algorithms. The x-axis values in the figure show the considerable dissimilarity between the sub-optimal partitions and any optimal partition for these networks. Focusing on the position of data points, we observe that they are mostly located above their corresponding 45-degree line. This indicates that sub-optimal partitions tend to be disproportionally dissimilar to any optimal partition. This result goes against the naive viewpoint that close-to-maximum modularity partitions are also close to an optimal partition.
  • The average modularity-based heuristic algorithm returns optimal partitions for only 43.9% of these reasonably modular networks. Taken together with the low success rates of heuristic algorithms in maximizing modularity, our results indicate a crucial mismatch between the design philosophy of modularity maximization heuristics for community detection and their performance: Heuristic modularity maximization algorithms rarely return an optimal partition or a partition resembling an optimal partition, even on networks with a reasonably modular structure.
  • The Bayan algorithm solves this fundamental problem and provides optimal partitions and partitions within a guaranteed proximity to an optimal partition (exact solutions and approximate solutions for modularity maximization).

How does Bayan compare with other community detection methods w.r.t. retrieving planted partitions?

Interested to learn more? Check out our three peer-reviewed papers below.

  • Samin Aref, Mahdi Mostajabdaveh, and Hriday Chheda. "Heuristic modularity maximization algorithms for community detection rarely return an optimal partition or anything similar." in Computational Science - ICCS 2023, LNCS 10476, Springer, 2023 doi.org/10.1007/978-3-031-36027-5_48 Preprint
  • Samin Aref and Mahdi Mostajabdaveh. "Analyzing modularity maximization in approximation, heuristic, and graph neural network algorithms for community detection." Journal of Computational Science, 2024 doi.org/10.1016/j.jocs.2024.102283 Preprint
  • Samin Aref, Mahdi Mostajabdaveh, and Hriday Chheda. "The Bayan algorithm: detecting communities in networks through exact and approximate optimization of modularity." Physical Review E, 2024 Preprint
  • You can start using Bayan in Python today.

    • The Bayan algorithm is publicly available through the package installer for Python (pip) .
    • Resources for the Bayan algorithm in Python:
      - Minimal working examples of Bayan in Google Colab
      - Bayan GitHub Repository
      - Bayan library on pypi
    • Running Bayan for optimization models with more than 2000 variables or constraints (input graphs with more than 44 nodes) requires a free academic license to be installed for the mathematical solver Gurobi (which solves the linear programming relaxation problems within the Bayan algorithm). You may follow the instructions in the GitHub Repository to request and install a free academic license for Gurobi if your usage is not commercial and you are affiliated with an academic institution.
    • The Bayan algorithm can be used as follows in Jupyter Python:
    • Project Members

      Prof. Samin Aref

      Department of Mechanical & Industrial Engineering, University of Toronto

      Mahdi Mostajabdaveh, Ph.D.

      Department of Mathematical & Industrial Engineering, Polytechnique Montreal

      Hriday Chheda

      Department of Mechanical & Industrial Engineering, University of Toronto

      Origin of the algorithm name "Bayan"?

      • Bayan is a village in the Southeast of Iran located at the midpoint of the hometowns of the developers of this project.
      • Bayan is a small "community" of 22 people in 5 families as of the 2016 census.

      Acknowledgments


        The Bayan project is supported by the Data Sciences Institute at the University of Toronto.