Bad communities with high modularity
Faculty of Engineering, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
Received: 27 February 2013
Received in final form: 10 May 2013
Published online: 24 July 2013
In this paper we discuss some problematic aspects of Newman and Girvan’s modularity function QN. Given a graph G, the modularity of G can be written as QN = Qf − Q0, where Qf is the intracluster edge fraction of G and Q0 is the expected intracluster edge fraction of the null model, i.e., a randomly connected graph with same expected degree distribution as G. It follows that the maximization of QN must accomodate two factors pulling in opposite directions:Qf favors a small number of clusters and Q0 favors many balanced (i.e., with approximately equal degrees) clusters. In certain cases the Q0 term can cause overestimation of the true cluster number; this is the opposite of the well-known underestimation effect caused by the “resolution limit” of modularity. We illustrate the overestimation effect by constructing families of graphs with a “natural” community structure which, however, does not maximize modularity. In fact, we show there exist graphs G with a “natural clustering” V of G and another, balanced clustering U of G such that (i) the pair (G,U) has higher modularity than (G,V) and (ii) V and U are arbitrarily different.
Key words: Computational Methods
© EDP Sciences, Società Italiana di Fisica and Springer-Verlag, 2013