Detectability Thresholds and Optimal Algorithms for Community Structure in Dynamic Networks
The detection of communities within a dynamic network is a common means for obtaining a coarse-grained view of a complex system and for investigating its underlying processes. While a number of methods have been proposed in the machine learning and physics literature, we lack a theoretical analysis of their strengths and weaknesses, or of the ultimate limits on when communities can be detected. In this project, we study the fundamental limits of detecting community structure in dynamic networks. Specifically, we analyze the limits of detectability for a dynamic stochastic block model where nodes change their community memberships over time, but where edges are generated independently at each time step. Using the cavity method, we derive a precise detectability threshold as a function of the rate of change and the strength of the communities. Below this sharp threshold, we claim that no efficient algorithm can identify the communities better than chance. We then give two algorithms that are optimal in the sense that they succeed all the way down to this threshold. The first uses belief propagation, which gives asymptotically optimal accuracy, and the second is a fast spectral clustering algorithm, based on linearizing the belief propagation equations. These results (see Fig. below) extend our understanding of the limits of community detection in an important direction, and introduce new mathematical tools for similar extensions to networks with other types of auxiliary information.
The detectability limit: \begin{equation} \label{eq:threshold} c \lambda^2 = \frac{1-\eta^2}{1+\eta^2} \quad \text{or} \quad |c_{\textrm{in}} - c_{\textrm{out}}| = k \sqrt{c \,\frac{1-\eta^2}{1+\eta^2}} \, , \end{equation}
The overlap for (top) belief propagation and (bottom) our spectral algorithm. The detectability transition in Eq. \eqref{eq:threshold} for $T=\infty$ is shown as a solid line. The dashed curve shows the detectability transition for $T=40$; the magenta curve shows the transition for $T=\infty$. Each point shows the average over 100 dynamic networks generated by our model with $n=512$, $T=40$, $k=2$ groups, and average degree $c=16$. The overlap here is calculated by averaging the maximum overlap at each time slot over all permutations. This maximization step implies that the expected overlap in the undetectable region is $O(n^{-1/2})$, and this produces a small deviation away from overlap = 0 in our numerical experiments.
References
2016
Detectability thresholds and optimal algorithms for community structure in dynamic networks
Amir Ghasemian, Pan Zhang, Aaron Clauset, and 2 more authors
The detection of communities within a dynamic network is a common means for obtaining a coarse-grained view of a complex system and for investigating its underlying processes. While a number of methods have been proposed in the machine learning and physics literature, we lack a theoretical analysis of their strengths and weaknesses, or of the ultimate limits on when communities can be detected. Here, we study the fundamental limits of detecting community structure in dynamic networks. Specifically, we analyze the limits of detectability for a dynamic stochastic block model where nodes change their community memberships over time, but where edges are generated independently at each time step. Using the cavity method, we derive a precise detectability threshold as a function of the rate of change and the strength of the communities. Below this sharp threshold, we claim that no efficient algorithm can identify the communities better than chance. We then give two algorithms that are optimal in the sense that they succeed all the way down to this threshold. The first uses belief propagation, which gives asymptotically optimal accuracy, and the second is a fast spectral clustering algorithm, based on linearizing the belief propagation equations. These results extend our understanding of the limits of community detection in an important direction, and introduce new mathematical tools for similar extensions to networks with other types of auxiliary information.