AMOS: AN AUTOMATED MODEL ORDER SELECTION ALGORITHM FOR SPECTRAL GRAPH CLUSTERING
By Pin-Yu Chen, Thibaut Gensollen, and Alfred O. Hero III, Fellow, IEEE
One of the longstanding problems in spectral graph clustering (SGC) is the so-called model order selection problem: automated selection of the correct number of clusters. This is equivalent to the problem of finding the number of connected components or communities in an undirected graph. In this paper, we propose AMOS, an automated model order selection algorithm for SGC. Based on a recent develop- ment of clustering reliability for SGC under the random interconnec- tion model, AMOS works by incrementally increasing the number of clusters, estimating the quality of identified clusters, and providing a series of clustering reliability tests. Consequently, AMOS outputs clusters of minimal model order with statistical clustering reliabil- ity guarantees. Comparing to three other automated graph clustering methods on real-world datasets, AMOS shows superior performance in terms of multiple external and internal clustering metrics.
AMOS is using some external libraries for working, be sure to install them before :
$ python examples.py --A=hibernia
import amos
model = amos.AMOS(k_init,Kmax)
best_k, labels = model.predict(A)
You can set some parameters:
-
A (required), the adjacency matrix of an undirected, weighted, and connected graph. The matrix is then converted to a Scipy Sparse Matrix (csr).
-
k_init (optional, Default : 2), number of clusters - start with 2 partitions by default.
-
Kmax (optional, Default : 10), maximum number of allowed clusters, smaller value of Kmax can speed up the computation process.
-
Kmeans_replicate (optional), Number of time the k-means algorithm will be run. (Default : 50).
-
alpha (optional), confidence interval parameter for homogeneous RIM test (Default : 0.05)
-
beta (optional), confidence interval parameter for inhomogeneous RIM test (Default : 0.05)
-
pth (optional), significance level for V-test (Default : 10^-5)
-
n_jobs (optional), the number of jobs to run in parallel. If -1, then the number of jobs is set to the number of cores. (Default : 1).
-
verbose (optional), for displaying some processing informations.
- Number of clusters K,
- Identified clusters (labels)
- hibernia, should find 2
- cogent, should find 3
- ieee, should find 4
- facebook, should find 5
- minnesota, should find something between 45 and 55
More details are accessible here.
A test file test.py
will test the examples to be sure the code is working properly.
Please cite the arXiv and the ICASSP papers when using the code.
MIT