Command Palette
Search for a command to run...
Huang Dong ; Wang Chang-Dong ; Wu Jian-Sheng ; Lai Jian-Huang ; Kwoh Chee-Keong

Abstract
This paper focuses on scalability and robustness of spectral clustering forextremely large-scale datasets with limited resources. Two novel algorithms areproposed, namely, ultra-scalable spectral clustering (U-SPEC) andultra-scalable ensemble clustering (U-SENC). In U-SPEC, a hybrid representativeselection strategy and a fast approximation method for K-nearestrepresentatives are proposed for the construction of a sparse affinitysub-matrix. By interpreting the sparse sub-matrix as a bipartite graph, thetransfer cut is then utilized to efficiently partition the graph and obtain theclustering result. In U-SENC, multiple U-SPEC clusterers are further integratedinto an ensemble clustering framework to enhance the robustness of U-SPEC whilemaintaining high efficiency. Based on the ensemble generation via multipleU-SEPC's, a new bipartite graph is constructed between objects and baseclusters and then efficiently partitioned to achieve the consensus clusteringresult. It is noteworthy that both U-SPEC and U-SENC have nearly linear timeand space complexity, and are capable of robustly and efficiently partitioningten-million-level nonlinearly-separable datasets on a PC with 64GB memory.Experiments on various large-scale datasets have demonstrated the scalabilityand robustness of our algorithms. The MATLAB code and experimental data areavailable at https://www.researchgate.net/publication/330760669.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| image-document-clustering-on-pendigits | U-SPEC | NMI: 0.803 runtime (s): 1.01 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.