https://doi.org/10.1140/epjb/e2008-00381-8
Unsupervised and semi-supervised clustering by message passing: soft-constraint affinity propagation
Institute for Scientific Interchange, Viale Settimio Severo 65, Villa Gualino, 10133 Torino, Italy
Corresponding author: a weigt@isiosf.isi.it
Received:
1
April
2008
Revised:
11
September
2008
Published online:
8
October
2008
Soft-constraint affinity propagation (SCAP) is a new statistical-physics based clustering technique [M. Leone, Sumedha, M. Weigt, Bioinformatics 23, 2708 (2007)]. First we give the derivation of a simplified version of the algorithm and discuss possibilities of time- and memory-efficient implementations. Later we give a detailed analysis of the performance of SCAP on artificial data, showing that the algorithm efficiently unveils clustered and hierarchical data structures. We generalize the algorithm to the problem of semi-supervised clustering, where data are already partially labeled, and clustering assigns labels to previously unlabeled points. SCAP uses both the geometrical organization of the data and the available labels assigned to few points in a computationally efficient way, as is shown on artificial and biological benchmark data.
PACS: 02.50.Tt – Inference methods / 05.20.-y – Classical statistical mechanics / 89.75.Fb – Structures and organization in complex systems
© EDP Sciences, Società Italiana di Fisica, Springer-Verlag, 2008