SAHN Clustering in Arbitrary Metric Spaces Using Heuristic Nearest Neighbor Search
Sequential agglomerative hierarchical non-overlapping (SAHN) clustering techniques belong to the classical clustering methods that are applied heavily in many application domains, e.g., in cheminformatics. Asymptotically optimal SAHN clustering algorithms are known for arbitrary dissimilarity measures, but their quadratic time and space complexity even in the best case still limits the applicability to small data sets. We present a new pivot based heuristic SAHN clustering algorithm exploiting the properties of metric distance measures in order to obtain a best case running time of O(nlogn) for the input size n. Our approach requires only linear space and supports median and centroid linkage. It is especially suitable for expensive distance measures, as it needs only a linear number of exact distance computations. In extensive experimental evaluations on real-world and synthetic data sets, we compare our approach to exact state-of-the-art SAHN algorithms in terms of quality and running time. The evaluations show a subquadratic running time in practice and a very low memory footprint.
Top- Kriege, Nils M.
- Mutzel, Petra
- Schäfer, Till
Category |
Paper in Conference Proceedings or in Workshop Proceedings (Paper) |
Event Title |
Algorithms and Computation - 8th International Workshop (WALCOM) |
Divisions |
Data Mining and Machine Learning |
Event Location |
Chennai, India |
Event Type |
Workshop |
Event Dates |
13.-15.02.2014 |
Series Name |
Lecture Notes in Computer Science |
ISSN/ISBN |
978-3-319-04656-3 |
Publisher |
Springer |
Page Range |
pp. 90-101 |
Date |
13 February 2014 |
Official URL |
https://doi.org/10.1007/978-3-319-04657-0\_11 |
Export |