Utilizing Structure-rich Features to improve Clustering

Utilizing Structure-rich Features to improve Clustering

Abstract

For successful clustering, an algorithm needs to find the boundaries between clusters. While this is comparatively easy if the clusters are compact and non-overlapping and thus the boundaries clearly defined, features where the clusters blend into each other hinder clustering methods to correctly estimate these boundaries. Therefore, we aim to extract features showing clear cluster boundaries and thus enhance the cluster structure in the data. Our novel technique creates a condensed version of the data set containing the structure important for clustering, but without the noise-information. We demonstrate that this transformation of the data set is much easier to cluster for k-means, but also various other algorithms. Furthermore, we introduce a deterministic initialisation strategy for k-means based on these structure-rich features.

Grafik Top
Authors
  • Schelling, Benjamin
  • Bauer, Lena G. M.
  • Behzadi, Sahar
  • Plant, Claudia
Grafik Top
Shortfacts
Category
Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title
The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2020
Divisions
Data Mining
Event Location
Ghent, Belgium
Event Type
Conference
Event Dates
14. - 18.9.2020
Date
14 September 2020
Export
Grafik Top