Non-linear Cluster Enhancement: Forcing Clusters into a compact shape
K-means is one of the most widely used clustering algorithms there is and applied for a wide range of settings, but how do we know that a data set is suited for it? K-means's assumptions about the data are relatively strict: clusters should be Gaussian distributed with uniform variance in all directions. These assumptions are rarely satisfied in a data set. While clusters, that do not deviate from these assumptions too far, can be cut out with sufficient precision, the farther the data is from these assumptions, the more likely k-means is to fail. Instead of testing whether the assumptions are met and k-means can be applied, we make it so. Our goal is to improve the suitability of data sets for k-means and widen the range of possible data sets it can be applied to. Our algorithm changes the position of data points so that the clusters become more compact and, thus, fit better into the requirements of k-means. Based on cluster-wise PCA and local Z-transformation we estimate the form of the correct clusters and move the data points so that the correct clusters become more compact with each iteration and -- in the end -- have uniform variance, as well as increase the distance between clusters. We explain the theory behind our approach and validate it with extensive experiments on various real world data sets.
Top- Schelling, Benjamin
- Miklautz, Lukas
- Plant, Claudia
Category |
Paper in Conference Proceedings or in Workshop Proceedings (Paper) |
Event Title |
24th European Conference on Artificial Intelligence |
Divisions |
Data Mining and Machine Learning |
Event Location |
Santiago de Compostela, Spain (Online Conference) |
Event Type |
Conference |
Event Dates |
29 Aug - 08 Sep 2020 |
Page Range |
pp. 1451-1458 |
Date |
2020 |
Export |