Non-linear Cluster Enhancement: Forcing Clusters into a compact shape

Content

Abstract
Authors
Shortfacts

Abstract

K-means is one of the most widely used clustering algorithms there is and applied for a wide range of settings, but how do we know that a data set is suited for it? K-means's assumptions about the data are relatively strict: clusters should be Gaussian distributed with uniform variance in all directions. These assumptions are rarely satisfied in a data set. While clusters, that do not deviate from these assumptions too far, can be cut out with sufficient precision, the farther the data is from these assumptions, the more likely k-means is to fail. Instead of testing whether the assumptions are met and k-means can be applied, we make it so. Our goal is to improve the suitability of data sets for k-means and widen the range of possible data sets it can be applied to. Our algorithm changes the position of data points so that the clusters become more compact and, thus, fit better into the requirements of k-means. Based on cluster-wise PCA and local Z-transformation we estimate the form of the correct clusters and move the data points so that the correct clusters become more compact with each iteration and -- in the end -- have uniform variance, as well as increase the distance between clusters. We explain the theory behind our approach and validate it with extensive experiments on various real world data sets.

Top

Authors

Schelling, Benjamin
Miklautz, Lukas
Plant, Claudia

Top

Shortfacts

Category	Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title	24th European Conference on Artificial Intelligence
Divisions	Data Mining and Machine Learning
Event Location	Santiago de Compostela, Spain (Online Conference)
Event Type	Conference
Event Dates	29 Aug - 08 Sep 2020
Page Range	pp. 1451-1458
Date	2020
Export

Top