Non-linear Cluster Enhancement: Forcing Clusters into a compact shape

Non-linear Cluster Enhancement: Forcing Clusters into a compact shape

Abstract

K-means is one of the most widely used clustering algorithms there is and applied for a wide range of settings, but how do we know that a data set is suited for it? K-means's assumptions about the data are relatively strict: clusters should be Gaussian distributed with uniform variance in all directions. These assumptions are rarely satisfied in a data set. While clusters, that do not deviate from these assumptions too far, can be cut out with sufficient precision, the farther the data is from these assumptions, the more likely k-means is to fail. Instead of testing whether the assumptions are met and k-means can be applied, we make it so. Our goal is to improve the suitability of data sets for k-means and widen the range of possible data sets it can be applied to. Our algorithm changes the position of data points so that the clusters become more compact and, thus, fit better into the requirements of k-means. Based on cluster-wise PCA and local Z-transformation we estimate the form of the correct clusters and move the data points so that the correct clusters become more compact with each iteration and -- in the end -- have uniform variance, as well as increase the distance between clusters. We explain the theory behind our approach and validate it with extensive experiments on various real world data sets.

Grafik Top
Authors
  • Schelling, Benjamin
  • Miklautz, Lukas
  • Plant, Claudia
Grafik Top
Shortfacts
Category
Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title
24th European Conference on Artificial Intelligence
Divisions
Data Mining and Machine Learning
Event Location
Santiago de Compostela, Spain (Online Conference)
Event Type
Conference
Event Dates
29 Aug - 08 Sep 2020
Page Range
pp. 1451-1458
Date
2020
Export
Grafik Top