k-SubMix: Common Subspace Clustering on Mixed-Type Data

Content

Abstract
Authors
Shortfacts

Abstract

Clustering heterogeneous data is an ongoing challenge in the data mining community. The most prevalent clustering methods are designed to process datasets with numerical features only, but often datasets consist of mixed numerical and categorical features. This requires new approaches capable of handling both kinds of data types. Further, the most relevant cluster structures are often hidden in only a few features. Thus, another key challenge is to detect those specific features automatically and abandon features not relevant for clustering. This paper proposes the subspace mixed-type clustering algorithm k-SubMix, which tackles both challenges. Its cost function can handle both numerical and categorical features while simultaneously identifying those with the biggest impact for a high-quality clustering result. Unlike other subspace mixed-type clustering methods, k-SubMix preserves inter-cluster comparability, as it is the first mixed-type approach that defines a common subspace for all clusters. Extensive experiments show that k-SubMix outperforms competitive methods and reduces the data’s complexity by a simultaneous dimensionality reduction.

Top

Authors

Klein, Mauritius
Leiber, Collin
Böhm, Christian

Top

Shortfacts

Category	Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title	Machine Learning and Knowledge Discovery in Databases: Research Track - European Conference, ECML PKDD 2023, Turin, Italy, September 18-22, 2023, Proceedings, Part I
Divisions	Data Mining and Machine Learning
Event Location	Torino, Italy
Event Type	Conference
Event Dates	18-22 Sep 2023
Series Name	Lecture Notes in Computer Science
Publisher	Springer
Page Range	pp. 662-677
Date	2023
Official URL	https://doi.org/10.1007/978-3-031-43412-9\_39
Export

Top