An Execution Framework for Grid-Clustering Methods
Cluster analysis methods have proven extremely valuable for explorative data analysis and are also fundamental for data mining methods. Goal of cluster analysis is to find correlations of the value space and to separate the data values into a priori unknown set of subgroups based on a similarity metrics. In case of high dimensional data, i.e. data with a large number of describing attributes, clustering can result into a very time consuming task, which often limits the number of observations to be clustered in practice. To overcome this problem, Grid clustering methods have been developed, which do not calculate similarity values between the data value each, but organize the value space surrounding the data values, e.g. by specific data structure indices. In this paper we present a framework which allows to evaluate different data structures for the generation of a multi-dimensional grid structure grouping the data values into blocks. The values are then clustered by a topological neighbor search algorithm on the basis of the block structure. As first data structure to be evaluated we present the BANG file structure and show its feasibility as clustering index. The developed framework is planned to be contributed as package to the WEKA software.
Top- Schikuta, Erich
- Fritz, Florian
Category |
Paper in Conference Proceedings or in Workshop Proceedings (Full Paper in Proceedings) |
Event Title |
International Conference on Computational Science (ICCS 2016) |
Divisions |
Workflow Systems and Technology |
Subjects |
Angewandte Informatik |
Event Location |
San Diego, USA |
Event Type |
Conference |
Event Dates |
6-8 June, 2016 |
Date |
June 2016 |
Export |