An Execution Framework for Grid-Clustering Methods

Content

Abstract
Authors
Shortfacts

Abstract

Cluster analysis methods have proven extremely valuable for explorative data analysis and are also fundamental for data mining methods. Goal of cluster analysis is to find correlations of the value space and to separate the data values into a priori unknown set of subgroups based on a similarity metrics. In case of high dimensional data, i.e. data with a large number of describing attributes, clustering can result into a very time consuming task, which often limits the number of observations to be clustered in practice. To overcome this problem, Grid clustering methods have been developed, which do not calculate similarity values between the data value each, but organize the value space surrounding the data values, e.g. by specific data structure indices. In this paper we present a framework which allows to evaluate different data structures for the generation of a multi-dimensional grid structure grouping the data values into blocks. The values are then clustered by a topological neighbor search algorithm on the basis of the block structure. As first data structure to be evaluated we present the BANG file structure and show its feasibility as clustering index. The developed framework is planned to be contributed as package to the WEKA software.

Top

Authors

Schikuta, Erich
Fritz, Florian

Top

Shortfacts

Category	Paper in Conference Proceedings or in Workshop Proceedings (Full Paper in Proceedings)
Event Title	International Conference on Computational Science (ICCS 2016)
Divisions	Workflow Systems and Technology
Subjects	Angewandte Informatik
Event Location	San Diego, USA
Event Type	Conference
Event Dates	6-8 June, 2016
Date	June 2016
Export

Top