An Execution Framework for Grid-Clustering Methods

An Execution Framework for Grid-Clustering Methods

Abstract

Cluster analysis methods have proven extremely valuable for explorative data analysis and are also fundamental for data mining methods. Goal of cluster analysis is to find correlations of the value space and to separate the data values into a priori unknown set of subgroups based on a similarity metrics. In case of high dimensional data, i.e. data with a large number of describing attributes, clustering can result into a very time consuming task, which often limits the number of observations to be clustered in practice. To overcome this problem, Grid clustering methods have been developed, which do not calculate similarity values between the data value each, but organize the value space surrounding the data values, e.g. by specific data structure indices. In this paper we present a framework which allows to evaluate different data structures for the generation of a multi-dimensional grid structure grouping the data values into blocks. The values are then clustered by a topological neighbor search algorithm on the basis of the block structure. As first data structure to be evaluated we present the BANG file structure and show its feasibility as clustering index. The developed framework is planned to be contributed as package to the WEKA software.

Grafik Top
Authors
  • Schikuta, Erich
  • Fritz, Florian
Grafik Top
Shortfacts
Category
Paper in Conference Proceedings or in Workshop Proceedings (Full Paper in Proceedings)
Event Title
International Conference on Computational Science (ICCS 2016)
Divisions
Workflow Systems and Technology
Subjects
Angewandte Informatik
Event Location
San Diego, USA
Event Type
Conference
Event Dates
6-8 June, 2016
Date
June 2016
Export
Grafik Top