Modeling and optimizing large-scale data flows

Modeling and optimizing large-scale data flows

Abstract

Modern scientific collaborations require large-scale integration of various processes. Higher-level dataflow languages are used on top of parallel and distributed dataflow systems to enable faster data-intensive workflow programs development, their easier optimization, and more maintainable code. In this paper, we present the rationales, design, and application of the needed advanced support for modeling and optimizing data flows for data mining and integration processes. The optimization research and development is based on dataflow pre-execution modeling and extending the registry of process activities by advanced annotations. Additionally, the overall process from a dynamic model to a static model as input for the optimization algorithms is described. This novel approach is implemented within an advanced graphical user interface, called the Process Designer, in order to support semi-automatic optimization as well as within a dataflow execution platform, called the Gateway. It can be adapted to any dataflow language implementation. The Process Designer architecture based on modern (meta-)modeling concepts naturally supports validated transformations between external textual and internal graphical representations of the targeted dataflow language, and in this way significantly increases the productivity and robustness of the implementation processes.

Grafik Top
Authors
  • Wöhrer, Alexander
  • Brezany, Peter
  • Janciak, Ivan
  • Mehofer, Eduard
Grafik Top
Projects
Grafik Top
Shortfacts
Category
Journal Paper
Divisions
Scientific Computing
Journal or Publication Title
Future Generation Computer Systems: the international journal of grid computing: theory, methods and applications
ISSN
0167-739X
Publisher
Elsevier
Page Range
pp. 12-27
Volume
31
Date
February 2014
Export
Grafik Top