A Flexible Algorithmic Approach for Identifying Conflicting/Deviating Data on the Web

A Flexible Algorithmic Approach for Identifying Conflicting/Deviating Data on the Web

Abstract

Information on the Web often contains contradictions and conflicting information, thus impacting the quality of data sources and the quality-related performance of search and retrieval. Therefore, appropriate techniques need to be developed and integrated into the infrastructure serving for the retrieval and browsing of data sources such that conflicting data are detected, can be removed or blocked, or can be highlighted to the user in order to offer an improvement of the quality of content consumed by users. This paper proposes an approach which allows to detect conflicting data by providing a technique for investigating deviation between values available from structured data on the Web. Our approach consists of multiple phases: First, some initial pre-processing of data from targeted data sources prepares the data sources to be comparable. Second, Levenshtein distance is computed between data elements to represent the degree of conflict between data elements. Third, computing the cosine similarity between vectors of Levenshtein distance values and a user-configurable sensitivity vector, encoding the characteristics of a specific kind of conflict that is subject to investigation, finally allows for a ranked detection of the conflicting data. This algorithm has been applied and tested on a data collection about movies from the Web, illustrating how the techniques can be applied for the detection of conflicting information on the Web.

Grafik Top
Authors
  • Jnoub, Nour
  • Klas, Wolfgang
  • Kalchgruber, Peter
  • Momeni Roochi, Elaheh
Grafik Top
Supplemental Material
Shortfacts
Category
Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title
8th International Conference On Computer, Information and Telecommunication Systems 2018
Divisions
Multimedia Information Systems
Subjects
Webentwicklung, Webanwendungen
Multimedia
Event Location
France, Colmar
Event Type
Conference
Event Dates
July 2018
Series Name
International Conference on Computer, Information and Telecommunication Systems
Page Range
-5
Date
11 July 2018
Export
Grafik Top