Representative Skylines using Threshold-based Preference Distributions

Content

Abstract
Additional Information
Authors
Shortfacts

Abstract

Abstract—The study of skylines and their variants has received considerable attention in recent years. Skylines are essentially sets of most interesting (undominated) tuples in a database. However, since the skyline is often very large, much research effort has been devoted to identifying a smaller subset of (say k) “representative skyline” points. Several different deﬁnitions of representative skylines have been considered. Most of these formulations are intuitive in that they try to achieve some kind of clustering “spread” over the entire skyline, with k points. In this work, we take a more principled approach in deﬁning the representative skyline objective. One of our main contributions is to formulate the problem of displaying k representative skyline points such that the probability that a random user would click on one of them is maximized. Two major research questions arise naturally from this formulation. First, how does one mathematically model the likelihood with which a user is interested in and will “click” on a certain tuple? Second, how does one negotiate the absence of the knowledge of an explicit set of target users; in particular what do we mean by “a random user”? To answer the ﬁrst question, we model users based on a novel formulation of threshold preferences which we will motivate further in the paper. To answer the second question, we assume a probability distribution of users instead of a ﬁxed set of users. While this makes the problem harder, it lends more mathematical structures that can be exploited as well, as one can now work with probabilities of thresholds and handle cumulative density functions. On the theoretical front, our objective is NP-hard. For the case of a ﬁnite set of users with known thresholds, we present a simple greedy algorithm that attains an approximation ratio of (1 − 1/e) of the optimal. For the case of user distributions, we show that a careful yet similar greedy algorithm achieves the same approximation ratio. Unfortunately, it turns out that this algorithm is rather involved and computationally expensive. So we present a threshold sampling based algorithm that is more computationally affordable and, for any ﬁxed ǫ > 0, has an approximation ratio of (1 − 1/e − ǫ). We perform experiments on both real and synthetic data to show that our algorithm signiﬁcantly outperforms previously proposed approaches.

Top

Additional Information

Work done while at Georgia Tech, Atlanta, USA

Top

Authors

Das Sarma, Atish
Lall, Ashwin
Nanongkai, Danupon
Lipton, Richard J.
Jim, Xu

Top

Shortfacts

Category	Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title	International Conference on Data Engineering
Divisions	Theory and Applications of Algorithms
Event Location	Hannover, Germany
Event Type	Conference
Event Dates	11-16 April, 2011
Publisher	IEEE/ACM
Date	2011
Export

Top