Characterising dataset search—An analysis of search logs and data requests

Characterising dataset search—An analysis of search logs and data requests

Abstract

Large amounts of data are becoming increasingly available online. In order to benefit from it we need tools to retrieve the most relevant datasets that match ones data needs. Several vocabularies have been developed to describe datasets in order to increase their discoverability, but for data publishers is costly to cumbersome to annotate them using all, leading to the question of what properties are more important. In this work we contribute with a systematic study of the patterns and specific attributes that data consumers use to search for data and how it compares with general web search. We performed a query log analysis based on logs from four national open data portals and conducted a qualitative analysis of user data requests for requests issued to one of them. Search queries issued on data portals differ from those issued to web search engines in their length, topic, and structure. Based on our findings we hypothesise that portals search functionalities are currently used in an exploratory manner, rather than to retrieve a specific resource. In our study of data requests we found that geospatial and temporal attributes, as well as information on the required granularity of the data are the most common features. The findings of both analyses suggest that these features are of higher importance in dataset retrieval in contrast to general web search, suggesting that efforts of dataset publishers should focus on generating dataset descriptions including them.

Grafik Top
Authors
  • Kacprzak, Emilia
  • Koesten, Laura
  • Ibáñez, Luis-Daniel
  • Blount, Tom
  • Tennison, Jeni
  • Simperl, Elena
Grafik Top
Shortfacts
Category
Journal Paper
Divisions
Visualization and Data Analysis
Journal or Publication Title
Journal of Web Semantics
ISSN
1570-8268
Page Range
pp. 37-55
Volume
55
Date
March 2019
Export
Grafik Top