Examining the Intra-Location Differences Among Twitter Samples

Examining the Intra-Location Differences Among Twitter Samples

Abstract

In this paper, we explore Twitter data samples collected from five different geographical locations. For each of these geographical locations, we compare variations occurring within samples collected simultaneously from two different machines running Twitter API clients. In addition, we split the collected data samples into “complete” and “incomplete” datasets. An incomplete dataset is a collection of Twitter messages where at least one machine received a smaller data sample due to some interruption. A complete dataset is one that includes all tweets that Twitter’s API delivers for a particular set of search parameters. Our findings indicate that 86 of the complete samples show some variations in the attribute values attached to extracted tweets. While the complete datasets show comparable attribute values and network characteristics, the incomplete data samples exhibit substantial differences. We arrive at recommendations for researchers on Online Social Networks on how to mine Twi tter data while mitigating these risks.

Grafik Top
Authors
  • Ivanova, Rositsa
  • Kusen, Ema
  • Sobernig, Stefan
Grafik Top
Shortfacts
Category
Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title
Proceedings of the 8th International Conference on Complexity, Future Information Systems and Risk COMPLEXIS - Volume 1
Divisions
Security and Privacy
Subjects
Angewandte Informatik
Event Location
Lisbon, Portugal
Event Type
Conference
Event Dates
22-23 Apr 2023
Publisher
SciTePress
Page Range
pp. 94-101
Date
2023
Export
Grafik Top