Snooping Wikipedia Vandals with MapReduce

Abstract :

In this paper, we present and validate an algorithmable to accurately identify anomalous behaviors on online andcollaborative social networks, based on their interaction withother fellows. We focus on Wikipedia, where accurate groundtruth for the classification of vandals can be reliably gatheredby manual inspection of the page edit history. We develop adistributed crawler and classifier tasks, both implemented inMapReduce, with whom we are able to explore a very largedataset, consisting of over 5 millions articles collaborativelyedited by 14 millions authors, resulting in over 8 billion pairwiseinteractions. We represent Wikipedia as a signed network, wherepositive arcs imply constructive interaction between editors. Wethen isolate a set of high reputation editors (i.e., nodes havingmany positive incoming links) and classify the remaining onesbased on their interactions with high reputation editors. Wedemonstrate our approach not only to be practically relevant(due to the size of our dataset), but also feasible (as it requiresfew MapReduce iteration) and accurate (over 95% true positiverate). At the same time, we are able to classify only about halfof the dataset editors (recall of 50%) for which we outline somesolution under study.

Complete list of metadatas

https://hal-imt.archives-ouvertes.fr/hal-01279007
Contributor : Admin Télécom Paristech <>
Submitted on : Thursday, February 25, 2016 - 12:52:49 PM
Last modification on : Thursday, October 17, 2019 - 12:37:01 PM

Identifiers

  • HAL Id : hal-01279007, version 1

Citation

Michele Spina, D. Rossi, Mauro Sozio, Silviu Maniu, Bogdan Cautis. Snooping Wikipedia Vandals with MapReduce. IEEE ICC, Feb 2015, London, United Kingdom. ⟨hal-01279007⟩

Share

Metrics

Record views

172