Snooping Wikipedia Vandals with MapReduce

Abstract :

In this paper, we present and validate an algorithm able to accurately identify anomalous behaviors on online and collaborative social networks, based on their interaction with other fellows. We focus on Wikipedia, where accurate ground truth for the classification of vandals can be reliably gathered by manual inspection of the page edit history. We develop a distributed crawler and classifier tasks, both implemented in MapReduce, with whom we are able to explore a very large dataset, consisting of over 5 millions articles collaboratively edited by 14 millions authors, resulting in over 8 billion pairwise interactions. We represent Wikipedia as a signed network, where positive arcs imply constructive interaction between editors. We then isolate a set of high reputation editors (i.e., nodes having many positive incoming links) and classify the remaining ones based on their interactions with high reputation editors. We demonstrate our approach not only to be practically relevant (due to the size of our dataset), but also feasible (as it requires few MapReduce iteration) and accurate (over 95% true positive rate). At the same time, we are able to classify only about half of the dataset editors (recall of 50%) for which we outline some solution under study.

Type de document :
Communication dans un congrès
IEEE ICC, Feb 2015, London, United Kingdom. IEEE ICC, 2015
Liste complète des métadonnées
Contributeur : Admin Télécom Paristech <>
Soumis le : jeudi 25 février 2016 - 12:52:49
Dernière modification le : mercredi 28 novembre 2018 - 01:26:31


  • HAL Id : hal-01279007, version 1


Michele Spina, D. Rossi, Mauro Sozio, Silviu Maniu, Bogdan Cautis. Snooping Wikipedia Vandals with MapReduce. IEEE ICC, Feb 2015, London, United Kingdom. IEEE ICC, 2015. 〈hal-01279007〉



Consultations de la notice