Our new paper titled “Even good bots fight: The case of Wikipedia” has finally appeared on PLOS One.
There are two things that I particularly find worth-highlighting about this work. First, this is the first time that someone looks at an ecosystem of the Internet bots at scale using hard data and tries to come up with a typology of the Internet bots (see the figure). And second, the arrangement of our team that is a good example of multidisciplinary research in action: Milena Tsvetkova, the lead author is a sociologist by training. Ruth Garcia is a computer engineer, Luciano Floridi is a professor of Philosophy, and I have a PhD in physics.
If you find the paper too long, have a look at the University of Oxford press release, or the one by the Alan Turing Institute, where both Luciano and I are Faculty Fellows.
Among many media coverages of our work, I think the one in The Guardian is the closest to ideal.
I am very happy to announce our new series of seminars at the Oxford Internet Institute (OII), called “The OII Colloquia (TOC)“.
The OII Colloquia bring senior speakers from other departments at the University of Oxford to the Oxford Internet Institute to spark conversation around the Internet and society.
The word Colloquia (sing.: Colloquium) comes from the Latin word “Colloquy” meaning “Conversation”. Today, we often use the term to describe departmental seminars with a general topic and audience.
The OII Colloquia, however, come closer to the original sense of the word: through this series of events we aim to initiate conversations and strengthen our ties with scholars at other departments of the University of Oxford, around topics of shared interest. They should be considered as a trigger for long-lasting collaborations between the OII and the speakers’ own departments.
TOC are held twice a term (weeks 2 and 7) on Thursdays from 17:15 to 18:45 in an interactive and stimulating environment at the Oxford Internet Institute, 1 St Giles OX1-3JS open to the public (upon registration).
I am very happy to see that our eBook titled “At the Crossroads: Lessons and Challenges in Computational Social Science” is finally out, and ready for free download.
This book is based on a Research Topic collection that Javier Borge-Holthoefer, Yamir Moreno, and I edited for Frontiers in Physics over the past year. The collection contains 10 articles + 1 editorial, written by 48 authors, covering different aspects of the emerging field of Computational Social Science.
The Research Topic has received more than 36,000 views so far, pushing it already among the top 10 most visited Research Topics in all Frontiers in Physics research topics.
Please feel free to download the eBook and enjoy reading!
Originally posted on HUMANE blog by Milena Tsvetkova.
Our study on disagreement in Wikipedia was just published in Scientific Reports (impact factor 5.2). In this study, we find that disagreement and conflict in Wikipedia follow specific patterns. We use complex network methods to identify three kinds of typical negative interactions: an editor confronts another editor repeatedly, an editor confronts back an equally experienced attacker, and less experienced editors confront someone else’s attacker.
Disagreement and conflict are a fact of social life but we do not like to disclose publicly whom we dislike. This poses a challenge for scientists, as we rarely have records of negative social interactions.
To circumvent this problem, we investigate when and with whom Wikipedia users edit articles. We analyze more than 4.6 million edits in 13 different language editions of Wikipedia in the period 2001-2011. We identify when an editor undoes the contribution by another editor and created a network of these “reverts”.
A revert may be intended to improve the content in the article but may also indicate a negative social interaction among the editors involved. To see if the latter is the case, we analyze how often and how fast pairs of reverts occur compared to a null model. The null model removes any individual patterns of activity but preserves important characteristics of the community. It preserves the community structure centered around articles and topics and the natural irregularity of activity due to editors being in the same time zone or due to the occurrence of news-worthy events.
Using this method, we discover that certain interactions occur more often and during shorter time intervals than one would expect from the null model. We find that Wikipedia editors systematically revert the same person, revert back their reverter, and come to defend a reverted editor beyond what would be needed just to improve and maintain the encyclopedia objectively. In addition, we analyze the editors’ status and seniority as measured by the number of article edits they have completed. This reveals that editors with equal status are more likely to respond to reverts and lower-status editors are more likely to revert someone else’s reverter, presumably to make friends and gain some social capital.
We conclude that the discovered interactions demonstrate that social processes interfere with how knowledge is negotiated. Large-scale collaboration by volunteers online provides much of the information we obtain and the software products we use today. The repeated interactions of these volunteers give rise to communities with shared identity and practice. But the social interactions in these communities can in turn affect knowledge production. Such interferences may induce biases and subjectivities into the information we rely on.
This has become a common knowledge that certain lives matter more, when it comes to media coverage and public attention to natural or manmade disasters. Among many papers and articles that report on such biases, my favourite is this one by William C. Adams, titled “Whose Lives Count?”, and dated back to 1986. In this paper, it’s been reported, that for example, an Italian life matters to the American TV’s as much as some 200 Indonesians lives.
The Mh17 crash site in eastern Ukraine. Analysis of Wikipedia found that its article about the crash was the most read across all the aircraft incidents reported in Wikipedia. Photo by Robert Ghement/EPA.
We also studied such biases in online attention and in relation to aircraft crashes. Our paper, recently published in the Royal Society Open Science, reports that for example, a North American life matters almost 50 times more than an African life to the pool of Wikipedia readers.
The paper has received great media attention, and made it to Science and the Guardian.
The abstract of the paper reads
The Internet not only has changed the dynamics of our collective attention but also through the transactional log of online activities, provides us with the opportunity to study attention dynamics at scale. In this paper, we particularly study attention to aircraft incidents and accidents using Wikipedia transactional data in two different language editions, English and Spanish. We study both the editorial activities on and the viewership of the articles about airline crashes. We analyse how the level of attention is influenced by different parameters such as number of deaths, airline region, and event locale and date. We find evidence that the attention given by Wikipedia editors to pre-Wikipedia aircraft incidents and accidents depends on the region of the airline for both English and Spanish editions. North American airline companies receive more prompt coverage in English Wikipedia. We also observe that the attention given by Wikipedia visitors is influenced by the airline region but only for events with a high number of deaths. Finally we show that the rate and time span of the decay of attention is independent of the number of deaths and a fast decay within about a week seems to be universal. We discuss the implications of these findings in the context of attention bias.
and the full paper is available here.
Jonathan and I recently published a paper titled “Wikipedia traffic data and electoral prediction: towards theoretically informed models“ in EPJ Data Science.
In this article we examine the possibility of predicting election results by analysing Wikipedia traffic going to different articles related to the parties involved in the election.
Unlike similar work in which socially generated online data is used in an automated learning system to predict the electoral results, without much understanding of mechanisms, here we try to provide a theoretical understanding of voters’ information seeking behaviour around election time and use that understanding to make predictions.
Left panel shows the normalized daily views of the article on the European Parliament Election, 2009 in different langue editions of Wikipedia. The right panel shows the relative change between 2009 and 2014 election turnout in each country vs the relative change in the page view counts of the election article in the corresponding Wikipedia language edition. Germany and Czech Republic are marked as outliers from the general trend.
We test our model on a variety of countries in the 2009 and 2014 European Parliament elections. We show that Wikipedia offers good information about changes in overall turnout at elections and also about changes in vote share for parties. It gives a particularly strong signal for new parties which are emerging to prominence.
We use these results to enhance existing theories about the drivers of aggregate patterns in online information seeking, by suggesting that:
voters are cognitive misers who seek information only when considering changing their vote.
This shows the importance of informal online information in forming the opinions of swing voters, and emphasizes the need for serious consideration of the potentials of systems like Wikipedia by parties, campaign organizers, and institutions which regulate elections.
Read more here.
Since I launched this blog, I always wanted to write something about the dangers of big data! Things that can go wrong easily when you study a large scale transactional data. Obviously, I haven’t done this!
But recently we (Bertie, my PhD Student and I) just finished a paper titled: P-values: misunderstood and misused.
Of course statistical “misunderstanding” is one of the dangers of big data. Calculating p-values has become the most-used method to prove the “significance” of your analysis. However, as we say in the abstract:
P-values are widely used in both the social and natural sciences to quantify the statistical significance of observed results. The recent surge of big data research has made p-value an even more popular tool to test the significance of a study. However, substantial literature has been produced critiquing how p-values are used and understood. In this paper we review this recent critical literature, much of which is routed in the life sciences, and consider its implications for social scientific research. We provide a coherent picture of what the main criticisms are, and draw together and disambiguate common themes. In particular, we explain how the False Discovery Rate is calculated, and how this differs from a p-value. We also make explicit the Bayesian nature of many recent criticisms, a dimension that is often underplayed or ignored. We also identify practical steps to help remediate some of the concerns identified, and argue that p-values need to be contextualised within (i) the specific study, and (ii) the broader field of inquiry.