The interplay between extremism and communication in a collaborative project

Collaboration is among the most fundamental social behaviours.  The Internet and particularly the Web have been originally developed to foster large scale collaboration among scientists and technicians. The more recent emergence of Web 2.0 and ubiquity of user-generated content on social web, has provided us with even more potentials and capacities for large scale collaborative projects. Projects such as Wikipedia, Zooniverse, Foldit, etc are only few examples of such collective actions for public good.

Despite the central role of collaboration in development of our societies, data-driven studies and computational approaches to understand mechanisms and to test policies are rare.

In a recent paper titled “Understanding and coping with extremism in an online collaborative environment: A data-driven modeling” that is published in PLoS ONE, we use an agent-based modelling  framework to study opinion dynamics and collaboration in Wikipedia.

Our model is very simple and minimalistic and therefore the results can be generalized to other examples of large scale collaboration rather easily.

We particularly focus on the role of extreme opinions, direct communication between agents, and punishing policies that can be implemented in order to facilitate a faster consensus.

The results are rather surprising! In the abstract of the paper we say:

… Using a model of common value production, we show that the consensus can only be reached if groups with extreme views can actively take part in the discussion and if their views are also represented in the common outcome, at least temporarily. We show that banning problematic editors mostly hinders the consensus as it delays discussion and thus the whole consensus building process. We also consider the role of direct communication between editors both in the model and in Wikipedia data (by analyzing the Wikipedia talk pages). While the model suggests that in certain conditions there is an optimal rate of “talking” vs “editing”, it correctly predicts that in the current settings of Wikipedia, more activity in talk pages is associated with more controversy.

Read the whole paper here!


This diagram shows the time to reach consensus (colour-coded) as a function of relative size of the extreme opinion groups (RoE) and the rate of direct communication between agents (r) in four different scenarios. 


Even good bots fight and a typology of Internet bots

Our new paper titled “Even good bots fight: The case of Wikipedia” has finally appeared on PLOS One.

There are two things that I particularly find worth-highlighting about this work. First, this is the first time that someone looks at an ecosystem of the Internet bots at scale using hard data and tries to come up with a typology of the Internet bots (see the figure). And second, the arrangement of our team that is a good example of multidisciplinary research in action: Milena Tsvetkova, the lead author is a sociologist by training. Ruth Garcia is a computer engineer, Luciano Floridi is a professor of Philosophy, and I have a PhD in physics.

If you find the paper too long, have a look at the University of Oxford press release, or the one by the Alan Turing Institute, where both Luciano and I are Faculty Fellows.

Among many media coverages of our work, I think the one in The Guardian is the closest to ideal.


A first typology of the Internet bots. See the source.


New Paper: Personal Clashes and Status in Wikipedia Edit Wars


Originally posted on HUMANE blog by Milena Tsvetkova.

Our study on disagreement in Wikipedia was just published in Scientific Reports (impact factor 5.2). In this study, we find that disagreement and conflict in Wikipedia follow specific patterns. We use complex network methods to identify three kinds of typical negative interactions: an editor confronts another editor repeatedly, an editor confronts back an equally experienced attacker, and less experienced editors confront someone else’s attacker.

Disagreement and conflict are a fact of social life but we do not like to disclose publicly whom we dislike. This poses a challenge for scientists, as we rarely have records of negative social interactions.

To circumvent this problem, we investigate when and with whom Wikipedia users edit articles. We analyze more than 4.6 million edits in 13 different language editions of Wikipedia in the period 2001-2011. We identify when an editor undoes the contribution by another editor and created a network of these “reverts”.

A revert may be intended to improve the content in the article but may also indicate a negative social interaction among the editors involved. To see if the latter is the case, we analyze how often and how fast pairs of reverts occur compared to a null model. The null model removes any individual patterns of activity but preserves important characteristics of the community. It preserves the community structure centered around articles and topics and the natural irregularity of activity due to editors being in the same time zone or due to the occurrence of news-worthy events.

Using this method, we discover that certain interactions occur more often and during shorter time intervals than one would expect from the null model. We find that Wikipedia editors systematically revert the same person, revert back their reverter, and come to defend a reverted editor beyond what would be needed just to improve and maintain the encyclopedia objectively. In addition, we analyze the editors’ status and seniority as measured by the number of article edits they have completed. This reveals that editors with equal status are more likely to respond to reverts and lower-status editors are more likely to revert someone else’s reverter, presumably to make friends and gain some social capital.

We conclude that the discovered interactions demonstrate that social processes interfere with how knowledge is negotiated. Large-scale collaboration by volunteers online provides much of the information we obtain and the software products we use today. The repeated interactions of these volunteers give rise to communities with shared identity and practice. But the social interactions in these communities can in turn affect knowledge production. Such interferences may induce biases and subjectivities into the information we rely on.

Biases in Online Attention; Whose life matters more

This has become a common knowledge that certain lives matter more, when it comes to media coverage and public attention to natural or manmade disasters. Among many papers and articles that report on such biases, my favourite is this one by William C. Adams, titled “Whose Lives Count?”, and dated back to 1986. In this paper, it’s been reported, that for example, an Italian life matters to the American TV’s as much as some 200 Indonesians lives.


The Mh17 crash site in eastern Ukraine. Analysis of Wikipedia found that its article about the crash was the most read across all the aircraft incidents reported in Wikipedia. Photo by Robert Ghement/EPA.

We also studied such biases in online attention and in relation to aircraft crashes. Our paper, recently published in the Royal Society Open Science, reports that for example, a North American life matters almost 50 times more than an African life to the pool of Wikipedia readers.

The paper has received great media attention, and made it to Science and the Guardian.

The abstract of the paper reads

The Internet not only has changed the dynamics of our collective attention but also through the transactional log of online activities, provides us with the opportunity to study attention dynamics at scale. In this paper, we particularly study attention to aircraft incidents and accidents using Wikipedia transactional data in two different language editions, English and Spanish. We study both the editorial activities on and the viewership of the articles about airline crashes. We analyse how the level of attention is influenced by different parameters such as number of deaths, airline region, and event locale and date. We find evidence that the attention given by Wikipedia editors to pre-Wikipedia aircraft incidents and accidents depends on the region of the airline for both English and Spanish editions. North American airline companies receive more prompt coverage in English Wikipedia. We also observe that the attention given by Wikipedia visitors is influenced by the airline region but only for events with a high number of deaths. Finally we show that the rate and time span of the decay of attention is independent of the number of deaths and a fast decay within about a week seems to be universal. We discuss the implications of these findings in the context of attention bias.

and the full paper is available here.

Wikipedia readership around the UK general election

I already have written about the Wikipedia-Shapps story. So, that is not the main topic of this post! But when that topic was still hot, some people asked me whether I think anyone ever actually reads the Wikipedia articles about politicians? Why should it be important at all what is written in those articles? This post tackles that question. How much do people refer to Wikipedia to read about politics, specially around the election time?

Let’s again consider the Shapps’ case. Below, you can see number of daily page views of  of the Wikipedia article about him.

Screenshot from 2015-05-04 22:57:35

As you see, there are two HUGE peaks of around 7,000 and 14,500 views per day on top of a rather steady daily page view of sub-1000. The first peak appeared when “he admitted that he had [a] second job as ‘millionaire web marketer’ while [he was] MP“, and  the second one when the Wikipedia incident happened. Interesting to me is that while the first peak is related a much more important event, the second peak related to what I tend to call a minor event, is more than twice as large as the first one. Ok, so this might be just the case of Shapps and mostly due to media effects surrounding the controversy. How about the other politicians, say the party leaders? See the diagrams below.

Screenshot from 2015-05-04 22:57:45

A very large peak is evident in all the curves for all the party leaders with a peak of 22,000 views per day for Natalie Bennett, the leader of the Green party. Yes, that’s due to the iTV leaders’ debate on the 2nd of April. If you saw our previous post on search behaviour, you shouldn’t be surprised; surprising is the absence of a second peak around the BBC leaders’ debate on 16th of April, especially when you see the diagrams from our other post on Google search volumes.

How about the parties? How many people read about them on Wikipedia? Check it out below.

Screenshot from 2015-05-04 22:57:52

Here, there seems to be a second increase in the page views after the BBC debate on 16th April. Moreover, there is an ever widening separation between the curves of Tory-Labour-UKIP and LibDem-Green-SNP curves. This is very interesting, as Tories and Labours are the most established English parties, whereas the UKIP is among the newest ones. That’s very much related to our project on understanding the patterns of online information seeking around election times.

Elections and Social Media Presence of the Candidates

Some have called the forthcoming UK general election a Social Media Election. It might be a bit of exaggeration, but there is no doubt that both candidates and voters are very active on social media these days and take them seriously. The Wikipedia-Shapps story of last week is a good example showing how important online presence is for candidates, journalists, and of course voters. We don’t know how important this presence is in terms of shaping the votes, but at least we can look into the data and gauge the presence of the candidates and the activity of the supporters. In this post and some others we present statistics of online activity of parties, candidates, and of course voters. For an example, see the previous post on the searching behaviour of citizens around the debate times.

Who is on Twitter?

Candidates and parties are very much debated by supporters on social media, particularly Facebook and Twitter. But how active are candidates themselves on these platforms? In this post we show simply how many candidates from each party and in which constituencies have a Twitter account. Some of them might be more active than others and some might tweet very rarely, and we will analyse this activity in the next posts. Here we count only who has any kind of publicly known account.


Geographical distribution of candidates who have Twitter account.

The figure above shows the geographical distributions of candidates for each party and whether they have a Twitter account. There are some interesting results in there. For example, Labour has the largest number of Twitter-active candidates, whereas ALL the SNP candidates tweet. While LibDem and Green parties have the same number of accounts, normalised by the overall number of constituencies that they are standing in, Green seems to be more Twitter-enthusiastic. UKIP loses the Twitter game both in absolute number and proportion.

Who is on Wikipedia?

Having a Twitter account is something of a personal decision.  A candidate decides to have one and it’s totally up to them what to tweet. The difference in the case of Wikipedia, is that ideally candidates would not create or edit one about themselves. Also the type of information that you can learn about a candidate on their Wikipedia page is very different to what you can gain by reading their tweets.

Geographical distribution of the candidates, whom Wikipedia has an article about.

Geographical distribution of the candidates, whom Wikipedia has an article about.

The figure above shows the constituencies that the candidates standing in are featured in the largest online encyclopaedia, Wikipedia. Here, Tories are the absolute winners, in terms of the number of articles. Greens are the least “famous” candidates and LibDem are well behind the big two. In the next post we will explore often voters turn to Wikipedia to learn about the parties and candidates, and I’m sure by reading that you’ll be convinced that being featured on Wikipedia is important!


All right, so far, Labour won Twitter presence and Tories took Wikipedia (remember all the SNP’s also have a Twitter account). But how about the gender of the candidates? Is there any gender-related feature in social presence pattern of the candidates?

First let’s have a look at the gender distribution of the candidates.

Geographical distribution of the candidates colour-coded by gender.

Geographical distribution of the candidates colour-coded by gender.

As you see in the figure above, there are fewer female candidates than male ones across all the parties. Only 12% of the UKIP candidates are female while the Greens have the highest proportion at 38%. Tories sit right next to UKIP on the list of the most male oriented parties. There is also a clear pattern that most of the constituencies in the centre have male candidates.

How about social media?

Among all the candidates, 20% of male candidates are featured in Wikipedia, whereas this is about 17% for female candidates. Almost half of the Tories male candidates are in Wikipedia, whereas this goes down to 28% for their female counterparts. Only Labour female candidates have more coverage in Wikipedia compared to the males of the party, but the difference is marginal. ّIn all the other parties, males have a higher coverage rate. The tendency of Wikipedia to pay more attention to male figures is a very well known fact. 

Twitter is different. Slightly more female candidates (76%) have a Twitter account than male candidates (69%). Almost all (96%) of Labour females tweet, and Tory female candidates are more active than their male candidates. This pattern however is lost for the UKIP candidates, as 52% of their males are on Twitter compared to only 44% of their female candidates (who have the lowest rate among all the party-gender groups).


The data that we used to produced the maps and figures come mainly from a very interesting crowd-sourced project called yournextmp. However, we further validated the data using the Wikipedia and Twitter API’s. If you want to have a copy, just get in touch!

Wikipedia sockpuppetry: linking accounts to real people is pure speculation

Will the real Grant Shapps please stand up? ViciousCritic/Totally Socks, (CC BY-NC-SA)

You must have heard about the recent accusation of Grant Shapps by the Guardian. Basically, the Guardian claims that Shapps has been editing his own Wikipedia page and “Wikipedia has blocked a user account on suspicions that it is being used by the Conservative party chairman, Grant Shapps, or someone acting on his behalf”.

In a short piece that I wrote for The Conversation I try to explain how these things work in Wikipedia, what they mean,  and basically how unreliable these accusations are.

There are two issues here:

First, conflict of interest, for which Wikipedia guidelines suggest that “You should not create or edit articles about yourself, your family, or friends.” But basically it’s more a moral advice, because it’s technically impossible to know the real identity of editors. Unless the editors disclose their personal information deliberately.

The second point is that the account under discussion is banned by a Wikipedia admin not because of conflict of interest (which is anyway not a reason to ban a user), but Sockpuppetry: “The use of multiple Wikipedia user accounts for an improper purpose is called sock puppetry”. BUT, Sockpuppetry is not generally a good cause for banning a user either. It’s prohibited, only when used to mislead the editorial community or violate any other regulation.

Sock puppets are detected by certain type of editors who have very limited access to confidential data of users such as their IP-addresses, their computed and operating systems settings and their browser. This type of editor is called a CheckUser, and I used to serve as a CheckUser on Wikipedia for several years.

In this case the accounts that are “detected” as sock puppets have not been active simultaneously — there is a gap of about 3 years between their active periods. And this not only makes it very hard to claim that any rule or regulation is violated, but also, for this very long time gap, it is technically impossible for the CheckUser to observe any relation between the accounts under discussion.

Actually, the admin who has done the banning admits that his action has been mostly because of behavioural similarity (similarity between the edits performed by the two users and their shared political interests).

Altogether, I believe the banning has no reliable grounds and it’s based on pure speculation, and also the Guardian accusations are way beyond what you can logically infer from the facts and evidence.