Collective Memory in the Digital Age

We finished our project on Collective Memory in the Digital Age: Understanding “Forgetting” on the Internet last summer, but our last paper just came out on Science Advances last week.

The paper, titled “The memory remains: Understanding collective memory in the digital age” presents the results of our study on collective memory patterns based on Wikipedia viewership data of articles related to aviation accidents and incidents.

Combined with our previous paper on Dynamics and biases of online attention, published last year, we mainly claim two things:

Our short-term collective memory is really short; shorter than a week, and it’s biased, and our long-term memory is pretty long, about 45 years, also biased, nevertheless modellable!  And the Internet plays important roles in both observations and also helps us to quantify and study these patterns.

Of course, we have reported few other facts and observations related to our collective memory, but the main message was that.

We report that the most important factor in memory triggering patterns is the original impact of the past event measured by its average daily page views before the recent event occurred. That means that some past events are intrinsically more memorable and our memory of them are more easily triggered. Examples of such events are the crashes related to the 9/11 terrorist attacks.

Time separation between the two events also plays an important role. The closer in time the two events are, the stronger coupling between them; and when the time separation exceeds 45 years, it becomes very unlikely that the recent event triggers any memory of the past event.

The similarity between the two events has turned out to be another important factor; This happens in the case of the Iran Air flight 655 shot down by a US navy guided missile in 1988, which was not generally well remembered but far more attention was paid to it when the Malaysia Airlines 17 flight was hit by a missile over Ukraine in 2014.

3 - press_fig-1

Page-view statistics of three recent flights (2015) and their effects on the page-views of past events from 2014, and events from 1995 to 2000. The recent events cause an increase in the viewership of some of the past events. 

Read the article here, the abstract says:

Recently developed information communication technologies, particularly the Internet, have affected how we, both as individuals and as a society, create, store, and recall information. The Internet also provides us with a great opportunity to study memory using transactional large-scale data in a quantitative framework similar to the practice in natural sciences. We make use of online data by analyzing viewership statistics of Wikipedia articles on aircraft crashes. We study the relation between recent events and past events and particularly focus on understanding memory-triggering patterns. We devise a quantitative model that explains the flow of viewership from a current event to past events based on similarity in time, geography, topic, and the hyperlink structure of Wikipedia articles. We show that, on average, the secondary flow of attention to past events generated by these remembering processes is larger than the primary attention flow to the current event. We report these previously unknown cascading effects.


The interplay between extremism and communication in a collaborative project

Collaboration is among the most fundamental social behaviours.  The Internet and particularly the Web have been originally developed to foster large scale collaboration among scientists and technicians. The more recent emergence of Web 2.0 and ubiquity of user-generated content on social web, has provided us with even more potentials and capacities for large scale collaborative projects. Projects such as Wikipedia, Zooniverse, Foldit, etc are only few examples of such collective actions for public good.

Despite the central role of collaboration in development of our societies, data-driven studies and computational approaches to understand mechanisms and to test policies are rare.

In a recent paper titled “Understanding and coping with extremism in an online collaborative environment: A data-driven modeling” that is published in PLoS ONE, we use an agent-based modelling  framework to study opinion dynamics and collaboration in Wikipedia.

Our model is very simple and minimalistic and therefore the results can be generalized to other examples of large scale collaboration rather easily.

We particularly focus on the role of extreme opinions, direct communication between agents, and punishing policies that can be implemented in order to facilitate a faster consensus.

The results are rather surprising! In the abstract of the paper we say:

… Using a model of common value production, we show that the consensus can only be reached if groups with extreme views can actively take part in the discussion and if their views are also represented in the common outcome, at least temporarily. We show that banning problematic editors mostly hinders the consensus as it delays discussion and thus the whole consensus building process. We also consider the role of direct communication between editors both in the model and in Wikipedia data (by analyzing the Wikipedia talk pages). While the model suggests that in certain conditions there is an optimal rate of “talking” vs “editing”, it correctly predicts that in the current settings of Wikipedia, more activity in talk pages is associated with more controversy.

Read the whole paper here!


This diagram shows the time to reach consensus (colour-coded) as a function of relative size of the extreme opinion groups (RoE) and the rate of direct communication between agents (r) in four different scenarios. 


Using Twitter data to study politics? Fine, but be careful!

The role of social media in shaping the new politics is undeniable. Therefore the volume of research on this topic, relying on the data that are produced by the same technologies, is ever increasing. And let’s be honest, when we say “social media” data, almost always we mean Twitter data!

Twitter is arguably the most studied and used source of data in the new field of Computational Political Science, even though in many countries Twitter is not the main player. But we all know why we use Twitter data in our studies and not for instance data mined from Facebook: Twitter data are (almost) publicly available whereas it’s (almost) impossible to collect any useful data from Facebook.

That is understandable. However, there are numerous issues with studies that are entirely relying on Twitter data.

In a mini-review paper titled “A Biased Review of Biases in Twitter Studies on Political Collective Action“, we discussed some of these issues. Only some of them and not all, and that’s why we called our paper a “biased review”.

The reason that I’m reminding you of the paper now is mostly the new surge of research on “politics and Twitter” in relation to the recent events in the UK, US, and the forthcoming elections in European countries this summer.

Here is the abstract:

In recent years researchers have gravitated to Twitter and other social media platforms as fertile ground for empirical analysis of social phenomena. Social media provides researchers access to trace data of interactions and discourse that once went unrecorded in the offline world. Researchers have sought to use these data to explain social phenomena both particular to social media and applicable to the broader social world. This paper offers a minireview of Twitter-based research on political crowd behavior. This literature offers insight into particular social phenomena on Twitter, but often fails to use standardized methods that permit interpretation beyond individual studies. Read more….


Social Media: an illustration of overestimating the relevance of social media to social events from XKCD. Available online at

Even good bots fight and a typology of Internet bots

Our new paper titled “Even good bots fight: The case of Wikipedia” has finally appeared on PLOS One.

There are two things that I particularly find worth-highlighting about this work. First, this is the first time that someone looks at an ecosystem of the Internet bots at scale using hard data and tries to come up with a typology of the Internet bots (see the figure). And second, the arrangement of our team that is a good example of multidisciplinary research in action: Milena Tsvetkova, the lead author is a sociologist by training. Ruth Garcia is a computer engineer, Luciano Floridi is a professor of Philosophy, and I have a PhD in physics.

If you find the paper too long, have a look at the University of Oxford press release, or the one by the Alan Turing Institute, where both Luciano and I are Faculty Fellows.

Among many media coverages of our work, I think the one in The Guardian is the closest to ideal.


A first typology of the Internet bots. See the source.


New Paper: Personal Clashes and Status in Wikipedia Edit Wars


Originally posted on HUMANE blog by Milena Tsvetkova.

Our study on disagreement in Wikipedia was just published in Scientific Reports (impact factor 5.2). In this study, we find that disagreement and conflict in Wikipedia follow specific patterns. We use complex network methods to identify three kinds of typical negative interactions: an editor confronts another editor repeatedly, an editor confronts back an equally experienced attacker, and less experienced editors confront someone else’s attacker.

Disagreement and conflict are a fact of social life but we do not like to disclose publicly whom we dislike. This poses a challenge for scientists, as we rarely have records of negative social interactions.

To circumvent this problem, we investigate when and with whom Wikipedia users edit articles. We analyze more than 4.6 million edits in 13 different language editions of Wikipedia in the period 2001-2011. We identify when an editor undoes the contribution by another editor and created a network of these “reverts”.

A revert may be intended to improve the content in the article but may also indicate a negative social interaction among the editors involved. To see if the latter is the case, we analyze how often and how fast pairs of reverts occur compared to a null model. The null model removes any individual patterns of activity but preserves important characteristics of the community. It preserves the community structure centered around articles and topics and the natural irregularity of activity due to editors being in the same time zone or due to the occurrence of news-worthy events.

Using this method, we discover that certain interactions occur more often and during shorter time intervals than one would expect from the null model. We find that Wikipedia editors systematically revert the same person, revert back their reverter, and come to defend a reverted editor beyond what would be needed just to improve and maintain the encyclopedia objectively. In addition, we analyze the editors’ status and seniority as measured by the number of article edits they have completed. This reveals that editors with equal status are more likely to respond to reverts and lower-status editors are more likely to revert someone else’s reverter, presumably to make friends and gain some social capital.

We conclude that the discovered interactions demonstrate that social processes interfere with how knowledge is negotiated. Large-scale collaboration by volunteers online provides much of the information we obtain and the software products we use today. The repeated interactions of these volunteers give rise to communities with shared identity and practice. But the social interactions in these communities can in turn affect knowledge production. Such interferences may induce biases and subjectivities into the information we rely on.

Biases in Online Attention; Whose life matters more

This has become a common knowledge that certain lives matter more, when it comes to media coverage and public attention to natural or manmade disasters. Among many papers and articles that report on such biases, my favourite is this one by William C. Adams, titled “Whose Lives Count?”, and dated back to 1986. In this paper, it’s been reported, that for example, an Italian life matters to the American TV’s as much as some 200 Indonesians lives.


The Mh17 crash site in eastern Ukraine. Analysis of Wikipedia found that its article about the crash was the most read across all the aircraft incidents reported in Wikipedia. Photo by Robert Ghement/EPA.

We also studied such biases in online attention and in relation to aircraft crashes. Our paper, recently published in the Royal Society Open Science, reports that for example, a North American life matters almost 50 times more than an African life to the pool of Wikipedia readers.

The paper has received great media attention, and made it to Science and the Guardian.

The abstract of the paper reads

The Internet not only has changed the dynamics of our collective attention but also through the transactional log of online activities, provides us with the opportunity to study attention dynamics at scale. In this paper, we particularly study attention to aircraft incidents and accidents using Wikipedia transactional data in two different language editions, English and Spanish. We study both the editorial activities on and the viewership of the articles about airline crashes. We analyse how the level of attention is influenced by different parameters such as number of deaths, airline region, and event locale and date. We find evidence that the attention given by Wikipedia editors to pre-Wikipedia aircraft incidents and accidents depends on the region of the airline for both English and Spanish editions. North American airline companies receive more prompt coverage in English Wikipedia. We also observe that the attention given by Wikipedia visitors is influenced by the airline region but only for events with a high number of deaths. Finally we show that the rate and time span of the decay of attention is independent of the number of deaths and a fast decay within about a week seems to be universal. We discuss the implications of these findings in the context of attention bias.

and the full paper is available here.

Understanding voters’ information seeking behaviour

Jonathan and I recently published a paper titledWikipedia traffic data and electoral prediction: towards theoretically informed models in EPJ Data Science.

In this article we examine the possibility of predicting election results by analysing Wikipedia traffic going to different articles related to the parties involved in the election.

Unlike similar work in which socially generated online data is used in an automated learning system to predict the electoral results, without much understanding of mechanisms, here we try to provide a theoretical understanding of voters’ information seeking behaviour around election time and use that understanding to make predictions.


Left panel shows the normalized daily views of the article on the European Parliament Election, 2009 in different langue editions of Wikipedia. The right panel shows the relative change between 2009 and 2014 election turnout in each country vs the relative change in the page view counts of the election article in the corresponding Wikipedia language edition. Germany and Czech Republic are marked as outliers from the general trend.

We test our model on a variety of countries in the 2009 and 2014 European Parliament elections. We show that Wikipedia offers good information about changes in overall turnout at elections and also about changes in vote share for parties. It gives a particularly strong signal for new parties which are emerging to prominence.

We use these results to enhance existing theories about the drivers of aggregate patterns in online information seeking, by suggesting that:

voters are cognitive misers who seek information only when considering changing their vote.

This shows the importance of informal online information in forming the opinions of swing voters, and emphasizes the need for serious consideration of the potentials of systems like Wikipedia by parties, campaign organizers, and institutions which regulate elections.

Read more here.