Using Twitter data to study politics? Fine, but be careful!

The role of social media in shaping the new politics is undeniable. Therefore the volume of research on this topic, relying on the data that are produced by the same technologies, is ever increasing. And let’s be honest, when we say “social media” data, almost always we mean Twitter data!

Twitter is arguably the most studied and used source of data in the new field of Computational Political Science, even though in many countries Twitter is not the main player. But we all know why we use Twitter data in our studies and not for instance data mined from Facebook: Twitter data are (almost) publicly available whereas it’s (almost) impossible to collect any useful data from Facebook.

That is understandable. However, there are numerous issues with studies that are entirely relying on Twitter data.

In a mini-review paper titled “A Biased Review of Biases in Twitter Studies on Political Collective Action“, we discussed some of these issues. Only some of them and not all, and that’s why we called our paper a “biased review”.

The reason that I’m reminding you of the paper now is mostly the new surge of research on “politics and Twitter” in relation to the recent events in the UK, US, and the forthcoming elections in European countries this summer.

Here is the abstract:

In recent years researchers have gravitated to Twitter and other social media platforms as fertile ground for empirical analysis of social phenomena. Social media provides researchers access to trace data of interactions and discourse that once went unrecorded in the offline world. Researchers have sought to use these data to explain social phenomena both particular to social media and applicable to the broader social world. This paper offers a minireview of Twitter-based research on political crowd behavior. This literature offers insight into particular social phenomena on Twitter, but often fails to use standardized methods that permit interpretation beyond individual studies. Read more….

fphy-04-00034-g001

Social Media: an illustration of overestimating the relevance of social media to social events from XKCD. Available online at http://xkcd.com/1239/

Even good bots fight and a typology of Internet bots

Our new paper titled “Even good bots fight: The case of Wikipedia” has finally appeared on PLOS One.

There are two things that I particularly find worth-highlighting about this work. First, this is the first time that someone looks at an ecosystem of the Internet bots at scale using hard data and tries to come up with a typology of the Internet bots (see the figure). And second, the arrangement of our team that is a good example of multidisciplinary research in action: Milena Tsvetkova, the lead author is a sociologist by training. Ruth Garcia is a computer engineer, Luciano Floridi is a professor of Philosophy, and I have a PhD in physics.

If you find the paper too long, have a look at the University of Oxford press release, or the one by the Alan Turing Institute, where both Luciano and I are Faculty Fellows.

Among many media coverages of our work, I think the one in The Guardian is the closest to ideal.

c5blo6jxeai55d3

A first typology of the Internet bots. See the source.

 

The OII Colloquia

I am very happy to announce our new series of seminars at the Oxford Internet Institute (OII), called “The OII Colloquia (TOC)“.

The OII Colloquia bring senior speakers from other departments at the University of Oxford to the Oxford Internet Institute to spark conversation around the Internet and society.

The word Colloquia (sing.: Colloquium) comes from the Latin word “Colloquy” meaning “Conversation”. Today, we often use the term to describe departmental seminars with a general topic and audience. 

https-%2f%2fcdn-evbuc-com%2fimages%2f26124578%2f154856160921%2f1%2foriginalThe OII Colloquia, however, come closer to the original sense of the word: through this series of events we aim to initiate conversations and strengthen our ties with scholars at other departments of the University of Oxford, around topics of shared interest. They should be considered as a trigger for long-lasting collaborations between the OII and the speakers’ own departments.

TOC are held twice a term (weeks 2 and 7) on Thursdays from 17:15 to 18:45 in an interactive and stimulating environment at the Oxford Internet Institute, 1 St Giles OX1-3JS open to the public (upon registration).

New Paper: Personal Clashes and Status in Wikipedia Edit Wars

screen-shot-2016-11-17-at-17-26-46

Originally posted on HUMANE blog by Milena Tsvetkova.

Our study on disagreement in Wikipedia was just published in Scientific Reports (impact factor 5.2). In this study, we find that disagreement and conflict in Wikipedia follow specific patterns. We use complex network methods to identify three kinds of typical negative interactions: an editor confronts another editor repeatedly, an editor confronts back an equally experienced attacker, and less experienced editors confront someone else’s attacker.

Disagreement and conflict are a fact of social life but we do not like to disclose publicly whom we dislike. This poses a challenge for scientists, as we rarely have records of negative social interactions.

To circumvent this problem, we investigate when and with whom Wikipedia users edit articles. We analyze more than 4.6 million edits in 13 different language editions of Wikipedia in the period 2001-2011. We identify when an editor undoes the contribution by another editor and created a network of these “reverts”.

A revert may be intended to improve the content in the article but may also indicate a negative social interaction among the editors involved. To see if the latter is the case, we analyze how often and how fast pairs of reverts occur compared to a null model. The null model removes any individual patterns of activity but preserves important characteristics of the community. It preserves the community structure centered around articles and topics and the natural irregularity of activity due to editors being in the same time zone or due to the occurrence of news-worthy events.

temporal_motif
Using this method, we discover that certain interactions occur more often and during shorter time intervals than one would expect from the null model. We find that Wikipedia editors systematically revert the same person, revert back their reverter, and come to defend a reverted editor beyond what would be needed just to improve and maintain the encyclopedia objectively. In addition, we analyze the editors’ status and seniority as measured by the number of article edits they have completed. This reveals that editors with equal status are more likely to respond to reverts and lower-status editors are more likely to revert someone else’s reverter, presumably to make friends and gain some social capital.

We conclude that the discovered interactions demonstrate that social processes interfere with how knowledge is negotiated. Large-scale collaboration by volunteers online provides much of the information we obtain and the software products we use today. The repeated interactions of these volunteers give rise to communities with shared identity and practice. But the social interactions in these communities can in turn affect knowledge production. Such interferences may induce biases and subjectivities into the information we rely on.

Biases in Online Attention; Whose life matters more

This has become a common knowledge that certain lives matter more, when it comes to media coverage and public attention to natural or manmade disasters. Among many papers and articles that report on such biases, my favourite is this one by William C. Adams, titled “Whose Lives Count?”, and dated back to 1986. In this paper, it’s been reported, that for example, an Italian life matters to the American TV’s as much as some 200 Indonesians lives.

3656

The Mh17 crash site in eastern Ukraine. Analysis of Wikipedia found that its article about the crash was the most read across all the aircraft incidents reported in Wikipedia. Photo by Robert Ghement/EPA.

We also studied such biases in online attention and in relation to aircraft crashes. Our paper, recently published in the Royal Society Open Science, reports that for example, a North American life matters almost 50 times more than an African life to the pool of Wikipedia readers.

The paper has received great media attention, and made it to Science and the Guardian.

The abstract of the paper reads

The Internet not only has changed the dynamics of our collective attention but also through the transactional log of online activities, provides us with the opportunity to study attention dynamics at scale. In this paper, we particularly study attention to aircraft incidents and accidents using Wikipedia transactional data in two different language editions, English and Spanish. We study both the editorial activities on and the viewership of the articles about airline crashes. We analyse how the level of attention is influenced by different parameters such as number of deaths, airline region, and event locale and date. We find evidence that the attention given by Wikipedia editors to pre-Wikipedia aircraft incidents and accidents depends on the region of the airline for both English and Spanish editions. North American airline companies receive more prompt coverage in English Wikipedia. We also observe that the attention given by Wikipedia visitors is influenced by the airline region but only for events with a high number of deaths. Finally we show that the rate and time span of the decay of attention is independent of the number of deaths and a fast decay within about a week seems to be universal. We discuss the implications of these findings in the context of attention bias.

and the full paper is available here.

Understanding voters’ information seeking behaviour

Jonathan and I recently published a paper titledWikipedia traffic data and electoral prediction: towards theoretically informed models in EPJ Data Science.

In this article we examine the possibility of predicting election results by analysing Wikipedia traffic going to different articles related to the parties involved in the election.

Unlike similar work in which socially generated online data is used in an automated learning system to predict the electoral results, without much understanding of mechanisms, here we try to provide a theoretical understanding of voters’ information seeking behaviour around election time and use that understanding to make predictions.

fig1_fig1

Left panel shows the normalized daily views of the article on the European Parliament Election, 2009 in different langue editions of Wikipedia. The right panel shows the relative change between 2009 and 2014 election turnout in each country vs the relative change in the page view counts of the election article in the corresponding Wikipedia language edition. Germany and Czech Republic are marked as outliers from the general trend.

We test our model on a variety of countries in the 2009 and 2014 European Parliament elections. We show that Wikipedia offers good information about changes in overall turnout at elections and also about changes in vote share for parties. It gives a particularly strong signal for new parties which are emerging to prominence.

We use these results to enhance existing theories about the drivers of aggregate patterns in online information seeking, by suggesting that:

voters are cognitive misers who seek information only when considering changing their vote.

This shows the importance of informal online information in forming the opinions of swing voters, and emphasizes the need for serious consideration of the potentials of systems like Wikipedia by parties, campaign organizers, and institutions which regulate elections.

Read more here.