Semantic Network Analysis of Chinese Social Connection (“Guanxi”) on Twitter

About two months ago, a paper of ours with the above title appeared on Frontiers in Digital Humanities (Big Data).

This paper has emerged from my former MSc student at the Oxford Internet Institute, Pu Yan, who is currently working on her PhD in our department.

In this paper we combined a network analysis tool with computational linguistic methods to understand the differences in the ways that Guanxi is conceptualized in two different Chinese cultures (Mainland vs Taiwan, Hong Kong, and Macau).

What I like about this paper is the discussion of the results rather than anything else. Pu, with her great domain knowledge, interprets the results in a very insightful way.

The paper is available here and the abstract says:

Guanxi, roughly translated as “social connection,” is a term commonly used in the Chinese language. In this study, we employed a linguistic approach to explore popular discourses on guanxi. Although sharing the same Confucian roots, Chinese communities inside and outside Mainland China have undergone different historical trajectories. Hence, we took a comparative approach to examine guanxi in Mainland China and in Taiwan, Hong Kong, and Macau (TW-HK-M). Comparing guanxi discourses in two Chinese societies aim at revealing the divergence of guanxi culture. The data for this research were collected on Twitter over a three-week period by searching tweets containing guanxi written in simplified Chinese characters (关系) and in traditional Chinese characters (關係). After building, visualizing, and conducting community detection on both semantic networks, two guanxi discourses were then compared in terms of their major concept sub-communities. This study aims at addressing two questions: Has the meaning of guanxi transformed in contemporary Chinese societies? And how do different socio-economic configurations affect the practice of guanxi? Results suggest that guanxi in interpersonal relationships has adapted to a new family structure in both Chinese societies. In addition, the practice of guanxi in business varies in Mainland China and in TW-HK-M. Furthermore, an extended domain was identified where guanxi is used in a macro-level discussion of state relations. Network representations of the guanxi discourses enabled reification of the concept and shed lights on the understanding of social connections and social orders in contemporary China.

Screen Shot 2017-08-22 at 19.09.02



What’s the state of the art in understanding Human-Machine Networks?

About a month ago, we finished our 2-year long EC-Horizon2020 project on Human-Machine Networks (HUMANE). The first task of this project was to perform a systematic literature review to see what the state of the art in understanding such systems is.

The short answer is that we do not know much! And what we know is not very cohesive. In other words, design, development, and exploration of human-machine systems have been done mostly through trial and error and there has not been much theory or systematic thinking involved.

We wrote a review paper to report on our systematic exploration of the literature. It took us nearly 18 months to finally get the paper published, but it was worth every second waiting as we managed to get it out at the ACM Computing Survey, which has the highest impact factor among all the journals in Computer Science.

Here you can read the paper.

And the abstract says:

In the current hyperconnected era, modern Information and Communication Technology (ICT) systems form sophisticated networks where not only do people interact with other people, but also machines take an increasingly visible and participatory role. Such Human-Machine Networks (HMNs) are embedded in the daily lives of people, both for personal and professional use. They can have a significant impact by producing synergy and innovations. The challenge in designing successful HMNs is that they cannot be developed and implemented in the same manner as networks of machines nodes alone, or following a wholly human-centric view of the network. The problem requires an interdisciplinary approach. Here, we review current research of relevance to HMNs across many disciplines. Extending the previous theoretical concepts of socio-technical systems, actor-network theory, cyber-physical-social systems, and social machines, we concentrate on the interactions among humans and between humans and machines. We identify eight types of HMNs: public-resource computing, crowdsourcing, web search engines, crowdsensing, online markets, social media, multiplayer online games and virtual worlds, and mass collaboration. We systematically select literature on each of these types and review it with a focus on implications for designing HMNs. Moreover, we discuss risks associated with HMNs and identify emerging design and development trends.

Screen Shot 2017-08-22 at 18.47.14.png


Collective Memory in the Digital Age

We finished our project on Collective Memory in the Digital Age: Understanding “Forgetting” on the Internet last summer, but our last paper just came out on Science Advances last week.

The paper, titled “The memory remains: Understanding collective memory in the digital age” presents the results of our study on collective memory patterns based on Wikipedia viewership data of articles related to aviation accidents and incidents.

Combined with our previous paper on Dynamics and biases of online attention, published last year, we mainly claim two things:

Our short-term collective memory is really short; shorter than a week, and it’s biased, and our long-term memory is pretty long, about 45 years, also biased, nevertheless modellable!  And the Internet plays important roles in both observations and also helps us to quantify and study these patterns.

Of course, we have reported few other facts and observations related to our collective memory, but the main message was that.

We report that the most important factor in memory triggering patterns is the original impact of the past event measured by its average daily page views before the recent event occurred. That means that some past events are intrinsically more memorable and our memory of them are more easily triggered. Examples of such events are the crashes related to the 9/11 terrorist attacks.

Time separation between the two events also plays an important role. The closer in time the two events are, the stronger coupling between them; and when the time separation exceeds 45 years, it becomes very unlikely that the recent event triggers any memory of the past event.

The similarity between the two events has turned out to be another important factor; This happens in the case of the Iran Air flight 655 shot down by a US navy guided missile in 1988, which was not generally well remembered but far more attention was paid to it when the Malaysia Airlines 17 flight was hit by a missile over Ukraine in 2014.

3 - press_fig-1

Page-view statistics of three recent flights (2015) and their effects on the page-views of past events from 2014, and events from 1995 to 2000. The recent events cause an increase in the viewership of some of the past events. 

Read the article here, the abstract says:

Recently developed information communication technologies, particularly the Internet, have affected how we, both as individuals and as a society, create, store, and recall information. The Internet also provides us with a great opportunity to study memory using transactional large-scale data in a quantitative framework similar to the practice in natural sciences. We make use of online data by analyzing viewership statistics of Wikipedia articles on aircraft crashes. We study the relation between recent events and past events and particularly focus on understanding memory-triggering patterns. We devise a quantitative model that explains the flow of viewership from a current event to past events based on similarity in time, geography, topic, and the hyperlink structure of Wikipedia articles. We show that, on average, the secondary flow of attention to past events generated by these remembering processes is larger than the primary attention flow to the current event. We report these previously unknown cascading effects.


The interplay between extremism and communication in a collaborative project

Collaboration is among the most fundamental social behaviours.  The Internet and particularly the Web have been originally developed to foster large scale collaboration among scientists and technicians. The more recent emergence of Web 2.0 and ubiquity of user-generated content on social web, has provided us with even more potentials and capacities for large scale collaborative projects. Projects such as Wikipedia, Zooniverse, Foldit, etc are only few examples of such collective actions for public good.

Despite the central role of collaboration in development of our societies, data-driven studies and computational approaches to understand mechanisms and to test policies are rare.

In a recent paper titled “Understanding and coping with extremism in an online collaborative environment: A data-driven modeling” that is published in PLoS ONE, we use an agent-based modelling  framework to study opinion dynamics and collaboration in Wikipedia.

Our model is very simple and minimalistic and therefore the results can be generalized to other examples of large scale collaboration rather easily.

We particularly focus on the role of extreme opinions, direct communication between agents, and punishing policies that can be implemented in order to facilitate a faster consensus.

The results are rather surprising! In the abstract of the paper we say:

… Using a model of common value production, we show that the consensus can only be reached if groups with extreme views can actively take part in the discussion and if their views are also represented in the common outcome, at least temporarily. We show that banning problematic editors mostly hinders the consensus as it delays discussion and thus the whole consensus building process. We also consider the role of direct communication between editors both in the model and in Wikipedia data (by analyzing the Wikipedia talk pages). While the model suggests that in certain conditions there is an optimal rate of “talking” vs “editing”, it correctly predicts that in the current settings of Wikipedia, more activity in talk pages is associated with more controversy.

Read the whole paper here!


This diagram shows the time to reach consensus (colour-coded) as a function of relative size of the extreme opinion groups (RoE) and the rate of direct communication between agents (r) in four different scenarios. 


Using Twitter data to study politics? Fine, but be careful!

The role of social media in shaping the new politics is undeniable. Therefore the volume of research on this topic, relying on the data that are produced by the same technologies, is ever increasing. And let’s be honest, when we say “social media” data, almost always we mean Twitter data!

Twitter is arguably the most studied and used source of data in the new field of Computational Political Science, even though in many countries Twitter is not the main player. But we all know why we use Twitter data in our studies and not for instance data mined from Facebook: Twitter data are (almost) publicly available whereas it’s (almost) impossible to collect any useful data from Facebook.

That is understandable. However, there are numerous issues with studies that are entirely relying on Twitter data.

In a mini-review paper titled “A Biased Review of Biases in Twitter Studies on Political Collective Action“, we discussed some of these issues. Only some of them and not all, and that’s why we called our paper a “biased review”.

The reason that I’m reminding you of the paper now is mostly the new surge of research on “politics and Twitter” in relation to the recent events in the UK, US, and the forthcoming elections in European countries this summer.

Here is the abstract:

In recent years researchers have gravitated to Twitter and other social media platforms as fertile ground for empirical analysis of social phenomena. Social media provides researchers access to trace data of interactions and discourse that once went unrecorded in the offline world. Researchers have sought to use these data to explain social phenomena both particular to social media and applicable to the broader social world. This paper offers a minireview of Twitter-based research on political crowd behavior. This literature offers insight into particular social phenomena on Twitter, but often fails to use standardized methods that permit interpretation beyond individual studies. Read more….


Social Media: an illustration of overestimating the relevance of social media to social events from XKCD. Available online at

Even good bots fight and a typology of Internet bots

Our new paper titled “Even good bots fight: The case of Wikipedia” has finally appeared on PLOS One.

There are two things that I particularly find worth-highlighting about this work. First, this is the first time that someone looks at an ecosystem of the Internet bots at scale using hard data and tries to come up with a typology of the Internet bots (see the figure). And second, the arrangement of our team that is a good example of multidisciplinary research in action: Milena Tsvetkova, the lead author is a sociologist by training. Ruth Garcia is a computer engineer, Luciano Floridi is a professor of Philosophy, and I have a PhD in physics.

If you find the paper too long, have a look at the University of Oxford press release, or the one by the Alan Turing Institute, where both Luciano and I are Faculty Fellows.

Among many media coverages of our work, I think the one in The Guardian is the closest to ideal.


A first typology of the Internet bots. See the source.


The OII Colloquia

I am very happy to announce our new series of seminars at the Oxford Internet Institute (OII), called “The OII Colloquia (TOC)“.

The OII Colloquia bring senior speakers from other departments at the University of Oxford to the Oxford Internet Institute to spark conversation around the Internet and society.

The word Colloquia (sing.: Colloquium) comes from the Latin word “Colloquy” meaning “Conversation”. Today, we often use the term to describe departmental seminars with a general topic and audience. 

https-%2f%2fcdn-evbuc-com%2fimages%2f26124578%2f154856160921%2f1%2foriginalThe OII Colloquia, however, come closer to the original sense of the word: through this series of events we aim to initiate conversations and strengthen our ties with scholars at other departments of the University of Oxford, around topics of shared interest. They should be considered as a trigger for long-lasting collaborations between the OII and the speakers’ own departments.

TOC are held twice a term (weeks 2 and 7) on Thursdays from 17:15 to 18:45 in an interactive and stimulating environment at the Oxford Internet Institute, 1 St Giles OX1-3JS open to the public (upon registration).