We frequently get questions about how we do our research. Our most important research and methods decisions are detailed in our data memos, working papers, and peer reviewed published research. But here are some straightforward answers to the most common questions we receive.
What is the difference between a data memo, a working paper and a formally published piece of research?
Data memos are designed to present quick snapshots of analysis on current events in a short format. This approach allows us to conduct real-time analysis during pivotal moments of public and political life, and to shed light on attempted incidents of computational propaganda. Often the project publishes data memos in a series, constantly adding new details to an ongoing political event or to report new findings. We build on our initial memos and rigorous analysis to inform our peer reviewed pieces.
Is your research reviewed by your peers?
Yes. The vast majority of our work has been funded by the world’s largest, most prestigious public science agencies, including the National Science Foundation and European Research Council. These agencies coordinate extensive double-blind peer review committees that review our research design and methodological approaches. In addition, our academic writing gets regularly critiqued and reviewed. We have published our research in book chapters, edited books, conference proceedings, peer reviewed articles, and frequently present our work at academic conferences. Each type of publication has different kinds of editorial review, blind review, double blind review, and unblinded review.
Why do you publish data memos in “real time”?
These days, many of us believe that we have a responsibility to contribute to public life, and especially so during critical moments such as elections and referenda. For our real-time data memos and working papers we take advantage of scholarly networks to have our research reviewed informally before we publish it on our webpage. For our election observatory data memos, we typically receive feedback from researchers with academic appointments in communication, information science, and computer science departments. We also seek feedback from country-specific experts.
How has your research design changed over time?
The science of computational propaganda and misinformation has changed over time. From the inception of the project in 2014 to 2016, our research mainly focused originating novel methodological approaches to study social media manipulation, theorizing new phenomena in the relationship to computational propaganda, and analyzing patterns of malicious behavior on social media during critical moments of public life in real-time. In the aftermath of digital interference in the US Presidential Elections 2016 to 2018, our research matured to study a multitude of phenomena related to junk news, manipulative cyber operations, and computational propaganda around the globe including Asia, Europe, and the Americas. More recently, our research has examined the malicious use of social media on new or difficult to study platforms including Instagram, Facebook, WhatsApp, as well as AI-generated fakes, and regulatory responses to computational propaganda. As a research team we want to inform evidence-based public discourse and we have developed guidelines and resources for civil society, parties, policymakers, and platforms.
How do you select your hashtags when you study political communication on Twitter?
In order to get a large and relevant sample of social media data, tweets are collected by following particular hashtags identified by our research team as being actively used during a political event. Our researchers rely on platforms tools and their own in-country expertise to identify hashtags. Each team of experts is assembled for their knowledge of a country’s political culture, language and familiarity with the issues being studied. In every data collection, we test our initial set of hashtags by collecting test data sets and analyzing the co-occurrence of hashtags. In this iterative process of pre-tests and sub-samples, we can identify important hashtags that are missing from our initial list and expand. We include a full list of all of our hashtags in every publication or data supplement.
Which APIs do you use for collecting your Twitter data?
Most of our analyses are based on data from Twitter’s free and public Streaming API. This allows us to archive traffic around a set of hashtags associated with a political event in near real time. We use other Twitter APIs as well: for example, we use the Search API to collect more information about suspicious accounts, lookup the timelines of users, and collect post and user metadata. We have also bought data from GNIP, Twitter’s in-house data broker. Each of these methods have their own limitations in the form of data caps, poor sampling documentation from Twitter, or lost fidelity in conversations where content has been deleted or accounts have been suspended.
Do you capture and analyze content that gets removed later by Twitter?
Yes. We capture content via the Streaming API that gets removed later by either Twitter or by the accounts themselves later. Only Twitter knows how much of this content has been removed, and under what circumstances.
The sample periods for your studies differ, and sometimes includes data captured after voting day. Why?
We usually sample a few extra days of social media traffic so that we can understand the full arc of how junk news and social media algorithms impact public life. Sometimes the accounts we track continue to produce content even after the close of the polls. For example, our work has documented instances where accounts disseminated messages designed to sow distrust about the integrity of an election.
How do you decide what is junk news?
Our typology of junk news is grounded in close examination of the content being shared in each sample. A detailed methods section on our grounded typology is available in our most recent peer reviewed publication here and here. Junk news content includes various forms of propaganda and ideologically extreme, hyper-partisan, or conspiratorial political news and information. To be classified as junk news content, the source must fulfil at least three of these five criteria:
- Professionalism: These outlets do not employ standards and best practices of professional journalism. They refrain from providing clear information about real authors, editors, publishers, and owners. They lack transparency and accountability and do not publish corrections of debunked information.
- Style: These sources use emotionally driven language that includes emotive expressions, hyperbole, ad hominem attacks, misleading headlines, excessive capitalization, unsafe generalizations and logical fallacies, moving images, and lots of pictures and mobilizing memes.
- Credibility: These outlets rely on false information and conspiracy theories, which they often employ strategically. They report without consulting multiple sources and do not fact-check. Sources are often untrustworthy and standards of production lack reliability.
- Bias: Reporting by these outlets is highly biased, ideologically skewed, or hyper-partisan, and news reporting frequently includes strongly opinionated commentary.
- Counterfeit: These sources mimic established news reporting. They counterfeit fonts, branding, and stylistic content strategies. This category also includes commentary disguised as news, with references to news agencies and credible sources, and headlines are written in a news tone with date, time, and location stamps.
How do you train coders?
All of our coders are experts in the political context the analysis is focusing on. They are native speakers or proficient in the language spoken in the country context analyzed, are highly knowledgeable about the media landscape, and are deeply familiar with the political debates, figures and contexts shaping the social media conversation. Training workshops are conducted by researchers from the project’s core team and take place over several weeks. Coders are required to achieve an intercoder reliability score of Krippendorf’s alpha = > 0.8 signaling good concept formation and high adeptness to our method.
How do you identify amplifier accounts?
We describe amplifier accounts as accounts that deliberately seek to increase the volume of traffic or the attention being paid to particular messages. These accounts include automated, semi-automated and highly active human-curated accounts on social media. We define amplifier accounts as those that post 50 times a day or more on one of the selected hashtags. This detection methodology falls short of capturing amplifier accounts that are tweeting at lower frequencies. Despite the simplicity of our metric, more complex methods using machine learning yield comparative numbers of false positives and remain contested in the field of computational social science. On the contrary, we have identified very few human users that tweet more than 49.5 times average per day. More detailed empirical analysis of amplifier accounts is available in our latest peer reviewed publications here and here.
Transparency & Replicability
What do you do to be transparent about your research process?
Since we seek more transparency from social media platforms and the political actors who abuse us all, we take extra care to be exceptionally transparent about our work.
- Our research gets reviewed by other academics at the planning stage, as we implement the research plan, and as we write up our findings.
- We put full descriptions of our grant awards on our project website.
- We disclose all funders on our website and in individual publications.
- We only conduct research that has been reviewed and approved by our university ethics board.
- All our publications explain our typologies and data collection methods.
- We abide by the conventions of scholarly citation, acknowledgement, and peer review.
- We adapt our methods when scholarly feedback or research from other teams shows a better way forward.
- We actively maintain and protect an encrypted archive of data sets, recorded interviews, and archival materials so that we can revisit findings, reconstruct events, and preserve material for future research.
- We get data sets online as soon as we can so that others may explore the samples.
- We share findings with technology companies at the same time that we share findings with journalists.
- We respond to methodology queries, alert technology companies on suspicious behavior on their platforms, and advise them on doing better detection of bots and fake users.
- We respond to reasonable inquiries from journalists, policy makers and the interested public in a timely manner.
Do you publish replication data?
We publish open access replication data from all of our studies.
Our ultimate goal is research excellence, and along the way we can also advance public discussion about the causes and consequences of computational propaganda. We are happy to respond to questions and look forward to your feedback.
Working with Journalists
Why do you work with journalists, and use news sources in your analysis?
It takes many kinds of investigators, using many tools, to expose and battle disinformation. And the social science of fake news, disinformation, and misinformation has evolved a lot since we started working in this domain and producing our annual inventories of information operations. Because it is so hard to understand secret, hidden activities, and to do so at a global scale, it is best to explore them with a wide range of methods.
Evidence about disinformation campaigns is hard to capture. And of course, journalists are investigators too, with some of the best investigations of computational propaganda coming from professional journalists working on focused news stories in times of crisis. So, if we want to understand the global trends it makes sense to collaborate with journalists, both to improve the quality of their reporting and to aggregate their knowledge and experience in systematic ways.
News reporting about disinformation is on the rise because disinformation is on the rise and because reporters are learning about what to investigate. Moreover, how the rest of the world experiences disinformation is changing rapidly, and we need to look at cases around the world. To tackle disinformation as a problem, we need to think globally. And one way to get that global perspective is to look at the evidence from the world’s hardworking investigative journalists.
Patterns in news reporting are very interesting. For example, a report such as the 2020 cybertroops inventory offers one way to understand the trends by comparing the evidence across many countries. We have straightforward commitments to rigour and explain the limitations of the research in the report itself. The working paper is “in no way is intended to provide a complete picture of how state actors are operating in this space,” but we can “begin to build a bigger picture by piecing together public information.” We think it is critical to be methodologically diverse because the causes and consequences of disinformation are complex and multifaceted. With a focussed analysis such as cybertroops, it is best to be conservative with our estimates and stay close to the evidence we do have. The tools for making sense of written materials such as news reports—including meta-analysis and content analysis—help us parse information from many sources. Comparative case analysis helps researchers understand and compare evidence from different countries.
What methodology do you use to analyse media reporting?
There are well established, powerful methods for understanding big picture trends. To produce high quality, generalizable observations on the dark industry behind disinformation, the research must be systematic to be credible. Investigative reporting on disinformation has provided a trove of evidence, and social science methods let us get a handle on the global scope of the problem. This takes nothing away from the value and importance of also doing case studies and historical work, but these, like our work, cannot provide a complete picture on their own.
As a group of academics, we take the scientific method seriously, and we involve colleagues in reviewing our work—and sometimes we turn to professional journalists as experts. Each year there are dozens of new reviewers, fresh incidents to investigate, and new languages covered. At this point the archives behind a study such as our annual cybertroops report includes thousands of incident reports and citations, complete with scores for the quality of sources.
We often disseminate our scholarship as articles in major peer-review journals and books by top university presses, and we cherish the awards we get for good work. When we get a solid critique, we restart, revise, resubmit or abandon a project. If a reviewer cannot provide evidence for their claims, we stick to the evidence that does exist. And when we disseminate our research, readers can easily spot the caveats, limitations, and boundaries of our findings because we identify them ourselves in each study. If we proceed with dissemination, it is because we have found that any methodological imitations are not fatal to the main arguments.
All honest researchers face methodological challenges. The data on disinformation is messy, incomplete, and challenging to collect. But it would be foolish to turn our heads from high quality data or collaborations with other kinds of independent investigators such as journalists.
The social science of disinformation takes time and effort, and the data sets are messy and incomplete. It takes many of us, using many techniques, to bring home the punchline: disinformation is a large, global problem, that can still be fixed. Public policy makers, civil society leaders, and the tech industry itself needs to know and understand the big picture trends as we all discuss interventions. We are among the many different research teams trying to bring data and analysis to those discussions. Doing interventions without evidence is a dangerous strategy.