Twitter reports that less than 5% of accounts are fake or spammers, commonly referred to as “bots”. Since his Twitter takeover offer was accepted, Elon Musk has repeatedly questioned these estimates, even dismissing Public response from CEO Parag Agrawal.
Later, Musk put the deal on hold and demanded more proof.
So why are people arguing over the percentage of bot accounts on Twitter?
As the creators of Botometer, a widely used bot detection tool, our group at the Indiana University Social Media Observatory has been studying inauthentic accounts and manipulation on social media for over a decade. We brought the concept of the “social bot” to the fore and first estimated its prevalence on Twitter in 2017.
Based on our knowledge and experience, we believe that estimating the percentage of bots on Twitter has become a very difficult task, and debating the accuracy of the estimate may be missing the point. Here’s why.
What, exactly, is a bot?
To measure the prevalence of problematic Twitter accounts, a clear definition of targets is needed. Common terms like “fake accounts”, “spam accounts” and “bots” are used interchangeably but have different meanings. Fake or fake accounts are those that impersonate people. Accounts that mass produce unsolicited promotional content are defined as spammers. Bots, on the other hand, are accounts controlled in part by software; they can post content or perform simple interactions, such as retweeting, automatically.
These types of accounts often overlap. For example, you can create a bot that impersonates a human to automatically post spam. This account is simultaneously a bot, a spammer and a scam. But not every fake account is a bot or a spammer and vice versa. Arriving at an estimate without a clear definition only produces misleading results.
Defining and distinguishing account types can also inform appropriate interventions. Fake and spam accounts degrade the online environment and violate platform policy. Malicious bots are used to spread misinformation, increase popularity, exacerbate conflict through negative and inflammatory content, manipulate opinions, influence elections, conduct financial fraud, and disrupt communication. However, some bots can be harmless or even useful, for example, helping to spread news, providing disaster alerts, and conducting research.
Simply banning all bots is not in the interests of social media users.
To put it simply, researchers use the term “inauthentic accounts” to refer to the collection of fake accounts, spammers, and malicious bots. This is also the definition that Twitter seems to be using. However, it’s unclear what Musk has in mind.
hard to count
Even when consensus is reached on a definition, there are still technical challenges in estimating prevalence.
External researchers do not have access to the same data as Twitter, such as IP addresses and phone numbers. This hampers the public’s ability to identify inauthentic accounts. But even Twitter recognizes that the actual number of inauthentic accounts may be higher than estimated, because detection is a challenge.
Inauthentic accounts evolve and develop new tactics to avoid detection. For example, some fake accounts use AI generated faces as your profiles. These faces can be indistinguishable from real ones, even for humans. Identifying these accounts is difficult and requires new technologies.
Another difficulty is presented by coordinated accounts that appear to be individually normal, but act so similarly to each other that they are almost certainly controlled by a single entity. However, they are like needles in the haystack of hundreds of millions of daily tweets.
Finally, inauthentic accounts can evade detection through techniques such as swapping identifiers or automatically posting and deleting large volumes of content.
The distinction between inauthentic and genuine accounts is increasingly blurred. Accounts can be hacked, bought or rented, and some users “donate” their credentials to organizations that post on their behalf. As a result, so-called “cyborg” accounts are controlled by both algorithms and humans. Likewise, spammers sometimes post legitimate content to hide their activity.
We observed a wide spectrum of behaviors mixing the characteristics of bots and people. Estimating the prevalence of inauthentic accounts requires the application of a simplistic binary classification: authentic or inauthentic account. No matter where the line is drawn, mistakes are inevitable.
missing the big picture
The focus of the recent debate over estimating the number of Twitter bots oversimplifies the issue and misses the point of quantifying the harm of online abuse and manipulation by inauthentic accounts.
Recent evidence suggests that inauthentic accounts may not be the only culprits responsible for the spread of disinformation, hate speech, polarization and radicalization. These issues often involve a lot of human users. For example, our analysis shows that misinformation about COVID-19 was spread openly on Twitter and Facebook by high-profile, verified accounts. Through BotAmp, a new tool in the Botometer family that anyone with a Twitter account can use, we found that the presence of automated activity is not evenly distributed. For example, the cryptocurrency discussion tends to show more bot activity than the cat discussion. So whether the overall prevalence is 5% or 20% makes little difference to individual users; their experiences with these accounts depend on who they follow and the topics they care about.
Even if it were possible to accurately estimate the prevalence of inauthentic accounts, it would do little to solve these problems. A significant first step would be to recognize the complex nature of these issues. This will help social media platforms and policymakers to develop meaningful responses.
Article by Kai-Cheng Yang, PhD Student in Informatics, Indiana University and Filippo Menczer, Professor of Informatics and Computer Science, Indiana University
This article is republished from The Conversation under a Creative Commons license. Read the original article.