Cybercrime

Researchers Find Thousands of Twitter Amplification Bots in Just One Day

7,000 Twitter Amplification Bots Found in One Day’s Search

December 11, 2018

7,000 Twitter Amplification Bots Found in One Day’s Search

Researchers have examined Twitter looking for what are known as amplification bots. These accounts serve no purpose other than to amplify confidence in the content of a tweet and/or confidence in the tweeter. At one level of sophistication they can be used influence public opinion on specific topics. At another level they can be used to increase followers for individual accounts. And in between they can be used by spammers, scammers and phishers.

The first step in discovering the extent of amplification bots is to develop an automated method for recognizing such a bot. Duo Security researchers Jordan Wright and Olabode Anise started with a dataset of 576 million tweets and needed to be able to distinguish between normal twitter behavior and abnormal amplification behavior.

They examined an easy observation. The majority of tweets receive more likes than retweets. “It is reasonable to expect that the number of likes for a particular tweet would be higher than the number of retweets, since liking a tweet is a lower-impact action,” write the researchers.

A ‘like’ tells the author that the content of the tweet is appreciated, but does not post the tweet to the liker’s own followers. Likes are consequently well-used by people, but of little value for amplification by bot. The first task was to test this assumption.

The researchers then filtered out those tweets with less than 50 retweets. The purpose was to limit distortion from the large number of tweets with few retweets: one like and one retweet from the same follower would skew the ratio of RT/Like elsewhere.

They found “that half of the tweets in our dataset have nearly a 2:1 ratio of likes vs. retweets, while 80 percent of the tweets have at least more likes than retweets (greater than 1:1 ratio).” The rest have a much greater ratio of retweets to like.

In one example, they found a tweet that had 969 retweets and just 164 likes — a massive reversal of the normal ratio of likes to retweets. “To put some numbers to how rare this is,” comment the researchers, “only 0.2 percent of tweets in our dataset had more than at least 900 retweets and a similar retweet-to-like-ratio.”

Advertisement. Scroll to continue reading.

This is almost certainly a bot account. Examining the timeline of the bot’s account provides further clues — it didn’t seem to have authored a single original tweet, but contained many other retweets with a similarly high retweet to like ratio.

The researchers also examined the time-distribution of retweets for suspected bots. The assumption is that a genuine account’s timeline will show tweets in basic chronological order, while an amplification bot’s timeline would be more scattered. To test and measure this, the researchers used the inversion count (a mathematical measure of deviation from the natural order) on a genuine account, and the amplification bot already determined.

A genuine account should show a lower inversion number than an amplification account. “The inversion count for [the genuine account] timeline is 63, while the inversion count is 2028 for the amplification bot,” they found.

Armed with these three clues to the likelihood of an amplification bot, the researchers developed a script to search through their twitter dataset. They had three criteria: at least 90% of tweets should be retweets; at least one-third of their retweets should have been amplified; and the inversion count on their timeline should be greater than 100.

Running this search script over the dataset for just one day found 7,000 amplification bots. The determining criteria were consciously set high to avoid false positives — so the true figure is likely to be higher. Had the search been conducted over a longer period, then it would inevitably have found more. And finally, although the dataset comprised 576 million tweets, that is only about one day’s-worth of all tweets

This research does not intend to estimate the total number of amplification bots on Twitter at any time — but it clearly shows that it is a vast number that cannot be controlled by Twitter’s own processes. It tells us a number of things. Twitter itself cannot keep up with the formation and use of amplification bots, so it is a fair assumption that everyone will sooner or later receive a tweet that looks to be popular but is not necessarily so. If the amplification is for malicious purposes, it may include a link that appears to be safe by virtue of the number of retweets it has received.

Since URL reputation lists cannot keep up with the generation and use of malicious URLs, there is no guarantee that either Twitter’s own filters or a company’s purchased filters will know that it is malicious. The temptation exists for the user to click a shortened URL in an interesting tweet that has been apparently verified by a large (falsely amplified) number of retweets.

Such is the speed and volume of Twitter that, short of banning its use, there is little that companies can do to prevent this problem. The best solution — as so often happens — is user awareness. This research demonstrates how users can quickly recognize a suspected amplification tweet: a high ratio between retweets and likes, and retweeters’ timelines that have a high proportion of retweets to original tweets that are visibly not in chronological order are the most immediate indicators.

Written By Kevin Townsend

Kevin Townsend is a Senior Contributor at SecurityWeek. He has been writing about high tech issues since before the birth of Microsoft. For the last 15 years he has specialized in information security; and has had many thousands of articles published in dozens of different magazines – from The Times and the Financial Times to current and long-gone computer magazines.

Latest News

Click to comment

CIEM Chat: How to Reduce Cloud Identity Risk

March 26, 2024

Join the session as we discuss the challenges and best practices for cybersecurity leaders managing cloud identities.

Virtual Event: Ransomware Resilience & Recovery Summit

April 17, 2024

SecurityWeek’s Ransomware Resilience and Recovery Summit helps businesses to plan, prepare, and recover from a ransomware incident.

SD-WAN: Don’t Build a Dead End, Prepare for Future-Proof Secure Networking

SD-WAN must be scalable, stable, secure, and fully operational to serve as a strong base for seamless modernization and progression to SASE. (Etay Maor)

You Against the World: The Offenders Dilemma

Foreign attackers have many more toolsets at their disposal, so we need to make sure we’re selective about our modeling, preparation and how we assess and fortify ourselves. (Tom Eston)

Why Intelligence Sharing Is Vital to Building a Robust Collective Cyber Defense Program

With automated, detailed, contextualized threat intelligence, organizations can better anticipate malicious activity and utilize intelligence to speed detection around proven attacks. (Marc Solomon)

Know Your Audience When Speaking to Security Practitioners

How can security practitioners make sense of the vendor landscape and separate those who talk a good game from those who can execute, perform, and solve real problems for enterprises? (Joshua Goldfarb)

Cybersecurity Mesh: Overcoming Data Security Overload

A significant cybersecurity challenge arises from managing the immense volume of data generated by numerous IT security tools, leading organizations into a reactive rather than proactive approach. (Torsten George)

Cybercrime

Comodo Forums Hacked via Recently Disclosed vBulletin Vulnerability

A recently disclosed vBulletin vulnerability, which had a zero-day status for roughly two days last week, was exploited in a hacker attack targeting the...

Eduard KovacsOctober 1, 2019