7,000 Twitter Amplification Bots Found in One Day’s Search
Researchers have examined Twitter looking for what are known as amplification bots. These accounts serve no purpose other than to amplify confidence in the content of a tweet and/or confidence in the tweeter. At one level of sophistication they can be used influence public opinion on specific topics. At another level they can be used to increase followers for individual accounts. And in between they can be used by spammers, scammers and phishers.
The first step in discovering the extent of amplification bots is to develop an automated method for recognizing such a bot. Duo Security researchers Jordan Wright and Olabode Anise started with a dataset of 576 million tweets and needed to be able to distinguish between normal twitter behavior and abnormal amplification behavior.
They examined an easy observation. The majority of tweets receive more likes than retweets. “It is reasonable to expect that the number of likes for a particular tweet would be higher than the number of retweets, since liking a tweet is a lower-impact action,” write the researchers.
A ‘like’ tells the author that the content of the tweet is appreciated, but does not post the tweet to the liker’s own followers. Likes are consequently well-used by people, but of little value for amplification by bot. The first task was to test this assumption.
The researchers then filtered out those tweets with less than 50 retweets. The purpose was to limit distortion from the large number of tweets with few retweets: one like and one retweet from the same follower would skew the ratio of RT/Like elsewhere.
They found “that half of the tweets in our dataset have nearly a 2:1 ratio of likes vs. retweets, while 80 percent of the tweets have at least more likes than retweets (greater than 1:1 ratio).” The rest have a much greater ratio of retweets to like.
In one example, they found a tweet that had 969 retweets and just 164 likes — a massive reversal of the normal ratio of likes to retweets. “To put some numbers to how rare this is,” comment the researchers, “only 0.2 percent of tweets in our dataset had more than at least 900 retweets and a similar retweet-to-like-ratio.”
This is almost certainly a bot account. Examining the timeline of the bot’s account provides further clues — it didn’t seem to have authored a single original tweet, but contained many other retweets with a similarly high retweet to like ratio.
The researchers also examined the time-distribution of retweets for suspected bots. The assumption is that a genuine account’s timeline will show tweets in basic chronological order, while an amplification bot’s timeline would be more scattered. To test and measure this, the researchers used the inversion count (a mathematical measure of deviation from the natural order) on a genuine account, and the amplification bot already determined.
A genuine account should show a lower inversion number than an amplification account. “The inversion count for [the genuine account] timeline is 63, while the inversion count is 2028 for the amplification bot,” they found.
Armed with these three clues to the likelihood of an amplification bot, the researchers developed a script to search through their twitter dataset. They had three criteria: at least 90% of tweets should be retweets; at least one-third of their retweets should have been amplified; and the inversion count on their timeline should be greater than 100.
Running this search script over the dataset for just one day found 7,000 amplification bots. The determining criteria were consciously set high to avoid false positives — so the true figure is likely to be higher. Had the search been conducted over a longer period, then it would inevitably have found more. And finally, although the dataset comprised 576 million tweets, that is only about one day’s-worth of all tweets
This research does not intend to estimate the total number of amplification bots on Twitter at any time — but it clearly shows that it is a vast number that cannot be controlled by Twitter’s own processes. It tells us a number of things. Twitter itself cannot keep up with the formation and use of amplification bots, so it is a fair assumption that everyone will sooner or later receive a tweet that looks to be popular but is not necessarily so. If the amplification is for malicious purposes, it may include a link that appears to be safe by virtue of the number of retweets it has received.
Since URL reputation lists cannot keep up with the generation and use of malicious URLs, there is no guarantee that either Twitter’s own filters or a company’s purchased filters will know that it is malicious. The temptation exists for the user to click a shortened URL in an interesting tweet that has been apparently verified by a large (falsely amplified) number of retweets.
Such is the speed and volume of Twitter that, short of banning its use, there is little that companies can do to prevent this problem. The best solution — as so often happens — is user awareness. This research demonstrates how users can quickly recognize a suspected amplification tweet: a high ratio between retweets and likes, and retweeters’ timelines that have a high proportion of retweets to original tweets that are visibly not in chronological order are the most immediate indicators.