Real Googlebots help Google discover new and updated webpages so they can be added to the search engine’s index; fake Googlebots have no such good intentions.
As it turns out, the evil twins of Googlebots are often used these days as the starting point for distributed denial-of-service (DDoS) attacks. According to new research from Incapsula taken from its inspection of more than 50 million fake Googlebot visits, 34.3 percent of all identified imposters were explicitly malicious – with 23.5 percent of these bots being used for Layer 7 DDoS attacks.
“Using Googlebot presents target website operators with a harsh dilemma – to block all Googlebots and be dropped from Google or to keep allowing Googlebots in and risk prolonged downtime,” Igal Zeifman, product evangelist at Incapsula, told SecurityWeek. “With Layer 7 attacks that can go for weeks and months at a time, both solutions are equally devastating.”
On average, a website will be visited by Googlebots 187 times per day, Incapsula estimates. For every 24 of those legitimate visits however, the site will also get a visit from a phony Googlebot. Many of the fake Googlebots examined by Incapsula were used for market intelligence (65.7 percent), while the rest were used for either DDoS, scraping (5.3 percent), spamming (3.8 percent) or hacking (1.7 percent).
In a blog post, Zeifman noted that fake Googlebots originate from botnets, most of which are inside the U.S., China, Turkey and India. The U.S. is the biggest country of origin for fake Googlebot visits, coming in at roughly 25 percent.
“One would assume that having a full list of IPs from Google could help, as it can be used by regular website owners to filter fake Googlebot traffic,” Zeifman told SecurityWeek. “However, realistically speaking, it’s hard to imagine most users keeping up with that list and updating their security rules as it expands. Realistically speaking, users just need to be aware of this threat and counter it with a security service that can accurately and transparently filter malicious bot traffic – Googlebot or otherwise.”
The best option, he said, is granular visit-by-visit processing, to verify the identity of every single visitor.
“This requires a lot of processing power, because you need to address each incoming request,” he said. “Very few possess that technology and fewer still have the CPU power to process hundreds of thousands of requests per second.”
More on the Incapsula research can be read here.