Application Security

Automated System Defeats reCAPTCHA With High Accuracy

A newly devised system that targets the audio version of Google’s reCAPTCHA challenges can break them with very high accuracy.

Dubbed unCAPTCHA, the automated system designed by computer science experts from the University of Maryland (UM) is said to be able to defeat the audio reCaptcha system with 85% accuracy.

Ionut Arghire

Published

November 2, 2017

A newly devised system that targets the audio version of Google’s reCAPTCHA challenges can break them with very high accuracy.Dubbed unCAPTCHA, the automated system designed by computer science experts from the University of Maryland (UM) is said to be able to defeat the audio reCaptcha system with 85% accuracy.

A newly devised system that targets the audio version of Google’s reCAPTCHA challenges can break them with very high accuracy.

Dubbed unCAPTCHA, the automated system designed by computer science experts from the University of Maryland (UM) is said to be able to defeat the audio reCaptcha system with 85% accuracy.

The system uses browser automation software to interact with the target site and engage with the captcha. The tool, which has been published on GitHub, can properly identify spoken numbers to pass the reCaptcha programmatically and trick the site into thinking their bot is a human, the authors claim.

“Specifically, unCaptcha targets the popular site Reddit by going through the motions of creating a new user, although unCaptcha stops before creating the user to mitigate the impact on Reddit,” the experts say.

To correctly bypass the captcha, which includes numbers that are read aloud at varied speeds, pitches, and accents through background noise, the attack identifies the audio message on the page, downloads it, and then automatically splits it by locations of speech.

Next, each number audio bit is uploaded to 6 different online audio transcription services that are free to use, namely IBM, Google Cloud, Google Speech Recognition, Sphinx, Wit-AI, and Bing Speech Recognition, and the results are collected.

“We ensemble the results from each of these to probabilistically enumerate the most likely string of numbers with a predetermined heuristic. These numbers are then organically typed into the captcha, and the captcha is completed. From testing, we have seen 92%+ accuracy in individual number identification, and 85%+ accuracy in defeating the audio captcha in its entirety,” the system’s authors reveal.

Another recently revealed tool for defeating CAPTCHA systems is targeting text-based systems and was designed to mimic the human eye. Called the Recursive Cortical Network (RCN), it incorporates neuroscience insights into a structured probabilistic generative model framework.

Advertisement. Scroll to continue reading.

In a paper (PDF), the team of researchers behind RCN explain that the tool is capable of solving Google reCAPTCHA with a 66.6% accuracy, but that it is also highly efficient against other systems: 64.4% for BotDetect, 57.4% for Yahoo, and 57.1% for PayPal image challenges. The findings were published in the Science magazine.

“By drawing inspiration from systems neuroscience, we introduce a probabilistic generative model for vision in which message-passing based inference handles recognition, segmentation and reasoning in a unified way. The model demonstrates excellent generalization and occlusion-reasoning capabilities, and outperforms deep neural networks on a challenging scene text recognition benchmark while being 300-fold more data efficient,” the researchers say.

In this article:

SecurityWeek

Application Security

Automated System Defeats reCAPTCHA With High Accuracy

Related Content