Skype allows customers to communicate over Voice over Internet Protocol (VoIP) platforms. And because it is encrypted, Skype, which was recently purchased by Microsoft for $8.5 Billion, is used by many businesses today for their international phone calls. What researchers have found, however, is a novel way to decrypt those conversations without ever knowing the encryption key.
Researchers Andrew M. White, Austin R. Matthews, Kevin Z. Snow, and Fabian Monrose from the Department of Computer Science and the Department of Linguistics at the University of North Carolina at Chapel Hill used an attack, dubbed “Phonotactic Reconstruction”, in their research paper, amusingly subtitled “Hookt on Fon-iks,” to predict clear text words from encrypted sequences. What they did was segment sequences of the VoIP packets into sub-sequences mapped into candidate words, then, based on rules of grammar, hypothesized these sub-sequences into whole sentences. In other words, they were able to reconstruct the conversation by guessing and predicting the original sounds used within the original Skype conversation.
Because of its faster compression, VoIP systems tend to use what’s called Linear Predictive Coding (LPC) to transmit voice conversations over IP. LPC creates data sets from spoken English on the sender’s end by breaking apart the resonances, hisses, and pops within and between words. The conversation is then synthesized on the receiving end by reversing the process. What the researchers did was simulate that reverse process in a lab. They wrote: “While the generalized performance is not as strong as we would have liked, we believe the results still raise cause for concern: in particular, one would hope that such recovery would not be at all possible since VoIP audio is encrypted precisely to prevent such breaches of privacy.”
This particular attack has its roots in linguistics. The researchers liken it to how infants break up speech into words without hearing actual pauses and word divisions within a sentence. Adults have a lexicon of sounds that make up individual words, but infants do not. Infants must rely on gestures, intonations and other clues to infer the breaks between words within the stream of sounds they hear. That is what this attack does; it attempts to build a lexicon of sounds using LPC to decode the conversation.
Unfortunately, the researchers don’t offer any solutions. They only hope that “this work stimulates discussion within the broader community on ways to design more secure, yet efficient, techniques for preserving the confidentiality of VoIP conversations.”
This holds relevance for device manufacturers. For example, integrated circuits consume varying rates of power. One can map that varying power consumption and begin to map the peaks and valley sequences to digits – ones and zeros. This is called Power Analysis Attacks. And they can be used to decode credit card sequences from a POS terminal or patient information from a medical device. The point is, there is data leaking out in ways we might not have thought possible.
Seeing research from the linguistics department have bearing on encryption is refreshing: It’s the “out of the box” thinking that cybercriminals employ. Often we’re left, after the fact, nodding our heads at the clever means by which someone used something completely unrelated to our field to defeat our security. Perhaps some of the standard computer security conferences should invite or challenge researchers in other disciplines to present. Imagine what we might learn.