Security Experts:

OpenDNS Uses Natural Language Processing to Detect APTs

OpenDNS has unveiled NLPRank, a new model that can be used to detect both opportunistic phishing campaigns and advanced persistent threats (APTs) by identifying certain patterns in DNS traffic.

OpenDNS security researcher Jeremiah O’Connor has found a way to combine natural language processing (NLP) techniques with the company’s global network data to detect malicious activity. The algorithms used by NLPRank are usually seen in fields such as data mining and bioinformatics, but the researcher has demonstrated that they can also be useful for IT security.

Cybercrime groups such as Carbanak/Anunak, which reportedly stole up to one billion dollars from 100 banks over a two-year period, often use spear phishing to install malware on the targeted systems. These types of operations usually involve malicious domains whose names look similar to the ones of legitimate high-profile domains. For example, the Carbanak group leveraged domains such as update-java(dot)net and adobe-update(dot)net.

OpenDNS’s NLPRank model analyzes the domain names and other details in order to determine if a domain is malicious.

“NLPRank is designed to detect these fraudulent branded domains that often serve as C2 domains for targeted attacks. Our system utilizes heuristics such as NLP, ASN mappings and weightings, WHOIS data patterns, and HTML tag analysis to classify these type of attack domains,” O’Connor explained in a blog post.

NLPRank relies on the edit distance algorithm, which is usually used for spell-checking, speech recognition, machine translation, and information retrieval.

“NLPRank uses a minimum edit-distance on substrings to check for the word distance between legitimate and typo-squatting domains (ex. malware.com vs. rnalware.com, linkedin.com vs. 1inkedin.net),” O’Connor said. “Minimum edit-distance is a shortest-path, dynamic-programming algorithm that checks for similarity between 2 strings. The minimum edit-distance between 2 strings is defined as the minimum number of edits it takes (ex. insertion, deletion, substitution) to turn string A into string B. Basically anytime you have to make an edit you incur a penalty.”

For example, in order to turn “g00gle.com” into “google.com,” one needs to make two substitutions, which means the penalty is 2. Turning “i n c e _ p t i o n” into “_ e x e c u t i o n” requires three substitutions, one deletion, and one insertion, making the penalty 5, the researcher explained.

By using this algorithm, OpenDNS believes it can find the difference between the “language” used by malicious domains and the one of benign domains in DNS traffic.

NLPRank can also detect malicious domains by analyzing Autonomous System Number (ASN) data. OpenDNS has mapped legitimate domains to their ASNs, which uniquely identify each network on the Web.

For instance, 14365 and 44786 are ASNs associated with Adobe. However, the ASN of the domain used by the Carbanak group (adobe-update(dot)net) was associated to PIN-AS Petersburg Internet Network LLC in Russia, which has often been utilized for cybercriminal activities. This clearly shows that the domain is not legitimate.

The NLPRank model is currently used by OpenDNS for the automated detection of threats, but it has not yet been implemented for automated blocking.

view counter
Eduard Kovacs (@EduardKovacs) is a contributing editor at SecurityWeek. He worked as a high school IT teacher for two years before starting a career in journalism as Softpedia’s security news reporter. Eduard holds a bachelor’s degree in industrial informatics and a master’s degree in computer techniques applied in electrical engineering.