Virtual Event: Threat Detection and Incident Response Summit - Watch Sessions
Connect with us

Hi, what are you looking for?


Malware & Threats

Researchers Pave Way For Automated Analysis of Malware Images

LAS VEGAS – BLACK HAT USA 2015 – Researchers at Invincea have tested the effectiveness of an automated analysis system that focuses on the images embedded in malware.

LAS VEGAS – BLACK HAT USA 2015 – Researchers at Invincea have tested the effectiveness of an automated analysis system that focuses on the images embedded in malware.

Desktop icons and various other types of images are often used by malware creators to lure users. A perfect example are the PDF icons leveraged to get users to open a malicious executable by tricking them into thinking that it’s a harmless document.

As part of a Defense Advanced Research Projects Agency (DARPA) project focusing on new types of malware analysis, Invincea researchers have demonstrated that an automated system designed to analyze images embedded in malware could improve threat detection rates, and it could help researchers understand how new malware tricks users and determine which adversary is behind a certain threat.

Malware image analysis

Alex Long, research engineer at Invincea Labs, presented the results of their work on Wednesday at the Black Hat conference in Las Vegas.

“Using the images in malware to analyze the sample puts malware authors in a ‘catch-22’ dilemma, because images are a huge part of how they manipulate users. We’re basically saying to malware authors, ‘You can keep using images to increase your chances of tricking a user, but we’re also going to be using images to make it easier for us to detect and understand your malware.’,” Long told SecurityWeek.

“Given the vast amount of research going into malware detection approaches, and the promising preliminary results our work has shown, we believe that this relatively simple idea has the potential to complement other approaches very effectively and should be continued further as a new signal in malware analysis,” he added.

According to Long, more than half of the two million malware samples provided by DARPA had at least one image embedded.

Advertisement. Scroll to continue reading.

The automated analysis of malware images has two main stages: identifying malware samples using visually similar image sets, and classifying the images (e.g. fake antiviruses, installers, game-related threats).

For the first component, Invincea relied on a technique known as “Average Hash.” This technique involves reducing an image to grayscale, stretching it or shrinking it to a certain size, increasing its contrast, and converting it to a binary vector. After an average pixel value is obtained, a hash is generated by comparing the value of each pixel to this average.

This allows the analysis system to efficiently compare a malware image with images from a given set regardless of their contrast, scale, or color scheme.

The classification of malware images into categories relies on the Google Image Search API and user-defined queries.

“For the image classification work, we used Google Image Search results to get images representing the various semantic classes of interest. So for example, if you want training data for Internet Explorer icons, you do a search for ‘internet explore’” with some advanced search settings to narrow down the results to just icons,” Long explained.

For some of the tested categories, such as fake word processors, researchers obtained very good results, but in other categories the malware images were misclassified at an unacceptable rate. 

“The goal in our research was to make this process entirely automated. Malware authors are using automated processes to produce an essentially endless stream of polymorphic variants from a single malware sample, so malware analysts must begin to rely more on automated approaches as well,” Long said.

“Using our approach, the extraction, comparison, and visualization of matching images is done completely automatically so an analyst can go from receiving 200,000 fresh new malware samples that he knows nothing about, to seeing a ‘social network’ of their shared images with literally the push of a button,” the expert noted. “[The image classification] process was also entirely automated, so choosing how you want to classify malware images is as simple as adding a few words (like ‘anti-virus’) for the new search query, to the list of queries in our system.”

Processing images often requires a lot of system resources, but Invincea has attempted to develop an analysis system that works efficiently.

“Staying in the theme of scalability, we focused on approaches that would be computationally-cheap. Average hash is a good example of that, as the entire algorithm takes 20-30 lines of code in python and is nearly instantaneous to run,” Long told SecurityWeek. “In order to maintain effective performance when performing image matching across potentially millions of images, we used the open source library, FLANN, which is short for ‘Fast Light-Weight Approximation of Nearest Neighbors.’ This uses a technique that is much more complex than kNN to approximate the results of kNN without having to perform the costly pair-wise comparisons between every possible pair of images. This allows us to do nearest neighbor calculations across hundreds of thousands of images in a few seconds, making any performance impact virtually negligible.”

Before image analysis can be integrated into a commercial solution, the overall accuracy of the system needs to be improved, the researcher said.

“This work was performed near the tail end of a 4 year DARPA-backed program, so our main goal was just publishing the idea into the community. We wanted to demonstrate that the concept had potential, which I believe we did,” Long said.

Written By

Eduard Kovacs (@EduardKovacs) is a contributing editor at SecurityWeek. He worked as a high school IT teacher for two years before starting a career in journalism as Softpedia’s security news reporter. Eduard holds a bachelor’s degree in industrial informatics and a master’s degree in computer techniques applied in electrical engineering.

Click to comment

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

SecurityWeek’s Threat Detection and Incident Response Summit brings together security practitioners from around the world to share war stories on breaches, APT attacks and threat intelligence.


Securityweek’s CISO Forum will address issues and challenges that are top of mind for today’s security leaders and what the future looks like as chief defenders of the enterprise.


Expert Insights

Related Content


The changing nature of what we still generally call ransomware will continue through 2023, driven by three primary conditions.


A recently disclosed vBulletin vulnerability, which had a zero-day status for roughly two days last week, was exploited in a hacker attack targeting the...


No one combatting cybercrime knows everything, but everyone in the battle has some intelligence to contribute to the larger knowledge base.

Malware & Threats

Threat actors are increasingly abusing Microsoft OneNote documents to deliver malware in both targeted and spray-and-pray campaigns.

Malware & Threats

Unpatched and unprotected VMware ESXi servers worldwide have been targeted in a ransomware attack exploiting a vulnerability patched in 2021.

Malware & Threats

A vulnerability affecting IBM’s Aspera Faspex file transfer solution, tracked as CVE-2022-47986, has been exploited in attacks.


The recent ransomware attack targeting Rackspace was conducted by a cybercrime group named Play using a new exploitation method, the cloud company revealed this...

Application Security

Virtualization technology giant VMware on Tuesday shipped urgent updates to fix a trio of security problems in multiple software products, including a virtual machine...