Security Experts:

Connect with us

Hi, what are you looking for?


Mobile & Wireless

Google Scours the Internet for Dirty Android Apps

Google is analyzing all the apps that it can find across the Internet in an effort to keep Android users protected from Potentially Harmful Applications (PHAs).

Google is analyzing all the apps that it can find across the Internet in an effort to keep Android users protected from Potentially Harmful Applications (PHAs).

One week after launching the Android Ecosystem Security Transparency Report, Google decided to explain how it leverages machine learning techniques for detecting PHAs. 

Google Play Protect (GPP), the security services that help keep devices with Google Play clean, analyzes more than half a million apps each day, and looks everywhere it can for those apps, the Internet search giant said. 

AndroidThanks to the help of machine learning, Google says it is able to detect PHAs faster and scale better. The scanning system uses multiple data sources and machine learning models to analyze apps and evaluate the user experience. 

Google Play Protect looks into the APK of all applications it can find, to extract PHA signals such as SMS fraud, phishing, privilege escalation, and the like. Both the resources inside the APK file and the app behavior are tested to produce information about the app’s characteristics. 

Additionally, Google attempts to understand the manner in which the users perceive apps by collecting feedback (such as the number of installs, ratings, and comments) from Google Play, as well as information about the developer (such as the certificates they use and their history of published apps). 

“In general, our data sources yield raw signals, which then need to be transformed into machine learning features for use by our algorithms. Some signals, such as the permissions that an app requests, have a clear semantic meaning and can be directly used. In other cases, we need to engineer our data to make new, more powerful features,” Google notes

The company calculates a rating per developer based on the ratings of that developer’s apps, and uses that rating to validate future apps. The tech giant also uses embedding to create compact representations for sparse data, and feature selection to streamline data and make it more useful to models. 

“By combining our different datasets and investing in feature engineering and feature selection, we improve the quality of the data that can be fed to various types of machine learning models,” the company notes. 

Google uses models to identify PHAs in specific categories, such as SMS-fraud or phishing. While these are broad categories, models that focus on smaller scales do exist, targeting groups of apps part of the same PHA campaign and sharing source code and behavior. 

Each of these model categories comes with its own perks and caveats. Using a single model to tackle a broad category provides simplicity but lacks precision due to generalization, while the use of multiple PHA models requires additional engineering efforts and reduces scope, despite improving precision. 

To modify its machine learning approach, Google uses both supervised and unsupervised techniques, such as logistic regression, which has a simple structure and can be trained quickly, and deep learning, which can capture complicated interactions between features and extract hidden patterns. Google also uses deep neural networks in the process. 

“PHAs are constantly evolving, so our models need constant updating and monitoring. In production, models are fed with data from recent apps, which help them stay relevant. However, new abuse techniques and behaviors need to be continuously detected and fed into our machine learning models to be able to catch new PHAs and stay on top of recent trends,” Google notes. 

The employed machine learning models were able to successfully detect 60.3% of the PHAs identified by Google Play Protect, covering over 2 billion Android devices, Google says, adding that it will continue investing in the technology. 

Related: Google Introduces Security Transparency Report for Android

Written By

Ionut Arghire is an international correspondent for SecurityWeek.

Click to comment

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

Join this webinar to learn best practices that organizations can use to improve both their resilience to new threats and their response times to incidents.


Join this live webinar as we explore the potential security threats that can arise when third parties are granted access to a sensitive data or systems.


Expert Insights

Related Content

Mobile & Wireless

Infonetics Research has shared excerpts from its Mobile Device Security Client Software market size and forecasts report, which tracks enterprise and consumer security client...

Mobile & Wireless

Apple rolled out iOS 16.3 and macOS Ventura 13.2 to cover serious security vulnerabilities.

Mobile & Wireless

Technical details published for an Arm Mali GPU flaw leading to arbitrary kernel code execution and root on Pixel 6.

Mobile & Wireless

Apple’s iOS 12.5.7 update patches CVE-2022-42856, an actively exploited vulnerability, in old iPhones and iPads.

Mobile & Wireless

The February 2023 security updates for Android patch 40 vulnerabilities, including multiple high-severity escalation of privilege bugs.

Mobile & Wireless

Two vulnerabilities in Samsung’s Galaxy Store that could be exploited to install applications or execute JavaScript code by launching a web page.

Mobile & Wireless

South Dakota Gov. Kristi Noem says her personal cell phone was hacked and linked it to the release of documents by the January 6...


Pig Butchering, also known as Sha Zhu Pan and CryptoRom, is an ugly name for an ugly scam.