Email Security

Google’s RETVec Open Source Text Vectorizer Bolsters Malicious Email Detection

Google shows how RETVec, a new and open source text vectorizer, can improve the detection of phishing attacks, spam and other harmful content.

Eduard Kovacs

Published

November 30, 2023

Google shows how RETVec, a new and open source text vectorizer, can improve the detection of phishing attacks, spam and other harmful content.

Google revealed on Wednesday that a new text vectorizer developed by its researchers significantly boosts efficiency in detecting malicious emails in Gmail inboxes.

The new text vectorizer, called RETVec (Resilient & Efficient Text Vectorizer), has been described by Google as “an efficient, resilient, and multilingual text vectorizer designed for neural-based text processing”.

The internet giant has been leveraging text classification models to identify phishing attacks, scams, inappropriate comments and other harmful content on services such as YouTube and Gmail.

However, threat actors have been coming up with ways to evade these classifiers, using invisible characters, homoglyphs, and keyword stuffing.

RETVec aims to boost the efficiency of text classifiers while significantly reducing computation costs, and the tests conducted by Google over the past year seem to show that it has achieved its goal.

In its tests, Google replaced the text vectorizer previously used to detect spam in Gmail with RETVec. The company noticed a 38% improvement in spam detection, and a significant reduction in false positives and false negatives. In addition, the company saw a solid improvement in terms of performance.

“RETVec achieves these improvements by combining a novel, highly-compact character encoder, an augmentation-driven training regime, and the use of metric learning,” Google explained.

It added, “Due to its novel architecture, RETVec works out-of-the-box on every language and all UTF-8 characters without the need for text preprocessing, making it the ideal candidate for on-device, web, and large-scale text classification deployments. Models trained with RETVec exhibit faster inference speed due to its compact representation. Having smaller models reduces computational costs and decreases latency, which is critical for large-scale applications and on-device models.”

Advertisement. Scroll to continue reading.

RETVec has been detailed by Google in a paper and it has been made open source. A tutorial is also available for entities interested in using the new text vectorizer.

In this article:

SecurityWeek

Email Security

Google’s RETVec Open Source Text Vectorizer Bolsters Malicious Email Detection

Related Content