Connect with us

Hi, what are you looking for?

SecurityWeekSecurityWeek

Email Security

Google’s RETVec Open Source Text Vectorizer Bolsters Malicious Email Detection

Google shows how RETVec, a new and open source text vectorizer, can improve the detection of phishing attacks, spam and other harmful content.

Google revealed on Wednesday that a new text vectorizer developed by its researchers significantly boosts efficiency in detecting malicious emails in Gmail inboxes.

The new text vectorizer, called RETVec (Resilient & Efficient Text Vectorizer), has been described by Google as “an efficient, resilient, and multilingual text vectorizer designed for neural-based text processing”. 

The internet giant has been leveraging text classification models to identify phishing attacks, scams, inappropriate comments and other harmful content on services such as YouTube and Gmail.

However, threat actors have been coming up with ways to evade these classifiers, using invisible characters, homoglyphs, and keyword stuffing.

RETVec aims to boost the efficiency of text classifiers while significantly reducing computation costs, and the tests conducted by Google over the past year seem to show that it has achieved its goal.

In its tests, Google replaced the text vectorizer previously used to detect spam in Gmail with RETVec. The company noticed a 38% improvement in spam detection, and a significant reduction in false positives and false negatives. In addition, the company saw a solid improvement in terms of performance.

Advertisement. Scroll to continue reading.

“RETVec achieves these improvements by combining a novel, highly-compact character encoder, an augmentation-driven training regime, and the use of metric learning,” Google explained.

It added, “Due to its novel architecture, RETVec works out-of-the-box on every language and all UTF-8 characters without the need for text preprocessing, making it the ideal candidate for on-device, web, and large-scale text classification deployments. Models trained with RETVec exhibit faster inference speed due to its compact representation. Having smaller models reduces computational costs and decreases latency, which is critical for large-scale applications and on-device models.”

RETVec has been detailed by Google in a paper and it has been made open source. A tutorial is also available for entities interested in using the new text vectorizer. 

Related: Satori Releases Open Source Data Permissions Scanner for Enterprises

Related: Top 10 Security, Operational Risks From Open Source Code

Related: Silverfort Open Sources Lateral Movement Detection Tool

Written By

Eduard Kovacs (@EduardKovacs) is senior managing editor at SecurityWeek. He worked as a high school IT teacher before starting a career in journalism in 2011. Eduard holds a bachelor’s degree in industrial informatics and a master’s degree in computer techniques applied in electrical engineering.

Trending

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

In cyber-physical systems (CPS), just one hour of downtime can outweigh an entire annual security budget. Learn how to master the Return on Security Investment (ROSI) to align security goals with the bottom-line priorities.

Register

Delve into big-picture strategies to reduce attack surfaces, improve patch management, conduct post-incident forensics, and tools and tricks needed in a modern organization.

Register

People on the Move

Malwarebytes has named Chung Ip as Chief Financial Officer.

Semperis has appointed John Podboy as Chief Information Security Officer.

Randy Menon has become Chief Product and Marketing Officer at One Identity.

More People On The Move

Expert Insights

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest cybersecurity news, threats, and expert insights. Unsubscribe at any time.