Connect with us

Hi, what are you looking for?



Researchers Link “de-identified” Browsing History to Social Media Accounts

Researchers Demonstrate How “de-identified” Web Browsing Histories Can be Linked to Social Media Accounts

Researchers Demonstrate How “de-identified” Web Browsing Histories Can be Linked to Social Media Accounts

While the use of cookies and other tracking mechanisms used to track computers is widespread and well understood, it is often believed that the data collected is effectively de-identified; that is, the cookies track the computer browser, not the person using the computer.

This is the message often promulgated by the advertising industry: tracking cookies allow targeted advertising without compromising personal privacy. Now new research from academics at Stanford and Princeton universities demonstrates that this need not be so.

In the new study ‘De-anonymizing Web Browsing Data with Social Networks‘ (due to be presented at the 2017 World Wide Web Conference Perth, Australia, in April) the researchers show that de-identified web browsing histories can be linked to social media profiles using only publicly available data. Once the social media profile associated with a browsing pattern is known, the person is known.

The basic premise is that social media users are more likely to click on links posted by people they follow. This creates a distinctive pattern that persists in the browsing history. “An adversary can thus de-anonymize a given browsing history,” states the report, “by finding the social media profile whose ‘feed’ shares the history’s idiosyncratic characteristics.”

The theory was tested against Twitter — chosen because it is largely public, has an accessible API, and wraps its links in the shortener. Assuming an ‘adversary’ has access to browsing histories, he can then easily deduce (through timing or referrer information) which links came from Twitter. The pattern of those referrals from Twitter can then be used to identify the user concerned by matching it with users’ Twitter profile characteristics. The same approach could also be used against users with Facebook or Reddit accounts.

“Users may assume they are anonymous when they are browsing a news or a health website,” comments says Arvind Narayanan, an assistant professor of computer science at Princeton and one of the authors of the research, “but our work adds to the list of ways in which tracking companies may be able to learn their identities.” 

Advertisement. Scroll to continue reading.

The approach is not foolproof. Nevertheless, say the researchers, “given a history with 30 links originating from Twitter, we can deduce the corresponding Twitter profile more than 50 percent of the time.” In fact, in a test involving 374 volunteers who submitted web browsing histories, the method was able to identify more than 70 percent of those users by comparing their web browsing data to hundreds of millions of public social media feeds. 

“All the evidence we have seen piling up over the years showing the strong limits of data anonymization, including this study,” comments Yves-Alexandre de Montjoye, an assistant professor at Imperial College London (not associated with the research), “really emphasizes the need to rethink our approach to privacy and data protection in the age of big data.”

The problem goes beyond simple user privacy, since it could be used to target persons of interest. “The idea would be to look at something such as my Twitter account (as in who I’m following) and to determine what links I’m seeing,” explains F-Secure security advisor Sean Sullivan. “And then, to find the ‘User X’ with the highest correlation between site visits and links seen. At which point, if I’m User X, I could be targeted by somebody who controls one of the sites visited.”

At a purely ‘commercial’ level, this could be used to target individuals with high value goods. But it could also be used to find and target specific individuals prior to a network attack.

The researchers accept that their current methodology is not 100% accurate, but add an “adversary may fruitfully make use of other fingerprinting information available through URLs, such as UTM codes. Thus, the main lesson of our paper is qualitative: we present multiple lines of evidence that browsing histories may be linked to social media profiles, even at a scale of hundreds of millions of potential users.”

Furthermore, it claims, “our attack has no universal mitigation outside of disabling public access to social media sites, an act that would undermine the value of these sites.” It calls for “more research into privacy-preserving data mining of browsing histories.”

Written By

Kevin Townsend is a Senior Contributor at SecurityWeek. He has been writing about high tech issues since before the birth of Microsoft. For the last 15 years he has specialized in information security; and has had many thousands of articles published in dozens of different magazines – from The Times and the Financial Times to current and long-gone computer magazines.

Click to comment

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

SecurityWeek’s Threat Detection and Incident Response Summit brings together security practitioners from around the world to share war stories on breaches, APT attacks and threat intelligence.


Securityweek’s CISO Forum will address issues and challenges that are top of mind for today’s security leaders and what the future looks like as chief defenders of the enterprise.


Expert Insights

Related Content

Cybersecurity Funding

Los Gatos, Calif-based data protection and privacy firm Titaniam has raised $6 million seed funding from Refinery Ventures, with participation from Fusion Fund, Shasta...


Many in the United States see TikTok, the highly popular video-sharing app owned by Beijing-based ByteDance, as a threat to national security.The following is...

Artificial Intelligence

Two of humanity’s greatest drivers, greed and curiosity, will push AI development forward. Our only hope is that we can control it.


Employees of Chinese tech giant ByteDance improperly accessed data from social media platform TikTok to track journalists in a bid to identify the source...

Mobile & Wireless

As smartphone manufacturers are improving the ear speakers in their devices, it can become easier for malicious actors to leverage a particular side-channel for...

Cloud Security

AWS has announced that server-side encryption (SSE-S3) is now enabled by default for all Simple Storage Service (S3) buckets.


Meta was fined an additional $5.9 million for violating EU data protection regulations with WhatsApp messaging app.