Researchers Demonstrate How “de-identified” Web Browsing Histories Can be Linked to Social Media Accounts
While the use of cookies and other tracking mechanisms used to track computers is widespread and well understood, it is often believed that the data collected is effectively de-identified; that is, the cookies track the computer browser, not the person using the computer.
This is the message often promulgated by the advertising industry: tracking cookies allow targeted advertising without compromising personal privacy. Now new research from academics at Stanford and Princeton universities demonstrates that this need not be so.
In the new study ‘De-anonymizing Web Browsing Data with Social Networks‘ (due to be presented at the 2017 World Wide Web Conference Perth, Australia, in April) the researchers show that de-identified web browsing histories can be linked to social media profiles using only publicly available data. Once the social media profile associated with a browsing pattern is known, the person is known.
The basic premise is that social media users are more likely to click on links posted by people they follow. This creates a distinctive pattern that persists in the browsing history. “An adversary can thus de-anonymize a given browsing history,” states the report, “by finding the social media profile whose ‘feed’ shares the history’s idiosyncratic characteristics.”
The theory was tested against Twitter — chosen because it is largely public, has an accessible API, and wraps its links in the t.co shortener. Assuming an ‘adversary’ has access to browsing histories, he can then easily deduce (through timing or referrer information) which links came from Twitter. The pattern of those referrals from Twitter can then be used to identify the user concerned by matching it with users’ Twitter profile characteristics. The same approach could also be used against users with Facebook or Reddit accounts.
“Users may assume they are anonymous when they are browsing a news or a health website,” comments says Arvind Narayanan, an assistant professor of computer science at Princeton and one of the authors of the research, “but our work adds to the list of ways in which tracking companies may be able to learn their identities.”
The approach is not foolproof. Nevertheless, say the researchers, “given a history with 30 links originating from Twitter, we can deduce the corresponding Twitter profile more than 50 percent of the time.” In fact, in a test involving 374 volunteers who submitted web browsing histories, the method was able to identify more than 70 percent of those users by comparing their web browsing data to hundreds of millions of public social media feeds.
“All the evidence we have seen piling up over the years showing the strong limits of data anonymization, including this study,” comments Yves-Alexandre de Montjoye, an assistant professor at Imperial College London (not associated with the research), “really emphasizes the need to rethink our approach to privacy and data protection in the age of big data.”
The problem goes beyond simple user privacy, since it could be used to target persons of interest. “The idea would be to look at something such as my Twitter account (as in who I’m following) and to determine what links I’m seeing,” explains F-Secure security advisor Sean Sullivan. “And then, to find the ‘User X’ with the highest correlation between site visits and links seen. At which point, if I’m User X, I could be targeted by somebody who controls one of the sites visited.”
At a purely ‘commercial’ level, this could be used to target individuals with high value goods. But it could also be used to find and target specific individuals prior to a network attack.
The researchers accept that their current methodology is not 100% accurate, but add an “adversary may fruitfully make use of other fingerprinting information available through URLs, such as UTM codes. Thus, the main lesson of our paper is qualitative: we present multiple lines of evidence that browsing histories may be linked to social media profiles, even at a scale of hundreds of millions of potential users.”
Furthermore, it claims, “our attack has no universal mitigation outside of disabling public access to social media sites, an act that would undermine the value of these sites.” It calls for “more research into privacy-preserving data mining of browsing histories.”

Kevin Townsend is a Senior Contributor at SecurityWeek. He has been writing about high tech issues since before the birth of Microsoft. For the last 15 years he has specialized in information security; and has had many thousands of articles published in dozens of different magazines – from The Times and the Financial Times to current and long-gone computer magazines.
More from Kevin Townsend
- Google Introduces SAIF, a Framework for Secure AI Development and Use
- SaaS Ransomware Attack Hit Sharepoint Online Without Using a Compromised Endpoint
- Sysdig Introduces CNAPP With Realtime CDR
- OWASP’s 2023 API Security Top 10 Refines View of API Risks
- Zoom Expands Privacy Options for European Customers
- SBOMs – Software Supply Chain Security’s Future or Fantasy?
- Threat Actor Abuses SuperMailer for Large-scale Phishing Campaign
- Quantum Decryption Brought Closer by Topological Qubits
Latest News
- In Other News: AI Regulation, Layoffs, US Aerospace Attacks, Post-Quantum Encryption
- Blackpoint Raises $190 Million to Help MSPs Combat Cyber Threats
- Google Introduces SAIF, a Framework for Secure AI Development and Use
- ‘Asylum Ambuscade’ Group Hit Thousands in Cybercrime, Espionage Campaigns
- Evidence Suggests Ransomware Group Knew About MOVEit Zero-Day Since 2021
- SaaS Ransomware Attack Hit Sharepoint Online Without Using a Compromised Endpoint
- Google Cloud Now Offering $1 Million Cryptomining Protection
- Democrats and Republicans Are Skeptical of US Spying Practices, an AP-NORC Poll Finds
