Researchers Demonstrate How “de-identified” Web Browsing Histories Can be Linked to Social Media Accounts
While the use of cookies and other tracking mechanisms used to track computers is widespread and well understood, it is often believed that the data collected is effectively de-identified; that is, the cookies track the computer browser, not the person using the computer.
This is the message often promulgated by the advertising industry: tracking cookies allow targeted advertising without compromising personal privacy. Now new research from academics at Stanford and Princeton universities demonstrates that this need not be so.
In the new study ‘De-anonymizing Web Browsing Data with Social Networks‘ (due to be presented at the 2017 World Wide Web Conference Perth, Australia, in April) the researchers show that de-identified web browsing histories can be linked to social media profiles using only publicly available data. Once the social media profile associated with a browsing pattern is known, the person is known.
The basic premise is that social media users are more likely to click on links posted by people they follow. This creates a distinctive pattern that persists in the browsing history. “An adversary can thus de-anonymize a given browsing history,” states the report, “by finding the social media profile whose ‘feed’ shares the history’s idiosyncratic characteristics.”
The theory was tested against Twitter — chosen because it is largely public, has an accessible API, and wraps its links in the t.co shortener. Assuming an ‘adversary’ has access to browsing histories, he can then easily deduce (through timing or referrer information) which links came from Twitter. The pattern of those referrals from Twitter can then be used to identify the user concerned by matching it with users’ Twitter profile characteristics. The same approach could also be used against users with Facebook or Reddit accounts.
“Users may assume they are anonymous when they are browsing a news or a health website,” comments says Arvind Narayanan, an assistant professor of computer science at Princeton and one of the authors of the research, “but our work adds to the list of ways in which tracking companies may be able to learn their identities.”
The approach is not foolproof. Nevertheless, say the researchers, “given a history with 30 links originating from Twitter, we can deduce the corresponding Twitter profile more than 50 percent of the time.” In fact, in a test involving 374 volunteers who submitted web browsing histories, the method was able to identify more than 70 percent of those users by comparing their web browsing data to hundreds of millions of public social media feeds.
“All the evidence we have seen piling up over the years showing the strong limits of data anonymization, including this study,” comments Yves-Alexandre de Montjoye, an assistant professor at Imperial College London (not associated with the research), “really emphasizes the need to rethink our approach to privacy and data protection in the age of big data.”
The problem goes beyond simple user privacy, since it could be used to target persons of interest. “The idea would be to look at something such as my Twitter account (as in who I’m following) and to determine what links I’m seeing,” explains F-Secure security advisor Sean Sullivan. “And then, to find the ‘User X’ with the highest correlation between site visits and links seen. At which point, if I’m User X, I could be targeted by somebody who controls one of the sites visited.”
At a purely ‘commercial’ level, this could be used to target individuals with high value goods. But it could also be used to find and target specific individuals prior to a network attack.
The researchers accept that their current methodology is not 100% accurate, but add an “adversary may fruitfully make use of other fingerprinting information available through URLs, such as UTM codes. Thus, the main lesson of our paper is qualitative: we present multiple lines of evidence that browsing histories may be linked to social media profiles, even at a scale of hundreds of millions of potential users.”
Furthermore, it claims, “our attack has no universal mitigation outside of disabling public access to social media sites, an act that would undermine the value of these sites.” It calls for “more research into privacy-preserving data mining of browsing histories.”
More from Kevin Bowers
- Alexa May Be Recording More Than You Realize
- UK’s NCSC Adopts HackerOne for Vulnerability Coordination Disclosure
- Artificial Intelligence in Cybersecurity is Not Delivering on its Promise
- Untangle Partners With Malwarebytes to Bring Layered Security to SMBs
- Testing Security Products: Third-Party Standards vs. In-House Testing
- New Cyber Readiness Program Launched for SMBs
- Personal Details of 120 Million Brazilians Exposed
- Researchers Find Thousands of Twitter Amplification Bots in Just One Day
Latest News
- Big China Spy Balloon Moving East Over US, Pentagon Says
- Former Ubiquiti Employee Who Posed as Hacker Pleads Guilty
- Cyber Insights 2023: Venture Capital
- Atlassian Warns of Critical Jira Service Management Vulnerability
- High-Severity Privilege Escalation Vulnerability Patched in VMware Workstation
- Exploitation of Oracle E-Business Suite Vulnerability Starts After PoC Publication
- China Says It’s Looking Into Report of Spy Balloon Over US
- GoAnywhere MFT Users Warned of Zero-Day Exploit
