Security Experts:

Connect with us

Hi, what are you looking for?

SecurityWeekSecurityWeek

Privacy

Researchers Link “de-identified” Browsing History to Social Media Accounts

Researchers Demonstrate How “de-identified” Web Browsing Histories Can be Linked to Social Media Accounts

Researchers Demonstrate How “de-identified” Web Browsing Histories Can be Linked to Social Media Accounts

While the use of cookies and other tracking mechanisms used to track computers is widespread and well understood, it is often believed that the data collected is effectively de-identified; that is, the cookies track the computer browser, not the person using the computer.

This is the message often promulgated by the advertising industry: tracking cookies allow targeted advertising without compromising personal privacy. Now new research from academics at Stanford and Princeton universities demonstrates that this need not be so.

In the new study ‘De-anonymizing Web Browsing Data with Social Networks‘ (due to be presented at the 2017 World Wide Web Conference Perth, Australia, in April) the researchers show that de-identified web browsing histories can be linked to social media profiles using only publicly available data. Once the social media profile associated with a browsing pattern is known, the person is known.

The basic premise is that social media users are more likely to click on links posted by people they follow. This creates a distinctive pattern that persists in the browsing history. “An adversary can thus de-anonymize a given browsing history,” states the report, “by finding the social media profile whose ‘feed’ shares the history’s idiosyncratic characteristics.”

The theory was tested against Twitter — chosen because it is largely public, has an accessible API, and wraps its links in the t.co shortener. Assuming an ‘adversary’ has access to browsing histories, he can then easily deduce (through timing or referrer information) which links came from Twitter. The pattern of those referrals from Twitter can then be used to identify the user concerned by matching it with users’ Twitter profile characteristics. The same approach could also be used against users with Facebook or Reddit accounts.

“Users may assume they are anonymous when they are browsing a news or a health website,” comments says Arvind Narayanan, an assistant professor of computer science at Princeton and one of the authors of the research, “but our work adds to the list of ways in which tracking companies may be able to learn their identities.” 

The approach is not foolproof. Nevertheless, say the researchers, “given a history with 30 links originating from Twitter, we can deduce the corresponding Twitter profile more than 50 percent of the time.” In fact, in a test involving 374 volunteers who submitted web browsing histories, the method was able to identify more than 70 percent of those users by comparing their web browsing data to hundreds of millions of public social media feeds. 

“All the evidence we have seen piling up over the years showing the strong limits of data anonymization, including this study,” comments Yves-Alexandre de Montjoye, an assistant professor at Imperial College London (not associated with the research), “really emphasizes the need to rethink our approach to privacy and data protection in the age of big data.”

The problem goes beyond simple user privacy, since it could be used to target persons of interest. “The idea would be to look at something such as my Twitter account (as in who I’m following) and to determine what links I’m seeing,” explains F-Secure security advisor Sean Sullivan. “And then, to find the ‘User X’ with the highest correlation between site visits and links seen. At which point, if I’m User X, I could be targeted by somebody who controls one of the sites visited.”

At a purely ‘commercial’ level, this could be used to target individuals with high value goods. But it could also be used to find and target specific individuals prior to a network attack.

The researchers accept that their current methodology is not 100% accurate, but add an “adversary may fruitfully make use of other fingerprinting information available through URLs, such as UTM codes. Thus, the main lesson of our paper is qualitative: we present multiple lines of evidence that browsing histories may be linked to social media profiles, even at a scale of hundreds of millions of potential users.”

Furthermore, it claims, “our attack has no universal mitigation outside of disabling public access to social media sites, an act that would undermine the value of these sites.” It calls for “more research into privacy-preserving data mining of browsing histories.”

Written By

Click to comment

Expert Insights

Related Content

Cybersecurity Funding

Los Gatos, Calif-based data protection and privacy firm Titaniam has raised $6 million seed funding from Refinery Ventures, with participation from Fusion Fund, Shasta...

Privacy

The EU's digital policy chief warned TikTok’s boss that the social media app must fall in line with tough new rules for online platforms...

Privacy

Meta was fined an additional $5.9 million for violating EU data protection regulations with WhatsApp messaging app.

Cloud Security

AWS has announced that server-side encryption (SSE-S3) is now enabled by default for all Simple Storage Service (S3) buckets.

Mobile & Wireless

As smartphone manufacturers are improving the ear speakers in their devices, it can become easier for malicious actors to leverage a particular side-channel for...

Privacy

Many in the United States see TikTok, the highly popular video-sharing app owned by Beijing-based ByteDance, as a threat to national security.The following is...

Privacy

Employees of Chinese tech giant ByteDance improperly accessed data from social media platform TikTok to track journalists in a bid to identify the source...