On the 5th of June, a hacker uploaded a file containing approximately 6.5 million of LinkedIn’s passwords (encoded as SHA-1 hashes). The following day, LinkedIn publicly acknowledged that the published passwords did indeed belong to LinkedIn users. As no further information was released, the full impact of the breach is not clear yet. Specifically, if no usernames were accessible by the hacker then the importance of the passwords disclosure is negligible. However, had the hacker gained access to the full credentials (i.e. the passwords’ respective usernames) that could have a significant business impact on the whole LinkedIn network.
For this column, I will focus on the business impact of such hypothetical credentials breach on the social network users’ privacy and to the social network’s product, as even if no actual harm was done in that specific incident, it should serve as a red flag for all social networks.
On a social network, your privacy’s weakest link is your worst friend
Data security is traditionally defined by three core principles known as The CIA triad: Confidentiality, Integrity and Availability of data. On most online services (e.g., Webmail) the CIA is usually kept as follows: The application provider makes sure that the access to the user’s data is confined to the relevant user only. Therefore, an attacker can obtain or change the user’s data only by directly attacking the user (e.g., stealing one’s password with a malware) or the application. In the world of social networks, since the whole purpose is to enable its users to share data with their peers – the model is different. While the integrity and availability of the data are still very much under the control of the user and the application, the data confidentiality involves another party – the user’s connections (i.e., friends, followers, depending on the specific social network terminology).
Let’s consider the following scenario: You are sharing some pictures with your friends over the social network. Even if you set the privacy definitions to restrict access to friends only, nothing stops one of your friends from saving a picture and reposting it to the general public bypassing whatever privacy setup a user may have put in place. Therefore, the social network adds another attack surface against users’ privacy – the users’ friends. On any social network, your privacy is only as good as the privacy of your most careless—or temporary—friend.
Hopefully, it’s now easy to see that the damage to the user’s privacy in the LinkedIn breach may not be confined to the ones that their password was revealed, but also to all of their friends! How? Even if a hacker doesn’t have your password but can access a friend’s account, they are now your “worst friend.”
Quantifying the privacy damage
In order to estimate the damage we first need to estimate how many accounts’ would be directly jeopardized by such hypothetical credentials breach. The disclosed passwords file contained some 6.5 million unique passwords. Since passwords are humanly selected, repetitions are bound to be found.
In the RockYou password breach , which now serves as the gold standard for passwords study, it was found that the uniqueness of the password was less than 50%, i.e. each password was used more than twice on average. Therefore, it’s safe to assume that the number of accounts directly hit with such hypothetical breach would well exceed the 10 Million mark. For this reason, we will use 10M as our approximation for the number of breach-able accounts.
Now let’s move forward with an estimate of collateral damage. How many friends did the directly hit accounts have? A naïve approach would be to multiply 10M by the number of the average unique friends each member has. It’s easy to see that if the number exceeds 16, then 10M breached accounts would span the whole 160M members of the social network.
To conduct a more thorough analysis, let’s use a study conducted by Stanford researchers from 2008. In this study, it was shown that it is enough to randomly select 6% of the accounts in a social network in order to span 50% of the network’s topology. For a social network which consists of 160M users, 6% is just about 10M compromised accounts.
It should be noted that once the password barrier has been removed by the attacker, it is technically very easy to harvest data, as illustrated some years ago when 100M publicly available Facebook’s profile were harvested by a security researcher.
Summing up, the collateral damage of such potential credential breach to the privacy of LinkedIn users could have easily included half of LinkedIn users – just shy of 100M users.
Quantifying the monetary damage
How valuable is LinkedIn’s data topology data? The social graph data is pretty much the main asset of Linkedin. This data is the revenue base for LinkedIn, as described on their SEC filing: “42 percent of 2010 revenue came from hiring solutions ($101.8 million), 33 percent came from marketing ($79.3 million); and 25 percent came from premium subscriptions ($61.9 million).”
The market is currently evaluating LinkedIn (NYSE: LNKD) to be worth more than $10B. When it comes to social networks, “friendliness” determines value. In the case of Facebook, for instance, one observer wrote, “We are not the customers of Facebook, we are the product. Facebook is selling us to advertisers.” The same is true for LinkedIn. Someone could have (or did) steal most of the LinkedIn product. Taking that into account really emphasizes the negligence of LinkedIn protecting its main asset. Could it be that we need to add LinkedIn’s investors as the main victims of the breach?