48 million records containing detailed personal information of tens of millions of people were exposed to the Internet after data-gathering company LocalBlox left a cloud storage repository publicly available.
The personal and business data search service gathered and scraped the exposed data from multiple sources, UpGuard security researchers discovered. The exposed information includes individuals’ names, physical addresses, and dates of birth, along with data scraped from LinkedIn, Facebook, Twitter, and more.
LocalBlox co-founder Ashfaq Rahman has already confirmed that the exposed information indeed belongs to the company.
Because the exposed information combines personal data with details on the people’s Internet usage, it builds “a three-dimensional picture of every individual affected,” UpGuard says.
Armed with this data, one would not only know who the affected individuals are, but also what they talk about, what they like, even what they do for a living. This information can be used to target users with ads or political campaigning, but can also expose them to identity theft, fraud, and social engineering scams.
The exposed data was stored in an Amazon Web Services S3 bucket that was configured for Internet access and was publicly downloadable. On February 18, when UpGuard discovered it, the bucket contained a 1.2 TB ndjson (newline-delineated json) file that was compressed to a 151.3 GB file.
After downloading and analyzing the file, UpGuard discovered that it belonged to LocalBlox. The company was informed on the issue on February 28 and the bucket was secured later that day.
The file was found to contain 48 million records, each in json format and separated by new lines. The security researchers also discovered that the real estate site Zillow was used in the data gathering process, “with information being somehow blended from the service’s listings into the larger data pool.”
Exposed source fields revealed where the scraps of data were collected from.
“Some are fairly unambiguous, pointing to aggregated content, purchased marketing databases, or even information caches sold by payday loan operators to businesses seeking marketing data. Other fields are more mysterious, such as a source field labeled ‘ex’,” the security researchers note.
Some of the data came from Facebook and included data points such as pictures, skills, lastUpdated, companies, currentJob, familyAdditionalDetails, Favorites, and mergedIdentities, along with a field labeled allSentences, which suggested that the information was scraped from the Facebook html and not through an API.
The main issue that this incident reveals is the ease at which data can be scraped from Facebook.
“In the wake of the Facebook/Cambridge Analytica debacle, the importance of massive sets of psychographic data is becoming more and more apparent,” UpGuard notes.
Another issue this incident brings to the spotlight is that third-parties often target data from popular websites and monetize the information in new ways, perhaps without the knowledge of the impacted individuals (and likely without the website’s – in this case Facebook – knowledge either).
LocalBlox says it is “the First Global Customer Intelligence Platform to search, combine and validate deep business and people profiles.” Thus, the exposed data represents the actual product the company offers: psychographic data that can be used to influence users.
There’s a clear business interest in this type of data harvesting, processing, and resale, meaning that massive and intrusive data sets clearly exist, for both companies and political parties to leverage when looking to influence people.
“What should be a wonder is that these datasets aren’t better secured and administered. This exposure was not the result of a clever hack, or well-planned scheme, but of a simple misconfiguration of an enterprise asset— an S3 storage bucket— which left the data open to the entire internet. The profitability gained by data must come with the responsibility of protecting its integrity and privacy,” UpGuard also points out.