Researchers have found an unprotected database storing 1.8 billion posts collected from social media services, news websites and forums by a contractor for the U.S. Department of Defense.
The data was discovered on September 6 by Chris Vickery, director of risk research at cyber resilience firm UpGuard, inside an AWS S3 storage bucket that was accessible to any user with an AWS account.
Based on the names of the subdomains storing it, the information appears to have been collected for the U.S. Central Command (CENTCOM) and the U.S. Pacific Command (PACOM), unified combatant commands of the Department of Defense.
The exposed records represent comments posted on news websites, forum messages, and posts from social media services such as Facebook, and they cover a wide range of topics, including sports, video games, celebrities and politics. The data had been collected between 2009 and present day.
While some of the posts appear to be written by American citizens, many of them are in Arabic, Farsi and various dialects spoken in Pakistan and Afghanistan.
“Arabic posts criticizing or mocking ISIS, posted to Facebook pages for Iraqi anti-jihadi groups, or Pashto language comments made on the official Facebook page of Pakistani politician Imran Khan, who has drawn scrutiny from both the Taliban and the US government, give some indication of content that might be of interest to CENTCOM in its prosecution of regional wars and against Islamic extremists,” UpGuard said in a blog post.
The vast amount of information has been set up for searches via Apache Lucene, a high-performance, full-featured text search engine library.
An analysis of the data showed that it was likely collected for the Pentagon by VendorX, a now-defunct private sector contractor. While it had been in operation, the company claimed it was working on Outpost, a “multi-lingual platform designed to positively influence change in high-risk youth in unstable regions of the world.” The project was exclusively run for CENTCOM.
While the exposed data has been collected from public sources, UpGuard believes the incident raises some questions about the privacy and civil liberties impact of the U.S. government’s intelligence operations. The leak also once again highlights the risks associated with third-party vendors.
The Department of Defense has secured the leaky database. The organization told CNN that the information is not collected or processed for any intelligence purposes. A representative of CENTCOM said the data is “used for measurement and engagement activities of our online programs on public sites,” but declined to elaborate.
This is not the first time UpGuard has found an unprotected AWS S3 bucket storing data belonging to a high profile organization. In the past months, the company discovered similar leaks tied to Accenture, the U.S. Republican Party, TigerSwan, Verizon, and the U.S. military.