What do Sex Surveys and Cybercrime Surveys Have in Common?
Cybercrime is either getting worse or getting better. According to a recent report from Microsoft's research team, we simply do not have enough verified data to support either claim.
In a new paper Sex, Lies and Cybercrime Surveys, Microsoft researchers Dinei Florencio and Cormac Herley list various ways in which survey parties routinely dismiss the inherent difficulties in accurate survey collection. They cite that survey mean "can be affected to an arbitrary extent by a single lie, transcription error or exaggeration" and " the cyber-crime estimates that we have appear to be largely the answers of a handful of people extrapolated to the whole population." Cybercrime surveys simply incorporate unverified outliers into the final result "in order to arrive at otherwise indefensible conclusions." That's how we get widely exaggerated claims on how bad phishing, spam, or cybercrime really is.
The researchers write "it is ironic then that our cyber-crime survey estimates rely almost exclusively on unverified user input. A practice that is regarded as unacceptable in writing code is ubiquitous in forming the estimates that drive policy."
The problem is not limited to cybercrime. To discuss the larger issue of small sample sizes and accuracy, Florencio and Herley also talk about sex surveys. In general they report that men greatly exaggerate the number of sex partners they have during their lifetimes, while women tend to under report. A very small number of each exaggerate by a lot and therefore skew the overall results.
The researchers say there is similarity with cybercrime surveys, such as the Federal Trade Commission's annual Identity Theft Report, where some people really exaggerate their losses and are nonetheless factored into the final results. The researchers state that since identity theft losses are confined to a small segment of the population, this greatly magnifies the outliers. In one example they cite how a single individual who claims $50,000 losses, in an N = 1000 person survey, is enough for the survey taker to extrapolate a $10 billion loss over the entire population. In another survey, one unverified claim of $7,500 in phishing losses results in $1.5 billion in losses over the entire population. The researchers write "The vagueness and lack of clarity about what has been measured allows for a large range of interpretation."
And that's bad. The paper includes examples were headlines shout how cybercrime has doubled. Or has declined. This includes Congressional testimony where the losses due to cybercrime have been estimated to be about $1 trillion annually. But, according to the Microsoft researchers, we don't really know.
The paper doesn't really provide answers other than to be suspect of outliers. Removing exaggerations would largely flatten the results. There would be few headlines. And cybercrime, perhaps, would not get as much attention if the annual increases were manageable.
The authors conclude: "Are we really producing cyber-crime estimates where 75% of the estimate comes from the unverified self-reported answers of one or two people? Unfortunately, it appears so. Can any faith whatever be placed in the surveys we have? No, it appears not."
And yet these estimates are all we have.