It's No Secret: Measuring the Security and Reliability of Authentication via ‘Secret’ Questions

Seeing as this is ostensibly a crypto blog, I'd like to comment on a paper from earlier this year: It's no secret-- Measuring the security and reliability of authentication via 'secret' questions, by Stuart Schechter, A. J. Bernheim Brush, and Serge Egelman. This paper looks at the security of 'security questions': those questions about yourself you have to answer to get back into an account when you've forgotten your password. And (shock! surprise!) they find that this sort of mechanism generally sucks from a security point of view-- perhaps even more than passwords do.

The paper describes what is essentially a social-science experiment to test the security of these 'security questions.' Before I go on, let me point out that these questions are actually quite important. If you forget your password to most websites, they'll send a temporary password to your email account. But what happens if you forget your Gmail password? Or your Hotmail password? More generally, what happens if you forget your password to your webmail account? These websites can't just send a temporary password to your email account, since that's exactly what you can't access anymore. They have to have some other mechanism to authenticate users, and that's where these 'security questions' come in.

So, these 'security questions' exist to secure a 'back door' into your webmail account, and should therefore be as secure as your password. But are they? To test this, the authors of this paper ran the following experiment:

They brought a number of volunteers into the lab, where each volunteer brought with them a 'partner' (friend, spouse, significant other, etc.).
The volunteer then answered typical security questions, exactly as if they were opening a new webmail account. (They also asked each volunteer if they would trust their partner with their webmail password.)
In another room, the partners were asked to try to guess their volunteer's answers.
Also, the volunteers came back some months later to try to guess/remember the answers they gave above.

Furthermore, the authors collated the answers given by the volunteers to see how 'unique' the typical answer is.

Have a bad feeling yet? You should. The answers would gratify the cynical among you:

The partners were often able to guess their volunteer's answers, even volunteer wouldn't trust the partner with their password.
Also, most answers were not very unique at all. On the contrary, the most popular answers tended to be very popular, indeed.

And to add insult to injury, the security questions would often fail even at their intended purpose: volunteers were surprisingly good at forgetting the answers they gave when they 'opened' their accounts.

I like this paper for a number of reasons. It makes me feel smart for being cynical about computer security (though you don't have to study the field too long before growing cynical yourself). It's also full of amusing little gems, like how often a volunteer and a spousal partner would give different answers to the question 'Where did you meet your spouse?' (Nine out of 18 times.) But I actually have two more substantial points to make about this work:

First, it implicitly (but clearly) considers two different kinds of threats:
- The psycho ex, who knows you really well and is really interested in your email in particular, and
- the script-kiddie who just wants to break into as many webmail accounts as possible.

(Note that if you are famous or a political figure, you get the best of both worlds: lots of strangers are interested in you, specifically. See Palin, Governor Sarah.) The experiment shows that the security questions are vulnerable to both kinds of attackers. The psycho-ex can guess your answers based on their knowledge of you, while the script kiddie can break into lots of accounts just by guessing the most common answers to each question. Despite that, however, it's nice to see the researchers acknowledge that these are qualitatively different kinds of attackers with different kinds of abilities.

Secondly, my colleagues are by now very tired of hearing me rant about the piss-poor levels of scientific rigor I see in most computer-security 'experiments' (and I use that word loosely). From that perspective, this paper is a breath of fresh air. Just look at these riches:
- It's clear that the researchers have been trained in human-subjects methods, and consulted with an IRB. I say that because they engaged in (and made clear that they had) some simple practices that IRBs would require. For example, every participant was compensated for their time in some way-- but not to the degree that the compensation could be considered coercive. Little things like that.
- The researchers had also clearly dealt with human subject before, and took human nature into account when designing the experiment. To encourage partners to actually really try to guess volunteers' answers, for example, they gave out a prize for most answers guessed. That kind of thing.
- They validated their data. Why were they getting such weird results to question four? Because the actual question ('Name of your first pet?') was mis-typed ('First pet?'), soliciting people to give the wrong kind of answer ('Dog' instead of 'Rover', for example). Oops. Guess they will have to take that question out the analysis.
- They gave enough description of the experiment to actually replicate it yourself, and presented the actual data gathered. (This is a surprisingly common bad habit in the field: most computer-security 'experiment' paper will either insufficiently describe the experiment, withhold the actual data collected, or both.)
- And oh, look! Statistical tests! Actual p-values! I'm in love.

It's possible that I'm being too hard on my field, here. Yeah, most 'experimental' computer-security paper fail even the most basic aspects of scientific rigor, above, but this paper had the luxury of studying a 'natural' phenomenon: one that does not adapt in response to your measurement tools and obeys some static, objective probability distribution. Most computer-security phenomena, on the other hand, adapt to evade detection by your instruments, and occur adversarially (and not randomly). It's very hard to collect data, know ground truth, or even apply statistical tests in that situation. But still-- these authors manage to do it. We, as a field, should put aside the excuses and follow their lead