Oblivious transfer with access control (or, how to query a database anonymously)

Between my trip to CCS last month and my employer's internal security-focused reading-group, I've been exposed to quite a bit of recent research in the area of computer security recently. Much of it has been highly technical, of course, but after a while it has sparked some very general observations about the field. Over the next few posts, I'd like to review some of the more accessible papers and use them to motivate some of those observations. In this post: why I am so frustrated by the field of academic cryptography.

To explain this, let me describe the CCS paper Oblivious Transfer with Access Control, by Jan Camenisch, Maria Dubovitskaya and Gregory Neven. (I should also mention that this paper builds on very similar previous work by Scott Coull, Matthew Green, and my friend Susan Hohenberger.) The set-up is this: suppose you're a pharmaceutical company making some wonder-drug, and you need access to the genomes of people with some genetic condition. Or, you're making some new technology, and you need to do a patent-search. Now, there are databases for both genomes and patents,and you can pretty easily subscribe to them. The problem is that you don't want anyone to know what genomes or patents are of interest to you, as this may clue in your competition as to what you're doing. This includes the database itself-- you don't even want it to know what genomes you're getting from it. You might not even want it to know who you are. However, you do somehow need to prove to the database that you are a paying subscriber. Can this even be done?

The answer, according to this paper, is "yes." I would add, however, "in theory." See, the paper presents a protocol for accomplishing this kind of database-query and proves that the protocol achieves the above security goals. That's good. I'm all behind that. But the protocol is structured like this:

Step 1: the database server encrypts every record using a new fresh key each time, and sends all these ciphertexts to the client making the query.
Step 2: the client and the database engage in something called 'oblivious transfer', by which the client gets one of the keys but the database doesn't know which one. (This is where the term 'oblivious transfer' from the title comes from.)

There's a lot more to it, of course, but this is enough detail to motivate my rant, which begins now. Let's look at step 1: the database server encrypts the whole database and sends it over. Really? Are you sure? Let's see how that works out in practice.

Google tells me that the patent office has made available over seven million patents. By an extremely scientific process (averaging the size of the five random patents served on the Google patent search homepage,) I have concluded that the average patent requires a PDF about 164KB in size. The database, then, is about a terabyte in size. It would take about 2.5 hours to send the first message of this protocol (over gigabit ethernet). It would be much cheaper just to buy a copy of the whole database, stick in on a networked drive somewhere, and make all of your queries internally.

But let's consider human genomes, and imagine a world in which genome sequencing is cheap and easy. Now, Wikipedia tells me that the FBI maintains the Combined DNA Index System, which holds 5 million DNA profiles. A profile is not the same as a full genome, but I'm sure that we'll start storing entire genomes soon enough. Wikipedia also tells me that a single human genome is at least 756 megabytes. (That's for an XX genome. Interestingly, an XY genome is more: 770 megabytes.) But 99.9% of all human DNA is shared. So, really, only 770KB of that needs to be stored/communicated. The entire database, then is 3.6 terabytes, and would take 8 hours to send (again, over gigabit ethernet).

So, again I ask: really? Is anyone really going to engage in a protocol where the first message takes 8 hours to send? (And that's not even considering the communication cost of the overhead, or the computational cost of the encryption.) I had the chance to pose this question to the presenting author at their CCS talk, and her answer was essentially: "We were only really thinking of databases with a thousand or so entries."

...Which leads me back to my general frustration with the field of academic cryptography. It is not that this paper is especially impractical for the field. By the standards of the field, in fact, this protocol is entirely reasonable. Specifically, the protocol executes within something called 'probabilistic polynomial time' (PPT) meaning that its cost doesn't grow too fast as a function of key-length. But that rather technical definition really has nothing to do with actual practicality. The best way to think about this paper, then, is that it holds a mathematical result: some specific type of computation can in fact be done in PPT. And this is true of the field as a whole: it has become a field of mathematics, devoted to exploring one particularly interesting complexity class.

So, why am I so frustrated with this field? Because its interests and mine are so wildly out of sync. Were I interested in cryptography as a purely mathematical endeavor, I'd be happy as a clam. But I'm not. My interest is in using cryptography to somehow make the world a better place, and that's not happening right now. The gap between theoretical and practical cryptography is enormous, and gets wider with every paper like this one. The mathematical field of academic cryptography keeps going merrily along, producing result after result like this one (some interesting form of secure computation is computable in PPT) while the practice of cryptography remains where it was in 1990.

No, really-- what has actually changed in the past 20 years? We're replaced one block cipher with another (DES with AES), identify-based crypto has been made into a product no one buys (which is another rant all on its own) and more people have a certificate now than back then. In other words, nothing's really changed. We're still using the same basic kinds of crypto in the same ways-- or rather, trying to get people to use the same basic kinds of crypto the the same ways and failing.

Hence my frustration. There is almost no tech transfer from the world of academic cryptography to actual practice. Once the academic community discovers a PPT protocol for some task, there is no incentive to develop it into a practical one. So these mathematical discoveries keep piling up on the ever-growing mound of things that don't actually help me make the world better. Can the actual practice of cryptography be improved? Oh my yes, but I can't do it without help. I need better-designed crypto, I need faster crypto, I need more human-friendly crypto... I need a lot of things. Do I need a protocol for querying a database anonymously? Well, sure, I might-- so long as it can be run in the real world. But yet another PPT protocol is, in and of itself, no help at all.