## IBM Claims Breakthrough In Analysis of Encrypted Data 199

Posted
by
timothy

from the scrambled-in-the-shell dept.

from the scrambled-in-the-shell dept.

An anonymous reader writes

*"An IBM researcher has solved a thorny mathematical problem that has confounded scientists since the invention of public-key encryption several decades ago. The breakthrough, called 'privacy homomorphism,' or 'fully homomorphic encryption,' makes possible the deep and unlimited analysis of encrypted information — data that has been intentionally scrambled — without sacrificing confidentiality."*Reader ElasticVapor writes that the solution IBM claims*"might better enable a cloud computing vendor to perform computations on clients' data at their request, such as analyzing sales patterns, without exposing the original data. Other potential applications include enabling filters to identify spam, even in encrypted email, or protecting information contained in electronic medical records."*
## Fully homomorphic encryption using ideal lattices (Score:5, Informative)

## Since it's close to being slashdotted... (Score:5, Informative)

## from the horses mouth (Score:5, Informative)

Just FYI this site is whole sale cut and paste ripping IBM press off.

http://www-03.ibm.com/press/us/en/pressrelease/27840.wss

## Re:No More Privacy (Score:3, Informative)

If they really figured it out, then sure they can analyze without your request, but they can't decrypt the results without your key. So you still have the same privacy. BTW, this is the entire point of this process.

## Re:Wait, what? (Score:5, Informative)

The point is not to read the content, but to enable a computer to analyze the content in such a way that they can deduce statistics and patterns from it. FTFA:

computer vendors storing the confidential, electronic data of others will be able to fully analyze data on their clients' behalf without expensive interaction with the client, and without seeing any of the private data

I don't need to know that you love apples to know you definitely love the same thing as 14 other people. Lets assume that we have 20 encrypted sets of data. Lets also assume the 20 sets say basically the same thing but because of the encyrption method look nothing a like from the raw data perspective. If you go ahead and find a way to analyze the encryption enough to know that the 20 emails all contain a similar message, but not enough to actually know what the message is... well then! You could go ahead and store all of ebay's customer information and do massive amounts of data crunching for them, without ever actually seeing any data.

This is a huge problem in IT, where admins need access to the databases in order to see how the data is being stored, how the tables are working, etc etc.. but can't actually have access to the database because then they might see customer information. So you either let joe-bob admin in there and let him see all the data, or you don't. Now you can let the admin in there, they can determine anything they might want to know, but they never actually see any exact data.

No, I don't know anything about the math portion.. but thats basically what they are trying to say in the article. I think. :)

## Wikipedia to the rescue (Score:5, Informative)

Cool, but I'm half-convinced that holes will be found. The first time a new encryption scheme is put to the test, it usually fails. Still, hopefully, it'll lead to a truly secure scheme.

## BAD summary (Score:5, Informative)

You can not analyze the data. You can perform calculations on it without knowing what it is. So, for instance, you could encrypt all your tax info, send it to a company that processes the encrypted data without decrypting it, and sends you back your encrypted tax return, without ever having seen any of your financial detail.

## simple explanation (Score:5, Informative)

OK, it looks like a lot of people are missing the point.

What Gentry figured out was a scheme for carrying out arbitrary computations on encrypted data, producing an encrypted result. That way, you can do your computation on encrypted data in the "cloud", but only you can view the results.

If E() is your encryption function, x is your data, and f() is the function you'd like to compute, homomorphic encryption gives you a function f'() such that f'(E(x)) = E(f(x)). But at no point does it actually decrypt your data.

This could be huge for secure computing.

## Re:No More Privacy (Score:5, Informative)

Everything remains encrypted throughout the process, including the output. Only the client can read the results. That is the point.

## Re:If they can analyze the data... (Score:3, Informative)

## Re:No More Privacy (Score:3, Informative)

TFA is skimp on this but after bit of Googling around I understand a little more, see also http://en.wikipedia.org/wiki/Homomorphic_encryption [wikipedia.org].

The point being that those who provide the encrypted data must encrypt it in a special way to allow the homomorphic properties to be taken advantage of.

## Re:logic? (Score:3, Informative)

## Re:logic? (Score:3, Informative)

The summary is wrong. A Privacy Homomorphism allows third parties to compute calculations on the data on your behalf without decrypting either the input or the output. In other words, the cloud provider could, for example, total up your sales data without ever decrypting the individual sale information or the final total. The encrypted final total would then be given to you, and you would decrypt it to learn what it was.

At no point does a third party have access to a decrypted form of your data.

## here's why this is important. (Score:3, Informative)

TFA doesn't seem clear on this point, but what the name of the technique implies is that you can perform the operation, but neither the inputs nor the outputs are ever decrypted. So if you can't see the question, and you can't see the answer, then why would you perform the operation other than at the request of someone who can (i.e. the client)?

Example, I want the total sales figures for all the left handed employees. I cobble together the appropriate SQL processing request send it to my cloud server which rummages throught the data base summing up the figures for some subset of the fields. It sends me back just the sum, encypted. It never knows which employees it was selecting nor any of their sales figures or even the sum. It just has the encrypted result that it sends to me all processed.

otherwise I'd have to pull every encrypted record of every employee across the web to my computer. In effect doing a table dump across the net for even the smallest calculation. Yuck! let alone not working on my iphone.

Another application is encryption of ballots. The achilies heel of almost every voting encryption algorithm is that somewhere somebody has the key and to do any processing you have to decrypt the ballots. THis way you can do the sum and only decrypt the results. you could publish the encrypted sum before the key is even unsealed then publish the key. The key does not have to ever be in the hands of the central tabulators.

Current voting encryption schemes require centralized control with possession of the keys. This way the control is decentralized with the counters and the key keepers not being the same person.

## Re:Wikipedia to the rescue (Score:3, Informative)

Holes are always found - no method is 100% foolproof.

http://en.wikipedia.org/wiki/One-time_pad [wikipedia.org]

## Re:If they can analyze the data... (Score:3, Informative)

This isn't a vulnerability with existing encryption systems, it's a scheme for a different way to structure and encrypt the data to explicitly allow calculations on the data without exposing the original values.

## Re:Analysis can mean Disclosure of Information (Score:4, Informative)

It does now. That's the point. I don't think the wording in the article is very good. What they're doing is more like simulation of circuits (AND and XOR gates). You can construct a general purpose computer from such gates. You can run a gate-level simulation of such a machine, but your simulation would normally use unencrypted data. This breakthrough allows your simulated machine to use encrypted data, so you feed it encrypted data and you get out encrypted data. All the guy running the simulation knows is the design of the simulated hardware.

:-) And no, I never found a method that could handle both AND and XOR, so I look forward to reading more about this.

This can be taken one step further. If you simulate a programmable computer - not just a fixed algorithm - then the guy running the simulation won't even know what *algorithm* he's running in addition to not knowing what the data is since the program is just encrypted data. I've been toying with this for a while without knowing the proper name for it

## Re:BAD summary (Score:3, Informative)

Actually, imagine being able to add two numbers together without knowing what those two numbers were and returning the total that you STILL don't know what the number is, but you have the cyphertext for it. You still need the key to decrypt the total.

Example in plaintext:

4 + 5 = 9

Example encypted (oversimplified):

D32JFS3 + 234DSF31 = 42SDF23

So the third party would receive D32JFS3 and 234DSF31 (not knowing they meant 4 and 5) and he would return 42SDF23 (not knowing it was 9)

The ablility to add two peices of cyphertext to get some (still unknonw) peice of cyphertext does not increase the "breakability" of the encryption because, just like the rosetta stone, you really need pairs of plaintext and cyphertext to do any real analysis. There may be some NEW attack methods on lattice based encryption techniques, but they are not yet widely known.

## Re:from the horses mouth (Score:2, Informative)

## Is the plaintext needed post-encryption? (Score:3, Informative)

A lot of respondents seem to have seized on a spurious notion of what this is all about. That isn't surprising since the Slashdot article and the press release and even the abstract are rather obscure. No sign of a preprint, but the same abstract shows up for a number of colloquiums in the last couple of months. The paper is from a proceedings, so it may itself not be especially profound.

The abstract says:

"We propose a fully homomorphic encryption scheme -- i.e., a scheme that allows one to evaluate circuits over encrypted data without being able to decrypt. Our solution comes in three steps. First, we provide a general result -- that, to construct an encryption scheme that permits evaluation of arbitrary circuits, it suffices to construct an encryption scheme that can evaluate (slightly augmented versions of) its own decryption circuit; we call a scheme that can evaluate its (augmented) decryption circuit bootstrappable."The encryption and compression literature tends to use the word "scheme" where others might say algorithm or transform. "Circuits" here is a term of art (maybe arising originally from actual physical circuits, as in the Enigma machine?)

"An encryption scheme that permits evaluation of arbitrary circuits" suggests only that the possessor of the private key can generate these arbitrary queries, not that anybody and their brother can scavenge the encrypted data. It isn't stated whether such a query also requires the plaintext. It would be pretty cool if one feature were to be able to discard the plaintext post-encryption.

The gimmick appears to be that the arbitrary circuit can include the decryption itself (the bootstrap part). Note that this feature is far more cool (assuming it works) than all the nonsense about cloud computing. Somehow the data are *arbitrarily* available to properly encoded queries without ever being exposed - even to the CPU performing the operations. This processor could be on the same machine, on some remote server, in the cloud or across the galaxy. How cool is that?

## Re:OK, I don't understand (Score:3, Informative)

What are the operations for which this is homomorphic?It has to be quite limited. Otherwise for example, lets suppose I have an integer (encrypted of course) and I have comparison and addition/subtraction and multiply/divide.

I can very easily find the encrypted values of both 0 (a-a for any a) and 1 (a/a)The article neglected to mention that the underlying encryption system is randomized public key encryption. This means (A) you can easily discover encryptions of 0, encryptions of 1, and encryptions of anything else, because it is a public key system and you can encrypt anything you like.

It also means (B) this won't help you with decryption because every encryption of 0 looks different. So knowing some encryptions of 0 will not let you recognize whether a given encrypted value is an encryption of 0, of 1, or of anything else.

And, I don't see how you can prevent equality tests in the encrypted domain. You might have to calculate a Kernel but surely there is no way to prevent that.You certainly can do equality tests in the encrypted domain. It's just that the result of the equality test is encrypted; for example, it is an encryption of 0 or an encryption of 1. But you have no idea which it is. Only the client who supplied the encrypted data (and more importantly, the public key encryption system) can decrypt the result of your equality test, or of any other calculations you do on encrypted data.

## Re:No, misleading headline (Score:3, Informative)

allowsthe types of things described in the article, while still keeping certain desired information secure. This Wikipedia article [wikipedia.org] gives a much better description of the issue, and you don't even really have to understand abstract algebra to understand that section.## Re:But what if it took... a TRILLION times longer? (Score:4, Informative)

I read the paper and my guess is that a TRILLION is actually an understatement. It looks to me like it might be almost INFINITELY slower. In other words, completely impractical and only of theoretical value.

However, now that the first step has been taken, it's possible that someone will come up with an improvement that makes the idea practical someday.

## Re:simple explanation (Score:1, Informative)

Let x be the data. Assume x is a 2 bit value.

Let f(x) = x & 0x01

Let g(x) = (x & 0x02) >> 1

Use new magic to get f'() and g'()

compute f'(E(x)) and g'(E(x))

if they are the same, you know x is either 00 or 11

if they are different, you know x is either 01 or 10

This can be scaled up to any N bit value to arrive at one of two possible values for any x given E(x).

If you know anything at all about the type of x (i.e. it may be ascii text, or its range may be bounded), choosing the right one of the two possibilities is trivial.

## Clarification on the technology (Score:5, Informative)

A few misconceptions continue to circulate here; let me try to shed some light.

First, the encryption system is apparently not practical in its current form. Maybe improvements will occur some day to make it practical, maybe not. It is still a major theoretical breakthrough because fully homomorphic encryption had often been thought to be impossible in the past. It has been a long sought goal in cryptography and it is remarkable to see it finally achieved. So in practice nobody is going to be doing spam filtering, income tax returns, or anonymous google searches any time soon.

Second, several people have gotten tripped up over an apparent weakness: if you can calculate E(X-Y) you can get an encryption of 0; if you can calculate E(X/Y) you can get an encryption of 1; and from these you could get other encryptions and potentially break the system. This idea fails for two reasons: first, it is a public-key system, so you don't need to go through all this rigamarole to get encryptions of 0, 1, or anything. In public key cryptography, anyone can encrypt data under a given key, without knowing any secrets. So it is already possible to get encryptions of known values, even without the special homomorphic properties. Second, in order for public key systems to be secure, they need to have a randomization property. In randomized encryption, there are multiple ciphertext values that encrypt the same plaintext. Basically, the encryption algorithm takes both the plaintext and a random value, and produces the ciphertext. Each different possible random value causes the same plaintext to go to a different ciphertext. The decryption algorithm nevertheless can take any of these different ciphertext values and produce the same plaintext.

This may be confusing because the most well known public key encryption system, RSA is not randomized. At the time it was invented, this aspect was not well understood. Shortly afterwards it became clear how important randomization is. Other encryption systems like ElGamal do use randomization, and RSA was adapted to allow randomization via what is called a "random padding" layer, known by the technical name PKCS-1. This adds the randomness which allows RSA to be used securely.

One other point is that people are getting hung up about what "fully" homomorphic encryption covers. Exactly what operations can you do? I think the best way to think of it is to go down to the binary level. We know that in our computers, at the lowest level everything is 1's and 0's. These get combined with elementary logical operations like AND, OR, NOT, XOR, and so on. Using these primitive operations, all the complexity of modern programs can be built up.

In the case of the homomorphic encryption, it is probably best to think of the values being encrypted in binary form, as encryptions of 1's and 0's. Keep in mind the point above about randomized encryption: all the encryptions of 1 look different, as do all the encryptions of 0. You can't tell whether a given value encrypts a 1 or a 0. Given these encrypted values, you can compute AND, OR, XOR, NOT and so on with these values, and get new encrypted values as the answers. You don't know the value of the outputs, they are encrypted. Only the holder of the private key, who originally encrypted the data, could decrypt the output. But you can continue to work with these output values, do more calculations with them, and so on.

Let me give an example of how you could do an equality comparison. Suppose you have two encrypted values and want to determine if they are the same. Recall that we are working in binary, so you actually have two sequences of encrypted bits; some are encrypted 1's and some are encrypted 0's, but you can't tell which. So the first thing you compute is the XOR of corresponding bits in the two values: XOR the 1st bits of each value; XOR the 2nd bits of each value, and so on. Now if the values are equal, the results are all encryptions of 0's. If the values are different, some of the results will be encryptions of 1's. But aga

## Re:Can I run this homomorphism on your data? (Score:4, Informative)

f(x) = x

No. The operations you get are addition and multiplication, that's it. Given E(x) and E(y), you can compute E(x + y) or E(xy), nothing else. And you do this without ever learning x or y. RTFWA [wikipedia.org].

The reason for the terminology is that the encryption function E is a ring homomorphism [wikipedia.org] between plaintext and ciphertext. Some operation of addition is defined on both plaintext and ciphertext such that if x and y are plaintext, f(x + y) = f(x) + f(y). (The "+" on the left is addition of plaintext, the "+" on the right is addition of ciphertext: two totally different operations.) Multiplication is similar. You don't get to apply arbitrary homomorphisms to the data, it's the (predetermined) encryption function that's the homomorphism.

Actually, I don't see any mention of subtraction -- so maybe it's really a semiring homomorphism. With an actual ring homomorphism you'd also have f(x - y) = f(x) - f(y), and some 0 element with f(0) = 0. And maybe f(1) = 1, depending on definition.

## Re:Homomorphism (Score:4, Informative)

Here's a very simplified example of a homomorphism. I define a function

f(x) = 3x

This function is a homomorphism on numbers under addition. Its image "preserves" the addition operation. What I mean more precisely is

f(a) + f(b) = f(a + b)

That's pretty easy to verify for the function I've given.

But examples like you gave (semigroup homomorphisms) have existed for a long time. Basic RSA has that property. The key advance here is that you have a semi

ringhomomorphism, where it preserves two operations, one of which distributes over the other. Like multiplication and addition, or bitwise and and xor. (For those who don't follow: x*(y + z) = x*y + x*z, x & (y ^ z) = (x & y) ^ (x & z). If you don't believe the second identity, try all possibilities.)An example of a semiring homomorphism on the reals is f(x) = -x. Then f(x + y) = -(x + y) = f(x) + f(y), and f(xy) = (-x)(-y) = xy = f(x)f(y). (Unless you believe in Time Cube.)

It seems distributivity is enough to do complicated calculations. You could simulate and and xor gates, I guess. Then you could get ~x = x ^ -1, x | y = ~(~x & ~y), etc.: all possible binary operations. That's enough to build a virtual computer right there, all operating on encrypted data.

Of course, the one running the code would be able to figure out exactly what algorithm you're using. So it's not perfect. But it's pretty cool regardless.

## Not an "IBM Researcher" (Score:1, Informative)

Gentry did this work as part of his PhD at Stanford; IBM played a minor role, if any at all, in this research. The original article is copied word-for-word from this IBM press release: http://www-03.ibm.com/press/us/en/pressrelease/27840.wss

It's no surprise that IBM is trying to claim this as their work, but Craig has been giving talks on this topic for a while in academic settings; I was fortunate to be in the audience at Brown six weeks ago when he blew the audience's minds in a two-hour talk. The conference at which he officially unveiled this technique, STOC 2009, happened at the beginning of this month, so any way you slice it this IBM PR stunt is late to the game. The language in the press release is just depressing, as they transparently try to take credit for this seminal work that has very little to do with the company, and only admit at the very end that he just happened to visit IBM during one summer.