New Encryption Scheme Could Protect Your Genome 78
sciencehabit writes "As the cost of genetic sequencing plummets, experts believe our genomes will help doctors detect diseases and save lives. But not all of us are comfortable releasing our biological blueprints into the world. Now cryptologists are perfecting a new privacy tool that turns genetic information into a secure yet functional format. Called homomorphic encryption, the method could help keep genomes private even as genetic testing shifts to cheap online cloud services."
New? (Score:4, Informative)
Re: (Score:2)
However, I suspect that every new application requires the method to be applied differently. Also, for every new application, other attack vectors might be possible so it is crucial to sort these out. Just thinking.
Re: (Score:2)
exactly how long would it take to die, and how? (Score:3)
Re: (Score:1)
Re: (Score:2)
It doesn't protect you from that hot blond taking a strand of your hair to the local gene-scan station before going on a second date, but it does mean that the guy that hacked the NIH genetic database won't get the DNA of 400,000,000 people in one fell swoop. Though of course it probably also means that the NIH database will require thousands of times the storage capacity since de-duplication can't be applied to the massive genetic overlap between individuals.
And the nosy blond could be mostly stymied by la
Re: (Score:2)
Though of course it probably also means that the NIH database will require thousands of times the storage capacity since de-duplication can't be applied to the massive genetic overlap between individuals.
The human genome is what? About 1.5 Gigabytes? That's a lot of data, but far from unmanageable. Store two copies for redundancy and you have 3 Gigabytes. Let's round down a bit and say you can get 600 people's DNA onto 2 TB worth of drives. Let's say you pay $120 per terabyte, then you're paying 20 cents per patient for two copies. Of course, this will be enterprise class storage for medical purposes, so let's say $4 per patient. Not exactly bank-breaking. Anyway, you haven't presented any good reason why y
Re: (Score:2)
Researchers realized that the complex algorithms used during genetic tests could be closely approximated by the two basic mathematical operations. Lattice cryptology enabled homomorphic encryption, allowing computers to analyze encrypted data and return encrypted results without ever being able to decode the information.
I can't see how it would be very useful for actual genetic research either, since researchers generally need the decoded information as well as personal and family medical history when interpreting the results. The 10^9 higher computational overhead would also be a huge problem in research since, unlike a medical test where you know a pattern and are just trying to find out whether a single sample matches it, you are instead trying to find patterns shared by a group of genomes associated with a similar med
Re: (Score:2)
I think the idea is mainly that you don't want to have to completely resequence your genome every time you want to test for something new - after all we could be discovering new medically-relevant genetic properties for centuries, and the company doing the sequencing doesn't necessarily know or care about every potentially interesting finding, so you keep it on file somewhere. If costs continue to follow the current trend the first factor will likely only be an issue for a decade or two, before the price d
Re: (Score:2)
I think what they're really trying to sell in this article is saving everyone's data in a central repository where everyone's DNA could be mined for data without compromising their privacy. That's effectively impossible. The only way to do it would be to perform operations that examine the entire database to produce a sigle result. The required computing power/time would be astronomical under this model. Pretty much every other way of doing it allows you to narrow down a particular patients DNA and extract
Re: (Score:2)
Sounds like a good sales pitch, but how would homomorphic encryption enable such an anonymous data-mining paradise? As I understand it such encryption allows you to process the data without decrypting it, but the results are themselves encrypted with the same key. And if you have the key to access the results then you don't need the ability to process data without decrypting it.
I would assume that each DNA record has it's own key (otherwise it kind of defeats the point), and that you can't mix the processi
Re: (Score:2)
Sounds like a good sales pitch, but how would homomorphic encryption enable such an anonymous data-mining paradise?
Well partly by being effectively backdoored from the start. It seems unrealistic to believe there wouldn't be some sort of backdoor from the start to fix things when they break in the large, complex, inpenetrable data set. After things are pretty stable, the developers will be reluctant to get rid of the back door because of the large number of times they would have had to rebuild entirely from scratch if they didn't have the back door, and it will hang around forever. Mostly, however, there's the simple fa
Re: (Score:2)
Why would an honest individual put in a back door in the encryption for "testing"? Just test with data you have the key to. Much simpler and doesn't inherently undermine the integrity of the system you're building. And how can things "break" within an immutable data file? When's the last time you saw a "broken" bitmap or text file that wasn't due to either a failed creation (probably not worth fixing), or corruption of the transmission or storage medium that can be solved with an error-correcting wrappe
Re: (Score:2)
Why would an honest individual put in a back door in the encryption for "testing"? Just test with data you have the key to.
It doesn't take a dishonest individual. It's just fairly typical in such situations. It depends on who's actually in charge and if they run into problems.
Consider that the US nuclear launch codes were 00000000 for two decades. Consider that something like 30 billion dollars a day is spent in credit/debit card transactions based using a system with effectively _no_ security. Consider the failing grade nearly all large organizations receive pretty much every time they are audited for security. Even when thei
Re: (Score:2)
Okay, yeah editing is an issue as well, but not one relevant to archives of immutable data.
Fair point about data dumps from focused studies, I'm sure some of them would indeed contain common elements that could open an attack vector, though I don't know how big a vector a few known bits in a 3MB file would actually make. Certainly nothing like having 99.8% be known. It probably wouldn't be racial studies that do it though, IIRC there's not actually any well-defined racial boundaries from a genetic perspecti
Re: (Score:2)
Okay, yeah editing is an issue as well, but not one relevant to archives of immutable data.
True, but I'm not as confident as you that early versions of this will actually allow for immutable data. Avoiding all bugs that might require things to be re-encoded is a monumental task. Maybe they could pull it off. I would be truly, truly impressed.
It probably wouldn't be racial studies that do it though, IIRC there's not actually any well-defined racial boundaries from a genetic perspective - there's not even one single solitary gene shared by most black people that isn't also present in a lot of whites and asians (and vice-versa), we just travel and intermix too much. It only takes one person with a bad case of wanderlust a thousand years ago to introduce a gene into a large portion of an otherwise isolated population.
True. It depends a bit on the groups. Island populations, for example. Specific studies on people with a particular medical condition with a genetic link might be better example than people with particular ethnicities.
Yeah, I can't argue against the incompetence card. In fact that's why I think homomorphic encryption could be a wonderful thing for genetics - it means that the sequenced DNA need never be stored in plaintext anywhere outside the sequencing machine, not even in volatile memory while being analyzed. There's still the risk that someone gets their hands on both the key and data, but a single "never, ever keep these two things in the same place" security rule would go a long way towards protecting against that, and has at least a chance of being followed.
That is true. I think something like that
Re: (Score:2)
What is there to go fixably wrong? You sequence the DNA to a 1.5GB file - if there's any problem in that stage you're hosed already. Then you do a binary diff to your reference sequence - that's a pretty thoroughly mature technology. Then you encrypt it - again, any problems = you're hosed. And if we're working on the assumption that the lab has no access to the data once it leaves the sequencer as a 3MB encrypted file then they would be hard-pressed to fix anything in the data anyway, at most they coul
Re: (Score:2)
What is there to go fixably wrong? You sequence the DNA to a 1.5GB file - if there's any problem in that stage you're hosed already. Then you do a binary diff to your reference sequence - thWhat is there to go fixably wrong? You sequence the DNA to a 1.5GB file - if there's any problem in that stage you're hosed already. Then you do a binary diff to your reference sequence - that's a pretty thoroughly mature technology. Then you encrypt it - again, any problems = you're hosed. And if we're working on the assumption that the lab has no access to the data once it leaves the sequencer as a 3MB encrypted file then they would be hard-pressed to fix anything in the data anyway, at most they could reformat it into something more efficient to process, but that would seem a risky undertaking when you have no access the data to verify that you didn't just hose things completely.at's a pretty thoroughly mature technology. Then you encrypt it - again, any problems = you're hosed. And if we're working on the assumption that the lab has no access to the data once it leaves the sequencer as a 3MB encrypted file then they would be hard-pressed to fix anything in the data anyway, at most they could reformat it into something more efficient to process, but that would seem a risky undertaking when you have no access the data to verify that you didn't just hose things completely.
Well that's pretty much the point. If the model of the system is so secure that you're hosed if anything at all goes wrong, most people are going to hedge their bets by putting in a back door so they can try to fix things. When you're going to have to tell your clients to redo millions of dollars of really expensive data entry if anything goes wrong, you're going to be under a fair amount of pressure to make sure that doesn't happen. One way to do that is to secretly break your security model. It happens al
Re: (Score:2)
I wish I could argue against your faith, but I've seen too many examples myself.
Think of this though - who is the customer for the DNA lab? Individual citizens on doctor's orders. And what exactly happens today if it turns out that there was a problem/something really unexpected with the last set of tests? Seems like mostly Doc sends you to get them done again. As long as that doesn't change with sequencing neither Doc nor the lab has much incentive to have the records around indefinitely, especially if
Re: (Score:2)
Re: (Score:1)
they put out ads to hire experienced "editors", and timothy put up his hand.
Re: (Score:2)
hmm, SHA512 from 1999-2001, 1977 DES .htpasswd (Score:2)
That's an interesting comment. Consider hashes as one important part of cryptography. SHA2 is a current standard used by some up-to-date software, while a lot of systems don't support it yet. It's too new to be used everywhere, having been officially standardized thirteen years ago.
Millions of web sites use .htpasswd files which default to DES (1977) and that's just one example out of many software packages that call crypt() to get a DES hash.
I've thought of cryptography as careful, methodical, slow com
Re: (Score:2)
Re: (Score:2)
Honestly I don't see much attack on the practicality there. He highlights the *cost* of the technique (*much* slower performance), but how much that effects the practicality is entirely domain-dependent. For example the a doctor in TFA performed a genetic risk assessment for some condition in 0.2 seconds. I guarantee you that was by far the fastest part of the entire process - if it takes minutes or hours instead of seconds to perform a thorough genetic workup in such a way that *nobody* except myself or
Re: (Score:2)
The main attack you can make on the practicality of this system is that it evisions encrypting the information on one server and then sending it out to another server to perform operations on it. If a test that takes .2 seconds is a billion times slower than it needs to be, that means that any garden variety computer can perform that test very, very quickly. You can use a fancy encryption method that may already be broken to send out the DNA to some virtual "lab" as if you were sending out a blood sample, o
Re: (Score:2)
Sure, most genetic tests amount to looking at a few bits in one or more known places in the data - not exactly advanced calculus. Even with a billionfold performance penalty a desktop PC could probably perform at least a handful of tests in a timely fashion. As for the lab, I suppose I was thinking more about inevitable attempts at corporate lock-in than actual necessity.
For security though... how many doctors have you dealt with on a personal level? These aren't security professionals - their bains are
Re: (Score:2)
Even with a billionfold performance penalty a desktop PC could probably perform at least a handful of tests in a timely fashion.
If that PC is slightly modified to be a trustworthy device with a proper security model, then there's no reason for the homomorphic encryption. The device can just decrypt the data first, then do every test necessary in a very timely fashion...
As for the lab, I suppose I was thinking more about inevitable attempts at corporate lock-in than actual necessity.
There... there you have have a very good point. This article screams of "force patients to store their DNA on your servers, but provide an argument that's reasonably convincing, even to security, experts that it's safe and secure and not subject to the complete sham t
Re: (Score:2)
> Analysis of just about any other medical dataset is going to be far more complicated.
Agreed. It's also going to tend to be far less sensitive for the simple fact that it contains far less information about you and your predispositions with regard to health, intelligence, personality, appearance, and everything else with a strong genetic component. It seems to me that homomorphic encryption is a technology with a very narrow window of utility - to wit, protecting extremely sensitive data that needs mi
We can't (Score:2)
What hope is there really of keeping your genome private if you are sending it across the internet?
Re: (Score:3)
Besides the 'internet security issue', its not that hard to get your DNA to test themselves if someone wants it.
Re: (Score:2)
Re: (Score:3)
I was going to mention that, but I wasn't sure. Can you get a full genome sequenced from hair, or do you need a certain quantity of blood or something?
As far as I can tell you need full cells so hair that has been cut with a scissor no, but if you have a hair follicle pulled out by a hair brush that's enough. Any blood, saliva, semen or tissue sample will also do. a quick check suggests as little as 5 cells are needed so we're talking nanograms of material here.
Re: (Score:2)
Ah, the 1980s where Lex Luthor can clone Superman from a strand of his hair in Superman IV.
Re: (Score:2)
a quick check suggests as little as 5 cells are needed so we're talking nanograms of material here.
Yup. Scientists discovered they could extract your DNA from your fingerprint ~2003. http://science.slashdot.org/st... [slashdot.org]
Re: (Score:2)
If it's important enough you can get a full DNA sequence from a single cell - DNA was designed to replicate, and it's not that hard to get it to do just that in the lab. If you've got hundreds/thousands/millions of cells then it makes it even easier since you can use "shotgun" sequencing techniques to accelerate the process dramatically. And that's still a pretty small sample - most animal cells are around 10-30um in diameter, so you're looking at 35,000-1,000,000 of the suckers in a 1mm cube sample.
Blood
Re: (Score:2)
If they're interested in *your* DNA specifically, no, technological measures won't stop it (though legally requiring licenses to possess gene sequencers and "informed consent" laws in regards to human DNA sequencing would go a long way towards holding back any GATTACA-esque abuses.
On the other hand, how valuable would a database of thousands or millions of people's unencrypted DNA be?
Re: (Score:2)
Re: (Score:2)
"I'm still chuckling over the use of the words "private" and "cloud" in the same sentence..."
Wow, that's a quote that should go on the wall in every corporate board room.....
Re: (Score:2)
Re: (Score:2)
Nice to know you value money over privacy.
It's not a matter of what I value, it's just being honest about the priorities of the corporations who hold your data.
Remind me never to tell you anything in confidence.
Please don't.
Re: (Score:2)
Does the old saying "Three men can keep a secret, so long as two of them are dead" have any meaning for you?
Best way to keep your secrets is to not tell ANYONE.
Genetic security through obscurity vs. cooperation (Score:3)
So true. But DNA security is more that an issue of privacy. In the near future, understanding the human genome will make possible developing bioweapons targeted at individuals (with collateral damage) as well as bioweapons that could probably kill all humans exposed to the pathogen (like Ebola). We have, up to now, been protected by the obscurity and complexity of the issue. With advanced computers, vast data collection, and improved scientific understanding, creating individual and global bioweapons will
Re: (Score:1)
Re: (Score:2)
Yeah, better stay away from homogenous beverages as well, just to be safe. If you don't have to shake before opening there's no telling *what* it's doing to your sexuality.
Window managers (Score:1)
hehehe (Score:2)
Re: (Score:2)
Re: (Score:3)
It was a lousy joke.
>Homo means gay
Somebody needs to brush up on their Greek. Homo- as a prefix means "the same", as in homomorphic = the same form.
Or alone as Latin for "man" (as in Homo Sapien = intelligent man, versus homo erectus = upright man)
Yes, I'm perpetually annoyed by ignorant people sexualizing useful words, much less common prefixes. Why do you ask? We've got a perfectly good word for sex, it's even one of the coveted limited-edition single-syllable models reserved for only the most impor
Keep it (Score:2)
Encryption can be broken, especially the kind that exposes useful information about the plaintext as this one does. A much simpler alternative is to keep your genetic information in your own control, processing it on your own computer with open source software. You know, just what we already do with other sensitive information like passwords.
Re: (Score:2)
Re:Keep it (Score:4, Informative)
Hi. I'm a theoretical cryptographer.
Encryption can be broken,
Some implementations have been broken. Encryption itself is generally fine (as long as you go with well-studied, standardized methods). There is a point that encryption is always subject to real-world factors, but the most common libraries are pretty good. Whenever you read about a data breach in the news, it's not because encryption was broken--something else went wrong (and, frequently, exposed data that wasn't encrypted in the first place).
especially the kind that exposes useful information about the plaintext as this one does.
Homomorphic encryption does not expose useful information about the plaintext, although the article doesn't make that clear. You start with an encrypted input, perform an operation, and get an encrypted output. Only the person with the key--who is not the person performing the computation--can decrypt the result.
There is a somewhat-related but distinct concept, called "functional encryption", in which one can distribute a key associated with a function f. That key allows a user to take an encryption of x and obtain f(x)--but nothing else about x other than f(x), where "nothing else" has a mathematical formalization. So you could (conceptually) encrypt your entire medical record and give your doctor a key for the function that calculates the probability that you'll have a heart attack in the next five years. Then they'll be able to calculate that probability, but nothing else about you.
A much simpler alternative is to keep your genetic information in your own control, processing it on your own computer with open source software. You know, just what we already do with other sensitive information like passwords.
This I agree with, in an ideal world. Will we be living in such a world, 5, 10, or 20 years down the line? I don't know. Right now, the trends are largely in outsourcing everything--more and more, your data and computation live on the cloud. For medical information, your doctor doesn't do all the tests himself--he outsources them to a lab. For genetic information, 23andMe doesn't sell software that lets you analyze your own genetic markers--they take your information and perform the analysis on it themselves. So these trends will need to change before the above takes place.
It would be great to keep one's own data and get all the various analysis tools via FOSS. But someone needs to write and distribute those tools--as well as make it feasible to obtain one's own data in the first place (I don't know about you, but I don't have an MRI machine in my house). So until that world exists, homomorphic encryption is a potentially useful tool in this area.
[It also has uses beyond securely outsourcing computation, but that's somewhat off-topic.]
Re:Keep it (Score:4, Interesting)
Right, because I have the knowledge and equipment to sequence my own DNA make sense of the results.
Sure, encryption can be broken, and I don't know how far I'd trust IBMs 1st-generation homomorphic encryption, much less this "streamlined, high performance" version adapted by medical researchers, but it's a hell of a lot better than nothing.
Also, while I'm not an encryption expert, it sounds like homomorphic encryption doesn't actually expose useful information (at least not intentionally, I'm sure it opens up some new attack vectors, everything does). Encrypt A to get B. Apply operations f(B) to get C, decrypt C to get f(A). C is still encrypted gibberish.
So, assuming it's possible to do public/private key homomorphic encryption, my doctor could send a sample for sequencing along with a public key. DNA gets sequenced and encrypted (ideally both on the same non-networked hardware so that the plaintext data is never accessible to anyone), and the encrypted sequence is sent back to my doctor, archived in a public database, whatever.. Doc can then send it to a third-party DNA analysis firm in Nigeria, who perform all manner of analysis on it and send the reams of gibberish test results back. He then calls me in, the only holder of the private key, and I can then decrypt the results on my secure, open-source computer and present them for his interpretation and advice.
I just can't. (Score:3)
I'm trying to say something intelligent involving homomorphic encryption with random seeds and salt that doesn't trigger the Beavis & Butthead reflex, but I just can't make it happen.
Would be more useful . . . (Score:5, Interesting)
Even then, a gust of wind while I am in the backyard might be all that is required one day for someone's reader to catch my DNA and run a simulation to match with facial recognition.
Re: (Score:2)
Don't give the NSA and FBI ideas.
A few years back, the Supreme Court ruled they couldn't use IR scanners without a warrant on buildings. Although it was "broadcast" out to common areas, you historically had the expectation of privacy. Yey originalism and intent of Founding Fathers.
That atitude is dead as a doornail now (not that it wasn't always DOA in government reaching -- hence that case) but now it's even more of a struggle with Congress and the President acquiesing to all kinds of metadata stuff.
I st
Re: (Score:2)
I think the point is less to protect your personal DNA sequence, and more to protect the anonymity of databases/sequencing labs/doctors offices/etc. that are otherwise carrying around massive blinking "hack me" signs.
Re: (Score:2)
Obtain a used air filter of a building, and you may have the DNA of anyone who has been in that building for the last couple of days . . . legally.
Homomorphic encryption helps? (Score:3)
Re: (Score:3)
Near as I can tell, it's simply a way to outsource number crunching. Like for example in a paternity suit, you can encrypt the DNA of the people in question, hand it over to a cloud provider who'll give you a paternity index score but can't recover the actual DNA sequences involved. Okay, not best example. Say you have a huge number of samples like a genetic archive. You want to find "The people with genes XYZ, what other genetic differences do they have from the general population?", so you hand a cloud pr
Re: (Score:3)
To rephrase AC: using homomorphic encryption you:
Encrypt A to A*
Perform analysis on A* to get B* (the gibberish encrypted results)
Decrypt B* to get B.
So basically you, as some lab doing the analysis, has *no* idea what the incoming DNA is, nor what the results of your analysis are. All you need know is how to perform the analysis if they *weren't* encrypted. You can then send the encrypted results back to the doctor who sent you the encrypted DNA, and *she* (or the patient in question) can decrypt them to
Am I missing Something? (Score:1)