Datamining Medline for Gene Interactions - Pubgene 57
An Anonymous Coward wrote: "According to an article in the 5 May 2001 issue of New Scientist , biologists in Norway have developed a computer program to datamine Medline to predict interactions between genes. Some of the relationships hadn't been predicted before and were found to be real. The scientists' PubGene database and tools are available for experimentation." Wow.
Some related work (Score:1)
Also, a paper [nih.gov] in Bioinformatics was published recently which tries to extract protein interactions. They used a dictionary of words related to interactions, and then look for proteins which are mentioned in the same sentence that contain one of those dictionary words, along with part of speech analysis to improve accuracy.
Something like that.
distributed effort? (Score:1)
Like the frist post gene. (Score:1)
Re:Norbert Wiener (Score:1)
we could get onto Enkephalonetics,
Which is......?
Re:I've done this... sort of (Score:1)
But it's already been done! Quick, go to [sourceforge.net]
megahal.sourceforge.net and grab the latest sourcecode. Build it, and then run some sample english papers through it. Feed it some other miscellaneous stuff for good measure (how about the script to the movie Terminator and a few of the better Slashdot trolls), and let 'er rip! Sure, it may not make *logical* sense, but since this is for an English course, you'll get graded higher for originality and "thinking outside the box". Just don't mention my name when they come to take you away to your padded cell.
-1: Sleep Deprived
Re:Norbert Wiener (Score:1)
"Enkephalos" is ancient Greek for "Head" or "cranium". Therefore, enkephalonetics = using your head. =)
Kinda weird... (Score:1)
One of the final year comp sci projects here, reminds me of this, although AFAIK far simpler. One idea though, that was brought up during the student's presentation, is that this might work very well in a distributed computing situation.
Perhaps this will be the next SETI@home ?
Re:Soapbox with logic problems (Score:1)
Re:And with a little imagination (Score:1)
Re:Quote from article (Score:1)
I'm wondering if it's much different than Google is doing with web page data, on a larger scale?
Re:I've done this... sort of (Score:1)
Re:GENE interactions are the future of MEDICINE? (Score:1)
Quote from article (Score:1)
Hmm, shades of HAL? Lets hope when he's eventually developed that the developers (or creators?) incoroprate Asimov's 3 Laws of Robotics into this thing.
Wouldn't want these smart AI programs to "suggest" something that would be potentionally harmful.
-Cyc
Re:Usefullness (Score:1)
Re:open source genetics (Score:1)
As many of you may know, the drug/biologics industry has fought bitterly to protect ANY and ALL information relating to gene therapy trials as trade secrets. In fact, they protected such trade secrets about a certain adenovirus vector so well that none of the right people knew it had killed monkeys and seriously injured several humans before it killed Jesse Gelsinger last year (with a little help from a U. Penn research egotist). The upshot is that when these greedy moron subhuman bastards seek to protect prior evidence of toxicity as "trade secrets," people die for lack of the suppressed knowledge. It's shameful. We'll have to hope FDA and HHS have the balls to get these gene cowboys in line before they kill again.
Information extraction (Score:1)
I think that the main thing is demonstrates is how poorly scientists choose to represent their data in the first place. Not only do we choose to put all this vital stuff into something which is totally unamenable to computation, but we sign the copyright over to various commercial interests.
Phil
Internet at it's best (Score:1)
If I hear another lame-ass comment about how the Internet is just like the tulip bubble in the 19th century, I am going to send them this link.
And, oh yeah. Perl is not just for script-kiddies either. So there.
Re:you can do simple analyses with this new tool.. (Score:1)
you can do simple analyses with this new tool... (Score:1)
check out compare-stuff.com/pubmed [compare-stuff.com] to analyse relative co-occurrence in PubMed articles.
you can compare more than just gene names too: disease/condition names, reagents, techniques, author's addresses, whatever...
reload the entry page to see different examples of what you can compare
or ask similar questions on the web at large with the vanilla version: compare-stuff.com [compare-stuff.com]
already done... (Score:1)
roll your own gene/disease interaction analyses (Score:1)
Biased data (Score:1)
Maybe I'm missing something here, but isn't the fact that the other genes are being mentioned in the same article as the first gene already imply a relationship between the two? Why else would the authors mention them in the same article?
Another thing to consider is that scientist don't just go around randomly picking a gene and studying it. There are generally reasons why the gene is interesting, and those genes are studied more than others. There is a whole field of the sociology of science that deals with how the way scientists go about doing science influences the results that they find. It annoys the heck out of most scientists.
It is good that there were some relationships that the program found that had not been previously found, but essentially the program is an automated review article generator with a meta analysis component to organize and sort the data.
The idea posted above, that drug interaction would be a good thing to do as well, I can heartily agree with. Have the program go through not only the literature but also the PDR (Physician's Desk Reference), categorize pharmacological responses (e.g. what drugs cause blood pressure to rise by what mechanisms) and not only could we possibly avoid some nasty drug interactions, but perhaps we could find where some drugs act synergistically with each other to generate greater or new results that were not previously thought of.
Slashdot Dataminer (Score:1)
Re:I've done this... sort of (Score:1)
I'm betting, 25 years, too late for me.
I'd like to make something that compiles an essay from paragraphs and phrases in other works, that could be made in the next 2 years I think.
Re:Usefullness (Score:1)
Re:Way ahead of ya, dude (Score:1)
Re:Usefullness (Score:2)
Sure structures and motifs are good to have, but there are a lot of structures out there that we don't know much more about. And the issue here is about interactions, beyond simple statements like "this is a catalytic protein" or whatever.
Can one give Pubgene a pdb or fasta file-- and find papers on homolougous genes or structurally similar proteins-- or must one use BLAST, or a fold recognition algoritm prior to searching Pubgene?
No, you are supposed to have a set of genes names that you are working with. Homology can be asses elsewhere. What you can ask this system is about known and inferred interactions out there.
Lars
__
Interesting work, not seminal (Score:2)
More elaborate techniques have also been suggested for learning about the interactions. By simple text analysis, you can deduce with fair (but not perfect) certainty if a gene is up or down regulating another gene. Other systems try to find support for hypothesis on interaction networks by doing pubgene-similar analysis. If your experiments support many tentative networks, you can let the vast amounts of knowledge in the published literature dismiss the bad suggestions.
The need for systems like this is huge. More articles than ever are being published, and there is no way a researcher can keep up with the information flow. New technology also admits large scale genome-wide experiments that generates enormous amounts of data. Such data needs to be analysed automatically, and if we can tie in the published knowledge, the value of the data increases.
If you are interested in systems like these, look up the works of Andrade, Valencia, Bork, Ouzounis, and their collaborators!
Lars
__
Re:you can do simple analyses with this new tool.. (Score:2)
Lars
__
Re:Biased data (Score:2)
That is the basis for their technique yes. The thing is that they are using this transitively. If gene A is mentioned together with B in one paper, and then B is shown to work together with C in another, then that is evidence for A and C having some sort of relationship. There are problems with this approach, and I think the authors are aware of it (have not had access to the article yet). For example, if a paper is talking about a certain new technology, it could bring out examples from various systems in the cell, and thus mentioning genes that are quite unrelated.
Another thing to consider is that scientist don't just go around randomly picking a gene and studying it. There are generally reasons why the gene is interesting, and those genes are studied more than others.
This may be true historically, but we are entering a whole new era in genomics. Industrial science is here. The fashionable experiments today are genomewide, studying for example the workings of a large set (thousands) of genes at the the time. Genes are no longer selected for sociological reasons, but based on predictions on various aspects of the gene. "Are there reasons to believe that this gene encodes a protein sitting in the cell membrane? Let's include it in out experiment because it is then probably doing interesting signalling."
With industrial genomics and a higher publication rate than ever, new tools are needed to sift through the data. These Norwegians provides one attempt at addressing this.
Lars
__
Soapbox with logic problems (Score:2)
But what got my goat was the claim that "more than 100,000 deaths per year are caused by adverse drug reactions" and yet "By contrast, deaths due to traditional herbal remedies are so rare they're hard to find."
This is such blindingly bad use of statistics that I have to howl. It isn't so much like comparing apples with oranges, as like comparing apples with trilobites. Consider the populations: why are people taking traditional herbal medicines? For colds, indigestion, general malaise. Not for heart disease, strokes, cancer or anything life threatening. People at risk of death are a lot more likely to risk dangerous combinations of drugs. Well, derrr.
Generalize this for any science... (Score:2)
Finding information is a hell of a skill - I know that a lot of my time as a grad student has been spent on literature reviews.
Never mind GENE interactions, what about MEDICINE? (Score:2)
By contrast, deaths due to traditional herbal remedies are so rare they're hard to find. I'm not dismissing modern medicine entirely - far from it - I'm just pointing out some disturbing facts.
So why are gene interactions so hot, yet medicine interactions so neglected in research? And why, for that matter do so few people know that they could substantially reduce their risk of heart disease and cancer by going vegetarian or vegan? Surely the governments of the world should be funding research and education on these two topics on a massive scale - it could save thousands upon thousands of lives - and even from a callous economic point of view, the savings in terms of medicaid and lost economic productivity due to ill-health would be huge! In fact, official guidelines still endorse a meat-based diet despite the well-known health risks, and there is NO serious attempt to co-ordinate drug safety information between regulatory bodies internationally. That's right, none - regulatory bodies in the UK often ignore bans in the US, and vice-versa. What's more, the support for even collating data on side effects of medicines at a government level is poor - particularly in the UK.
The reason is the same in both cases, and it's very simple. Profit. Profit for the drugs companies, to be precise. Pharamaceutical corps profit from ill-health, and they don't exactly relish the idea of their drugs getting banned or contraindicated for safety reasons, either. Campaign funds, and the revolving door between the FDA and the drugs/biotech industries helps keep the government in line. For more info see http://www.drrath.com/
Business @ the Speed of Thought (Score:2)
Re:Interesting stuff right in front of them (Score:2)
"I may not have morals, but I have standards."
GENE interactions are the future of MEDICINE? (Score:2)
And as for why gene interaction is so hot, is that it's the real key to a lot problems. You thought that the human genome was it? No no no... that was only the beginning... it was the map for gene interactions. The genes are worthless if we can't figure out what they do and how they interact. I mean, we can't even tell you how an E. coli works even though we've got the genome. There will be a lot of profit out of finding protein interactions, sure, but it'll be to find cures. I work in a lab that's trying to figure out gene therapy in prostate cancer. We need to know the genetic mechanisms for therapy to be effective. Or don't you want cancer cured?
"I may not have morals, but I have standards."
Re:I've done this... sort of (Score:2)
If you're interested in slightly higher level concepts, I just found this website [ucla.edu] at my college's webserver (it's a class I had to take, intro to Molecular Bio) and it looks like it's got some good info through the flash animations. If you want the hardcore stuff, go to the NCBI site [nih.gov] where you can browse the genome, search for proteins and genes, and do all the stuff real biologists do
"I may not have morals, but I have standards."
Re:GENE interactions are the future of MEDICINE? (Score:2)
The real problem is the one you stated, that most seniors are on multiple meds at a time for about as many conditions. That is, quite simply, pumping the body with way too many chemicals to be safe, especically in people whose bodies are breaking down to begin with. These drug interactions are probably incredibly difficult to study, but I agree that it needs to be done. I also agree that we need some kind of mechanical checking via computer to eliminate a lot of the stupid errors. However, I think the real problem lies in the fact that we're pumping drugs in to people in ways that they just aren't capable of dealing with. We need better forms of treatment, and I think genetic therapies, as well as that critical yet neglected factor of prevention, are going to be key in the future. Healthy diet and exercise alone can help with a large number of ailments, thereby reducing the number, or at least dosage, of meds needed later on in life. And to prevent heart disease, rather than take the pill, introduce the healthy gene (I know it's not ready, but it is the future) and that's one less drug-drug interaction to worry about.
The other problem is that, in large part, we don't know what causes diseases really. Alzheimer's is getting closer, and the ulcer bacteria was just an absurd discovery in some ways. We need to understand the disease before we can treat it, and all these things have to play together. So while I fully agree with you, and apologize for oversimplifying, that we need to really study multi-drug interactions, I don't think that simply saying "well, we can treat you for heart disease or AIDS, but not both" (totally hypothetical example) is the answer. Prevention is key. Understanding the disease better is key. And, hopefully, gene therapy will ease the number of meds as well. We need to make use of all the things Molecular Biology has achieved and will achieve in the coming decades, rather than rely soley on the older method.
p.s. Thanks for teaching me something!
"I may not have morals, but I have standards."
Re:I've done this... sort of (Score:2)
"I may not have morals, but I have standards."
Errors (Score:2)
Some leap forward: "Information in, Error out"!
Re:I've done this... sort of (Score:2)
Naw, I got my GED after my junior year of high school. Now I'm just the average working stiff. :-P
I thought about college, but after high school, and with what I've heard about how colleges treat undergrads (required to live on the dorms with crappy Internet access, kicked out if you post Bad Things, no privacy, disinterested professors and dumb students), I have no desire to pay ridiculous amounts of money for college when I'd rather be learning.
It annoys me that it seems to be impossible to do anything between having an extremely casual interest in something and making it your whole career. You can't just go take classes that interest you, because they have prerequisites, and general education requirements, and all sorts of hassle. If I wanted to actually do anything related to genetics, for example, I'd have to spend at least 4 years in school studying it, and then get a low-level job at some place, and then decide that I'm not that interested in it after all, and what then?
(As an aside, why is it that the simplest things are always overlooked by beginner's resources? Why, for example, don't they introduce all the basic terminology and notation for a topic as soon as the topic appears? I hate having to refer to a portion of the thing I'm working on as "that thingy over there", especially if I'm asking for help. I've seen this in computer science, physics, chemistry, and biology books. They don't even have a "notation" section in the back, or if they do, it's next to useless. (And this problem may be more limited to high school, but when I would ask the teachers, they would actually tell me "don't worry about that". Or they wouldn't know.))
If you know of any entry-level resources for learning various sciences, I'd be most interested. I'll be sure to check out those sites if I'm ever at a computer with Flash, and I may play around with making a Punnetizer Deluxe or something :)
--
Re:I've done this... sort of (Score:2)
And no, the punnet square really isn't hard as such, but I was pretty sure the teacher's motive was to catch as many students in fatigue or misalignment errors as possible. Also keep in mind the fact that >60% of the students were still confused by phenotypes.
Regardless, writing code to do it for me transformed the assignment from painful drudgery into a fascinating exercise. I was especially proud of realizing — on the way to gym class, no less — that it could all be represented as bitmasks. (I think this was when I first truly grokked the power of C.)
Genetics was really the only thing that captured my interest in biology; sadly, the class didn't linger long on that topic. I'm still interested in it, all from an amateur perspective, of course. If I get time I think I'd like to make some new software that does multiple generations and traits requiring more than one gene.
--
I've done this... sort of (Score:2)
Thus was The Punnetizer [quadium.net] born. Once I had the basic functionality working, I went hog-wild with output formats. So you can have your Punnet squares in ASCII text, HTML, LaTeX, and CSV. What was really fun was running it on a StarFire with 2GB of RAM with the maximum number of traits. The output HTML was something like 347MB. :P
Anyway, that was one of the few times we impressed Cowell. He actually volunteered to give us extra credit. Of course, he graded our next assignment extra tough, but oh well. :P
--
Re:No, it doesn't. (Score:2)
If you were writing an article about a gene that regulates insulin production, you probably wouldn't be mentioning a gene that produces monoamine oxidase. In fact, the program relies on the fact that there will be some relationship between the genes. Otherwise, it's all random.
I'd say you lose, but as you posted anonymously, that's a given.Re:GENE interactions are the future of MEDICINE? (Score:2)
I'm not saying that thousands of significant avoidable mistakes are not made each day -- they are and for many of these, the description you gave is entirely accurate. It is inexcusable -- if only because, in my humble opinion, computerized prescription crosschecking should be mandatory, and we should have far better mechanisms for automatic, secure sharing of parient records, with adequate safeguards for privacy.
HOWEVER: Studies indicate that the average patient over the age of 67 is taking eight medications (be aware that, for purposes of drug interactions, many substances other than prescription meds can be highly significant). Also, studies have shown that somewhere between seven or eight meds, the chances of an unintended drug interaction reaches 50%
Further: Look at the literature. Though thousands of drug-drug (or class-class)interactions are known, many more are not verified (or quantified to a degree where they can be adequately weighed in clinical decision making). Worse, only a handful of 3-drug interactions are know, and almost no 4-drug and higher interactions have been documented. Finallly, we have barely begin to scratch the surface of stereo chemical racemic mixes and drug-gene interactions
How could we fully understand unintended drug-gene interactions? (i.e. interaction between a drug and some gene or gene product aside from its intended target) when we have barely mapped the outline of the human genome, and never sequenced a single individual much less the range of variation in the species. (Mapping is like drawing the outlines of the states and the major cities; sequencing is like having a complete roadmap - interpretation of the sequence is a couple of orders of magnitude beyond merely possessing the sequence, a fact we molecular biologists have successfully obscured, in our rush to get others as excited about our work as we are.) Even assuming a rate of growth akin to Moore's Law, we are still many decades from the kind of knowledge you seem to assume we have.
FURTHER, treatment often involves knowingly balancing the risks of a treatment plan against the risks of other candidate plans, and the the risks of the initial condition. It is not always easy to see these risks and effects without deep analysis and careful double blind studies.
For example, it wasn't long ago that many ICUs did not allow the adminitration of ACE inhibitors to certain types of heart failure patients, because the patients would visibly decline, and often die. There were reasons to sus pect ACE drugs might help - but any young physician who tried them soon learned the same cruel lesson. However, long term analyses (which were difficult to get authorized, given the obvious mortality) showed that 1-year survival was actually significantly increased -- those patients who immediately got sicker with ACE inhibitors would likely die in the next 6mos anyway, but after the initial shakeout many would actually do better after several days of ACEI, and overall, more patients were alive (and healthier) at 1, 3, or 5 years with the ACE drug than without.
As a physician and molecular biologist, the problems you cite frustrate me immensely, but it took me years of medical training to fully appreciate how incomplete our knowledge is, and how potentially NP-unsolvable the problem of diagnostics and therapeutics it.
That's not to say that I don't think it can be done much better. It can (and even doctors I admire could often stand some improvement). I've spent a good chunk of my life training for and working on these kinds of solutions in computing and molecular biology, as well as medicine.
I'M SORRY but your blanket indictment, though superficially similar to remarks I have made (backed by appropriate data and studies) in peer-reviewed journals, leads to precisely the wrong conclusion when the time comes to make medical policy, and I felt that I should make some effort to correct it. The factors you cite should be kept in mind, true, but they are not even remotely the entire picture. Worse, it is impossible to assess exacly *how much* of the picture they are, but I think we can safely say they are less than 50%
Every citizen's opinion counts, which makes it important that they hear the "other side". Not only do citizens help mold public policy, but more important, they are my patients, and I believe that the better background they have in their daily lives, the better equipped they will be to make the medical decisions that, in the end, are theirs, not mine, to make.
Sorry this was so long. As Descartes said:" If I had had more time, I'd have written a shorter reply."
Shades of Cyc (Score:2)
--
Re:Norbert Wiener (Score:2)
>>Which is?
>using your head
Or, more generally, using anyone's head.
Computers are cute toys, but we've already seen wetware being used to mediate the control of mechanisms. If we can use it to mediate information processing, computers will be relegated to the status of diagnostic tool, low-end user interface, and arithmetic calculator.
Cybernetics is machines that think.
Encephalonetics will be brains used as machines.
The spelling with the k's (kibernetics, enkephalonetics) is just how you say it if you're actually ancient Greek.
--Blair
Norbert Wiener (Score:2)
We can tell him, "you know back when you said that machines could do the thinking? Called it 'kibernetics'? Well, it turns out we couldn't do that, so we've adapted humans to do the thinking and we feed it into machines so they can digest it seven times better than fishing around the Science Citation Index. It's only half as good as experimentation but at a micro-fraction of the cost..."
I think at that point he'd understand the human mind and we could get onto Enkephalonetics, which is where this little electromechanical distraction is really leading us.
--Blair
Amazing... and useful, too! (Score:2)
Re:woo.. time warp! (Score:2)
er... no offense... 8-)
--
Rob White,
Cv - Cv = 0 Therefore there is an absolute frame of reference.
Usefullness (Score:3)
This technigue, morover, appears only to collate published interactions-- helpful, perhaps, in guiding the conduct of basic research, and the avoidence of duplicate studies-- but less useful when the goal of a researcher is determining the function of unknown genes, or putative protein products. In those cases, protein fold databases or motif databases are much more useful.
Can one give Pubgene a pdb or fasta file-- and find papers on homolougous genes or structurally similar proteins-- or must one use BLAST, or a fold recognition algoritm prior to searching Pubgene?
Bio-Informatics and AI (Score:3)
There was a paper in the office of some proffesor who used a brill learning algorithn with existing genes and then had it try to guess what a ramdom genes did. It did very well in the test despite the "primitive" ai.
3rdmill and spotfire
There is a lot of computing power in the life sciences field,and a lot of data created with gene-clips and assay data. People can't sort it all out anymore some computer analysis makes everything faster. Look at the human genome.
Re:I've done this... sort of (Score:3)
"I may not have morals, but I have standards."
open source genetics (Score:3)
Re:open source genetics (Score:3)
We're a long way from self-modification (I 4m 3l337 with the biggest cock ever kinda thing!) if we ever get there.
But I do agree with you, information should be available. And that's what this article is about. It was an ingenious method of searching vast quantities of data to link relevant papers.
One of the best things about this is that the methodology could possibly be applied to diciplines outside genetics, speeding up research in other areas.
Way ahead of ya, dude (Score:3)
Genome@home [stanford.edu]
Interesting stuff right in front of them (Score:5)
But I find it interesting that their method was so simple. It didn't involve any real complicated methods... basically a glorified text scanner. Yet, it was able to predict some new interactions that hadn't existed before. Still, it was only 7 times better than random guessing... I wonder if that could be improved any?