Toward a 3D Search Engine 83
Plasma Droid writes "NewScientistTech has a story about a 3D molecular search engine that is over 1,500 times faster than anything previously developed. The researchers, from Oxford University, developed a lightning-fast way to quickly match 3D shapes mathematically. This could not only speed up searches for new drugs, but lead to 3D search engines, for finding objects uploaded to platforms such as Google Earth, they say." The problem will be in jump-starting the supply of 3D data about molecules and everything else.
Enter Search Term: (Score:5, Funny)
WOO HOO! (Score:3, Funny)
Re:WOO HOO! (Score:5, Funny)
I'll bring the Hot Grids (Score:3, Funny)
Re: (Score:2)
That's great! Now if you could just do that 750,000 times in the next fifteen seconds, and tell me which shape in the set is most similar to this thing in my pocket...
(cue dick size jokes in 3...2...1)
Shape versus negative space (Score:5, Informative)
Re:Shape versus negative space (Score:4, Funny)
I bet you have to beat the chicks away with a stick.
FFT (Score:2)
Re: (Score:2)
Anchoring (Score:2)
I'm disappointed that I cannot read the actual article. While at Abbott (informally) and while at Battelle (in formal intellectual property documentation), I proposed that a vector (the term "
Crappy reasearch (Score:1, Interesting)
so? (Score:2)
Re: (Score:2, Funny)
Re: (Score:2, Funny)
Re: (Score:2)
Impact on Pharma (esp. patents) (Score:5, Interesting)
Re: (Score:3, Insightful)
The problem isn't that it takes a while to find new stuff. The problem is the barriers to entry are so high that sufficient competition can't take place, hence there is no pressure to work quickly. Basically the medical industry is *not* a free market.
Now, I don't think the barriers need to be removed, because most of the high barrier is to ensure that treatments are effective without nasty side effects. About the only part of the barrier I can see being removed is somehow changing the liability laws, but
Re: (Score:3, Insightful)
Except the barriers to entry are mostly not regulatory in nature. As with most advanced R&D-based industries, the barriers are brainpower and equipment. There's plenty of capital out there to handle the hit-and-miss nature of drug design, and the regulatory restrictions on drug production and marketing are not barriers to entry for research.
IMO, what is truly limi
Re: (Score:3, Interesting)
This is not entirely accurate. From a business standpoint, if you sell a cure and your competitor sells a "treatment", you'll erase them from the map. So they would definitely like to "cure" things. However, most of the rich, western people do not suffer from diseases per se, but from "risk
Re: (Score:2)
But it's not a barrier to entry, since established companies must also comply with FDA regulations. Barriers to entry imply that only new entrants face the the barrier.
That is exactly what I was referring to with the COX2 inhibitors... Vioxx is the specific example.
Good, but just one tiny bit of the problem (Score:5, Interesting)
Re: (Score:1)
Also, the search space for polypeptides is more restricted than that. There are only so many allowed torsion angles.
Re: (Score:2, Interesting)
Re: (Score:1)
Typically (Score:2)
really? (Score:2, Insightful)
Crystals are pretty watery, much like the cell. Unless packing contacts are altering the active site, they are unlikely to be much different.
Also, the bulk of the structure is there to keep the active site residues in a particular orientation.
Perhaps management vitriol was partially justified?
Re: (Score:1)
The hope is that a given protein remains within a particular probability space and that the shape of the active site, refined gradually over millions of years, is highly stable. When 3/4 of drugs entering phase I clinical trials fail efficacy, though, the numbers speak for them
Re: (Score:2)
They may be *better* at predicting structure, but they are still a shit long way from being any good. Remember that whole big Blue Gene deal, building the biggest baddest computer out there, that was done pretty much to be able to predict protein structure, and (last i heard) they still aren't even close. Every so many years a new technique for prediction comes out
Comment removed (Score:4, Insightful)
Re: (Score:3, Insightful)
The rest of your comments are pretty valid, however in this case that would seem to be aside the point. Searching objects in this fashion would be as simple as metadata that is appropriate for 3d model searches. Rather than provide a base model, you could search the metadata supplied with/for/generated for shape
Re: (Score:2)
Re: (Score:2)
Even if you do, you can use a sketching tool (like google sketchup... mmm, sketchup) to whip out a basic 3d model.
Also, it could be done through a tree-selection process - where you pick from perhaps 9 images the model that looks the most like the one you want, and you continue in this vein until you find (or don't f
Re: (Score:2, Interesting)
So at any point, you have to generate images of the 'neighbours' of the current structure. It could work. Maybe.
Re: (Score:2)
No, that will be a problem. Once you have the database, what exactly am I supposed to input for searching? Will I need to learn how to create a 3D model in order to search for similar objects?
Depends. Did you have to learn how to spell in order to use a text search engine?
The people who are going to be using this sort of database are going to already have tools available to create their models. People have been creating MOL and PDB files for quite awhile now, and if there isn't a file converter/importer then I'm sure there will be soon. Plus, researchers often want to just search for things that are similar to something they're already looking at. So what they'll do is take whatever model t
Speed versus Thoroughness (Score:4, Insightful)
The implication both from the summary and from the article itself is that this new search is just as thorough as other search methods but much faster. To prove thoroughness they would have had to show that anything found by other search methods will also be found by their new, much faster, search method. I doubt very much that they were able to do prove this rigorously.
That's not to say that the problem of matching 3D molecular shapes is not important or that their research is not valuable. I would say, though, that it is misleading to claim that they have solved the 3D search problem with a much faster algorithm. There are many different measure of 3D similarity and, for many measures of similarity, the only way to guarantee an optimum match is by exhaustive search.
Note that, in general, every search will be exhaustive in the sense that the query must be compared to every entry in the database. The problem is that many measures of similarity have additional parameters that must be optimized by exhaustive enumeration for each comparison. The classic example is a measure of 3D similarity that pairs each atom in the query with an atom from the structure in the database. In the general case, all possible pairings must be tried through an exhaustive enumeration.
Re: (Score:2)
Why should that be true? We are able to categorize textual content and build indexes based on word structure. Why couldn't we do the same thing with 3d objects, and thus be able to discard a large number of comparisons up front?
Re: (Score:2)
For some measures of 3D similarity there are shortcuts and for other measures there aren't shortcuts. For example, what happens if part of our query molecule is very similar to part of a molecule in the database we are searching? Does that count as a match or not? If the answer is that it does not count as a match, then we could sort our search database by number of atoms - only those molecul
Re: (Score:1)
The implication both from the summary and from the article itself is that this new search is just as thorough as other search methods but much faster. To prove thoroughness they would have had to show that anything found by other search methods will also be found by their new, much faster, search method. I doubt very much that they were able to do prove this rigorously.
...the only way to guarantee an optimum match is by exhaustive search...
I haven't read the paper, but I don't think this (a thorough comparison) is as hard as you think it is. The bioinformatics community is pretty good about sharing datasets and software. There are benchmarks datasets that researchers use for comparing shape-matching techniques. Pick, say, 100 query molecules and a database of 10,000 molecules. Search the database for each query, 1,000,000 queries, multiplied by the number of techniques you're comparing. Not that much work. Throw in Kabsch-style cRMS match
Re: (Score:2)
What I was referring to was guaranteeing that a particular search method can find the best match. If I understand what you're saying, it may not be that important to guarantee a best match - which is a good point.
With respect to guaranteeing that a search has found a best match, there are two problems. The first problem is that the search method may not reflect what is actually desired. If you want to fin
they got it backwards (Score:4, Interesting)
Yes, that's currently "the most common way" because at least you can tell what you're getting: when you get a match, you can actually say how close the different shapes are to one another.
The new technique uses a different approach. It analyses the position of the different atoms within a molecule to understand its shape. These relative positions can be mapped and stored a molecular database.
That's actually not a "new technique", it's an old technique. It's what people used to do before they tried to overlay 3D shapes accurately. They used to do that because computers used to be too slow to do the accurate comparison.
As the article points out, there is only limited 3D shape information available at all. Few people need to do 3D queries right now, and there is little data to do them on, so optimizing speed is the wrong thing to do; we need to optimize accuracy and scientific relevance.
Re: (Score:2)
I didn't say it wasn't important, I said few people are doing these searches. The reason that's important is because it means that users can generally run this stuff on their desktops for hours, which is a lot more compute power available than, say, for your average web query.
There are many databases of 3D representations of molecules.
There are indeed. But the actual number of comparisons you need to do numbers in the thousands, not in the billions, as it i
Hack the gibson! (Score:1, Funny)
Not enought structures? (Score:4, Insightful)
I tend to think the authors of the article are refering to the problems of a "useable form" for the structures and easy access of many of these databases. The first problem is mearly a problem of converting between the various structural file formats out there, something a good programmer (or grad student) can solve is a few weeks or less. The second is a bureaucrat issue and not a scientific one.
Re: (Score:1)
Lots of 3D bio data out there (Score:2)
Well the RCSB Protein Data Bank [rcsb.org] would be a start, and there are tons of molecule data bases with 3D data that are only waiting to be thoroughly mined. The pharmaceutical companies have them, and there are free ones too.
In fact, the motivation for this research undoubtedly was the abundance of data that is out there but can't/could not be searched efficiently.
Re: (Score:1)
Firstly, only some families of proteins have any x-ray structural data about them: there are whole families that are effectively uncrystallisable.
Secondly, the protein's 3D shape is only half the battle. Small molecules are generally highly flexible, so to search them in 3D you need to enumerate their potential shapes first. That's not trivial for large sets of compounds.
Quite interesting (Score:3, Interesting)
This is quite an interesting achievement. The tools that I am familiar with can only search for 2D structures like functional groups (alcohol groups, aromatic rings, etc). At their best, they might give the ability to search for R- and S- stereoisomers, but that is it. This is pretty enough for tasks like solvent design that are quite frequent in the chemical process industry, but in the pharmaceutical R&D they need more powerful tools.
I will give a simple example of an enzyme: These nice molecules catalyze reactions of vital importance in the modern pharmaceutical industry by providing a chemical "lock" where the "keys" (i.e. the reacting molecules) will dock on. This enables them to react and form a new molecule that will then undock from the enzume leaving the "lock" free for the next pair.
These "locks" are actually 3D structures of appropriately aligned molecules. This is where this search ability comes in: The chemist suspects how the appropriate lock would look like for catalyzing his reaction (3D alignment of functional groups), much like someone suspects what the right keywords for a Google search are. Then he feeds the data to the machine and gets the molecules that are likely to be of assistance in his work. After that, he can make experiments testing these enzymes to see if they actually work.
This should speed things up very much in biochemical research. It means less literature research and less failed experiments.
Ehm... it's how much faster? (Score:2, Interesting)
Great (Score:1)
related problem (Score:3, Interesting)
This poses a problem, similar to the (unstated) problem posed by the molecular printers in Neal Stephenson's Diamond Age: what happens when this sort of stuff starts to become widely available and people start engineering enzymes or instructing their printers to produce, say, heroin, or TNT? With molecular printers, presumably the first versions would only be able to produce structural stuff: printing bicycles, not martinis. But if we get to the point where we can design enzymes for a desired substrate -> product reaction, we have a real problem because it's all wet chemistry and there isn't an obvious hardware/firmware way to block people making anything their inventive, twisted little minds can come up with.
Mind you, I think that's great. I miss the days where I could order almost any chemical I wanted without having to wade through masses of paperwork, tracking, and laws intended to ban any drug analog that might have pharma activity. But it is going to have some very exciting side-effects.
Possible application? (Score:2, Interesting)
Re: (Score:1)
I suppose yes. After all, in the article it says that they are looking at the position of specific points in the general 3D structure and check their geometrical characteristics (skewness, relative distances, etc). This is what face-recognition software does in 2D right?
I don't get it (Score:2)
Great... (Score:2, Funny)
Distorted Expectations (Score:2)
It's announcements like these that cause me to ponder just how far behind we are in terms of software development progress.
Back in 1993 I had a whole suite of MS Flight Simulator programs. (different cities were packaged separately. To the best of my recollection, I had Chicago, New York, LA, and Paris). Obviously the game detail was limited, this was before 3D accelerators, but the buildings were still 3D and key locations had fairly accurate roads. I remember reading in more than one computer magazin
existing 3D molecule search engine (Score:2, Interesting)
Re: (Score:1)
Re: (Score:1)
The Oxford group's technique is looking at a different problem: small molecule 3D shape matching. Surprisingly, this is actually harder than protein shape matching: proteins have a defined 3D shape, but small molecules are flexible and can a variety of shapes. So, you either need to have a flexible fitting method, or you need to enumerate 'example' shapes for each molecule you want to search against.
Compare your search against ~17K protein structures to a search across the roughly 4 million commercially-a
Musical pattern searching (Score:1)
For robots (Score:1, Insightful)