Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
Programming IT Science Technology

Bioinformatics 105

tadghin pointed out this Newsweek article on bioinformatics, and also notes: "At O'Reilly, we just published our first bioinformatics book last week, Learning Bioinformatics Computer Skills, by Cynthia Gibas and Per Jambeck, and it immediately rocketed to the top of the Amazon Computer bestseller list. This definitely appears to be a new area for the computer industry that's just starting to hit people's radar big time. I've also made the point to VCs looking at distributed computation startups that what I see on sites like slashdot is a lot of movement by hackers towards new and interesting problems. And science looks a lot more interesting than some of the business computing that's been front and center the past couple of years. And the Biological Open Source Computing Conference I spoke at last year was definitely popping with ideas and excitement. Unfortunately, this year's conference is in Copenhagen, right before the O'Reilly open source convention, but I definitely urge slashdotters to check out this area. Demand for perl expertise is especially high."
This discussion has been archived. No new comments can be posted.

Bioinformatics

Comments Filter:
  • by Anonymous Coward
    Is it me, or was that article part news and part O'Reilly commercial? With the "news" being the smaller part.

    Where's Slashback? It's a slow Friday afternoon and I'm bored.

  • by Anonymous Coward
    Your body is a biomechanical machine, using blood to transfer energy from place to place. Drugs get into the bloodstream and they get everywhere. Not just where they're targetted, but all over the shop.

    There isn't a single drug in existence that has no side effects. This is why. And 50% of illnesses in the western world are caused by treatment. It's an international scandal.

    Genetic targetting is not going to solve this problem. It may make it less serious, but it isn't a solution in any sense.

    What we need to spend billions of dollars on is nanotechnology. The healthcare benefits of this are far, far greater - and the software problems are far more tractable.

  • by Anonymous Coward
    They have spectacularly failed to find cures for some very common illnesses, though. Anyone with a stomach ulcer can testify to this - Zantac is the world's #1 medicine, and also the world's #1 black-market drug. That's because people with stomach ulcers [and there are a *lot* of them] have to take it daily.

    How much research do you think Glaxo-Wellcome do every year into a cure for stomach ulcers?

  • by Anonymous Coward
    And your response may suggest that you have your head up your ass.

    Biomedical research isn't easy. The people who do the research are, well, *people*. There doesn't exist some special place that pharmaceutical companies can go to recruit evil sub-human scientists to manage the company and do the research for them. You watch too much T.V.

    I work for a pharmecuetical company. The people around me, including those in management, genuinely care about finding cures to diseases. We have family members we care about who have had or have the disease we do research on. At the very least, we all know it could very well be us one these days who comes down with cancer or whatnot.

    Take a break from the X-files and learn a little bit about how the world works, you twit.
  • by Anonymous Coward
    Here is the text.

    Craig Benham has a problem. As a professor at Mount Sinai School of Medicine in New York, he trains students in the exploding new field of bioinformatics?the fusion of high-powered computing and biology that is aimed at revolutionizing the health-care industry. But Benham can?t keep a postdoctorate researcher for more than a year. They keep leaving for jobs that pay up to $100,000 at bioinformatics start-ups, giant pharmaceutical companies or technology giants like Motorola and IBM that are targeting the rapidly growing life-sciences field. ?These companies need a whole new class of biologists who have training in the computational and mathematical methods,? Benham says. ?I?ve got one former student who has been hired four times in three years, increasing his salary 30 percent each time. There?s huge demand for these skills.? Benham knows of what he speaks: this summer he will join the University of California, Davis, heading up its new $ 95 million bioinformatics program.

  • by Anonymous Coward on Friday May 11, 2001 @05:59PM (#228836)
    what I see on sites like slashdot is a lot of movement by hackers towards new and interesting problems

    No, what you see on sites like Slashdot is a lot of talking by bored sys admins about new and interesting problems they wish they could work on.

  • "The software packages that comes to my mind when reading this are in fact written by statisticians and/or computer scientists... And if there is a rivalling package by a biologist, you'll see that they have often picked up statistics and methods from their competitors."

    This line of debate gets silly fast. Is a "computer scientist" a computer scientist if they focus on biology, or take a biology-centric view of the world? Vice-versa? Yes, you can define people in computational biology as either computer scientists or biologists, depending on how you like to think of them. I'll agree with you that the best researchers are highly competent in both realms. But the absolute best are biologists at heart.

    The point I wanted to emphasize is that the people who focus on "computer science" centrally, and do "computational biology" peripherally, almost universally come up short. Truthfully, you have to be a biologist before you can be a computational biologist.

    "I have numerous examples of biologists contending that their heuristic is much better than all the other heuristics (well at least on their own dataset)."

    In my experience, this happens more often among CS researchers in computational biology than it does among biologists. Check out the microarray analysis/feature detection literature before you disagree with me. It's a parade of algorithms with absolutely zero relevance to practical biology. And everybody insists that their own approach is better (the worst case in recent memory, involved a paper from ISMB '00 (I think) where the researchers' clustering algorithm produced results inconsistent with known biology, so they concluded that the biologists were wrong! I'm not kidding.)

    Biologists are certainly not innocent--after all, everyone has an ego--but the debates of this variety tend to be among two or three competing alternatives (i.e. the parsimony vs. likelihood debates in phylogeny) that have been accepted by a majority of the researchers in a field.

    "There are also excellent examples of biologists promoting their version of a traditional greedy heuristic as a "new algorithm" for solving an NP complete problems."

    I guess my response to this is "so what?" If Biologists were studying NP-complete problems, and not biology, then they couldn't be excused for not knowing the existing research in that field of computer science. But they're not. They're studying biology, and they're using whatever computational tools they need to do their job. So you're going to see these types of things for a while, and frankly, it's not a big deal. Who cares if a biologist thinks he/she has developed a new approximation for solving NP-complete problems? Does it change the fact that they've probably found another direct application of computational techniques to biology? No.

    "Have you looked at protein folding?"

    Yes. I work in the field.

    "That field is a source for the most repugnant oversimplifications ever made in science."

    Well, if you're going to make that kind of dispersion, you're going to have to be a lot more specific. Yes, there's a lot of bad literature on protein folding. A lot of it comes from polymer physicists. But the best stuff out there is still pretty simple. Does that make it bad? I would say the whole field is an "oversimplification." I would not say that this is "repugnant." I might just say it is realistic to simplify, depending on what kind of simplification I was talking about.

    "My point is that pointing fingers to either discipline is ridiculous."

    No, not really. Biologists have a handle on relevance and topicality, which are the most important things to have a handle on, frankly. What computer scientists do far more than biologists is lose the forest for the trees, and get lost in algorithmic trivia. And a lot of this owes to the sheer cultural differences between the two fields--CS people often go into CS because they want to think about algorithms and efficiency issues. Biologists just want to learn how real, squishy things work. They beg, borrow and steal from other disciplines to do this. And if they don't need to learn the whole field to do so, well, so be it. That's why the field is computational biology, and not biological computation...

    "The real path to successful bioinformatics is cooperation and humility."

    Certainly. But not equal humility. Computer scientists are entering an entirely new discipline where their own skills are of lesser importance, and they need to understand that. The CS/biology trade-off isn't equal at all, IMO.

    "Disclaimer: I am a computer scientist."

    Disclaimer: I am both, but I think like a molecular biologist. CS theory makes me yawn. :-)
  • "A lot of the work that's been done so far has been done by biologists who happen to be able to program, rather than by programmers who have learned the biology. As a result, a lot of the work uses inefficient algorithms, primitive approaches, bad statistics, and the like....Somebody who actually knows interesting new algorithms that can be applied to the problems can do even more."

    This is kind of a bad generalization to make. The software that has achieved notoriety and widespread use, while primitive in method (i.e. dynamic programming--boring, but widespread), is often based on very, very solid statistical theory. To the point where I almost find the "programmer" appeal to computational biology laughable--you'd be better served with some advanced statistics knowledge under your belt, rather than some programming knowledge, frankly.

    Also, as a student in comp bio myself, I can't tell you the number of times I've heard computational "biologists" stand up and give silly lectures on new algorithms to resolve solved problems (but in slightly faster time), or worse, completely abstract away the relevant details of a biological system in order to make new applications for their fancy new methods. While, yes, there's a danger to having a poor grasp on CS skills and doing computational biology work, this danger is significanty smaller than for those who are doing the same work without the biology skills. In my experience, it usually works like this: a biologist who can sort-of program will tend to write ugly code that gets the job done. A computer scientist who sort-of knows biology will get nowhere fast.

  • by gaj ( 1933 )
    While is a concern, I believe that it will be a problem in the long run. If insurance companies do pull a stunt like this, they will simply be replaced by companies that will take care of their clients, or by private groups that will band together to negotiate with drug companies and providers for reasonable rates.

    Of course, this is assuming that we (here in the US) can keep at least a nominally market driven health care system.


    --
    If your map and the terrain differ,
    trust the terrain.

  • Some words in advance:

    I worked for a company in cheminformatics so to say, we did software to gather and evaluate spectral and structural data, to store and retrieve it from a large database. Then I went to company that developed software for banks. Today I work in a bioinformatics company.

    The scenario was roughly the same, a lot of data in one or various databases, plus software to browse and manipulate that data. The difference is probably in the scale, the sheer amount of data, which is huge in bioinformatics.

    Compared to the guys from the financial software, the physical chemists had to work really hard!

    The problems were advanced and the number of customers, large chemical companies was less than the number of financial instutes in the second company.

    I believe the same will hold for the bioinformatics. What I can't tell however are the margins. The bankers seem to had a much better profit margin than the physical chemists. No idea what the bioinformatics customers are willing to pay. I expect pharmaceutical companies to be able to spend more on their tools and services than general chemistry companies.

    On the other hand, the present bioinformatics hype will probably to lead to a lot of competition.

    So I am not sure what will happen. Could be a good market, could be a very tough market. What I am sure of however is that the job is very interesting. State of the art software development, state of the art scientific work.

    In addition the skills requirements usually include advanced degrees in biology or statistics, things few average programmers can offer.

    Yes and no. You need a diverse team of specialists. Of course you will have scientists there, some molecular biologists, and experts in genetics, perhaps some mathematicians or computer scientists. But because you need to create good software as well, you need very good software people. Good database people, good GUI programmers, good software architects etc. Even good system admins for the large machines.

    So people need to be specialists in their IT subject plus be able to work in the bioinformatics domain as well. Interesting for me to see that many physicists seem to have this profile.

  • In addition the skills requirements usually include advanced degrees in biology or statistics, things few average programmers can offer.

    There are a lot of open source bioinformatics projects. These are typically spawned by university or other public research projects. You mention Python and Perl, so try bioperl [bioperl.org] or biopyhon [biopython.org] for a start.

    The one thing I didn't like about the biotech industry was how their research and information distribution was tied closely to their purse strings.

    You will have a lot of open source (where the majority of development money will come from public research funds) and a lot of commercial applications.

    It is unlikely that a large bunch of hackers will revolutionize this field. This is because you need a lot of domain specific knowledge and because a lot of work that needs to be done is too tedious or uncool to attract open source people from outside the bioinformatics field.

    Something like the Gimp could be done, because nearly everyone needs such a tool - but who needs for example a multiple sequence alignment editor besides biologists?

    Did you see some open source satelite control software or hydrodynamic simulation from outside their engineers communities?

  • I wonder what the odds are of finding one of these sequences in the billions of combinations currently being sequenced?

    But what is your reference DNA?

    There are regions on the chromosomes that are common to all individuals (like sequences that encode important cell machinery), while there are regions that vary more or less among indiviudals (e.g. those couple of nucleotides that differ between George Bush jr. and Al Gore :)

    And of course with ongoing research some of the DNA map data gets rewritten with higher accurate data versions (as it has been happened with the geographical world map in the past).

  • While Perl is great for cranking out some web sites with high mutation rates anyway, IMHO Perl is a maintenance nightmare.

    Anyone tried to do non trivial changes to his old Perl programms?

    My Perl programs were those that were the hardest to get understood after I had stopped working with them for some weeks. It is usually easier to rewrite them.

    Nonetheless I valued the good performance of Perl programs and was thus sceptical to other kids on the scripting language block, like Python.

    Months later, I must say that the much saner syntax of Python, the formidable documentation and the large library have changed my scripting preference from Perl to Python. Like Perl, Python has been ported to a lot of platforms.

    Ruby is a language I have not looked into yet. Its strong Japanese supporter base, has led to a lot of FreeBSD ports. So I might have a look soon.

    BTW, there are bioperl [bioperl.org], biopython [biopython.org], bioruby [bioruby.org] and biojava [biojava.org] efforts - anyone spotted a bioc or bioc++ one? And some dork registered www.biofortran.org [biofortran.org].

  • I am responsible for bioinformatics at the Institute of Cancer Research in London. Back in '97 I submitted a proposal to O'Reilly for an introductory bioinformatics book. They said that it sounded interesting, but that no one would buy it...
  • Perl (and other scripting languages) can call C/C++ routines fairly easily. I myself prefer to write the bulk of my bioinformatics code in a scripting language for easy modification, and only write the routines that really need speed in C++.
  • The computer scientists who don't know their biology are just as lost in the field the as biologists who don't know their computer science.

    True. If I have to sit through one more seminar where somebody thinks that they are doing bioinformatics by proving some unrealistic abstraction of a biological problem to be NP-hard, it will be one too many.
  • Actually there is a BioRuby [bioruby.org] but it is 1) fairly new and undeveloped and 2) mostly documented in Japanese, as most Ruby modules are. I myself like Ruby and have done several projects in it -- the problem though is that where I am now I have to work in a team, and Perl is the only scripting language that everbody knows.
  • by Jonathan ( 5011 ) on Friday May 11, 2001 @06:43PM (#228848) Homepage
    I haven't read the book myself, although I did know one of the authors (Per Jambeck) in grad school (in fact I still have his copy of Knuth's "The Metafont Book " if he's looking for it). I doubt the book is fluff, just not for CS folk. Like all new sciences, bioinformatics is done by people coming from other areas. If you are looking for a book about bioinformatics for CS folks who are non-biologists look at Dan Gusfield's "Algorithms on Strings, Trees, and Sequences", (1997) although it is beginning to be a bit dated.
  • by Jonathan ( 5011 ) on Friday May 11, 2001 @07:43PM (#228849) Homepage
    As I mentioned in another posting, Dan Gusfield's "Algorithms on Strings, Trees and Sequences" is good, although getting a bit dated now. Another excellent book is Durbin, et al's "Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids".
  • The software that has achieved notoriety and widespread use, while primitive in method (i.e. dynamic programming--boring, but widespread), is often based on very, very solid statistical theory.

    The software packages that comes to my mind when reading this are in fact written by statisticians and/or computer scientists... And if there is a rivalling package by a biologist, you'll see that they have often picked up statistics and methods from their competitors.

    Also, as a student in comp bio myself, I can't tell you the number of times I've heard computational "biologists" stand up and give silly lectures on new algorithms to resolve solved problems (but in slightly faster time), or worse, completely abstract away the relevant details of a biological system in order to make new applications for their fancy new methods.

    I have numerous examples of biologists contending that their heuristic is much better than all the other heuristics (well at least on their own dataset). There are also excellent examples of biologists promoting their version of a traditional greedy heuristic as a "new algorithm" for solving an NP complete problems. Have you looked at protein folding? That field is a source for the most repugnant oversimplifications ever made in science.

    My point is that pointing fingers to either disciplin is ridiculous. There are offenders on both sides. The real path to successful bioinformatics is cooperation and humility. Biologists need to talk to CS, and CS must talk to biology. I think this is generally well understood these days.

    Disclaimer: I am a computer scientist.

    Lars
    __

  • This line of debate gets silly fast. Is a "computer scientist" a computer scientist if they focus on biology, or take a biology-centric view of the world? Vice-versa? Yes, you can define people in computational biology as either computer scientists or biologists, depending on how you like to think of them. I'll agree with you that the best researchers are highly competent in both realms. But the absolute best are biologists at heart.

    I'd say that you should ask the scientist, and I don't think Stephen Altschul, Michael Waterman, Gene Myers, Webb Miller, Anders Krogh, David Haussler, David Sankoff, to name a few, would call themselves biologists. And you do agree that they have made significant contributions to computational biology, don't you?

    Please don't go into rating of scientists, because that is silly.

    In my experience, this happens more often among CS researchers in computational biology than it does among biologists.
    [snip]
    Biologists are certainly not innocent--after all, everyone has an ego--but the debates of this variety tend to be among two or three competing alternatives (i.e. the parsimony vs. likelihood debates in phylogeny) that have been accepted by a majority of the researchers in a field.

    Computer scientists are generally interested in methods, so yes, they more likely to propose their own method than a biologist. New methods are a good thing, although not if they have not biological basis, sure. But most CS people I know actually try to communicate with biologists to try to establish what is relevant or not. It is not easy, especially when biologists are unsure themselves. You mention micro array data; The k-means or hierarchical clustering methods in use to day are to me quite without biological relevance too in my opinion. Go look at some of the examples in the literature and start scrutinizing (sp?) the computed clusters. They can look quite weird.

    The parsimony vs. ML is about whether they have biological relevance and are scientifically sound. If a method has to be accepted before you can start deciding whether it should be thrown out or not, nothing will ever happen.

    If Biologists were studying NP-complete problems, and not biology, then they couldn't be excused for not knowing the existing research in that field of computer science. But they're not. They're studying biology, and they're using whatever computational tools they need to do their job If they write papers that actually are more about describing algorithms than applying them, then they should not be excused. What is wrong with walking over to the CS department and discuss methods a little? Why not try to do just a little bit more than the greedy approach to see if you can get an improvement? Using a hammer on a screw is not very impressive when you neighbour might have a screwdriver.

    Well, if you're going to make that kind of dispersion, you're going to have to be a lot more specific. Yes, there's a lot of bad literature on protein folding. A lot of it comes from polymer physicists.

    I am thinking of the "beads on a lattice" model, and I actually know molecular biologists who have worked on stuff like that, so I don't think you should blame the polymer physicists...

    "Repugnant" might be a bad word, English is not my native tongue, but my opinion is that the logical step needed to make conclusions on real proteins based on the simplistic lattice models is a giant leap that is very hard to defend. CS people have certainly worked on it, but they did not invent the field!

    Certainly. But not equal humility. Computer scientists are entering an entirely new discipline where their own skills are of lesser importance, and they need to understand that. The CS/biology trade-off isn't equal at all, IMO.

    Computer scientists have an interesting field on their own right and they don't need excuse themselves in any way. Some make forays into computational biology that are not the brilliant, but that is no reason to put down all those computer scientists that actually make the effort of learning the biology and even better, talk to the biologists, and make contributions.

    Lars
    __

  • I apologize in advance for the second grade level of material, however for a split second when I saw the headline I truly intepreted "biofartmetrics". I instantly had images of large corporations requiring flatulence logins (wouldn't be a problem for some of the people I've worked with...802.11 style wireless logins from remote locations...).

    &ltCOMPUTER VOICE>"Authorized....please eat more fibre Mr. Jones..."&ltCOMPUTER VOICE>

  • &nbsp It's interesting to read this...for some background, I'm an undergraduate graduating (in 2 weeks! yeah!) with a degree in biology, specifically marine ecology. The lines of hard and soft sciences seems to be blurring more and more. Chatting with a few friends the other day, the topic came up that biology itself as an integrated discipline seemed to be falling apart. The only real glue holding it together seems to be the genetiscists. Otherwise, you have your molecular bio folk, you have your organismal bio folk, and your ecologists. There's a lot of back and forth, but the real interdisciplinary biologists (by which i mean bridging the gap between the three aforementioned biology subdisciplines) are the evolutionists. The answers evolutionary biology has been yielding up, in no small part thanks to the wealth of sequence data in genbank, are amazing. They lead to answers in each and evry subdiscipline of biology.

    &nbsp Still, though, between everybody else there is a great deal of culture clash. Lab v. field. Human v. Non-human. Who can get the funding. Who's work is 'important'. It's fascinating to watch.

    &nbsp What is changing, as it is changing everywhere, is the degree of technical literacy of everyone in these fields. Talking to an old teacher of mine, he remarked that ecology has really become a hard science in comparison to what it used it be. Pick up any modelling book. It's a different world than the early descriptive studies. This is due in no small part to the large number of applied math folk that have sauntered into the field, as well as the continuing need for massive data crunching needed by the evolutionists who so often share departments with ecologists.

    &nbsp But as this next generation of biologists grows up, accessing genbank, running sequences, data mining, and using powerful computational tools is becoming more of the standard, as is our own familiarity with technology. Hell, tak e me for example - marine ecologists who hacks perl in his spare time as well as running OpenBSD servers. The necessity of existing in the digital age is breeding scientists with more basic knowhow of computational techniques.

    &nbsp It's amazing the worlds that open up when one takes just a programming class or two. One reason which so many biologists get into this field is the fuzzyness of the answers. We like examining why anwers are so imprecise. We like figuring out how to assign and partition variance to different causes. The natural complexity of the world is wonderous, and the need and ability to use the tools out there to grasp and hold on to this.

    &nbsp So where am I going with this? Well, in part, a bit of a tear on folk who think computational biology should just be applied to molevular and genetic problem. The potential for the same types of skills applied to the reams and reams of data piling up from LTER (long term ecological research) sites and other long term ecological data sets which are just starting to come to fruition is vast. Conservation issues require better models of ecosystem function in order to be attacked correctly. The exactness and certainty of computational power is ready and waiting to leap into all aspects of biology. I think bioinformatics is a first step to help codify more and more of biology into a precise science. Empiracal work is great, but it needs to be turned into a predective model at some point. The real challenge, I think, is going to be the reintegration of the fractioned disciplines of biology. How can molecular models of various biological processes be worked into a larger framework of ecosystem functions in both the short term and at evolutionary timescales? When we can really take an object oriented approach to the large scale problems of biology, and allow the specialists to work out the problems of each and every one, it's going to require biological and computer savvy to unify them into a whole picture.

    &nbsp Ok, enough for me now, i think i'm going to get that book and hone my perl population biology skills.
  • by VValdo ( 10446 ) on Friday May 11, 2001 @06:59PM (#228854)
    This seems to be a fun application of bioinformatics.

    Take some code, say the tinest known CSS descrambler in C [cmu.edu]. Maybe compress it into a nice tight zip/.gz binary. Now convert it to a DNA sequence [cmu.edu] (It seems you could actually make a couple possible sequences by switching around the letters) I wonder what the odds are of finding one of these sequences [nih.gov] in the billions of combinations currently being sequenced? W
    -------------------

  • but doesn't it fall upon the prescribing doctor to couple Zantac with an appropriate antibiotic?

    Pharmas make drugs, they don't treat patients. That's what doctors are for.
  • According to the article they're looking for people with biology and supercomputer experience. Somehow, I'd hope they'd use something a bit more sophisticated than slashdot kiddies' perl scripts to run on their Crays.

    -lx
  • I'm not saying a scripting language would be bad, but it seems strange to want to do something so large scale in perl, for the usual reasons - poor readability, maintainability, etc. isn't the best of choices. I'd think Ruby or Python would be better for serious software development - perl can do a lot of things, but for the most part is good for small hack jobs.

    -lx
  • Pharmaceutical companies are around to make money. That's why they create drugs that treat symptoms and not drugs that are cures. Now they're investing in ways to make more money from us. Great.
    "Free Software coders are around to get fame and attention. That's why they create software incrementally and not perfect software from the start. Now they're trying to get more attention from us. Great."

    The difference is? Nothing. Both are totally unsubstantiated and ignore well established theory, common sense, and any first hand understanding of the subject matter.
  • The US and Canada combined account for about 50% of world wide pharmaceutical sales. Africa and Asia (excluding Japan) less than 5%. Others somewhere in between. What's more, this doesn't fully convey the important fact that US (and other high paying) consumers do more than that to carry the market. If the United States (and to a less extent other parts of the world) had drug prices as regulated and controlled as they are throughout Europe and Canada even, many drugs would NEVER come to market because the profits aren't enough. The drug companies sell to these countries because it's just above their variable costs, meaning they make money, but not much.
  • While your points may be sound for the general competitive marketplace, you overlook the clear anecdotal evidence to the contrary. There is case upon case where there are incredibly effective treatments (meaning a cure or one that gives a dramatic improvement to the patient's quality of life) that have *already* been developed and tested...and the product was PULLED.
    "Clear anecdotal evidence" to the contrary? What is that supposed to mean precisely? There are at least two possibilities here that I can think of. A) You're pulling it out of thin air. B) You fail to understand the real issues behind them, since you don't actually work directly with the product. The odds are, in fact, that your statements strongly imply that the extent of your experience with them is academic and well removed. If there are so many examples, name a couple please! That should not be much of a problem, right? Or is this Nth hand knowledge?

    Furthermore, to shorten a potentially infinite thread, you're mostly barking up the wrong tree. I never said these companies do not exist to make profits. I won't even apologize for that. What I will say, however, is that the mere fact that these companies exist, by and large, to increase shareholder wealth does not preclude the efforts to find cures. Quite the contrary, as I've laid out in my previous post, "cures" are a dream for most of the companies' shareholders most of the time. They generally WILL pursue them.

    To briefly address some of your other comments, "breaking even" is not really breaking even in financial terms. If by "breaking even", you mean investing 500m dollars, and geting a return on that investment of 500m (hopefully) some 15 years later, then that's actually LOSING money in financial terms. Besides inflation, you must also take into account opportunity cost. That money could have been invested in other places in the stock market and returned millions more. If you figure a very reasonable number like 10% a year, that's about 1.2 billion dollars. Then you must also factor in risk. If the market can only be 500m dollars, but may be less, then you're asking even more of the company. No matter what you think of these companies, it is simply not up to them, shareholders simply will take their money elsewhere. It's equivelent to asking the shareholders to give their money away.

    In addition, where these situations tend to arise, the benefits to society also tend to be relatively small. Not to mention the most important fact, that RESOURCES are scarce. It may sound horrible that the 10k people in the country with a rare genetic disorder do not recieve their treatment, but remember that a decision MUST be made as to where to put it, because there simply is not enough to go around to every cause. The more lucractive markets also tend to be areas that society values more highly, areas where more lives can be saved/improved per dollar spent.

    Lastly, just because some companies are unwilling to pursue certain ventures does not mean other companies and/or the public sector are magically held back. If the other means do not work, that does not mean it is their fault. Do not penalize the only thing that really works. If you want to try to start a "break even" drug company, be my guest, the other drug companies aren't going to stop you from pursuing worthless markets. Or if you want to use the public sector as an alternative, again, be my guest. You won't get very far though, they have a lousy track record when it comes to actually making the end product. Just don't penalize the only system that works, you're only going to be harming those that you think you're trying to help.

  • by FallLine ( 12211 ) <fallline&operamail,com> on Friday May 11, 2001 @07:52PM (#228861)
    I happen to be involved in the biotechnology industry and I live in Philadephia, so I know a thing or two about the subject. You, on the other hand, do not. I also went to business schoool, as in finance, economics, and all that jazz, so you're way off base there as well.

    You ignore many fundamental issues in this business:

    There is strong competition. This means that it is very rare for any one company to totally dominate a market, especially for a prolonged period of time. From an offensive point of view, this means that a company with its hands on a cure would be choosing not from owning a market outright, but from owning a sliver of it, and even then with risk involved in not coming out with better alternatives as time progresses. With a "cure", a company would:

    1) be free to charge a lot for it. HMOs and insurers would prefer to pay for a cure like this, especially when you consider that so many of the costs that they pay go not to any one drug company, but (mostly) to the thousands of other ailments ASSOCIATED with that disease. (e.g., hiring doctors, nurses, medical equipment, etc).

    2) have relatively low risk. This, in financial terms, is equivalent to money.

    3) have quick turn over, when you compare that to the average 10+ year time to market for the drug companies, that's like a dream come true. put simply, 7b dollars today is worth a hell of a lot more to any one of these companies than 10b dollars over 5 years. This again, translates to money. Hint: Those dollars could have been invested in less risky ventures and returned more.

    4) would allow the company to take the entire market, rather than just a sliver. Meaning more money...

    5) saves on-going R&D dollars

    6) establishes a solid reputation...

    In addition, sitting on a cure also can easily become a defensive problem, when and if competitors find it for themselves. All those minority players in a given market would have plenty of motivation to release a cure if they had it. Meanwhile, the company that sits on it risks losing all their previous sales.

    I could go on, but you just don't get it. Now this is not to say that it's so cut and dried, that a company would never fail invest in the discovery a cure. There are certain times when the allignment of certain circumstances, say, risk, market size, pecularities of the disease, may prevent a company from investing large sums of money in a cure, but if you think companies sit on their hands on large and lucractive markets where such an opportunity is clearly exploitable you're only kidding yourself.
  • by FallLine ( 12211 ) <fallline&operamail,com> on Friday May 11, 2001 @11:53PM (#228862)
    So it's more lucrative to charge a person once rather than weekly for the rest of their lives? I can't see how that's possible.
    Why not? Who says that a series of pills must be sold for more than a single one (not that a cure is necessarily a single pill, in fact that's very unlikely)? Who says that the profits on those sales must be more? If you think it's impossible, you have little to no understanding of business, never mind the drug business.

    Ok put it this way, imagine you're Eli Lilly, you're in a drug market and sell 2b dollars a year with 30% of a given market. However, that 2b dollars a year product took 15 years to bring to market. (Hint: This depreciates the value of that return hugely). You've only been on the market 2 or 3 years and your patent will soon expire, meaning that your prices will get cut by 3x at least by the generics. Plus you've got other competitors banging at your door with alternatives today, chipping away at your sales. Furthermore, you should understand that the mere invention of that one drug was by no means assured, it was risky (investors demand a lot more return for taking on that kind of risk). You could very easily find yourself 3 or 4 years down the road without a single hit drug. In fact, to even have a hope of staying on top, you need to spend very substantial sums on R&D and marketing. In fact, only 3 out of 10 drugs on the market meet or exceed their R&D costs. Of those, only a small fraction will really generate your profits. Realistically, you're looking at a profit margin of about 15-9% (9 when you figure in depreciation), when all is said and done (remember only a very small fraction actually make it to market, let alone suceed), on a 2b dollar a year product. The picture I am painting is fairly close to reality.

    Now, imagine you're that same company, and you have a cure at hand (since you imply that they can do either just as easily). You can either continue down that same path (to the extent that you can control it) or you can bring the cure to market. The cure, if it's a given, is a no brainier. That's about ~7b dollars in revenues in the first year alone if you could sell the "cure" for the cost of one years worth of drugs, a very reasonable and low number. In fact, the HMOs and insurance companies would be willing to pay much more than this, considering how much they save from other medical bills, the complications alone far far outweigh the costs. What's more, that money comes relatively risk free. As a percentage of sales you would spend far less on R&D, meaning higher return for the shareholders, marketing would also be significantly reduced, given that it is a "cure", which would quickly become common knowledge in the medical community. So quick and dirty, ~6b in profit (minimum) for the cure versus 180m a year (figure 9% of 2b) for however many years. It really is a no brainer.
  • The computer scientists who don't know their biology are just as lost in the field the as biologists who don't know their computer science.

    --
    This sort of thing has cropped up before. And it has always been due to human error.

  • by Bizzaro ( 14691 ) on Friday May 11, 2001 @06:19PM (#228864)
    Some people in the field are now releasing their software under Free/Open Source licenses. It may seem odd to non-scientists that the license is an issue. Isn't all scientific work free and open? Far from it, especially in bioinformatics, where, as you may have read, there is a lot of money involved.

    A couple organizations have taken it upon themselves to promote freedom and openness in bioinformatics. One, Bioinformatics.org [bioinformatics.org], has a modified version of SourceForge so that the community can perform project management and collaborations on a community-run website. Bioinformatics.org has other services, such as website hosting, news forums, a software registry and repository, and more to come. The organization currently hosts 27 projects and has over 600 members. (Disclaimer: I am the Director of the organization.)

    Another organization, The Open Bioinformatics Foundation [open-bio.org], supports the development of several language libraries for bioinformatics, such as the famous BioPerl. They also host the BOSC conference mentioned in the post.

    --
    This sort of thing has cropped up before. And it has always been due to human error.

  • While bioinformatics is a bit over-hyped at the moment, the continuing trends of high-throughput and automation, as well as the associated explosion of data will ensure the need for biology-related programming/programmers in the future. Bioinformatic groups will probably be routinely found as departments within pharmaceutical companies, biotechnology companies, and many universities.

    BTW, it's actually very difficult to find good bioinformatics professionals - i.e. folks with a good life sciences background and software engineering skills. Some biologists with an aptitude for computers do progress to become proficient programmers, of course, and some software engineers can pick up the necessary biology background. Meanwhile, most people in bioinformatics shops are on some part of a learning curve ...

    YS
  • One of the things I enjoy about working in bioinformatics is that there are a number of open source tools, biological databases and resources available. (For example, I'd guess that the majority of bioinformaticians - and a good number of web developers! - depend on software released by Lincoln Stein [cshl.org].)

    The split between the "free and open" software licenses and the proprietary ones in science reflects the general differences we see with how biological information is protected/released. Companies answer to investors, and try to protect their intellectual property and trade secrets, with an intent to sell what is marketable. Academics generally have a more open attitude towards information ... at least, so long as they can still publish! (Of course, generalizations are dangerous, and some academics *do* hoard their info, while some companies may adopt a more open approach in some areas ...)

    YS
  • Don't quit your day job. While bioinformatics is a very interesting and exciting area, it is also a very small field, with potential for maybe to be a $10 billion industry at most. Bioinformatics companies have a very limited number of potential clients - other pharm companies - for which they perform various services. In addition the skills requirements usually include advanced degrees in biology or statistics, things few average programmers can offer.
  • There is something to be said for this position (that drug companies can't make money on curing diseases but rather by selling drugs that treat symptoms),

    I don't buy it. Drug companies operate in a competitive marketplace, whith very cost concious insurance companies footing the bill. If company A has a product that treats on symptoms, it's product will be soon replaced by company B's that actually treats the disease. Finally B's product will be driven out of the market by a product from Company C that cures the disease. The body of medical knowledge IS cumulative, and company C's route to profit is to develop a better product than B's.

    Sure, there is a process here, and some diseases may never have a cure, but the fact is that cures do really enter the marketplace, and drive out treatments.

  • As a PhD student in bioinformatics, I must say I strongly disagree with you.

    I am not saying its a bad career choice, or anything like that. But the article made it sound like this was going to cause a big increase in the overall demand for CS types. It just isn't so. The overall projections are not something I just made up, either - there are plenty of well thought out surveys being published as to where this is going. I have advanced degrees in Math, Chemistry and a lot experience in industrial use of statistics, as well as strong programming skills, and have followed this field with great interest. Living in central NJ, where many of these bioinformatics companies are headquartered, this seemed like a field that I would be able use my skills. But after some real investigation of the nature of the business I decided that this was not where I wanted to go.

    You talk about 'companies with thier foot biology', please tell me what exactly those are except the Pharmas and Agribusinesses? Nobody else is messing with genes.

    Hospitals are not going to have people writing bioinformatics software on staff - there is a little matter of FDA regulations on what they can use. Biotech companies that develop hardware are part of the core bioinformatics industry - their customers are the same Pharmas. Departments at universities are nice IF you can get a faculty position. Otherwise you will be paid $30,000 per year and be living grant to grant. No thanks.

  • by acomj ( 20611 ) on Friday May 11, 2001 @06:09PM (#228870) Homepage
    This is interesting to see bioinformatics in the spotlight.. I used to work at a place trying to do "meaning based search" in the medical field. They were working on among other things ontology based search and a search for protein-gene relationships for quicker drug discovery.

    We also had a doctor on board before the money started to run out.. It helps because the biology terms are very foreign to Computer types (assay, gene clips etc....)

    There was a paper in the office of some proffesor who used a brill learning algorithn with existing genes and then had it try to guess what a ramdom genes did. It did very well in the test despite the "primitive" ai.

    3rdmill and spotfire /labbook and a host of others are working on this stuff to sell to pharama companys to do better search and allow quicker more accurate drug creation. The thinking is that if you can make a parma discover drugs faster than the rest you can charge a boatload of money for the software. Discovering new drugs while keeping the side effects minimal is non-trivial.

    There is a lot of computing power in the life sciences field,and a lot of data created with gene-clips and assay data. People can't sort it all out anymore some computer analysis makes everything faster. Look at the human genome. Computers made it happen.

    "Sit back and enjoy the chaos" -Unknown
  • This year, the University of Waterloo started a new program [uwaterloo.ca] in Bioinformatics, with three ways of getting to that end:

    BSc (Honours Bioinformatics)

    BMath (Honour Computer Science - Bioinformatics option)

    BSc (Honours Biology and Bioinformatics)

    Hooray UW!

  • Ever heard of vaccines? People make vaccines, even though it's only required one time(or around once a decade). That's not treating the symptoms, or the cure, it's cutting it off before it even happens, which saves you lots and lots of money. Be thankful.
  • Your analysis is correct, insofar as you require the DeCSS bases to appear, unbroken, as a string within the genome.

    However, perhaps we don't have to require the string to be unbroken. For example, would the pattern "use 100 bases, skip 10, use 100 bases, skip 10..." be an acceptable algorithm for finding DeCSS in the genome? If so, the probability increases combinatorally, so perhaps isn't as unlikely as you think.

    As the string length gets small enough to be feasable (log4 3*10^9) ~ 16 bases, you have to start using inclusion-exclusion instead of just multiplying by M-N, which I don't feel particularly compelled to do right now.

    My point is just that there are more feasible encodings than a bits-to-bases unbroken string, so the chances are higher when you allow those cases.

  • by bwt ( 68845 ) on Friday May 11, 2001 @07:53PM (#228874) Homepage
    There are two factors that I think are driving the emergence of bioinformatics: culture and data explosion.

    When I was in college, the computer science majors "hung out" with the math majors, the physics majors, and the electrical engineering majors. Biologists hung out with the less analytical crowd. Obviously these are generalizations, but I believe a lot of "the problem" is that culturally biologists just don't have very good computer skills. Suddenly it is the case that biology as a science absolutely requires these skills. If you were one of the few (and some do exist) that broke the stereotype, you need to be starting a company about now. Otherwise the race is on for the biologists to learn programming and the CS-math-physics types to learn biology.

    Second is the fact that biologists are drowning in data. Projects like the human genome project are producing lots of data, but thats just the tip of the iceberg. There is already an exploding market in high throughput assays and measurement computation. The result is that the field as a whole simply isn't managing it's data well. Often groups store there data in extremely crappy formats. Custom text formats, asn.1, etc... I'm an Oracle programmer, so I expect the kind of solutions that Banks and .com's use: big iron data warehouses running heavy duty RDBMS's like oracle, DB2. Nope. I have yet to come across a single bioinformatics project that has a clue about data modelling. It's actually much above average to use a database at all, let alone well. If I was head of the NIH, you can bet that Freshmen biologists would take a class in SQL starting immediately.

    When you combine the two factors: culture and data innundation, very strange things start to happen. The data infrastructure just isn't there and worse a lot of people just don't realize it. Biology is presenting problems that require massive data warehousing solutions to a field whose main data background is calculating p-values to show the effect of a drug is significant.
  • By "us", I mean computer geeks.

    This is meant as an introductory text to computing for biologists. Not vice versa. If you don't understand the biology, it's pretty much meaningless.
  • ... just to chime in. That's www.biolisp.org [biolisp.org]
  • Check this out:

    Bioperl [bioperl.org]
    Biopython [biopython.org]
    Biojava [biojava.org]
    BioXML [bioxml.org]
    BioCORBA [biocorba.org]

    I couldn't find anything for ruby (either linked from bioperl, as those were, or on their own app list) but you can bet it's coming. I'd personally love to see it. But there's plenty of options for bioinformatics other than perl, although perl's excellent text handling makes it a very suitable choice.

    "I may not have morals, but I have standards."
  • Yes and no. This is a decent idea for something like drosophila, where you can mutate the gene and see what happens, but there's no way this would work on humans. If you've got a phenotype, you've got to do massive forward (reverse? I always get them confused...) genetics to find your gene. Cystic Fibrosis took a decade or so.

    If it's the other way around and you've just got a sequence then you've got some different work to do. What if multiple genes in a pathway are mutated? What if there are multiple pathways affected by this gene? What if there is no noticable phenotype for a mutated gene?

    Sure, with a massive database, this could work (and I mean huge, like multi-century lineage-total-human-population huge) but realistically, linking DNA string object to phenotype object and expecting to elucidate a pathway is pretty insane, even for someone with your nick :-)

    A better way is the genomics approach, where you sit down with the microarray and say "Ok, what's going on here?" The biological systems are too big for just a two variable approach.

    The black box idea is a good one, but not the way you propose. If you use a black box to abstract away what's actually happening (i.e. ignore what you don't need in the microarray) without actually dumping any data (if you need it in the microarray, it's still there) then you have a feasible method. With the two variable approach, you force people to leave out so many variables that the system becomes just theoretical and pretty much impractical.

    "I may not have morals, but I have standards."
  • Vaccines ARE the cure. When you portrait the problem artificially to your body simply enough, it'll generate the right anti-bodies. It's natural, only it's artificial. Confusing? The truth is always confusing and paradoxial. Doesn't make it more evil though. It's just science.

    - Steeltoe
  • ... this Newsweek article on bioinformatics

    Neat-o! Sounds like news for nerds, and I'm a nerd! great!

    , and also notes: "At O'Reilly, we just published our first bioinformatics book last week, Learning Bioinformatics Computer Skills, by Cynthia Gibas and Per Jambeck,

    Good to get the word out, I guess.. for me, and O'Reilly..

    and it immediately rocketed to the top of the Amazon Computer bestseller list.

    Um... do you want a medal?

    This definitely appears to be a new area for the computer industry that's just starting to hit people's radar big time.

    Because you published the book? Oh, because the book has done well for all of a week. Right.

    I've also made the point to VCs looking at distributed computation startups that what I see on sites like slashdot is a lot of movement by hackers towards new and interesting problems.

    Wait.. I read slashdot.. hackers, apparently, read slashdot.. I'm just like a hacker! That's a good thing, I think! Thanks for the free compliment! btw, what's the book cost?

    And science looks a lot more interesting than some of the business computing that's been front and center the past couple of years. And the Biological Open Source Computing Conference I spoke at last year was definitely popping with ideas and excitement.

    OPEN SOURCE!!! GOOD!!!! OPEN GOOD OPEN GOOD GOOD GOPENOD GOOOOOD

    Unfortunately, this year's conference is in Copenhagen, right before the O'Reilly open source convention, but I definitely urge slashdotters to check out this area.

    The O'Reilly open source convention, eh? Is that free? No? What's the cost for that?

    Demand for perl expertise is especially high.

    Um.. at the open source convention, the Biological Open Source Computing Conference, slashdot, at O'Reilly, or at Newsweek? I'm getting confused...

    Linus has,in fact,grown,and explosively-JonKatz

  • good ol' salon had an article [salon.com] on this awhile back; essentially, the profit margins in the vaccine trade arn't high enough to justify making them, for some. Read the article.. it's title 'ready for some lockjaw' for a reason.

    Linus has,in fact,grown,and explosively-JonKatz
  • When you are looking for a random sequence (of length N) within a longer sequence (length M), the probability of finding it is the above probability multiplied by M-N (the same chance over and over again for every sub-sequence of length N, assuming you don't count wrapping substrings).

    First multiplying by M-N will, for a big enough M, give you a probability greater than 1. Clearly this is wrong. What you seem to want is a series of Bernoulli trials where each trial has the probability of randomly matching the N characters.

    However there are not going to be M-N independent trials. This is because when checking character 1 through N of the longer sequence with the shorter sequence, there are going to be a lot a matches and mismatches on the individual characters. This is going to impose constraints on getting a match for characters 2 through N+1. So you just can't shift the sequence over one character and get an independent trial.

    You can see similar ideas in efficient string matching algorithms. If a match fails on a subsequence of characters you can use the information to control the shift to the next character check.

  • hehe.. I thought they only created vaccines for things that would kill you. That way, by creating the vaccine they will actually make MORE money off you because you will live longer and spend more money on coff syrup and headache pills ;)
  • I work for one of the larger pharmaceutical companies in the world. I can see from the posts here that we have relatively few slashdotters with bioinformatics experience, so hopefully I can shed some light on a few areas.

    First, I see there are many who somehow feel that large pharmaceutical companies are out to rape and pillage consumers. The truth of the matter is that most of the public hasn't the foggiest ideat how freaking impossible it is to make drugs. Most drugs that are research are so spectacularly bad, they never make it to market. The main reason is that when they are tested they either have side-effects or (more likely) they just don't work. Bioinformatics is changing all this. I see it every day.

    For instance, in the old days, I could develop an antibiotic and test it against all kinds of bacteria and it would work fine, but then I give it to a mouse (after spending a cool million in preliminary research) and it kills the mouse because the mouse has the same enzyme as the bug I'm trying to kill. Today, I just search the database with my bacterial protein and in less than a second I'm convinced that not only do humans and mice have this protein, but so does every other creature with so much as a partially completed genome sequence. So, with slashdot in one window, and a bacterial genome in the other, I move on and the company can put that money into a drug that has a better chance of being effective. What bioinformatics does is remove some of the guesswork from pharmaceutical research.

    Most bioinformatics detractors do not see the big picture. Bioinformatics is in its infancy, much as the computer industry of the '60's. Let's face it, if I give you a handful of transistors you can't do a whole lot. I give you the number of transistors in your average year 2001 CPU, and your head spins at the possibilites. The currency of Bioinformatics is data, not transistors. One complete human genome sequence is nearly worthless. 100 complete human genome sequences is damn interesting. 10,000 human genome sequences is mind boggling. With that much data, you could easily detect the differences between individuals. A nice little relational database would be able to link all those people together by features and disease information. Orwellian concerns aside (assume all of them are volunteers), this sort of comparison allows you to learn about the smallest details that make people different. This leads directly to better drugs that have fewer side effects.

    What many people don't realize is that genome sequences are going to be a dime a dozen in the future. Doctors will be able to take a swab from your sore throat, put it in a small machine, and sequence the genomes of the organisms currently inhabiting your throat. As a result, the doc gives you an antibiotic that is specifically targeted to the strain of bacteria that is causing your problem, leaving all the others alone. While it sounds far fetched now, what I have just described is an engineering problem, not a science problem. The tools to rapidly sequence DNA will get better and faster, and the sequence data will multiply exponentially. I believe that bioinformatics is at the base of an exponential curve, and that we can't even imagine what the future holds.

    By the way, Perl is HUGE in bioinformatics. Nothing parses those strings of ATCG's quite the same way...besides, we biologists need results, we can't be messing with those "strict" languages like C++!:)

  • Seriously, thats got to be the worst waste of time I've ever seen. Why not lobby your congressman or try to talk with a state rep. about DeCSS/DMCA and get the laws changed? Big deal you found DeCSS in human DNA, now what? Will anything be changed?

  • There was a paper in the office of some proffesor who used a brill learning algorithn with existing genes and then had it try to guess what a ramdom genes did. It did very well in the test despite the "primitive" ai.

    I think that this points out an important reason that bioinformatics is such an exciting field for computer people to get into. A lot of the work that's been done so far has been done by biologists who happen to be able to program, rather than by programmers who have learned the biology. As a result, a lot of the work uses inefficient algorithms, primitive approaches, bad statistics, and the like. People are constantly reinventing the wheel, and in many cases are making ones that barely turn. Somebody who comes into the field with a strong computer background can turn out to be a real hero just by cleaning up the useful but inelegant work that's out there already. Somebody who actually knows interesting new algorithms that can be applied to the problems can do even more.

  • It doesn't work as well as you might hope. A big part of the problem is that most doctors don't seem to have the time or inclination to independently research the latest medical findings. Instead they depend on pharmaceutical companies to tell them. The problems with this should be pretty obvious. This is a particularly severe problem when all of the companies have similar treatments for a problem. In that case, none of them wants to push an alternative that will cut into their cash cow. News about alternate therapies can get out, but it's slowed appreciably. And, of course, there's always some reason to doubt the new findings, which the pharmaceutical salesmen will quickly point out when the doctors ask them about it. When that doesn't work, they try pitching directly to patients so that they won't talk to their better informed doctors and find out about available alternatives.

    Peptic ulcers are a classic case of this. For a long time people thought that ulcers were caused by organic problems that caused people to produce too much stomach acid. That suggested that the only treatment was a long-term regimen of antiacids or acid-blocking medicines; patients would be stuck taking them for the rest of their lives. This was obviously a lucrative field, so all of the Big Pharma companies started producing acid blocking medicines. Then somebody discovered that the excess acid production wasn't organic after all, but was caused by a bacterium, Helicobacter pylori, so ulcers could be cured by a short regimen including antiacid medication and antibiotics. Naturally, the Big Pharma companies didn't like this and they've tried very hard to keep it out of the public eye. They've tried hard to convince doctors that the new therapies are unreliable and ineffective, and now they're trying to convince people to take over the counter forms of their acid-blocking medication instead of talking to their doctors about the problem. It's disgusting, but it's also very profitable, so you can't expect Big Pharma to give it up any time soon.

  • My school [bham.ac.uk] run a new degree course [bham.ac.uk] in this... I know it is the first of its kind in the UK... along with a MSc in Natrual Computation [bham.ac.uk]

    Looks pretty cool stuff... I wonder if there are many places doing stuff like this.

  • Look at the TOC - chapters like "Can I learn a programming language without taking classes?".

    Obviously this book is for bio folks who are non-programmers.

  • Another good book is Krogh's "Biological Sequence Analysis". You'll need a strong background in probability, though; he goes into quite a bit of depth on Markov Models and HMMs. The down-side to this book is that all of the theory is presented very well, but you'll have to write your own code for the algorithms. So it's definitely not a book for the "cut-and-paste" programmers of which there seem to be so many nowadays.


    -------
  • Heh, I replied to your previous post and recommended that book... glad I'm not the only one to like it. I think it gives a stronger theoretical background than Gusfield's book.


    -------
  • what the odds are of finding one of these sequences in the billions of combinations currently being sequenced?

    Assume 16-32 billion base pairs have been sequenced so far. Each base pair represents 2 bits (4 possiblities: ATCG). So we have about 2^36 bits. Assuming that everything is statistically independent (it's not), that means that a random sequence 36 bits long is likely to have been found. Anything much longer, and chances become vanishing small.

    There are no DECSS codes that I know of that will fit in 36 bits, so this monkeys and typewriters approach is unlikely to have generated any DVD software.

    However, the human genome has produced systems capable of playing DVDs, indirectly, so all that evolution hasn't been completely wasted.

    But you could encode DeCSS in a retrovirus and put it into your genome artificially. Or you could put it into a rhinovirus and infect some DCMA executives. Then sue them for distributing the code whenever they sneeze.

  • It doesn't just have to be perl...but since so much down-in-the-trenches bioinformatics involves sorting, manipulating and processing text strings of DNA and protein sequences, perl is, for the most part, a perfect fit. there's also a really nice set of perl classes available at bioperl [perl.org] for doing a lot of the more tedious sequence processing jobs.

    also, just because you're dealing with genetic information doesn't necessarily mean you need to use a 'genetic' programming technique either...they're quite different things.

  • Sounds like a good place for all those talented dot com refugees out there.

    VCs should make sure to look out for those who lost them money the first time around. Especially those whop were into smoke and mirrors.

    Check out the Vinny the Vampire [eplugz.com] comic strip

  • by 575 ( 195442 )
    Sexuality,
    As bioinformatics
    Was that "grep" or "grope"?

  • While bioinformatics is a very interesting and exciting area, it is also a very small field...
    [snip]
    Bioinformatics companies have a very limited number of potential clients...

    As a PhD student in bioinformatics, I must say I strongly disagree with you. I see a future where biologists will have to accuire computer skills as well as laboratory skills, in order to be able to analyze their data. Those with computer skills will be the ones who analyze the data and draw the conclusions, while those who merely do lab work will fall behind and end up producing the data for the analysists. In my lab, not one single person at a research position performs lab work only, and a not-so-bold statement would be that this is true for everybody who does research at my department (Dept. of Genetics and Pathology, University of Uppsala).

    Previously, a typical question for a biologist could be: "I have these two genes, I wonder how they interact and how they are expressed under different conditions". Nowadays, the question would most likely rather be: "I have these expression profiles from a microarray chip conatining 5000 genes, I wonder if information about this and that biological pathway can be extracted".

    My guess is that we have yet to see the big breakthrough of bioinformatics. The bioinformatics companies have not only pharmaceutical companies as potential customers - every business and academical institution with their foot in biology is a potential target. This includes hospitals, departments at universities, biotech companies who develop hardware for bioanalysis etc.

    What we see now is a booming development, where small bioinformatics companies surface by the minute. A lot of these companies will of course go down the drain (the industry bears more than a few similarities with the dotcom-industry), but the big picture suggests that bioinformatics is still in its wake and will take a dominant position in the future of life sciences.

    Just my $0.02,
    /Erik
  • The Trends Guide is really cool (although the journal web site it's stored on really sucks).

    I wonder what the "Llama" crowd is, though.

    --Mike

  • Actually, it's fairly well known now that a significant number of ulcers are caused by a bacterial infection - specifically Helicobacter pylori (hope I spelled that right). Those people taking black market Zantac would be better served by black market antibiotics. :)

    And anyway, if you could develop a drug that cures a hole in your stomach (or anywhere for that matter) without causing cancer, you'd be a very rich multinational corporation.

  • "what the odds are of finding one of these sequences in the billions of combinations "

    Its not that hard to work out surely. The DeCSS code is about 500 bytes. Each byte could be one of 256 characters. Therefore the odds of discovering the sequence at random is 256 to the power 500. Or 2 the power 4000. Or 4 to the power 500, which is about 10 to the power 200. In other words to have a 50% chance of finding the sequence you would want 10 to the 200 base pairs of sequence. We currently have something like 10 to the 10 bps.

    In summary. I wouldn't bet on it.

    Phil

  • "but since so much down-in-the-trenches bioinformatics involves sorting, manipulating and processing text strings of DNA"

    The reasons that perl is so prevelant is that by and large the bioinformatics community has screwed up their data representation. The reason for this is simple. We have the biggest legacy problem in the world. The amount of data has expanded beyond all recognition. We use techniques which were knocked together to represent hundreds of sequences to represent millions.

    The end result of this is that we spend vast amounts of time chopping and changing text formats, which is an absurd way of spending time. Of course perl is great for this, which was why it got used so much. Which leads to the second problem. Many bioinformaticists are converted biologists (myself included). The end result of this is that we are often not terribly good programmers, and have a perhaps greater tendancy to stick with the langauge that we know than we might otherwise do.

    The fact that we are using perl for relatively large projects is really a admission of failure on our half rather than the strength of perl!

    Phil

  • This book seems to be tailored for people coming from biology. Does somebody know a good book about the topic more suited for CS people?
  • by hillct ( 230132 ) on Friday May 11, 2001 @06:46PM (#228902) Homepage Journal
    There is something to be said for this position (that drug companies can't make money on curing diseases but rather by selling drugs that treat symptoms), however it is a somewhat alarmist position, at least the way it has been expressed here. I don't know why it would be suprising to see a company invest in technology that will generate future profits.

    What bothers me about this issue is the futile attempts the federal government has made to attempt to regulate biological research with respect to use of the Genome Project data to assist in such morally ambiguous areas as human cloning. The attempts to regulate this field of resesearch are futile, as they are being handled now, since the industry high profit potential, that virtually unlimited funds will be expended to house research facilities in places beyond the borders of countries that choose to regulate this field of research.

    While on the subject, I'd like to aplaud the genobe project researchers for enbracing the concept of 'Open Source' science. There were a number of firms that actively tried to gather together and copyright genome project data.

    Well done gentlemen!

    you have allowed the creation of an entirely new field of science. The openness of the research data will reduce the percieved moral ambiguity of the derivative works based on that data.

    --CTH

    --
  • who are you joking man? you have to realize that in 10-15 yrs you won't go for a blood test, you'll go for a DNA test. The drug companies today will all be bioinformatics companies in the future. I work for a smallish drug company out of Canada, and trust me we're already using elementary techniques developed in bioinformatics research for purposes of drug discovery. On the other hand, I'm very biased. I have a degree in genetics and mathematics, so I fit your skills requirement. The theory maybe hard, but the programming, while not exactly trivial, is within the grasp of most people who know how to code.
  • Consider Rosetta (RSTA) with ~175 employees vs. VA (LNUX).

    $620m, not too shabby.

    http://www.thestandard.com/article/0,1902,24434,00 .html
  • Being in the field myself, I might add that biologists are far more forgiving of biologists than they are of computer programmers (and the pay is lower - far lower in academia ["open source" biology]). Make sure that you have a good foundation in biology and statistics before you make the plunge... However, good luck. This is an incredibly interesting and fascinating field! The manpower is desperately needed. Not however from the average "dot-commer". fid
  • biology != logic you constantly have to defer to reality. This is not the typical mantra of a programmer.
  • Donald Knuth, wich has casually been mentioned in another post [slashdot.org], knew of this at least in 1993.
    The following is an excerpt (it's approx. 60% down the page) from the interview [fatbrain.com] he gave to Computer Literacy Review in December 7th 1993:

    CLB: If you were a soon-to-graduate college senior or Ph.D. and you didn't have any "baggage", what kind of research would you want to do? Or would you even choose research again?

    Knuth: I think the most exciting computer research now is partly in robotics, and partly in applications to biochemistry. Robotics, for example, that's terrific. Making devices that actually move around and communicate with each other. Stanford has a big robotics lab now, and our plan is for a new building that will have a hundred robots walking the corridors, to stimulate the students. It'll be two or three years until we move in to the building. Just seeing robots there, you'll think of neat projects. These projects also suggest a lot of good mathematical and theoretical questions. And high level graphical tools, there's a tremendous amount of great stuff in that area too. Yeah, I'd love to do that... only one life, you know, but...

    CLB: Why do you mention biochemistry?

    Knuth: There's millions and millions of unsolved problems. Biology is so digital, and incredibly complicated, but incredibly useful.
    The trouble with biology is that, if you have to work as a biologist, it's boring. Your experiments take you three years and then, one night, the electricity goes off and all the things die! You start over.
    In computers we can create our own worlds. Biologists deserve a lot of credit for being able to slug it through.
    It is hard for me to say confidently that, after fifty more years of explosive growth of computer science, there will still be a lot of fascinating unsolved problems at peoples' fingertips, that it won't be pretty much working on refinements of well-explored things. Maybe all of the simple stuff and the really great stuff has been discovered. It may not be true, but I can't predict an unending growth.
    I can't be as confident about computer science as I can about biology.
    Biology easily has 500 years of exciting problems to work on, it's at that level.

    If you have any problems with the interview link, just use Google [google.com].
  • Oh I forgot. It's the researchers who write themselves the paycheks, and who pay off the AMA and decide where Pfietzer or whoever is going to spend their money. That's like saying a developer at Microsoft want's to take his time and write clean reliable code.
    A lot of diseases are viral, genetic, or other-wise uncurable...
    Oh, I guess it's you who's the medical expert. See my post here [slashdot.org].
  • Pharmaceutical companies. companies companies companies companies. Do you what that word means? That word is reserved for people in the business of making MONEY. They have stock. They have stock-holders. If they forget that they are around to make money, they dissapear.
    How about coder? That word is reserved for one who writes code. In no way does it evince to the motives behind the coding.
    Give me a better example please.
  • ...but if you think companies sit on their hands on large and lucractive markets where such an opportunity is clearly exploitable you're only kidding yourself.
    So it's more lucrative to charge a person once rather than weekly for the rest of their lives? I can't see how that's possible. In answer to the italicised comment, I don't think companies sit on their hands when an opportuinity is exploitable. This brings us right back to where I started.
  • by swagr ( 244747 )
    I hope you're right.
  • by swagr ( 244747 ) on Friday May 11, 2001 @05:41PM (#228912) Homepage
    Eventually, the proponents of bioinformatics claim, the new field will change health care by allowing pharmaceutical companies to shave years off the drug-discovery process, and letting doctors tailor medicines to an individual's genetic makeup.
    Pharmaceutical companies are around to make money. That's why they create drugs that treat symptoms and not drugs that are cures. Now they're investing in ways to make more money from us. Great.
  • Is this book more focused towards those who already have a great deal of knowledge about biology and bioinformatics, but need to know the computer side of things? I am really interested in this area, with a great deal of computer experience, but probably lack a lot of the biology experience that is needed for the field.

    Are there some good resources out there for those who are gifted at computers and algorithms, but wish to pursue bioinformatics? Would a career in this require a few classes of biology study? I imagine so, and I would be willing to do so, but what courses really apply?

    For the lack of a better sig, I'm out

  • Zantac is the world's #1 medicine, and also the world's #1 black-market drug. That's because people with stomach ulcers [and there are a *lot* of them] have to take it daily

    Thank you for unwittingly proving the parent's assertion that the drugs treat the symptoms rather than cure the disease.

    The AMA claims that over 90% of ulcer cases are caused by a strain of Heliobacter Pylorii, which is carried by the housefly. The metal Bismuth is toxic to the bacterium, so Bismuth-containing OTC drugs such as Pepto-Bismol have been successful in actually curing ulcers (large quantities required, though--see your doc). Other antibiotics do the same thing. Naturopathic physicians have had similar success with such unlikely agents as cabbage juice. In both cases, the cure involves eliminating the colony of H. Pylorii from your upper GI tract.

    The production of stomach acids is a normal part of the digestive cycle and should not be painful. It should also not be reduced. IIRC, Zantac reduces the secretion of stomach acids, causing the stomach to be less painful--it treats one of the symptoms. A reduction in the quantity of stomach acid causes its own problems: reduced digestion (and the slight malnutrition that obviously accompanies it...) and an increased risk of stomach cancer.

    Like clockwork, the pharmaceutical companies once again push drugs to only treat the symptoms. If this wasn't true, the commercials would say "Ask your doctor about curing your ulcer with large quantities of Pepto-Bismol" instead of "Ask your doctor about Zantac". Their Macchiavellian value system ignores the ever-increasing number of side effects of all of their wares, another issue entirely.



    Ewige Blumenkraft!
  • I can second this observation in a big way.

    I'm a graduate student in psychology who is specializing in behavioral genetics/genomics and statistical methodology.

    In my first year of grad school, somewhat before the explosion of bioinformatics, I took courses in molecular biology and neurogenetics. Other grad students in wet labish biology fields were often astonished at the extensive stats courses we were [are] expected to take. They would say things like "we're told not to bother much with stats, because at most we need things like t-tests and ANOVAs and whatnot".

    They probably won't be saying that any longer.

    Observing the introduction of stats and math into biology has been fascinating: e.g., in reading articles on PCA and eigendecomposition being used for microarray analysis, you get the impression that PCA is new, fresh, exciting all over again. It's like watching a whole field being introduced to this stuff, realizing how cool stats can really be.

    I can't wait for undergrads to pick up a Bio 101 book and learn basic stat, info, and measurement theory as part of the standard curriculum.
  • www.paragen.com

    Check it out, always looking for talented people in the RTP.

    It is a very cool industry to be in. Most of the people I work with are PHDs. Almost all are Perl hackers. They use Linux. Lots of trendy VC furniture and free beer. Monster hardware to play with.

    It is presently a small industry. But the scope is not limited to Pharmaceutical and Ag companies. Can't say too much (Insider laws and whatnot) but if you think this field is anything less that pre-supernova, you're wrong. Commercial applications of our technolgies are just now becoming apparent. There is no end to the potential growth of this field.

    www.paragen.com

    Check it out, always looking for talented people in the RTP.

  • Much of it was, yes.
    The sequence recognition software (i.e. BLAST; someone should post a link) is based on 1970s voice recognition technology - somebody's table algorithm. Neural Nets, which are fairly popular for a diversity of applications other than sequence recognition, were also invented in the 70's, if I recall.
    Markov Chain Monte Carlo (a means of solving high dimensional integration through random sampling) was invented by some of the physicists who worked on the hydrogen bomb in the 1950s, but it was only recently (circa 1988) adopted by Statisticians, in particular in Biostat.
    Much of this is fairly old technology but it needs significant retooling to be applied to the problems of biology, which are very, er.... rich.
    In any case, most bioinformatics is, in fact, not ground breaking in computational terms. There is a huge amount of work to be done figuring out how to take advantage of existing information technology, ESPECIALLY in the area of Database design.
    Bioninformatics is one of the best fields to get involved in as an academic. There is significant private sector interest - which means that jobs are available and that there is always teaching demand to keep you employed - but there isn't as much bullshit flying as in really corporately penetrated fields, like nutrition (which I used to do.)

    Sam,
    presently UCSC biochemistry + applied math, start to get my belated PhD at Columbia in the Fall. Yippee! Praetorious is a character from Frankenstein.
  • I have several years of Perl expertise, as well as a trackable history in clpm (no, not as a troll there, but as a living, breathing contributor). If interested, you know how to contact me. I'll require full accomodations (I don't fit so well in economy class seats).

    Dancin Santa
  • I think your last line is 6 syllables, not five. How about "Give me 'more' and more" instead?
  • As a biomedical engineering student, I've found that there are alot of projects looking for skilled programmers, but each problem may have a completely different need on the technical end. I would reccomend looking into projects that you find appealing, and see what kind of requirements there are. I personally have been looking into imaging (like CT, MRI, fMRI, x-ray) where there is alot of new development going on. Less biological knowledge is needed than the ability to do real-time graphics set up. Another big problem is the ability to work with doctors or other professionals; most have very little technical knowledge. They often don't care about the details of the project, but rather that it is fast, elegant and incredibly easy to use. (Easy for you is _not_ easy for them. I took a whole design class in how doctors and nurses abused medical equipment) If this is something you are interested in, I would do some research on the web. The Mayo Clinic (http://www.mayo.edu) has some very good links for biomedical/comp sci cross-over work. The biomedical field is still small enough that networking is one of the biggest tools available. Get in touch with professors and post-grads who can point you towards anywhere you want to go, and also fill in any holes in your medical background.
  • I'm just finishing Scientific Computing at Carnegie Mellon and one of my main focuses is biological seqencing. Everyone is up in arms because they don't think they can link genes or a combination of genes with a physical maifested trait(phenotype), but in actuality one can easily imagine and possibly roughly design an expensive and elaborate program or system of networks that would allow the following blackbox scenerio to work: Enter a list of people. In this list, each person has two objects. Object 1: DNA string Object 2: Phenotypes of note. Now once this data is entered into the complex blackbox, it would proceed to reverse engineer which genes are responsible for the traits. Large hard to copy blackboxes are great tools for keeping together your local monopolist.
  • One of the main goals of databasing human genes is to find out what traits they influence.

    In the future we'll have this information, but currently the task is deemed as daunting

    Sure my nick is CrazyJim, but I don't if this is thr craziest thing you ever heard... Thats kinda odd. I really think you misunderstood me. Seriously, go ask a genetics professor about what genes link to what phenotype and he'll prolly shrug or say that he's working on it.
  • and it immediately rocketed to the top of the Amazon Computer bestseller list...

    Just how many biologist out there who could get their hands in the bioinformatics. It's the best seller because there are so many dot-goner look for a job to pay for their mortgage.

  • For the same reason, people will drop the insurance company for the premium they do not have to pay for.
  • I used 366 instead of 1464, so the odds are worse than that. My calculators keep puking on me, so I'll leave it at "beyond astronomical".
    --
  • On average, the description of how to construct the data from substrings of a random string will be as long as the data itself. The human genome does not in any meaningful way include the data, any more than does a sufficiently long repeating string of the four bases (AGCTAGCTAGCT...).

    Give it up. (speaking of which, I'm surprised nobody has turned this into an "All your base" joke)
    --
  • First multiplying by M-N will, for a big enough M, give you a probability greater than 1. Clearly this is wrong. What you seem to want is a series of Bernoulli trials where each trial has the probability of randomly matching the N characters.

    True. That was sloppy of me, each successive trial would have its probability of success multiplied by probability of failure of all preceding trials. So the difference only lowers the probability of a match, and not significantly in this case.

    However there are not going to be M-N independent trials. This is because when checking character 1 through N of the longer sequence with the shorter sequence, there are going to be a lot a matches and mismatches on the individual characters. This is going to impose constraints on getting a match for characters 2 through N+1. So you just can't shift the sequence over one character and get an independent trial.

    This doesn't affect the probability calculation. Each substring of a random string is still a random string, and they are no more probable to be equal to one another than two successive randoms generated independently. If you calculated how the randomly distributed matches and mismatches of one trial changes the probability of the next trial, you'd find that on average they don't affect it at all.

    However, it can affect the number of expected matches, if you examine the characteristics of the given short string. A string of one symbol repeated is more likely to get more than one match than a string in which there are no symbol repetitions. But the probability of just "one or more" matches is unaffected.

    (I also goofed about the number of trials, which would be M - N + 1)
    --
  • Stripped of header and gzipped, I get 366 bytes, X4 is 1464 nucleotides.

    The probability of any two random sequences of the same length being equal is the inverse of the number of expressible sequences of that length. In this case, it is 4^length.

    When you are looking for a random sequence (of length N) within a longer sequence (length M), the probability of finding it is the above probability multiplied by M-N (the same chance over and over again for every sub-sequence of length N, assuming you don't count wrapping substrings).

    So N=1464, and M equals roughly 3 billion. So the probability is:
    (3*10^9-1464)/4^1464

    Which is in the neighborhood of one to a squared googol odds.

    Of course, that assumes random data, but I figure it's a good enough approximation.

    Don't knock yourself out looking for it. It's not there.
    --
  • I am pleased to see O'Reilly's entrance to the field, as well as the interest on Slashdot.

    My research group studies computational genomics, and I teach two classes in the field. For this reason, I've scoured the earth for suitable books on the topic.

    I have put together a list of 36 books on computational biology [berkeley.edu]. Most of these are suitable only to niche interests, outdated, or simply bad -- and many are intended for the Llama crowd. I've reviewed several proposals for new books, so I expect the offerings to become stronger in the next year or so. Those desiring a brief introduction to the field might want to look at the Trends Guide to Bioinformatics [bmn.com] (free, registration required; disclaimer: I was a guest editor). It's intended for biologists, but should be readable by /.ers.

  • hmm. Note to self: "stay away from medical profession."

"It may be that our role on this planet is not to worship God but to create him." -Arthur C. Clarke

Working...