Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Science

Researchers Revamp Human Gene Count Estimates 101

GuyFromAccounting writes: "Tomorrow's Economist has a new article describing research that shows that the number of human genes is more than twice the estimate made a few months ago.The article describes why it is so hard to estimate the number of genes."
This discussion has been archived. No new comments can be posted.

Researchers Revamp Human Gene Count Estimates

Comments Filter:
  • If you see a pattern in something, there are two possibilities:
    1. The pattern is a result of your measurement technique.
    2. The pattern is a result of something external. (i.e., something you actually care about.)

    That is to say, science articles (and especially Slashdot articles) tend to exaggerate either the scope of someone's research or their confidence in it. Actual articles in science journals tend to be very conservative in their claims, often mind-numbingly so. Conjectures and guesses get inflated in popular science articles to the level of "scientific truth", when in fact actual scientists in the field aren't entirely convinced.

    It is quite possible that science articles seem so contradictory because that is the way they are written. Contradiction and conflict are dramatic and interesting. I think science writers emphasize this to make their articles more appealing. This skewed presentation may be the cause of what you are observing.

    That isn't to say that the body of scientific knowledge never changes. It usually changes in a much more laid-back and boring way than portrayed in the media, however.

  • And then we can research it. If we have spare resources.

    As it appears to me, human genome knowledge is not complete and human genome map in fact does not exist. And yet, it was almost patented few months ago??
  • Taco and company aren't going to post 14 stories confirming the number of genes - it's not new, and (to most people) not exciting.

    "New" has little to do with it. We got (at least) 14 Columbine stories long after it was "news". "Exciting" is much closer to the mark. I'd substitute "how much can I relate to it".

    And when evidence does surface proving that the last theory was wrong, a new one should be created to fit the data and then that should be put under the microscope for flaws. This isn't religion.

    Keeping in mind that the discrediting evidence is subject to at least as much scrutiny as the theory it just disproved. It's more fun to burn the witches.

  • by Zombie ( 8332 ) on Thursday July 12, 2001 @12:43PM (#88556) Homepage
    This makes me think of the item on "prima donna" programmers [slashdot.org] the other day. If we for a second assume that the preposterous be true, and there is a God and he created man, then
    • he's a spaghetti coder
    • he doesn't document any of his code
    • he supports the "embrace and extend" method of evolving functionality
    • he's clearly into this whole job security through obscurity thing
    • lots of people die because we can't figure out his code, so he doesn't give a rodent's behind about us

    Actually... it looks like God works for Bill Gates!

  • That may be true, but he wrote everything in BrainFuck [muppetlabs.com]
  • Nature isn't that wasteful, and it wouldn't carry around 90% of the DNA for no good reason.

    That's right, Darwin's theory of evolution says that. The thing is, just being there might be the only reason needed. Padding is a perfectly valid possible use/function. When these people say "useless", they mean "no direct functionality, unlike these other interesting parts over here."

    If they don't know what it means, they should just say so and keep working at it.

    They do. They say "We don't know what this means, but here's our best guess...". The problem is that all too often the media takes the best guess and reports it with more certainty than it deserves.
  • Sounds like you need to use verb tenses from the part of the book [wisc.edu] that was left blank to save printing costs :)

  • There is a comment somewhere down here which is
    really that noone knows how to convert a whole bunch of ESTs hitting the genome into genes. The EST data is *very* messy. We've looked at this recently inside Ensembl and don't see a big win from confidently placed ESTs. Our opinion is that the Ohio State thang is just somewhat enthusiastic
    researchers getting good PR for their work.

    Check out http://www.ensembl.org/ for the more sober-headed view of this.
  • I agree - the concept of "gene" is obviously a convenient over simplification and a tad fuzzy. It seems that "genes" are really just the attractors of DNA evolution dynamics - they are the units that have strong enough control (in however a convoluted manner) over the phenotype to govern evolutionary success/failure, and thus their own profileration.

  • I'm no genetic expert, but from what I've read, what a "gene" is, is not precisely defined. All that there is are strings of base pairs, and while there may be a way to segment off certain areas of the string and say, "this substring serves this function... and therefore it is a gene", often that's not the case.

    There are also different types of genes. Similar to data strings and instructions in computers, there's an analogy in the genetic world to genes that do things and genes that hold information.

    I don't think that this purported claim that there are twice as many genes as previously though is significant until the researchers more properly define the genes and their functions.
  • by Mr. Theorem ( 33952 ) on Thursday July 12, 2001 @07:52PM (#88563)
    IANAG (I am not a geneticist), but one argument I've heard about the so-called "meaningless drivel" in the DNA is that the current estimates for the distribution of gene sizes is heavily biased towards small genes. Apparently, the cost to sequence a gene scales with the size of the gene, and in the interest of actually finishing a sequencing project, geneticists have favored the smaller, and therefore cheaper genes. The estimates for amount of "meaningless drivel" simply take the estimate for average gene size multiplied by the estimate for the number of genes, and find that this falls short of the total number of base pairs. The problem, then, is that the average size of genes is severely underestimated.

  • Simple estimate: log2($phonemes). English has somewhere between 30 and 60 phonemes (there are enough dialects to keep things confusing), so 6 bits should be sufficient. Of course, it isn't, since there's more to language than counting phonemes, and what's left likely requires a great deal more information. And your conclusion doesn't follow unless 99% of phonemes are redundant, which they aren't.
  • ...you should be skeptical about science.


    With the proviso, of course, that you afford the other method/ideology/religion the same consideration if you should choose to compare it to, for example, creationist "theory". :)

    (I'm not saying science is an alternative to religion, no. That's not my argument)


    mefus
    --
    um, er... eh -- *click*
  • So I guess we really are quite a bit more complex than worms.

    You mean you just figured this out?

  • Aint it cool? According to the article, Humans are actually coded OO style :-)

    "For example, 60% of "zinc finger" genes (whose protein-products help to regulate the expression of other genes), are located on chromosome 19. It looks as though they have evolved by repeated duplication from a single "grandmother" gene, followed by specialisation to do slightly different jobs. Protein-kinase genes, whose products are involved in intracellular signalling, are similarly concentrated on chromosome 1. The researchers tripled the number of protein-kinase genes known from this chromosome. They also found hints that genes whose protein-products work together in a cell have sometimes ended up as neighbours on a chromosome. That might simplify the co-ordination of their expression."
  • First, what was mapped was the genome, not the genes. If you use a book as a metaphor, what Celera and the HGP did by sequencing the genome was put all the individual letters in order. Identifying the actual genes (the stretches of the genome that code for proteins) is like putting in punctuation ... right now we have very little idea where the periods and commas and paragraphs are. However, based on our understanding of genetics, computer models have been developed to guess where the punctuation should go, thus leading to the estimate of the number of genes. And it *does* appear that humans have only ~30,000 genes ... the difference between us and, say, a fruit fly, is that our genes can be spliced in a number of ways, so that each individual gene can, on average, produce 3 or more different proteins. So while we don't have many more actual genes than a nematode (worm), we make 3x more proteins, which are the actual workhorses of the body.

    Genaissance's work looks at individual differences in the letters of genes ... these individual differences may or may not mean anything to the proteins produced. Their work doesn't nullify the significance of Celera and the HGP's work, which was meant to define the norm. The availability of sequenced and annotated mouse and human genomes is an incredible boon to almost every scientist, reducing what used to be years of bench work to an afternoon sitting in front of a computer screen. Genaissance's work, on the other hand, is of more limited use, and is mainly aimed at those who design human drugs.
  • A few months ago we were told there is no God because we didn't have many more genes than most animals.

    Who the hell said that? How could the number of genes in a human have any relation to religion?

    So, yes, nothing is remotely firm, yet. How many textbooks has this "fact" made its way into while the truth awaited to be discovered?

    What "fact" are you talking about? The article and the post refer to the number of genes in a human. Are you disputing the number? Is this number really included in textbooks as a fact?

    Or are you disputing the whole theory of how genetic makeup relates to biology?

    -Bruce
  • The reality is that identifying genes in raw sequence is very much a work in progress.

    I've always wondered... what is a raw sequence? Is it just a list of all the base pairs in one person's DNA? And what does "the human genome has been sequenced" mean, anyway?

    Sorry for my ignorance - but all the pop-science articles that were gushing about the human genome project never seemed to explain these simple points.

  • This reminds me of the GA-designed circuits story...in which a researched ran an GA program to design a circuit for a specific purpose, and after some evolution got a circuit which did the job correctly. It was composed of a main circuit which appeared to do all the work, and a totally seperate, isolated loop disconnected from the rest of the circuit, which for all intents and purposes appeared to do nothing and be completely useless. So he removed this "useless" loop, and suddenly the circuit as a whole stopped functioning. This seemingly useless piece was actually manifesting some spooky effect on the rest of the circuit. It just goes to show that evolution cares only about results, not process. Which explains the amount of "drivel" we have in our genome, and the gross energetic inefficiency of most life. Sometimes I wonder what would happen if we engineered our own organism from scratch, and left out all of the cruft. It would probably be super-efficient, except it may not be so robust against "unenvisioned" problems (ambiant radiation?). That "junk" must serve some buffering purpose (think of algorithms which employ randomness - they're not as omptimal under the best circumstances, but they completely avoid pathological cases).
  • When the original announcement came out, there was a big deal made of the fact that humans have a small number of genes relative to other species.

    But do the uncertainties in counting affect the gene counts for other species as well?
  • No, the entire genome is the OS. The genes are userland programs, while the rest is kernel code, and no-ops.
    Currently we are hacking at the Userland programs, then we will become kernel hackers ;).
  • No. Sometime next Tuesday, Cowboy Neil convinces me to loan Rob mine. Well, I say next Tuesday loosly, because, well, it hasn't happened yet. I could change my mind, and not drive over there until Wed, or I could get sloshed on Bass Ale and not do it until Thurs.. Or could I? I mean, it's happening, well, that Rob came back from the future with me on Tues, I mean, that is what I told me, so..

    But.. But..

    I gotta go.. My head hurts.

  • Celera, Incyte and many other companies do not collaborate with the public effort. So what? Everyone makes such a big deal about this when after all these are commercial companies and they are not going to give away their data for free (without strings attached). I was talking about the collaboration between public bioinf centres. This does happen. The UCSC and NCBI and EBI and Sanger and Genoscope, ...

    Is "junk DNA" just that? Or some subtle part of the design that we have yet to understand?
    I am not so keen on the word "design" there. Anyhow, research is just that. Research. There are lots of things we don't know and we are trying to find out WTF this junk DNA is all about.
    Anyway this post will never be read by anyone.

  • Maybe if they open sourced their efforts and cooperated
    1- Most databases are publicly available.
    2- Many bioinformatics groups DO cooperate
    Unfortunately, everybody has a different view of what is a gene and how to find them.
  • What is a raw sequence? Is it just a list of all the base pairs in one person's DNA?

    Yes. A raw DNA sequence is just the ordering of bases on one side of dna. Looks kind of like this:
    ACTGGCTGCTAC

    And what does "the human genome has been sequenced" mean, anyway?

    That we know the sequence of all the parts of the human genome. IE, we now know the order of bases for each chromosome's coding length... (there are parts of chromosomes that don't contribute to genes... they are primarily there for structural purposes.)

    Don Armstrong -".naidnE elttiL etah I"
  • The fact is that DNA itself is pretty useless. It can't do anything without proteins, and it's the proteins that are actually acting on each other, on the DNA, and on the RNA.
    Whoa. DNA and RNA are quite capable of doing stuff on their own without proteins. Proteins are generally more versatile than RNA and DNA, but much more costly to generate. See RNA World [panspermia.org] for some perspective.

    That said, it's probably the proteins that allow these functions in terms of things like splicing out introns (alternate splicing is a form of branching)
    Actually, it's RNA (SnRNA) with proteins acting as a scaffold to increase efficiency that handles splicing, although generally it is small proteins (but sometimes rna) that forces the spliceosome into choosing alternate splicing pathways. You could consider this as a form of branching, but the code is ever so much more complex than that. What is almost happening at the DNA level is a vast form of preprocessing on the source code, sort of a loading of the libraries and code base that will be operated on by the cell... it only has a few operations that it can perform, but it performs them well.


    Don Armstrong -".naidnE elttiL etah I"
  • Ok, fair enough, perhaps saying that they can't do anything by themselves is an exageration, but they can't produce life by themselves, which is really the key.
    Neither can proteins. But there is a pretty prevalent theory that kind of makes sense (to me at least) that says that RNA was the precursor to life. RNA molecules arose and were self replicating at low efficiencies owing to their primary structures. (With the generation of appropriate RNA molecules, you can get RNA that are self-replicating at high efficiencies, but I digress). Of course, were stuck into this question of "What is life?", and not knowing your personal definition, I'm not going to comment much on it. Suffice it to say that life itself is not black and white, and frankly, in a lot of instances, the answer to the question is not particularly important. (It's more of a label than an accurate description of what is going on.)
    You can't consider RNA and DNA on their own because they belong in a complete system. Even viruses require a protein coat, and they're in between in the life/not-life category. To understand anything living, you need the proteins too.
    For a bottom to top holistic understanding, of course. I totally agree that a synthesis of knowlege, from DNA, RNA and proteins to lipids, co-enzymes, co-factors, hormones, etc. is necessary in order to even begin to fully comprehend a complex system. (Actually, the synthesis of this information is one of the branches of the research I am doing right now... and if you'd like to dicuss it further, I wouldn't mind.)

    Don Armstrong -".naidnE elttiL etah I"
  • Go ahead and send me an e-mail since I don't seem to be able to find yours... ;-)

    Don Armstrong -".naidnE elttiL etah I"
  • Not at all. As I stated in my previous post, DNA is not code. It's an analogy, but the simple fact is that proteins are not hardware, but rather they are a part of the whole system.

    If you really want to look at a biological system as a computer program, it's a better idea to think of the proteins as part of the program, rather than the thing the program runs on.

    The DNA contains instructions, but in no real linear order. That's one (of about a billion) reason why deciphering the human genome is so difficult. Granted, source code isn't necessarily linear either, but there is no entry point for the "program" of a living system. Even birth is not a starting point because there's already a bunch of stuff happening even before the cell is implanted (all life from life). While this can be likened to a compiler, it sheds light on the idea that DNA isn't so much a program itself but part of this gigantic system which includes a lot of other "programs." Just as a kernel can not really do much on its own, neither can DNA.

    I'm not saying DNA is not like code, indeed the fact that we use the term "genetic code" at all indicates that there is some degree of similarity. But to consider DNA to be the entire thing is completely absurd. You can understand a program by looking at the source, but that's like saying you can understand a gene by looking at its protein sequence and such (which isn't completely true, but you can deduce a fair amount from it). You are looking at an entire system here, not just a single "program". It's actually very similar to a huge UNIX system, with a ton of small programs (proteins) that each do a few specific tasks. You can't understand UNIX by looking at one program and ignoring the rest. You also can't understand UNIX by looking at all the source code together (especially if you don't understand most of it, as we don't) because you don't know what programs are running at what time. Just because you have the source code to apache doesn't mean it's running right now. Same thing with proteins, you don't know if they're working right now. That's a major major problem, and it's one that neither the UNIX source nor the DNA has an answer to.

    "I may not have morals, but I have standards."
  • A gene has been very specifically defined. It goes as follows: "a gene is a unit of heredity." That's all there is to it. All the rest is a function of that. DNA is a physical manifestation of this idea, and the different genes must all perform different functions, but all are units of heredity that are passed on to subsequent generations.

    "I may not have morals, but I have standards."
  • Ok, fair enough, perhaps saying that they can't do anything by themselves is an exageration, but they can't produce life by themselves, which is really the key. You can't consider RNA and DNA on their own because they belong in a complete system. Even viruses require a protein coat, and they're in between in the life/not-life category. To understand anything living, you need the proteins too.

    "I may not have morals, but I have standards."
  • Actually, the synthesis of this information is one of the branches of the research I am doing right now... and if you'd like to dicuss it further, I wouldn't mind

    Cool! What kind of stuff? I'm always game :-)

    "I may not have morals, but I have standards."
  • blockquoth eraserbones
    And yet, the DNA spans between genes are generally referred to as 'useless' or, in this case, 'meaningless drivel.' Am I missing something, or is this exactly where the good stuff is?
    Not necessarily. You're caught in they typical /. trap of thinking of DNA as computer code. Granted, code is probably the best analogy we have, but it's still an analogy at best. You're very correct that living systems can operate by branching, looping, etc. just like programs. However, your mistake lies in looking too hard at the DNA and not hard enough at the whole system.

    The fact is that DNA itself is pretty useless. It can't do anything without proteins, and it's the proteins that are actually acting on each other, on the DNA, and on the RNA. That said, it's probably the proteins that allow these functions in terms of things like splicing out introns (alternate splicing is a form of branching) and DNA replication via DNA polymerase and other helpers (a form of recursion).

    While I personally don't believe that the intervening DNA sequences are complete garbage, I don't think they hold the processes you're looking for as much. I agree with the idea that they provide a lot of raw genetic material for evolution, and I also think they play a role in gene regulation by chromatin bundling and such.

    However, the idea of DNA as a program is only a small part of the picture, and in reality even when we have the genome and the proteome, we're still going to have to figure out how everything works together. A living system is big and complex, with tons of parts we don't understand yet. It's going to be a fun time figuring it out.

    "I may not have morals, but I have standards."
  • No, no, the introns are *source comments*:

    ACGTCATCTGAGCGTCGCGGCAGTAGTCTGCGTATGCTGAGTCGAGC

    /* Pinky finger - this should be the right size to fit comfortably into the hole in a CD */

    GTCGTGTCAGTTGCATGCGTAGTCATCTGACGTAGTCTGACTGATGCT GT AGCTAGTCAGTCGTACGTAGTCG

    /* Appendix - I don't remember what this is for, but let's leave it in anyway */

    And so on.
  • So you're saying no science should be published until it satisfies a need you see as useful? This is why there's so much bad science: people insisting that there's an industrial need for the product of all research. How do you think arseholse like Monsanto got the idea to patent genes and screw us all out of the benefits of our own genetic heritage? Through narrow-minded thinking like yours.

    Anyway, now I'm done with you, I'd like to point out that identifying a 'use' for every gene is NOT the best goal of genomics. There's a great line of thinking which says that mobile gene fragments may have transported modular portions of genes around the genome all the way from intracellularly to inter specially. If common structures can be found throughout the genome, particularly in functional gene exons, it might prove that the path of evolution is alot more interesting and perhaps short than we thought.

    Patterns of punctuated equilibrium dominate modern theories of special evolution. For instance, the mammalian radiation after the death of the dinosaurs. We went from a few small rodent-types to a gigantic variety of fauna in a few million years. Identifying links between these species on modular gene and non-encoding DNA fragments can tell us about geography, nationality, immunity, evolution - damn near everything.

    The genome is a fascinating place. Don't get trapped into thinking that the genes are the only interesting part. Maybe the uses above aren't the single most important right now, but the more we know about the genome, the more ways we have to exploit it and stop the onslaught of terrible genetic disease. Many great inventions occur by accident, so you should never limit science to what you see as strictly 'useful.'

  • by zpengo ( 99887 ) on Thursday July 12, 2001 @12:48PM (#88588) Homepage
    Why do I get the impression that scientists are trying to drum up new things to patent?

    "Uh...yeah...turns out there are plenty more of them. If you want to see the data, though, you need to pay us royalties!"

  • I concede that you could be right on both counts. Thanks for the reply :)
  • I'm glad to hear there are others who have realized this. I agree wholeheartedly. It's bugged me quite a bit that the folks doing this research feel confident that more than 90% of our genetic information is meaningless.

    What would people think if someone claimed to finally decode the mysterious "Russian" language, and announced that it was 90% meaningless?

    Nature isn't that wasteful, and it wouldn't carry around 90% of the DNA for no good reason. Didn't some doctors used to think that the heart was a useless organ? Even the tonsils are somewhat useful. The appendix seems to be one of the few examples of wastefulness. So maybe if I heard that less than 10% of DNA was meaningless I could believe it.

    Fact is, we barely know anything about what our DNA means. I'd bet there's a lot more to it than the straight protien mappings that we know about now. I am very excited to see the developments over my lifetime in figuring it out. But when I hear these reports, I wonder if the folks doing it are even qualified. If they don't know what it means, they should just say so and keep working at it.

  • One simple example is tandem repeats (sequences of just a few base pairs that could repeat tens of thousands of times, in a row), which make up a significant percentage of our DNA. It's probably impossible to ascribe a function to such segments beyong simply making the DNA molecule longer (a good thing geneetically, as this allows more of a mixup to occur during the "cross-over" in the generation of sex cells. But it really is just drivel, in the "code" sense. no way around it.
  • If you were the first thing alive in the universe, forget prima donna programmer, you're kind of by definition a prima everything...

    The only "intuitive" interface is the nipple. After that, it's all learned.
  • Back at TAMS (the Texas Academy of Math and Science), clubs sprung up like wildflowers. Most of them were quite silly. But, the worst part, was that they would spam you - in everyone's box, they'd put all sorts of unwanted advertisements. So, we decided to make a counter-club. We formed the Time Traveler's Club (TTC). We put up big signs with our slogan, "I'll See You Yesterday!". We passed out notes announcing that our next meeting would be last Tuesday. We passed out notes thanking everyone for the large turnout at our last meeting on the following friday. ;) Ah, that place was fun... that was almost as fun as the "Nobody For President" campaign we ran...

    -= rei =-
  • by Rei ( 128717 ) on Thursday July 12, 2001 @02:16PM (#88594) Homepage
    A perfect example of this is cloning. When Dolly was cloned, the media went all-out in seing cloning as "Its here!!" Then, when problems started showing up, people started on the "cloning is horrible, everyone gets defects" bandwagon. When, in reality, neither were true.

    The technique used in creating dolly was just awful. The scientist who worked on it has become a cloning opponent, largely due to seing his failures. On the contrary, other cloning researchers, like the ones that did the honolulu experiment with mice, have become its biggest proponents.

    What was the dolly experiment like? Well, first off, they chose sheep because there is a very long period of time from when the egg is fertilized to when it divides for the first time; this length of time was assumed to (and likely does) help the odds of the cell thinking things were normal. However, his techniques were awful. After denucleating the ovum, it would go into a dormant state. So, he had to get the new nucleus to be in a dormant state, too. He did this by starving the cell for hours until it almost died. Then, instead of transplanting the nucleus directly, he applied a powerful electric shock through the solution the two cells were in, which usually caused them to merge, and act like a just fertilized egg. Now, I'm sure everyone reading this is just going, "this is going to cause serious problems". And, that it does. Dolly was a lucky sheep. Most of the embryos weren't near as lucky - that shock does a lot of damage to the cell (and starving the nucleus until it shuts down is bad too).

    The honolulu experiments, for contrast, used mice. Mice are an even harder subject to deal with, because they have notoriously fast divisions in fertilized egg cells. But, they used them anyways, because they were not only convinced they could clone them, but they wanted to see results several cloned generations down the line. The technique used there involved, like before, denucleating the egg cell - but, doing this *right before* having the new nucleus implanted. To get a dormant nucleus, he took cells that were always (or almost always) dormant, such as certain nerve cells. He extracted the nucleus, and implanted it immediately into the egg cell. Then, he put the new egg cell inside a solution which prevents it from forming a polar body (and throwing away half the genes), so that it would think it was now fertilized. It then began to divide. They had a level of success that looks like, once the technique is perfected, will approach normal external fertilization techniques. Signs of premature aging didn't occur until 5 generations of clones - this due to the fact that genes slowly change over time, and usually for the worse; we basicly extended the mouse's lifespan beyond what its DNA was designed to handle (this could be fixed by making a "DNA backup" from the original mouse, and then reconstructing that DNA each time you want a clone - an important reason for fast DNA sequencers).

    The media locked onto the first story. When problems started to arise, they completely switched gears, and made it look like all cloning is dangerous. Bad media! No cookie! :)

    -= rei =-

    P.S.: BTW, I have yet to see a project where they actually transfer the mitochondria - that's over 1% of most animals' DNA.

  • Yes, but He didn't give us the source. Therefore, all of you in the "closed source is immoral" camp must be either atheists or blasphemers. I know Stallman is an atheist; I guess that makes a lot more sense now.

  • by icqqm ( 132707 ) on Thursday July 12, 2001 @12:55PM (#88596) Homepage Journal
    They didn't per chance say this because they found 46 chromosomes in a human instead of 23 did they?
  • We jump to conclusions too easily and judge it as fact.

    I doubt we'll ever know how old the earth is. I see new theories all the time stating it's older or younger than the numbers that are generally accepted by evolutionists.

    And Moore's law isn't a law at all. It's just a theory.
  • by joto ( 134244 ) on Thursday July 12, 2001 @01:45PM (#88598)
    Ah, no. But what you see, is only the binary. It's only after it is compiled with gcc (genetic code compiler) that it seems messy. It's like viewing a binary in texteditor. As I remember, God used to be especially picky on intendation and variable naming, and listed preconditions and postconditions for all his functions, as well as invariants for all of his datastructures.
  • Why does it seem like every other article contradicts current beliefs?

    Im always hearing, this makes the universe twice as old as previously thought, or this will push back the estimated date of intelligent life by a million years, or this blows away current estimates for the end of Moore's Law.

    I guess reading /. is giving me a healthy skepticism of science and the common belief in general.

  • ...most genes are split up into segments... That's it. I'm reprogramming everything with GOTOs.
  • Eric Lander (leader of the Whitehead Institute and one of the lead authors on the public genome paper) said at a talk in New York the week before the paper was published that although they were publishing the 31,000 gene number he didn't think that was all there was.

    He also relayed a conversation he'd had with an older, much crankier genomicist (who sequenced the first Mycobacterium genome and whose name I forget at the moment) who was the source of the long-believed 100,000 genes in the human genome figure. He said that his prediction was right because it was within one order of magnitude and that was close enough for statisticians.

    --
  • Meanwhile, in the afterlife...

    Gabriel: Lord, the Humans are discussing the Human Genome again.

    God: Again? Let's see if they've learned anything from the last time...60,000? Now they're just guessing! "Meaningless drivel"? Who do they think they are?

    Gabriel: I must admit, humans don't really seem to grasp this one yet. Look at this curious group of them over here.

    God: Spaghetti coder? Security through obscurity? They're comparing me to a computer programmer? How utterly 3-dimensional of them. Haven't they heard the term, "Thinking outside the Space-Time Continium?"

    Gabriel: Shall I go enlighten a few of them Lord?

    God: Nah, let 'em pound that brick wall for a few more decades or so. The experience will do them some good down the road.
  • by WolfWithoutAClause ( 162946 ) on Thursday July 12, 2001 @01:11PM (#88603) Homepage
    The reason is that the genes don't exactly have a marker at each end to delineate them. The genes are to some extent a matter of definition. They're different lengths, they can sometimes be found in different places, and sometimes 'two genes' do exactly the same thing even though they read quite differently.

    Some bacteria have been found with two or three sets of genes sort of ontop of each other- starting with different offsets. Its a bit like code that you can jump to at 0x2000 or 0x2001 and it does different, but useful things!

    Anyway reverse engineering this lot will take a while...
  • ...can't be that many... Levis, Jordache... uh...
  • While you are correct in that we have a lot more to discover about gene/protein interactions, I question your motives behind the cut-and-paste.

    What aspects of discovering that we share (mostly) the same set of transcriptional regulators as jellyfish, mice, and apes do you find not useful? Noticing that a receptor related to feeling pain in humans is homologous with those found in pigs, which we have a drug for, must be a bad thing. How about the insights into anthropology gained by comparing the 40 or so publicly known partial sequences floating around, knowing more about humans must be bad.

    Sure, we can't look at the geneome and spit out a cure for cancer in a week but efforts so far have been anything but useless (not to mention the tangental effect of developing new sequencing and computing technologies to handle all that data...)

    Oh, and if you had read the article, you would notice that a significant portion of this newest round of discoveries concerns grouped genes with similar function, which is useful for development of drugs which can target entire metabolic pathways which can lead to disease, insted of individual components of that pathway.
  • Aint it cool? According to the article, Humans are actually coded OO style :-)

    Uhm... No.

    The snippet you cite there describes a bunch of copy+paste operations done on a block of code [DNA] (not necessarily something as sophisticated as a function though) as it is written with minor modifications [mutations] for each imperfect copy [slightly different gene]. It's almost like taking a small loop containing conditionals which depend on i and rolling it out so that i is hard-coded and each rolled-out loop contains only the conditional important to it. In both cases, the modified duplicates remain close together but for different reasons (If I have to change something in each instance of my rolled-out loop, I can scroll down to it vs The DNA sequence in/around the region of interest is conducive to this type of duplication).

    The OO metaphor can be 'kind-of' observed in DNA or protein sequences which have alternative splicing. For instance, a protein kinase, which cuts a peptide [method], can recognise different sequences to cut at depending on what recognition factor gets spliced in at the time the protein is made. So a kinase can be initialized with the "cut-at" property equal to "valane-leucine" or equal to "phenylalanine-isoleucine" or whatever combination, depending on what sequence gets spliced in. Genes and proteins with alternative splicing are, however, very rare (we have not found many... yet).
  • by ageitgey ( 216346 ) on Thursday July 12, 2001 @12:42PM (#88607) Homepage
    Month-old stories used to get posted, but no we are getting links to stories that aren't even available yet! Maybe Taco finally got his time machine that was backordered at ThinkGeek.

  • First, while I'm posting, note that the GeneSweep page I linked above has statistics at the bottom on how people are betting -- supporting my assertion that there are no hard-and-fast conclusions about how many genes to expect. Also, when I wrote very different, presumably very different, I meant very different, presumably very accurate.

    Second, realize that the questions you ask aren't scientific questions like "What is the molecular weight of water?" They're production questions, like "When is software ready for 1.0?"

    I've always wondered... what is a raw sequence? Is it just a list of all the base pairs in one person's DNA?

    Basically. Sequence information comes out of the machine in blocks of 300-800 base pairs. Those pieces then need to be assembled into one contiguous sequence for each chromosome. Of course, they're actually mixing reads from different people. But for purposes of assembly, all humans are so similar that the differences are meaningless.

    I'm using "raw sequence" to refer to a string of As, Gs, Cs and Ts with no additional information.

    And what does "the human genome has been sequenced" mean, anyway?

    The truth is that the announcement of the "completion" was purely a political development. The public project and Celera declared a truce so they could end the "race" and work at a scientifically sound pace. The second "completion" came when both sides agreed they had enough information to draw substantive conclusions about the nature of the genome. There are still a lot of gaps left to be closed, though. And even at the point where most objective observers would agree that "the human genome has been sequenced", the bulk of the work will still remain to be done: identifying genes, figuring out what they do and identifying all the variants that account for human diversity.

    Unsettling MOTD at my ISP.

  • To add a bit to krmt's excellent points: If you insist on thinking of DNA as C code, think of the coding sequences as printf's and such and the intervening segments as the rest of the code -- flow control and whitespace. (We're talking about C, not Python..) Humans have a huge amount of whitespace, most of which probably isn't important at all. But it's very difficult to distinguish regulatory elements ("if" statements) from junk (whitespace).

    On the whole, you're right, though. The raw number of genes is more media-friendly than scientifically important. Unless your company has a business plan based on patenting genes.

    Unsettling MOTD at my ISP.

  • by update() ( 217397 ) on Thursday July 12, 2001 @01:22PM (#88610) Homepage
    The article (and the writeup here) makes it sounds like one presumably very accurate estimate has been supplanted by a very different, presumably very different, estimate. The reality is that identifying genes in raw sequence is very much a work in progress. At the annual Genome Sequencing meeting at Cold Spring Harbor in May, a bunch of groups presented different methods that resulted in widely divergent numbers. Everyone's numbers were increasing over the estimates of last year, though.

    It'll sort itself out over the next couple of years as the sequence gets better assembled, more non-human sequence is available for comparison and the groups adopt one another's good ideas. In the meantime, it looks like a good PR person at Ohio State managed to make their findings seem more revolutionary than they are.

    By the way, if you want to bet on the number, see the GeneSweep page [ensembl.org]. (Note that bets must be placed in person!) I put my $5 on 44,000 and change.

    Unsettling MOTD at my ISP.

  • . Moreover, most genes are split up into segments, known as exons, that are separated by long stretches of meaningless drivel. Although this drivel is copied during the first stage of the process by which genetic information is used to build the proteins that do the donkeywork of maintaining life, it is then cut out of the copies before they are transferred to the protein-making machinery

    So they keep making errors, and there are these long segments that are considered 'MEANINGLESS'. They've doulbed their number! Perhaps humans AREN'T as complex as they thought we were. People find it hard to believe that we could be so closely related to other species. THEY LOOK for the things that seperate us. Humans, worms, and plants, at the base level, are still made up of the same 4 chemicals. Perhaps we are being overzealous in thinking that we are somehow so different.

    I find it hard to accept any study also that claims a large portion of a DNA molecule to be 'DRIVEL'. Why is it so meaningless, just because you don't know what it does yet or what it's purpose is? Maybe that's the most important part that your looking over!

    We'll see alot more in the field of biotechnology. It's definitely an interesting field of research with some pretty scary possibilities when we think about it. But until they can get a better handle on what the numbers are, and learn not to consider things meaningless, I'm not siding with either side.

    [Something witty and intelligent should have appeared here.]
  • Hmmm... So the first guys went out and said that we were very similar in the number of genes to alot of other people. People were kinda taken by shock, much like people were when Darwin said we had evolved from apes/animals.

    Now these guys have come along and said that the other guys were wrong, and that we have almost twice as many genomes as we thought we had, making us VERY different.

    So who is right, can we know?

    All I know is that I'd put my money on these reccent guys for getting their next round of government/private funding and the other guys will be scraping the couch for quarters. Why? because people want to think they are better than everyone/everything else. They see themselves as VERY different. They don't want to be told otherwise, regardless as to whether it is right or wrong.

    "A closed mind is a wonderful thing to loose"

    [Something witty and intelligent should have appeared here.]
  • Most of the embryos weren't near as lucky - that shock does a lot of damage to the cell (and starving the nucleus until it shuts down is bad too).

    Oh, Brave new world...

    cmclean

  • So I guess we really are quite a bit more complex than worms.

    Worms yes, Worms Armageddon however...

    cmclean

  • The plain old numbers game doesn't work with the genome - we have 3 billion base pairs but frogs have 9 billion. We've got 46 chromasomes but dogs have 76. The most telling figure is that we're 99%++ genetically identical to chimpanzees and yet we can't interbreed (not that I've tried...). What really matters on a genomic level is the interplay between genes during crucial times of development, not just the functions of individual genes (our ribosomes are nearly identical to bacterial ones) - many of which were selected by evolutionary pressures meaning that once a problem had been solved, for example how to copy DNA, that gene was 'set in stone'. After all, who wants to keep solving a problem once you've found a solution? Back to the point on crucial times of development - could you imagine what the result would be if the gene(s) controlling synapse differentiation in the fetus stayed active for an extra hour? day? week? - that, ladies and gentlemen, may be all that is separating us from chimpanzees. Just a simple interplay of genes, subtly disturbed by a simple mutation, perhaps lengthening a crucial phase in brain development. So the numbers game is just irrelevant. -Nano.
  • by 3prong ( 241218 )
    I never thought a serious article on gene research would contain the phrase "the donkeywork of maintaining life"
  • Nah, we just lack the address space in the brain housing group to grasp all the coding nuances.

  • Seems to me that the DNA file system is steganographic.

    To the snooping researches data looks like a bunch of drivel. The intendent recipients (cells in different tissues) apply different keys and extract blocks where the key matches.

    Imagine this in the near future:

    http://stats.distributed.net/DNA/
    Keyspace Checked: 4.042%
  • and this doesn't even go into the similarities between prayer and tech support...
  • by Rasta Prefect ( 250915 ) on Thursday July 12, 2001 @12:58PM (#88620)

    It's psychology. While most scientists tend to regard the first few studies on a topic as little more than a theory until its been confirmed by a few other people, the Media and general public tend to take these as absolute answers. "Well, then" they say "Thats taken care of". If the next study indicates the first one is wrong, then somethings changed. Thats news. If it simply confirms that yes, we are little more complex than your average nematode, thats not news. If suddenly we have way more, then the number of genes in the human genome has changed(well not really but you understand what I mean) thats news and shows up in the popular media. Slashdot works the same way. Taco and company aren't going to post 14 stories confirming the number of genes - it's not new, and (to most people) not exciting. But if it challenges current beliefs, its exciting and gets posted.

    As for a health skepticism about science, you should be skeptical about science. Skepticism is (or should be) an integral part of science. Nothing should be taken for granted, nothing should be accepted as true until a good number of people have had a chance to kick it around in every way they can think of without finding a problem with the theory/study/whatever. And when evidence does surface proving that the last theory was wrong, a new one should be created to fit the data and then that should be put under the microscope for flaws. This isn't religion. You don't take things on faith. Everything should be questioned and tested before acceptance.

  • And yet, the DNA spans between genes are generally referred to as 'useless' or, in this case, 'meaningless drivel.' Am I missing something, or is this exactly where the good stuff is?

    I agree. I find it mighty arrogant for DNA researchers to admit on one hand that they don't understand most of what they're dealing with and on the other hand declare large portions of their subject matter as "meaningless drivel". Perhaps the drivel is there to provide time for the proteins to fold or to synchronize the encoding with other activity in the cell.

  • by mcockerill ( 258961 ) on Thursday July 12, 2001 @02:29PM (#88622) Homepage
    The research article is available from the Genome Biology web site here [genomebiology.com].
  • That blows away the study funded by Levi Strauss, Inc. that determined there were only 501.


    "You know, the golf course is the only place he isn't handicapped."

  • I think that's the whole point of REsearch, isn't it?
  • Because of the way that Celera (and, AFAIK the HGP) identify genes, I feel that we are going to be seeing this soft of announcement increasingly often. Celera has been using imprinted cells as a base for their sequencing. Imprinted cell have many genes hidden or 'turned off' by interceptor protiens. What needs to be done, but could take many years, and more bio-chemistry than we have now, is to 'walk' the genome. Take the base pairs 3n at a time, and sequence, ala Folding@Home. Actual protien sequences overlap, in a sort of bio-chemical data compression. If you offset the "start here" tag by one base pair you don't always end up with a garbage protien. This is why Celera's claim of such large quantitys of 'junk' DNA was scoffed at by many in the scienfific community.

    But I'm rambling. Time to go home.
  • 1) I was under the impression that Celera's technique involved finding a protien and following it back to the gene, then sequencing. It was supposed to "Only sequence 'Real Genes (tm)' and not all that filler".

    2) 3n at a time. As in, n codons at a time (DNA is meaningless other wise). I was trying to suggest that genes have both a location and an offset, so we should sequence a given series of codons 3 times, with different offsets. Yes, it's more work, but certain features of viri suggest that the same stretch of DNA can do multiple protiens.

    3) Actually, more than 90% of DNA is non coding in any given cell. All cells will sequence the Citric acid cycle for instance, but only a few will sequence seratonin. You MUST sequence (and compile the protien) for the whole genome. Then, look at another type of cell to see what you missed. Mishapen protien fragments that seem to be "junk" may shape polymorphic protiens as they fold. A sequence lacking its acton may suddenly have one when other "junk" is removed in an intermediate step. Just because a gene is nonexpressed or seemingly non-expressable does not mean it has no function.
  • ...the number of human genes is more than twice the estimate made a few months ago...

    Great. Now there are twice as many things that can go wrong.

  • there is no ambiguity in the definition, and there's not always a computer analogy.
  • um, no.

    actually, the estimates of the number of genes in the human genome were much higher before all this sequencing took place. people were shocked there were so little. those guys will be scraping for quarters because their research was probably flawed.

    science is convergent, and this is probably better research based on the other groups findings and subsequent research by other groups the first wasn't privy to.
  • 1- Most databases are publicly available.
    2- Many bioinformatics groups DO cooperate

    "Many and most" is not all. Is Celera [celera.com] really cooperating [wired.com] with the HGP? I know Celera has a Consensus Human Genome site [celera.com], but that is that everything they know? How does that compare with the UCSC data [ucsc.edu]? Is the patenting of gene sequences and techniques inhibiting research? I'm not asking these to troll, but simply because I'd like to know the answers. Unfortunately, everybody has a different view of what is a gene and how to find them. Probably in part because we don't know as much yet about genetics as we'd like to think. Is "junk DNA" just that? Or some subtle part of the design that we have yet to understand?
  • Whenever a scientist publishes, it IS open-sourced. But when Celera [celera.com] or some private company announces results, that's not same as publishing a paper in peer-reviewed journal. My comment wasn't about this case (go, Bo!), but about what I can tell of the process as a whole.
  • The result is a lot of data, but those data are scattered in numerous databases that are organised and maintained in diverse ways by various research teams. Maybe if they open sourced their efforts and cooperated, (a) they'd spend less time potentially looking at the same things and inventing the same processes (b) somebody would get a better sense of how much has been done and how much is left to go by looking at all the data. I know, I know...there wouldn't be any patents and piles of money if they did it that way, hence they have no motivation to... And of course, then they'd then spend their time fighting off German lawsuits for naming their sequencing software KGene or some such... Gene Anderson
  • More complex than worms? So you're saying the number of genes determines a living things complexity? Hmm, let's count something else. I've got four basepairs, the worm has got four basepairs, I'd say -based on the 'counting' argument- I'm as complex as the worm. Maybe you can count something else and the worm will be the most complex.
    Time to got to work and do somthing useless.

  • Good, you got my point. Hey, I needed less words to say the same. And now I'll go out for dinner, to do something usefull for a change.
    KH

  • You must agree that he believes in some aspects of object methodology though, i.e., inherited attributes!
  • by andres32a ( 448314 ) on Thursday July 12, 2001 @12:53PM (#88636) Homepage
    When Venter and Collins announced some months ago their "success" at sequencing the human genome and that according to their conclusions humans did not have much more genes that worms this did seem kind of odd to me. If this new paper published by "Genome Biology" is right it could ridicule all past celebration... i mean... if it turns out that Venter only mapped 30 000 genes but there were actually 60 000 genes in humans, wouldnt that mean that were not even half way to a complete gene map for humans??
  • You see, that's just the thing. A few months ago we were told there is no God because we didn't have many more genes than most animals. What does this finding signify?

    So, yes, nothing is remotely firm, yet. How many textbooks has this "fact" made its way into while the truth awaited to be discovered?

    In the meantime, it looks like a good PR person at Ohio State managed to make their findings seem more revolutionary than they are. Many discoveries supporting evolution have been that way, I've noticed.

  • Who the hell said that?

    Slashdot did [slashdot.org]. Actually they were quoting some guy on MSNBC [msnbc.com], I think. That's what I thought they meant when they said, "the estimate made a few months ago."

    How could the number of genes in a human have any relation to religion?

    You tell me. :)

    What "fact" are you talking about?

    By "fact" I meant the older, now purportedly inaccurate estimate. The post I was responding to said nothing was firm yet, so we shouldn't necessarily accept either estimate. My thought was, I wonder how many textbooks were printed during the time between the first estimate and the second.

    Or are you disputing the whole theory of how genetic makeup relates to biology?

    No. I think perhaps I wasn't clear, or you read a little too much into what I was saying, or you didn't quite have all the context available that I had in my mind (thinking of the slashdot story a few months ago).

  • by PYves ( 449297 ) on Thursday July 12, 2001 @12:50PM (#88639)
    Moreover, most genes are split up into segments, known as exons, that are separated by long stretches of meaningless drivel.

    Kind of like the comments on slashdot!

    I suppose you could make it into a Katz joke too.
    -PYves

  • We're not even half way to a complete gene map. Honestly we only have an idea of a small amount of the genes in the human body and what they code for.

    You have to be very careful to distinguish the genome from a genetic map. The genome is a nucleotide series consisting of 4 bases (A,T,G&C) that takes into account everything that codes for each protein that makes up a person. It is made up of introns (which code for proteins) and exons (which are labelled as 'junk' DNA, but are really sequences not coding for any protein.) There are abount 3 billion bases in the human DNA sequence.

    A genetic map however, is a map of all of the genes coded for in each chromosome. What scientists are able to do is to figure out where the protein is coded for in the gene and add it to a 'genetic map'.

    Venter and Collins announced they had produced a draft of the genome, nobody would even come close to a draft of the genetic map many tens of years yet.

  • We tend to forget that Science is about measures, ie How do things work based on what our current knowledge is. Unfortunetly, we forget this and end up treating this like an absolute Law. China once closed it self off from the world, when it sent out explorers and found no other civilization to match its self. Several centuries latter the gunboats came.
  • Uhhm, you might be more bananas then you think. Its the arrangement that has a bit more weight then the quantities.
  • Is this an attempt to make the whole genome publishing interesting again? The fact that there aren't so many genes (even is they're twice as many) clearly shows the important thing is what and how they do the voodoo that makes us. We used to be 99% chimpanzee and at least 40% bananas. I'm disapointed to find out I'm less bananas than I thought I was. It was always a good excuse...
  • 1) You're wrong on that one. For most of the genens, we do not even know what protein it encodes for. It can be calculated, but it takes several hours on the Grey supercomputer, and the calculation is usually wrong, because it makes certain assumptions.

    2) It's very difficult to find the actual offsets of genes, that's why that approach won't work.

    3) That is not true. Of course you are right, not every cell uses all genes, so there's always a large part of genens that isn't used in a cell, but that is not what is meant with the term 'junk DNA'. Junk DNA is dna that isn't part of a gene. Scientist still aren't sure if it's really junk, or does have some function (I personnaly dislike the term 'junk'), but it doesn't code for a gene. Current research suggests in humans about 96% of the DNA is junk DNA. In other life forms, this figure is very different. There are animals without any junk DNA.

  • by flez ( 463418 )
    So I guess we really are quite a bit more complex than worms.
    Let's just forgo all the Lawyer/Management/NT Admin jokes, shall we?
  • Scientists still need to figure out what each gene does and how they interact before they can do something useful with the human genome.
  • One would hope that we don't hear the same article posted in 6 months, lest a Moore's Rule of Genetics come into play.

    Kudos, Scientists, for doubling your workload.

    (I personally would say more but I didn't take Genetics due to the 8am lab.)

  • Maybe the Human race has evolved since the first count was made?
  • by Anonymous DWord ( 466154 ) on Thursday July 12, 2001 @12:50PM (#88649) Homepage
    Personally I don't know any human Genes, although my friend had a hamster named Gene. It died, unfortunately. You'd think it would be harder to count the non-human Genes, but I guess that's far in the future. Didn't they count the mouse Genes a while back though?
  • Not necessarily. If you look at genes and proteins in a slightly different light - as the functions that the genetic program can work with, then knowning how many there are and where they're located can give you some idea of what the rest of the program is working with. And since, at least for now, we can't just search for function call equivalents, counting the functions is our best alternative.
  • I agree with everything you've said, but I'd still like to put a bit more muscle behind the 'DNA as code' analogy. Ignoring the role that the ambient cellular environment plays would be foolhardy, true. But so would ignoring the role that hardware plays in the interpretation of software. The thing that makes the DNA the interesting part is that cells are the same from organism to organism, whereas DNA varies. Similarly with hardware and software.

    It feels to me like your criticism could be leveled at a desire to understand a program by looking at the source code. "the actual branching, jumping and iteration happens in the ALU, not in the code, so the code is the wrong place to look," and so on.
  • by eraserbones ( 467297 ) on Thursday July 12, 2001 @01:11PM (#88652)
    Genes, I'll grant you, are the exciting bits of a chromosome, because they (generally) correspond to proteins that can be identified and detected. But, I'm not entirely clear on why they are the primary focus of genetic research.

    We all seem pretty comfortable discussing DNA as though it were computer code, so let's follow that metaphor a little further. If I point at a big mess of C code (say, a console app) and ask 'What does this code do?' an amateur might be tempted to scan it for printfs, puts's, and other 'output signifiers.' But really, if that's all you look at, you don't have a clue what the actual funtion of the code is. All those boring scanf's, if/thens and operators are really important.

    My rudimentary education in genetics has me convinced that DNA in a living cell has the ability (like C code) to switch, jump, branch, and (most importantly) operate recursively on its own resultant proteins. And yet, the DNA spans between genes are generally referred to as 'useless' or, in this case, 'meaningless drivel.' Am I missing something, or is this exactly where the good stuff is?

    And, viewed from this angle, isn't counting genes as pointless as counting KLOCs?

The sooner all the animals are extinct, the sooner we'll find their money. - Ed Bluestone

Working...