A Genome Mark-up Language

Follow Slashdot stories on Twitter

A Genome Mark-up Language 84

Posted by Hemos on Friday January 12, 2001 @01:53AM from the it's-better-then-the-web! dept.

There's an interesting story running about the need/development of genetic mark-up language. It's called GEML - Gene Expression Mark-up Language and is basically a DTD^[?]. Obviously, with working with things like genes, GEML is useful - and a good example of why DTD is muy bein.

This discussion has been archived. No new comments can be posted.

A Genome Mark-up Language

Load All Comments

Search 84 Comments Log In/Create an Account

Comments Filter:

Re:CellML (Score:1)

by Anonymous Coward writes:

They are not for text fomratting. HTML was _misused_ for text formatting - the original idea was you tell the computer about what he thing _is_ and the _computer_ formats it. Like a simplified LaTeX (if you didn't know, in LaTeX, you basically say "I'm writing a book now", "This is the first chapter", "This is a footnote", etc., and the computer decides the formatting. That is to say, where to put everything. Or are you confused as to what formatting means?). Then XML came along. It's main use is telling the computer in an extensible manner, the meaning behind a piece of data - e.g. "This is a spec for a fireplace", "This is an address", "This is how you make a fruit fly". The computer can take appropriate action based on that, then (currently via XSLT and CSS to format + present the data appropriately) - but it can do other things too. It's a way of processing semantic content. It's a step towards AI.
Pathetic research by the author. (Score:2)

by Anonymous Coward writes:

Mark Pesce ought to spend more time researching what he's writing about rather than plugging VRML. From the article:

The "reporter" tag defines a sequence of codons (the four amino acids that comprise DNA) -- TACAGTGTCAGAATTAACTGTAGTC --

Elementary Grade 9 biology here, Mark. A codon is a sequence of three nucleotides (ex: GCC) that are in turn expressed into the 20 amino acids that constitute the building blocks of all our proteins. Don't just regurgitate what was in the press release!

Anyway, GEML is useless for real exchange and analysis of genetic information. For that purpose, I agree with a previous poster about packing 2 nucleotides per byte. It's an optimization that must be accepted as a standard before we can start doing on-demand heavy processing of genetic results.
Re:It's a closed standard. (Score:2)

by Tim ( 686 ) writes:

"I would agree that bioxml servers as a much better licensing model for the community than GEML, its worth mentioning that at the current time they do not compete. GEML appears to be about gene expression, and bioxml has no DTD's addressing this."

True. I do think that bioxml's goal is the same as GEML, but they're just not as far along as GEML (yet). It's just bothersome to me that a company-owned and controlled format like GEML could become very prevalent. I would still much rather see something like bioxml succeed instead. I hope they don't give up because of this...
Re:DTDs shouldn't be forked - thats the point (Score:3)

by Tim ( 686 ) writes: <timr AT alumni DOT washington DOT edu> on Friday January 12, 2001 @10:56AM (#512253) Homepage

"there's absolutely no value in forking a DTD. Unless you think there was maybe some value in all of the "modifications" Netscape and Microsoft made to the HTML DTD, for a simple example - its the same in this case."

Apples and Oranges.

HTML is controlled by the w3c--a standards body more or less independent of any particular company. Sure, M$ and Netscape had a lot of pull on HTML, but they *should* have, given that they *were* the browser market for a long time.

In this case, we have a particular bioinformatics company graciously offering up their own "public domain" DTD as a standard for the rest of the industry (how generous). And a major scientific journal latching on to it. The only problem is, that same bioinformatics company must approve any and all changes to the "standard"! It would be the same if HTML were a copyrighted property of Netscape, Inc.

It would be nice if the bioinformatics community could organize and form it's own XML standards body, a la the w3c. An agreed-upon standard is almost always better than a legislated standard.

Share
twitter facebook
Not the first... (Score:5)

by Tim ( 686 ) writes: <timr AT alumni DOT washington DOT edu> on Thursday January 11, 2001 @10:34PM (#512254) Homepage

The bioxml [bioxml.org] project has been trying to do this very thing for quite a while now. Previous to that, there was the biomolecular sequence markup language (BSML), and I don't think it ever came close to becoming a standard. The problem that these efforts always run into is the sheer diversity of opinion on how biological data should be represented. Molecular biologists and computational biologists can't even agree on the basic things, like how to represent sequence regions, let alone more complex issues, like annotation syntax.

Why Nature chose GEML as a standard is unclear--the article doesn't present a compelling argument for it over the alternatives, and the choice seems a little arbitrary. It'll be interesting to see what impact this has on the other projects, and how open the standard will be to extension and modification.

Share
twitter facebook
It's a closed standard. (Score:5)

by Tim ( 686 ) writes: <timr AT alumni DOT washington DOT edu> on Thursday January 11, 2001 @11:37PM (#512255) Homepage

From the GEML terms of use [geml.org]:

The GEML Format is a free, public-domain, open standard created and licensed by Rosetta Inpharmatics, Inc. ("Rosetta") in order to define a single, distinct format for handling gene expression data and avoid proliferation of incompatible variations.
...
You may not modify, lease, loan, sell, charge for, or create derivative works of the GEML Format or documentation without written permission from Rosetta.

So nobody can fork the standard without first consulting with Rosetta Inpharmatics. Wonderful. I just love their definition of "open standard."

This looks like another corporate-buddy move by a major scientific journal, much like the Science/Celera deal a few weeks back...

Go see bioxml [bioxml.org] for a truly open alternative.

Share
twitter facebook
Miguel's language (Score:1)

by Pseudonymus Bosch ( 3479 ) writes:

todos podemos hablar con Miguel de Icaza en su propio lenguaje.

You mean object-oriented C?
__
Flamebait (Score:1)

by Pseudonymus Bosch ( 3479 ) writes:

los Estados Unidos de América, cuyo lenguaje oficial es el inglés

Really? Does it appear in the US Constitution?

And isn't Linux and Perl Slashdot-official? Should we limit ourselves to discut these?
__
Mejor (Score:2)

by Pseudonymus Bosch ( 3479 ) writes:

You could also inform that the proper Spanish phrase is "muy bien".
__
muy bein? (Score:1)

by Kenyon ( 4231 ) writes:

WTF is "muy bein"? Haha.

--
From the it's-bad-grammar-time! dept. (Score:1)

by Mr Z ( 6791 ) writes:

Hemos, when comparing things, use than, not then. For instance, this article should've been from the "it's-better-than-the-web!" dept. The word then is used to describe a time sequence or other ordering, as in "first this, then that." The word than is used to compare things, as in "this is better than that." Got it?

*sigh*
--Joe
--
[OT] Grammar (Score:1)

by Mr Z ( 6791 ) writes:

That reminds me of this grammar puzzler. Add punctuation to the following to make it grammatically correct:

JIM HAD HAD HAD WHILE JOHN HAD HAD HAD HAD HAD HAD HAD HAD A BETTER EFFECT THAN HAD HAD HAD

--Joe
--
Re:From the it's-bad-grammar-time! dept. (Score:1)

by Mr Z ( 6791 ) writes:

Uh, no. Bad grammar bothers me.
--Joe
--
lots of "exceptions" to the coding rules (Score:2)

by peter303 ( 12292 ) writes:

The genome is much like human language- a fair amount of regularity plus a lot of special cases. In fact the latter throws off decoding robots and you see statistics like 98% decoded, etc. The scientific papers are full of nifty exceptions to what was believed before. The markup language would have to be flexible enough to encode all the exceptions- perhaps as a procedural attachment.
GEML? Bah! Quadrary Encoding! (Score:3)

by weston ( 16146 ) writes: <westonsd@@@canncentral...org> on Thursday January 11, 2001 @09:19PM (#512264) Homepage

While all of this is fairly unreadable -- even by geneticists -- it is easily read by a computer

GEML? Hard to read? Bah! What we should *REALLY* do is figure out a quadrary (you know, after binary and trinary) encoding scheme for all the other info and just pre-pend it to the beginning of the amino acid sequence. Maybe even insert it in some points, with some sort of delimiting sequcne, of course. None of this wimpy markup language stuff.

--

Share
twitter facebook
Re:standards are important esp. for biologists (Score:2)

by Star_Gazer ( 25473 ) writes:

Unfortunatly, they often tend not to do that :(
At least life scientist do not.

Instead, the use (the much dreaded) Word and wonder why all their betas, gammas, indices etc. tend to always disappear in the wrong moment...

I once wrote an web application where people could submit an abstract for a congress on developmental neurobiology. I allowed for subsets of HTML or simplifed LaTeX for text formatting. It was hell - even the brightest people in their field failed to understand the concepts. I believe I spend more time searching texts for missing tags or closing braces than for anything else...
Re:exchange of genetic information [sorry =:-)] (Score:1)

by drenehtsral ( 29789 ) writes:

Hehheheh =:-) I prefer the _time honored_ method of exchanging genetic material =:-) [sorry, couldn't resist...]
Re:It's a closed standard. (Score:2)

by wowbagger ( 69688 ) writes:

The GEML Format is a free, public-domain[...]

You may not modify, lease, loan, sell, charge for, or create derivative works of the GEML[....]

IT seems somebody doesn't understand the legal meaning of "public domain": that anybody can modify what is in the public domain, without restriction. That is why free software and Open Source Software AREN'T "public domain"!
Re:S�! (Score:1)

by Tsujigiri ( 77400 ) writes:

Sí. Esto es verdad. Pero estoy estudiande español y como para practicar siempre que pueda.
Re:Muy Bein... wow (Score:1)

by Ristretto ( 79399 ) writes:

The previous poster suggests that the incorrect muy bein should be spelled muy bien. This is a correct spelling, but misses the grammatical error (hey, this is Slashdot!). bien is (generally) adverbial (meaning "well"), and since we're talking about a DTD, we want to use an adjective ("good"). In other words, the sentence should read "DTD is muy bueno."
What it looks like. (Score:1)

by TummyX ( 84871 ) writes:

<genes>
ttaacattgagctaacgataggatacgattacattgagctaacgatag ga
tacgattacattgagctaacgataggatacgattacattgagctaacg at
</genes>
Article ignored what is already used! (Score:5)

by upstateguy ( 90019 ) writes: on Friday January 12, 2001 @03:47AM (#512271)

As a molecular genteticist, I am familiar with the markup languages that *already* exist for annotating genome sequences. Free software from NCBI even helped you format your sequences for submission to databases.
Sorry, I'm too lasy to annotate this myself :-):
Link to NCBI [nih.gov]
FASTA looks remarkably like the example given in the article.
Quicky description of FASTA (just one of many schemes but one of the most popular and oldest. [cornell.edu]
Perhaps rather than writing a trendy article trying to get buzzwords like genomics and bioinformatics together with geek speak, he should have done a tad more research.
Not to say there can't be huge improvements and trying to show the interplay (temporally AND physically) between genes. But don't do a half-assed job by ignoring what has already been used for decades.

Share
twitter facebook
Abbreviations... (Score:1)

by ari_j ( 90255 ) writes:

GEML just sounds better to the kind of people who would be in charge of this kind of thing. bioxml has no capital letters, is half-pronounceable and half-gotta-be-spelled-out, etc. GEML is all capital letters, can be spelled out or pronounced as a whole, etc. I think that why they chose GEML as a standard is far from unclear; rational is another matter.
Re:Abbreviations... (Score:1)

by ari_j ( 90255 ) writes:

But those capital letters! And just because most Slashdotters can pronounce every imaginable acronym smoothly doesn't mean anything for other communities...Imagine being an average Joe and driving down the street, seeing a big sign with GEML on it and another one that just says 'bioxml'. I'd think...wow, those idiots don't even have the self-esteem to capitalize their own name, whereas these other guys...wow, they must have a great product if they can handle four capital letters.
RDF hasn't woken up yet. (Score:1)

by dingbat_hp ( 98241 ) writes:

I don't see how a dead, unused (sorry, never was used, ever) standard like RDF is going to help.
Admittedly RDF hasn't been used much YET. After all - it's only a year since bog-standard XML took off. I'm a contractor; Dec '99 I couldn't sell XML skills to anyone, Jan 2000 my phone melted. By Easter 2000 everyone else was an XML "guru".
Wrox don't shift their first RDF book until October. You can't store production-grade quantities of RDF in a database yet. How can you say it's "past", when we haven't even finished building the infrastructure tools yet ?
OTOH, the one widely distributed RDF app that is out there (RSS) is even part of Slash. Take a look at those Slashboxes - they aren't running DocBook.
Added to which you can employ namespaces to form compound documents from many schemas,
That's just a quicker recipe for tag soup. The ability to have five different ways to express an author's address doesn't make it any easier to move data between applications or avoid "Dear Mr. Occupier" errors.
"It's the Semantics, Stupid"
Look at DocBook, as an example - people have been able to use it for years without concern that the next revision would destroy their document semantics.
What document semantics ? DocBook doesn't do semantics, and it has a structure that thinks everything is a computer manual. A schema that has a <GUIMenuItem> element, but doesn't have a means of expressing a target readership age ? Rights management that's a bare copyright element with an implied recommendation to attach generated text of "All Rights Reserved" when you render it ? (What if the rights _aren't_ all being reserved ?)
DocBook is a pile of bodges and hacks, and I only use it because I don't know anything else that's out there, and I'm reluctant to roll my own and add another one to the pile.
DocBook is Perl for text documents; lot's of "There's More Than One Way To Do It", and not a lot of "Done. Sorted.".
My current project (the next version of ARKive [arkive.org.uk]) is a huge graph of linked nodes, most of which are either text or rich-media. The directed nature of the graph blows plain XML out of the water - there's just no way to handle the referencing problem in XML; you're either fooling around with the inadequate ID & IDREF, or you do it through either XLink, or your own href attributes and lose support for any notion of document structure based on these links, unless you code it yourself at the application level. With RDF, I just talk to an API like Jena and when I make things related, they stay related (and the underlying engine will hand them back to me on demand, as whatever relevant fragment of the document I might need).
I am using DocBook to represent the text content nodes. It's not much more advanced than HTML though - I need a huge amount of markup on each node to select the appropriate set (what it refers to, what it says about it, whether it's written for 7 or 17 year olds) and I hold this trivially in RDF, with DocBook under a content property.
There's simply no way I could express this in DocBook alone. I could express it in DocBook with embedded LOM markup, and I could do that very easily just by namespacing two schemas as you suggest. Ther trouble with that approach though is that the only code that could ever make sense of it would be my own. With RDF, any RDF app (like the Redland app framework) can wander through it and make a pretty good use of it, even if it hasn't seen the documents before.
XML has no mechanism for a semantic schema. Attempting to use the structural schema it does have, as one, doesn't work well and it certainly doesn't travel.
Re:RDF hasn't woken up yet. (Score:1)

by dingbat_hp ( 98241 ) writes:

I see your point, but semantics are never enforceable anyway.
Who cares ? If you're publishing the latest fat stock prices, then it's in the user's interests to get it right. Semantic publishing needs a reliable means of making them available to those who want them, it doesn't need to follow them up and enforce getting it right.
you haven't told me how RDF gets around this
Take a look at RDF Schema [w3.org].
Of course, semantics aren't enough on their own. It's not too useful to know where the "creator" value is in two schemas, if you can't distinguish between one's "author" and the other's "translator". This is where an ontological understanding is needed, and there's a couple of projects out there working on that too; DAML [daml.org] & OIL [ontoknowledge.org].
Re:No tool support, yet (Score:1)

by dingbat_hp ( 98241 ) writes:

DTDs are going to be required for defining new XML grammars
Rubbish. I haven't written a DTD in over 18 months. Tool support is better than DTD, mainly because Schemas also use XML as their expression syntax and so it's trivial to build tools (often with XSLT) for them.
Schemas are still brand new, and tool support is weak to nonexistant.
Schema has been a Candidate Recommendation since October. Maybe it's not signed off yet, but it's pretty stable and usable out in the "real world".
I thank M$oft for this one. Dropping early versions of XSL and Schema onto developers a long time ago put a rocket under the W3C. This might have ended badly, except M$oft then did something unusual for them and fell back into line with a developing standard. Credit where credit's due...
XML considered harmful (Score:2)

by dingbat_hp ( 98241 ) writes:

This is another example of What's Wrong With XML (and particularly, what's wrong with proliferating schemas all over the place).
A schema isn't a means of publishing your data to a wider audience, it's a means of locking-out everyone who doesn't have a copy of it.
Look at real user of RDF [purl.org] for how to do this in a better way. XML is great, but the coupling between structure and semantics that comes from using an XML schema to represent both is a nightmare for interworking between teams that overlap, but aren't identical enough to use exactly the same schema.
A couple of years ago, we watched a bunch of old guys slaving over COBOL legacy conversion programs, desperately trying to suck the data out and into SQL, before Cinderella's glass computer turned back into the Y2K pumpkin. I don't want my future to turn into the same thing, scratching together n^2 XSL transforms to convert fooML into foo'ML.
yeeesh (Score:1)

by jrg ( 98378 ) writes:

a quote from the article:

"The 'reporter' tag defines a sequence of codons (the four amino acids that comprise DNA)"

sheeesh! can't they even get the basics right? a codon is a unit of three nucleotides that encode a single amino acid (there are three out of the 64 that do not code for an animo acid, rather, they code for the translation stop signals).

four nucleotides comprise DNA. there are 20 amino acids.
this type of error is shameful.
james
Re:why not? (Score:1)

by jrg ( 98378 ) writes:

"Any [sic] just why not is this amazing?"
because it's XML.
james
Re:Human Markup Language (Score:2)

by Shimbo ( 100005 ) writes:

It will have the additional benefit that you could do gene therapy by applying an XSL stylesheet in the transporter.
Oh dear, this is beginning to sound like a Voyager plot.
Re:GEML? Bah! Quadrary Encoding! (Score:1)

by Tom7 ( 102298 ) writes:

I think you mean "quaternary"?

What would be REALLY cool would be code for a protein which decompressed the rest of the stream.... ;)
Re:DTDs shouldn't be forked - thats the point (Score:1)

by rodentia ( 102779 ) writes:

There are lots of ways to extend and modify the behavior of an XML dialect and an associated DTD/schema without touching the core standard. That's the Xtensible part. They are merely holding veto power over back-propagation of enhancements into the original work.

The point of XML is to standardize the manner of extension. Even SGML allowed for internal subsets of markup declaration to extend the core DTD. The goal of such a standard is not to eliminate incompatibility but to minimize the pain of dealing with it.

Forking a DTD is like forking pudding, it doesn't do anything.
I forked up. (Score:1)

by rodentia ( 102779 ) writes:

Their license would appear to prohibit that which their chosen technology is intended to facilitate.
Very silly question, couldn't help, sorry (Score:1)

by bockman ( 104837 ) writes:

If GCC is a codon, what is bash ?
[Assume mandatory smiley here]
standards are important esp. for biologists (Score:5)

by myc ( 105406 ) writes: on Thursday January 11, 2001 @09:11PM (#512285)

since classical genetics has been around for a lot longer than computers and ascii, many classical genetic nomenclature use nortoriously asii-unfriendly symbols. For instance, as many of you know, Drosophila (fruit fly) geneticists can basically name genes anything they want to, and nomenclature to denote specific mutant alleles of genes use all sorts of evil things like subscripts, superscripts, Greek letters, etc etc. In short, it's just a total mess. Similarly, although yeast geneticists do have a standardized nomenclature, it's very ascii-unfriendly, due to things like Greek letters, superscripts, subscripts, etc etc. Nomenclature for mammalian systems such as mouse and humans is even worse, there is basically no standard. for instance some gene names use all CAPS while others only capitalize the first letter, and some use the common three-letter convention plus a number (BMP1, BMP2, BMP3, etc etc), while others use a Drosophila-type naming scheme (e.g., agouti, shaker are mouse mutant names)(there is some uniformity that is given to gene assignments in large sequencing projects, but those are just an alphanumeric sequence, it's not very descriptive).
Constrast this with a relatively more recent model genetic organism, the roundworm Caenorhabditis elegans. Standards were set early whereby all gene names were standardized by basis of their phenotype (eat-4 is a worm with a mutant feeding behavior, unc-6 describes a worm with uncoordinated movement, lin-41 describes a mutant with mutant cell development lineage, etc etc), and is ascii-friendly. As a result, C. elegans people enjoyed standardized and searchable computerized gene databases for much longer than other geneticists in other fields.
I hope that a standard becomes set and rapidly adapted; lab chiefs (to us grad student peons anyway) can often seem like PHB's in IT when it comes to adapting new methods and paradigms.

Share
twitter facebook
Re:Hmmm.... (Score:1)

by RoninM ( 105723 ) writes:

There's a bigger syntax error. The tag doesn't have any closure:

<body eyes="#00FF00" hair="#4F1F5F" height="74in" weight="175lb" crotchproperties="endowed"/>

It's probably not useful to express hair color as full RGB values, though.

You were being serious, right? Oh.
Re:No tool support, yet (Score:1)

by RoninM ( 105723 ) writes:

XML Schemas have the benefit of being written in XML. That should make XML Schema support fairly easy to manage. Of course, the parser has never been the hard part with XML.
Re:Abbreviations... (Score:1)

by RoninM ( 105723 ) writes:

I pronounce bioxml as by-ox-mil. The more I try to pronounce GEML as a word, the more it sounds like GML.
Always something for Perl to do next (Score:1)

by skazatmebaby ( 110364 ) writes:

Can't wait for Lincoln Stein's GEML.pm module, with handy shortcuts and image creating functions :)

I bet I can make a script that then creates a life form. aww yeah.
CellML (Score:2)

by alexburke ( 119254 ) writes:

From the Feed article [feedmag.com]:

GEML ISN'T alone. It has a competitor, another DTD known as CellML, used to define the complex interactions that take place within cells. CellML takes an integrated approach to describing all of the processes within a living cell -- its genes, proteins, enzymes, and chemical reactions, the pathways and connections between each part of the whole. CellML seems well suited to the kinds of work that supercomputers do -- creating simulations of incredibly complex systems -- while GEML only defines the genetics that create the cell.

Doesn't this seem a more apt way of describing a living organism? Sure, it's undoubtedly more complex and expensive (financially and computationally), but if you were to set an E10000 or Cray (or maybe a high-end Sun farm) to work on CellML, wouldn't it do more in less time than having to work everything out manually with GEML?

--
Re:Pathetic research by the author. (Score:1)

by spiro_killglance ( 121572 ) writes:

Yes, but only in binary transmitted files, text files would be uneditable because of all the control characters. But 3 nucleotides would be easy packed into base64 encoding, each one would matches an amino acid.
XSD? (Score:1)

by Scrymarch ( 124063 ) writes:

I'm not fully up on the XML scene, but aren't DTDs being replaced in the very near future by XSDs (XML Schema Definitions)? They at least are a dialect of XML, so to use XML you only have to learn one (easy) language.
Human Markup Language (Score:1)

by roman_mir ( 125474 ) writes:

over a year ago I described a human with XML tags for fun, something like this:
&ltXML&gt
&ltHUMAN GENDER="m/f"&gt
&ltHEAD&gt
&ltBRAIN&gt&lt/BRAIN&gt
&lt/HEAD&gt
&ltBODY&gt&lt/BODY&gt
&ltLEGS&gt&lt/LEGS&gt
&lt/HUMAN&gt
&lt/XML&gt

etc etc etc, maybe at some point null transportation technology will describe a human completely with his genetics, memory and personality with XML, and transport the person as energy over wireless media to put it all together at the other end.

Hopefully fast XSLT engines will exist by then and hopefully the whole thing will not be based on MS implementation of XML document.
Oh, but why? (Score:1)

by Red Pointy Tail ( 127601 ) writes:

Answer: Just in case we ever need to view our genome sequence on IE

And if the human genome has about 3 gig wouldn't wrapping quaint bits of information blow it up by quite a bit? sorry but the idea seems to rank on the same idiocy level as XML :)
Re:CellML (Score:1)

by mpesce ( 146930 ) writes:

Yes, CellML is a very nice way to describe all the processes within a cell. Also very important. But my understanding is that even with immense supercomputers (today) it still takes significant time to calculate something as banal and commonplace as protein folding. So CellML has its place (say for the AIDSResearch@Home project, in some future incarnation) but it's a bit too much information for geneticists. They'll both have their place. I should hope...
Re:Not the first... (Score:1)

by mpesce ( 146930 ) writes:

That's a good question. I don't know why Nature adopted GEML - it may be a case of a "tipping point" where enough geneticists and genomics firms finally realized that there was a need for a standard and some "cheerleader" got out there and started waving the GEML flag. If anyone knows the whys of this, I'd be interested to hear...
My mistake (Score:1)

by mpesce ( 146930 ) writes:

Yes, you're right, a codon is a 3 amino acid sequence. I should have used the words "base pairs" there instead.
Mea culpa.
And a closed standard ain't a bad thing... (Score:2)

by mpesce ( 146930 ) writes:

That's not a bad thing. Standards should not be arbitrarily pulled apart - particularly by competing commercial organizations (reference my XML article on FEED from a few years ago for points on this matter). The VRML97 ISO spec is "owned" by the Web3D consortium, in fact to make spec changes basically "illegal". Whatever that means.
Re:muy bein? (Score:1)

by slashdoter ( 151641 ) writes:

very well, it's spansh. they use it to say good or very good

________
DTDs shouldn't be forked - thats the point (Score:2)

by Ars-Fartsica ( 166957 ) writes:

Open standard or not - there's absolutely no value in forking a DTD. Unless you think there was maybe some value in all of the "modifications" Netscape and Microsoft made to the HTML DTD, for a simple example - its the same in this case.
No tool support, yet (Score:2)

by Ars-Fartsica ( 166957 ) writes:

For the time being, DTDs are going to be required for defining new XML grammars - Schemas are still brand new, and tool support is weak to nonexistant.
DTDs will probably stick around in one form or another for the next few years - its unfortunate that Schemas couldn't have been part of XML 1.0 - unfortunately the co-existance of DTDs and Schemas will cause code bloat as tools will basically need to support both.
Wake up, RDF is dead (Score:2)

by Ars-Fartsica ( 166957 ) writes:

A schema isn't a means of publishing your data to a wider audience, it's a means of locking-out everyone who doesn't have a copy of it.
Are you telling me that someone who doesn't have my data doesn't have it? Your astounding conclusion seems to be some sort of convoluted identity function.
Look at real user of RDF for how to do this in a better way. XML is great, but the coupling between structure and semantics that comes from using an XML schema to represent both is a nightmare for interworking between teams that overlap, but aren't identical enough to use exactly the same schema.
No one is doubting that poorly implemented schemas will degrade productivity, but I don't see how a dead, unused (sorry, never was used, ever) standard like RDF is going to help. Added to which you can employ namespaces to form compound documents from many schemas, so your limitation doesn't exist in any case.
A couple of years ago, we watched a bunch of old guys slaving over COBOL legacy conversion programs, desperately trying to suck the data out and into SQL, before Cinderella's glass computer turned back into the Y2K pumpkin. I don't want my future to turn into the same thing, scratching together n^2 XSL transforms to convert fooML into foo'ML.
You're vastly overestimating the dynamic nature of these schemas - this isn't the HTML DTD we're talking about. Look at DocBook, as an example - people have been able to use it for years without concern that the next revision would destroy their document semantics. Once again proof that a properly designed format weakens your counterarguments, and in any case, RDF isn't going to ever, EVER take off, so its probably time to quit flogging it.
Re:RDF hasn't woken up yet. (Score:2)

by Ars-Fartsica ( 166957 ) writes:

I see your point, but semantics are never enforceable anyway. At the end of the day, if people want to take your document and completely invert your semantics, they are going to do it.
Added to which, you haven't told me how RDF gets around this, or are you saying that the issue should be avoided altogether?
Error Checking the Human Genome? (Score:1)

by theflattman ( 170915 ) writes:

It's nice that the genome has been "sequenced in its entirety" and is presently undergoing "error checking" which should "continue for the next year".
Last time i checked at ncbi [nih.gov] the genome was at 30.4% finished. and the rough draft assembly is in 148307 pieces according to the golden path [ucsc.edu].
And of course the finished target for the human genome is three years from now!
Hello World (Score:1)

by ibirman ( 176167 ) writes:

So, how do you write Hello World (or its equivalent) in GEML?
HTML-like tags (Score:5)

by Fervent ( 178271 ) writes: on Thursday January 11, 2001 @09:10PM (#512306)

Insurance provider: Well Mr. Johnson, I'm afraid you have the <stupid person> tag.
Mr. Johnson: No!
Insurance provider: Yup. It's right between the <bald ugly-looking guy> tag and the <most likely to drink beer after finding out his wife gets fatter with age> tag.
Mr. Johnson: Oh God.
Insurance provider: I'm sorry.
Mr. Johnson: Is this hereditary? What can be done about my kids?
Insurance provider: Well, we can comment out the little buggers if we try. Some GScript may work to prevent them from passing the traits onto their children. Hell, we may even be able to use some Gava to touch up their faces so they won't be as ugly as you.
Mr. Johnson: And as for me?
Insurance provider: Your body is 2.0, Mr. Johnson. As far as we're concerned, noone supports you anymore.

Share
twitter facebook
incosequential reporting? (Score:1)

by metis ( 181789 ) writes:

The article mentions that the GEML fragment on display may be incomprehensible to even geneticists, but is readable by computers. It goes on saying that the value of the GEML is allowing computers to share data. I am confused, since XML is either offering a verbose definition of computer data that even humans can understand, or allowing human data ( David is an Employee of IBM) to be expressed in self-describing computer accessible form.

Since the genetic code is already digital, transforming in into something that computers can process seems rather pointless, what is wrong with AGTCTTCGADC? making it verbose for humans is also not very useful, because what the GEML seems to offer is very raw data, essentially a wrapper around raw sequences.

Maybe the issue is really hype. I.e. a clever gimmick to drive companies to share information by offering them bandwagons they can't refuse to climb?
Re:Hmmm.... (Score:1)

by stinkydog ( 191778 ) writes:

<GEML> <body eyes="#00FF00" hair="#4F1F5F" height="74in" weight="175lb" crotchproperties=endowed> </GEML> You have a syntax error in 'crotchproperties', 'crotchproperties' set to "0" Coming soon, MS GenomePage 2006, so you can really start screwing things up.
We are very closed to this. (Score:2)

by mentin ( 202456 ) writes:

We are really close to being able to modify human genome.
From CNN [cnn.com]: Genetically modified monkey - named ANDi carries in him an extra bit of DNA from a jellyfish. ANDi is the first primate to be similarly modified.
See CNN story [cnn.com] for full details.
Re:standards are important esp. for biologists (Score:1)

by Phillip2 ( 203612 ) writes:

The problem is with standards is that you can not just declare them. They have to be built with community agreement. At the moment there are "standards" in biological information. Lots and lots of them. Yesterday for instance I was struggling with sequence file formats. I can think of at least 15 different formats, all slightly different.
The scope of GEML seems quite limited. Its about gene expression data, which is currently very sexy. Its also been licensed in a fairly restrictive manner. Not the way to go if you asked me.
Phil
Re:It's a closed standard. (Score:1)

by Phillip2 ( 203612 ) writes:

Incidentally can anyone find any statement from Nature about this? I cant!
Phil
Re:Oh, but why? (Score:1)

by Phillip2 ( 203612 ) writes:

"sorry but the idea seems to rank on the same idiocy level as XML "
If you can not see the value of structuring data into a format which is easily parsable, and whose semantics are formally defined in a standard format, then I fear that your own idiocy level is fairly high.
XML is potentially about a lot more than viewing web pages.
GEML incidentally is pretty much useless for viewing genome sequences. Whilst this is no doubt mainly the fault of the bloke who wrote the article for getting it totally wrong, GEML is not designed to represent genomic information, but gene expression data. Two very different things.
Phil
Re:It's a closed standard. (Score:2)

by Phillip2 ( 203612 ) writes:

"Go see bioxml for a truly open alternative."
I would agree that bioxml servers as a much better licensing model for the community than GEML, its worth mentioning that at the current time they do not compete. GEML appears to be about gene expression, and bioxml has no DTD's addressing this.
As for nature, well I expect that there publishers are worried. Sooner or later paper journals are going to disappear. Perhaps they are diversifying, and have a stake in the company. This is not necessarily a problem. Even nature does not have the power to make a standard.
Phil
Re:Article ignored what is already used! (Score:2)

by Phillip2 ( 203612 ) writes:

The problem with most of the markup languages used in biology is that the are simple two letter at the begining of the line schemes. They tend to be very unexpressive as there are no relations between the tags (a line is one thing or another, and each line is independant of the last). The main problem with this unexpressivity is that it means "all the biology is in the comment field", or in other words unstructured free text. To extract this information out in a machine readable way, you get straight into natural (or as this is biology fairly unnatural) language parsing, and hit the same brick wall that AI has for the last 30 years.
I agree that the article linked to is half-assed, and badly researched. But the sad fact is that most of the database formats in existance also seem to be fairly half assed. I think that XML might help us to get around some of these problems.
Phil
Re:My mistake (Score:1)

by anichan ( 205614 ) writes:

Yes, you're right, a codon is a 3 amino acid sequence. I should have used the words "base pairs" there instead.
Actually, a codon is just what they had "TACAGTGTCAGAATTAACTGTAGTC". A tri-base codon is special and is called a "triplet codon". That is TAC, etc...
Re:num1 (Score:1)

by jobber-d ( 225767 ) writes:

pure luck my friend :P id like to give props to pieceofshit and anyone else who knows me
Re:Pathetic research by the author. (Score:1)

by John Sullivan ( 234934 ) writes:

For that purpose, I agree with a previous poster about packing 2 nucleotides per byte. It's an optimization that must be accepted as a standard before we can start doing on-demand heavy processing of genetic results.

There being four possible nucleotides (unless you're looking at something real exotic) surely you can get 4 per byte? Sticking to a base64 ascii encoding you can still get 3 nucleotides, so a single codon, per character, which is possibly a more elegant optimization.
Anyway, this shouldn't be necessary and goes against the XML philosophy. Although humans on the whole aren't meant to read XML directly, computers should be doing that, it should always remain *possibly* to do so, and I think this would muddy the human-eye view somewhat. It is accepted (by the people setting the standards) that this results in a larger raw stream, but that the correct way of dealing with that is to layer XML over storage-level and transport-level compression schemes to recover some of the entropy wastage. See REC-xml [w3.org], section 1.1, and points 3 and 5 of XML in 10 points [w3.org].

Heavy processing won't be done directly on markup - it'll be done on the in-memory representation after the markup is loaded, which can be assumed to be more compact than the markup if required (or less compact if there is a neat time/space tradeoff in the processing.)
Re:num1 (Score:1)

by lazy_playboy ( 236084 ) writes:

that is pathetic
Hmmm.... (Score:2)

by Calle Ballz ( 238584 ) writes:

<GEML> <body eyes="#00FF00" hair="#4F1F5F" height="74in" weight="175lb" crotchproperties=endowed> </GEML>
Re:Oh, but why? (Score:1)

by DoctorPraetorious ( 263178 ) writes:

Yeah, the human genome is several gigs; but the vast majority of it isn't "coding." If you were going to present the whole human genome (as opposed to, more realistically, a short sequence of particular interest to your research) you'd be able to convey a LOT more information by presenting the 1% of it that codes for amino acids, along with markups to provide links to crystal structures of the proteins, little sub-charts showing the frequency of medically relevant site specific substitutions (recall, there isn't _a_ human genome, there are many different ones) and so on and so forth. Yeah, it might blow up past the size of the raw genome, but it would contain actually useful information.
That said I can't think of any features you could want in such a language you can't do with just old html. Shrug.

UCSC Molecular Biology
Re:Always something for Perl to do next (Score:1)

by airuck ( 300354 ) writes:

There are already great perl tools for bioinformatics, check out www.bioperl.org
My life is complete (Score:1)

by booser108 ( 302999 ) writes:

I have my Gene Expression Mark-up Language, my HTML, and XML. I can express any form of text in the world. I can not die happy.
One word (Score:1)

by fortunetroll ( 303786 ) writes:

GeNeTeX

On Monday mornings I am dedicated to the proposition that all men are created jerks. -- H. Allen Smith, "Let the Crabgrass Grow"
Re:Goes to show... (Score:1)

by fortunetroll ( 303786 ) writes:

Yeah and CaML. Wonder what sort of genes that has...

On Monday mornings I am dedicated to the proposition that all men are created jerks. -- H. Allen Smith, "Let the Crabgrass Grow"
Re:muy bein? (Score:1)

by fortunetroll ( 303786 ) writes:

The Span shell.

But then again, WTF is "muy bein"?

On Monday mornings I am dedicated to the proposition that all men are created jerks. -- H. Allen Smith, "Let the Crabgrass Grow"
Re:GEML? Bah! Quadrary Encoding! (Score:1)

by fortunetroll ( 303786 ) writes:

I think we should encode HTML documents into the
genes of living people so that they can go
travelling and when they get to the destination
they can be re-assembled with all the other
people-fragments and viewed with some sort of
CAT-scan-browser. HTTP-over-Humans.
And if you want better bandwidth, you can sign up
for Broad-band: Only Females will be used to
transport the data.

On Monday mornings I am dedicated to the proposition that all men are created jerks. -- H. Allen Smith, "Let the Crabgrass Grow"
Re:EMCAScript and backwards compatibility - with D (Score:1)

by fortunetroll ( 303786 ) writes:

What kind of nonsense is this? Everyone knows that we will use the safe, reliable Microsoft standard GEML to encode our genes that safely and reliably allow us to live. We wouldn't have it any other way!

On Monday mornings I am dedicated to the proposition that all men are created jerks. -- H. Allen Smith, "Let the Crabgrass Grow"
Re:CellML (Score:1)

by fortunetroll ( 303786 ) writes:

Actually it makes me wonder: Instead of literate programming, are we getting programmable literature? I mean come on, why are these Markup Languages being used so damn widely in so many damn silly ways. They're for Text Formatting, not for making complex simulations! Next we'll be seeing editorial-based programming instead of functional or object-based. People!

On Monday mornings I am dedicated to the proposition that all men are created jerks. -- H. Allen Smith, "Let the Crabgrass Grow"
Re:HTML-like tags (Score:2)

by fortunetroll ( 303786 ) writes:

So long as my child doesn't turn into a Javascript popup window.

And then there's the parallel between reproduction of the species and that damn close-browser-window-makes-more-windows-popup trick that some sites pull on you. And I don't mean the fact that its usually a porn site that does it.

On Monday mornings I am dedicated to the proposition that all men are created jerks. -- H. Allen Smith, "Let the Crabgrass Grow"
Re:standards are important esp. for biologists (Score:2)

by fortunetroll ( 303786 ) writes:

This is why scientists write documents in LaTeX, not ASCII.

On Monday mornings I am dedicated to the proposition that all men are created jerks. -- H. Allen Smith, "Let the Crabgrass Grow"
Goes to show... (Score:1)

by bummerdude ( 304218 ) writes:

Well...
Muy Bein... wow (Score:2)

by tlipcon ( 304220 ) writes:

Yet another slashdot spelling mistake... If you're going to try to be witty and use other languages to try to increase people's perception of your intelligence or chic-ness, at least do it right. And this is a first post- MY first post, not the story's first post...

--
Re:No tool support, yet (Score:1)

by methylamine ( 304683 ) writes:

MSXML3.DLL supports both old-style and new-style schemas; ditto with XSL (the "/1999" and "/TR" versions). Sucks that tools will still support DTD's; I hate writing them.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Re:CellML (Score:1)

Pathetic research by the author. (Score:2)

Re:It's a closed standard. (Score:2)

Re:DTDs shouldn't be forked - thats the point (Score:3)

Not the first... (Score:5)

It's a closed standard. (Score:5)

Miguel's language (Score:1)

Flamebait (Score:1)

Mejor (Score:2)

muy bein? (Score:1)

From the it's-bad-grammar-time! dept. (Score:1)

[OT] Grammar (Score:1)

Re:From the it's-bad-grammar-time! dept. (Score:1)

lots of "exceptions" to the coding rules (Score:2)

GEML? Bah! Quadrary Encoding! (Score:3)

Re:standards are important esp. for biologists (Score:2)

Re:exchange of genetic information [sorry =:-)] (Score:1)

Re:It's a closed standard. (Score:2)

Re:S�! (Score:1)

Re:Muy Bein... wow (Score:1)

What it looks like. (Score:1)

Article ignored what is already used! (Score:5)

Abbreviations... (Score:1)

Re:Abbreviations... (Score:1)

RDF hasn't woken up yet. (Score:1)

Re:RDF hasn't woken up yet. (Score:1)

Re:No tool support, yet (Score:1)

XML considered harmful (Score:2)

yeeesh (Score:1)

Re:why not? (Score:1)

Re:Human Markup Language (Score:2)

Re:GEML? Bah! Quadrary Encoding! (Score:1)

Re:DTDs shouldn't be forked - thats the point (Score:1)

I forked up. (Score:1)

Very silly question, couldn't help, sorry (Score:1)

standards are important esp. for biologists (Score:5)

Re:Hmmm.... (Score:1)

Re:No tool support, yet (Score:1)

Re:Abbreviations... (Score:1)

Always something for Perl to do next (Score:1)

CellML (Score:2)

Re:Pathetic research by the author. (Score:1)

XSD? (Score:1)

Human Markup Language (Score:1)

Oh, but why? (Score:1)

Re:CellML (Score:1)

Re:Not the first... (Score:1)

My mistake (Score:1)

And a closed standard ain't a bad thing... (Score:2)

Re:muy bein? (Score:1)

DTDs shouldn't be forked - thats the point (Score:2)

No tool support, yet (Score:2)

Wake up, RDF is dead (Score:2)

Re:RDF hasn't woken up yet. (Score:2)

Error Checking the Human Genome? (Score:1)

Hello World (Score:1)

HTML-like tags (Score:5)

incosequential reporting? (Score:1)

Re:Hmmm.... (Score:1)

We are very closed to this. (Score:2)

Re:standards are important esp. for biologists (Score:1)

Re:It's a closed standard. (Score:1)

Re:Oh, but why? (Score:1)

Re:It's a closed standard. (Score:2)

Re:Article ignored what is already used! (Score:2)

Re:My mistake (Score:1)

Re:num1 (Score:1)

Re:Pathetic research by the author. (Score:1)

Re:num1 (Score:1)

Hmmm.... (Score:2)

Re:Oh, but why? (Score:1)

Re:Always something for Perl to do next (Score:1)

My life is complete (Score:1)

One word (Score:1)

Re:Goes to show... (Score:1)

Re:muy bein? (Score:1)

Re:GEML? Bah! Quadrary Encoding! (Score:1)

Re:EMCAScript and backwards compatibility - with D (Score:1)

Re:CellML (Score:1)

Re:HTML-like tags (Score:2)