
XML for Ancients 118
Andrew writes: "More than 5,000 years ago, the very first information revolution occurred when some unknown research team in Mesopotamia found a way to download and store language through a killer application called "writing.". The cuneiform digital library will have 60,000 texts ready in a couple of years. Using SVG and XML to represent their documents. Similar efforts are underway for hieroglyphics."
Slightly off topic..... (Score:3, Interesting)
Apart from signing my name on credit card chits, the only time I am required to write is for birthday/Christmas and other assorted cards. Its getting so bad now that I start to write a long word and just give up. My once pristine handwriting now looks like a doctors prescription scrawl.
Any else get this too?
Po
Re:Slightly off topic..... (Score:1)
Besides conversational language is not require to have correct grammar and indeed spelling
I can additionally converse just as badly in another four languages, how many can you?
Regards,
Po
Re:Slightly off topic..... (Score:1)
Re:Slightly off topic..... (Score:1)
Hehehe you spelled your first word wrongly
Regards,
Po
Re:Slightly off topic..... (Score:1)
Po
Re:Slightly off topic..... (Score:1)
Blah blah blah, DTP, CAD and IT since 1993 blah blah blah
Interesting you should mention that; recently I had decided discovered that I could no longer write in script correctly. It was as if the ability just fell out of my head. I utterly un-knew it.
Now I was taught cursive script by nuns, which were not gentle about teaching the elegance of script. So I was stunned and scared. I had to secretly, (lest anyone realize how ridiculous this looked) re-learn the subtleties of upper and lower case script writing by actually practicing worksheet style... very, very embarrassing...
Re:Slightly off topic..... (Score:1)
Re:Slightly off topic..... (Score:1)
After 8 years as a developer, my handwriting is fine (well, my printing... I never really was one for cursive.) After four years in the Navy, though, while all my other handwriting skills remained more or less consistent, my signature went from something readable to an almost completely illegible scrawl. At about the same time, the exact same thing happened to my wife's signature - she was working as a social worker in a nursing home, and signing something every five minutes.
I don't think either one of us could actually produce a readable signature anymore even if we tried.
Re:Slightly off topic..... (Score:1)
Re:Slightly off topic..... (Score:1)
Hmm...that was pointless..
Cheers,
jw
Is access going to be free? (Score:2, Insightful)
Site appears to be slash-dotted already...
So.. Are these 5000 year old documents going to be freely available or will the database of texts be copyrighted/restricted?
Re:Is access going to be free? (Score:1, Offtopic)
But when I click on the link anyway, the site loads with on problem. This is the rule not the exception. The amount of times I can't get to a link from slashdot is surprisingly low.
Re:Is access going to be free? (Score:1, Insightful)
Re:Is access going to be free? (Score:1)
Actualy, that is not true. I can testify for that once I'm one of the victims of the so called "Transparent Proxy". The only thing transparent about it is that you don't have to configure your browser to use it. Also, you have no option about NOT using it. So, we have problems trying to check if a site is up, or if the proxy server overloads. Or even if it crashes.
I, for once, I totaly agains these monsters.
Re:Is access going to be free? (Score:2)
Why do I always people saying things like: "Slashdotted already! What a pity... It should have been cached."
But when I click on the link anyway, the site loads with on problem. This is the rule not the exception. The amount of times I can't get to a link from slashdot is surprisingly low.
That's because those people are the ones who do the actual slashdotting. Usually by the time normal people like you and me click on the link, somebody at the other end has noticed that their site is down due to a DBS (Denial by Slashdot) attack and has set up a couple of mirrors that that future requests can be redirected to. After all, it's not somebody would lie about a thing like that.
Copyright is 70 years on books 8) (Score:2, Informative)
The case will happen if you ask for the translation (What, you are not Cuneiform litterate ? Talk about education 8)
Sonny Bono Act (Score:2)
Copyright is 70 years on books
No, 95 years on all works first published on or after January 31, 1923. See also Sonny Bono Copyright Term Extension Act [everything2.com]. And it'll get even longer before 2020 as Di$ney frantically bribes Congre$$ to pass yet another corporate-welfare copyright extension.
The case will happen if you ask for the translation
Re:Is access going to be free? (Score:1)
If you can read cunieform you have access. If you don't, you better start learning. This is not a project for non-professionals - like Linux people, epigraphers would tell you to RTFM before you complain about not understanding what is written.
Will we have to revise unicode? (Score:2, Interesting)
Actually... (Score:4, Interesting)
The current version (3.1) of the Unicode Standard, developed by the Unicode Consortium, assigns a unique identifier to each of 94,140 characters
Re:Will we have to revise unicode? (Score:2)
Re:Will we have to revise unicode? (Score:2, Informative)
The main reason seems to be that in East Asia, there are reduced character sets in daily use which contain only a couple of hundred or thousand glyphs, but to read and study classical texts, the number required quickly goes up into the tens of thousands, for each of a number of languages. Not having these glyphs in the Unicode set would be like asking English-speakers to use alphabets reduced by five or six characters (M and N are similar, X, Q, C and Z could be replaced by one character as well) and dictionaries from which three out of four words have been deleted due to redundancy or age.
The reason for this mis-design, the article argues, is political: the nationalities in question have never been asked how many characters they would need together -- for each single language, Chinese, Korean, or Japanese, a scholar would say "Sure! 50,000 characters is enough for us!"
Re:Will we have to revise unicode? (Score:1)
This is certainly a true statement, but it gets at a basic engineering tradeoff: performance verses inclusiveness.
Total inclusiveness isn't desireable for two reasons.
a) When it comes to dead languages, you have scholars who make their living arguing over fine points pertaining thereto, thus making a 'standard' a moving target. Attempts at total inclusiveness are an exercise in windmill jousting.
b) Even in a "broadband for all my friends" environment, the market (where the loot is) favors svelte technologies.
Prediction: the market partitions itself with the low end covered by Unicode, and more exotic technologies to favor the scholarly crowd.
English works just fine with only 18 letters (Score:2)
Not having these glyphs in the Unicode set would be like asking English-speakers to use alphabets reduced by five or six characters (M and N are similar, X, Q, C and Z could be replaced by one character as well)
Spelling reform. China (outside Taiwan) has had it. It's perfectly possible to write English with only 18 letters [everything2.com].
and dictionaries from which three out of four words have been deleted due to redundancy or age
So? Desk dictionaries aren't nearly as comprehensive as Oxford English Dictionary or even the unabridged Webster's Third New International Dictionary.
Re:Will we have to revise unicode? (Score:1)
Re:Will we have to revise unicode? (Score:2)
The author of that article doesn't seem to understnad the fact that Unicode is a character set, not a font. He also doesn't seem to understand how Unicode's surrogate pairs work (which allow for encoding of more than 1 million characters). He doesn't seem to understand that Unicode is an evolving standard (i.e., 3.1 is hardly the final version). And he doesn't seem to understand that UTF-8, UTF-16, UTF-32, etc. are all just different formats, and they actually represent the exact same character set.
But most importantly, he is flat-out wrong about how and why the decisions were made regarding encoding of East Asian languages. He needs to learn about the history of Han unification for CJK characters. If he did, he would know that linguists and computer scientists from East Asian countries have been involved in Unicode since the beginning. The unification of East Asian characters was done on purpose, and has the full support of linguists, scholars, and computer scientists from those countries.
If the author of that article had just spent a few minutes reading the a copy of The Unicode Standard, he would not have made those mistakes. He didn't even have to read the whole thing! Just the Introduction and Appendix A would have set him straight on the issues I just mentioned. The fact that he didn't means this guy really shouldn't be doing work for a company with the word "Research" in the title.
Oh, and even though that page says the article has not been modified since June 4, you can see from the google cache [google.com] that they have since removed their promise of responding to criticism.
And one more thing: Since he derides those mean old Westerners on the Unicode committee for being insensitive towards the peoples of East Asian countries, perhaps he should ask himself if it is considered impolite or insensitive to sweepingly refer to such peoples as "Oriental", which he does in the first few paragraphs.
Re:Will we have to revise unicode? (Score:2)
UTF-8 is actually perfectly sufficient for 32-bit characters. (And you meant UCS-32; UTF-n is an n-bit/character encoding of >n-bit characters, while UCS-n is the n-bit character set).
"Face on Mars" like thing (Score:1)
Wonder how long it will be before someone finds something interesting here, and how long it will take to "doctor" it?
Alternately, how long will it take for someone to fake something.
XML, Writing and Jabber (Score:2, Interesting)
They're using XML? They could integrate this with some sort of retrieval language and couple it with Jabber [jabber.org] clients. That way you could send some sort of command-line search/retrieval command to the database using a regular Jabber client and have the XML data sent back, since Jabber natively supports the standard.
Re:XML, Writing and Jabber (Score:2, Informative)
... (Score:4, Funny)
It appears that... (Score:5, Funny)
Cunieform writing (Score:5, Informative)
[smile]
Scientific American [sciam.com] has this article on Information Technology, 2500 B.C. [sciam.com] on what life was like for the information worker of that day.
As many as half a million cuneiform tablets, hand size up to book-page size, are now available around the world. Surely many more are waiting to be found. Those samples are of every quality: once prized accounts and receipts, schoolboys' lessons, litigation profound or droll, literary essays, erotica, mathematics--and entire ancient epics, centuries older than Father Abraham's. A mostly unread treasury, comprising the equivalent of tens of thousands of large printed volumes.
Looks like there could be a lot of fun and good stuff there.
First case of poor infrastructure planning... (Score:5, Funny)
-- William "Scorpion King" Gates
Re:First case of poor infrastructure planning... (Score:1)
Wow (Score:2, Funny)
Sooo... this project has been going on for about 5,000 years, they're finally going to be making a large release in a few years, and we're *JUST NOW* hearing about this?
My *god*, talk about keeping the PR lid on tight!
XML? Thank god it's not MS Word (Score:1, Funny)
Oh, what the hell.
Micro$oft sucks.
XML is a poor choice for cuneiform (Score:5, Funny)
Re:XML is a poor choice for cuneiform (Score:1, Redundant)
Run through an XSLT transformation.. Voila... HTML or PDF representing the cuneiform document (Do texts written in cuneiform qualify as documents??!?
Jeremy
all bound for mu-mu land (Score:4, Insightful)
The cuneiforms are justified and ancient.
and well formed.
XML is gonna rock you.
XML Hieroglyphics (Score:2, Funny)
No, really.
Should story links also have [url] notation? (Score:2, Funny)
XML Overrated? (Score:2)
Correct me if I'm wrong, but what is XML doing that some homegrown solution couldn't? Obviously clients would have to know the protocol, but with XML that is also the case.
I use XML all the time, maily because of XSLT, but I think its less functional and more hype. Feel free to enlighten me.
Protocol implementation (Score:2, Informative)
There's other advantages, but that's a big one.
Re:XML Overrated? (Score:2, Interesting)
Taking a quote from the heiroglyphics link [univ-paris8.fr] (can't comment on the cuneiform link as it's
Of course, as with any use of XML, you could do it with a 'homegrown' solution, but the point is that using XML gives you a well known (and well supported) framework which everyone can standardise on. (And yes I know the XML in the example is malformed
Re:XML Overrated? (Score:1)
Or, because XML is increasingly used in other applications, hence interoperability is not only high right now, but is also getting higher?
But perhaps it is because XML is very well suited to representing diverse forms of data.
I dunno..
Pishaw! (Score:1)
Talk about dead projects. I mean, freshmeat has nothing on these guys. 5,000 years, and how many upgrades? I'm STILL using writing 1.0, for chrissakes, not because it's better, but because there are no other versions!
Time to upgrade (Score:1, Funny)
While there is still some support for all sub-releases of version 3, I suggest you upgrade to the latest release (3.1.27 - 'joined up alphanumeric').
Of course there has been some criticism of the 'open source' nature of the writing project with claims that it leads to too many active branches (most notably with interoperability issues with the popular 'Chinese', 'Arabic' and 'Roman' branches).
Maybe a loop here ... (Score:1)
I mean, about just the same as todays computers...
Maybe we could try and feed the Enki story to a computer...
Missing marks (Score:2)
Consider, for example, the carry dots that some people use to add up numbers. Dots and things like that in the text may well uncover the way that calculations were done.
Re:Missing marks (Score:1)
Re:Missing marks (Score:2)
Interestingly, the developers of cuneiform also developed the first envelopes. The main message was kiln-fired and then wrapped with a new layer of clay, the address incised and the result merely air-dried. The recipient then gave the lot a crack against a nearby stone and brushed away the *envelope* to read his mail.
Who supports SVG? (Score:2)
Re:Who supports SVG? (Score:1)
Adobe has a plug-in [adobe.com] for IE and many nice SVG demos [adobe.com]. Unfortunately the plug-in is not integrated into IE, so you have to download it.
IE directly supports VML (try it here [microsoft.com] if you are using IE), which does more or less the same as SVG except that it's older, not standardized, and only supported by Microsoft.
Re:Who supports SVG? (Score:1)
Also, there's Batik, which is a Java-based SVG viewer plus some other tools.
VML does much less than SVG; it's pretty primitive in comparison. And it seems to have stagnated -- MS hasn't updated their support for it in IE for a long time.
SVG is XML (Score:1, Troll)
XML for Ancients? (Score:5, Funny)
"XML for Mummies"
At least in this case when you see the reviews "this book will put you to sleep" it really doesn't matter.
ICE (Score:2, Troll)
There is an initiative for almost every ancient language that is know (and decipherable). I'm sure digging thru xml.org will turn up a bounty of results =]
Data and Ceramics (Score:1)
Weren't the old drum storage systems used in the 1960's a ceramic structure coated with magentic surface? And that was an improvement on those birch bark 80 column cards.
But now we have advanced ceramics used in various other electronic media. And we measure our mean time between failure in hours.
So, how far have we really come in the last 5,000 years? They had fire and clay and their data remains readable after 5,000 years. We have lightening and clay and can't read data from 15 years ago and hard drives can fail in a flash.
Why aren't we planning storage and retrieval systems that can last thousands of years? Is it because our technical culture only values the last 2 to 3 years? How will we answer to our children when they can't figure out what we did 25 or 50 years from now? And I don't think we can blame it all as a planned obsolescence feature of Microsoft...well, maybe not all of it!
ohhhh baby (Score:1)
How Modern (Score:2)
Re:How Modern (Score:2)
Then I can write a washing bill in Babylonic cuneiform
But it still won't help you learn about Caractacus's uniform. You've got to keep these things in perspective.
obvious question ... (Score:1)
Obvious answer (Score:2)
"First post"
- Scott