Physicists Discover Evolutionary Laws of Language 287
Hugh Pickens writes "Christopher Shea writes in the WSJ that physicists studying Google's massive collection of scanned books claim to have identified universal laws governing the birth, life course and death of words, marking an advance in a new field dubbed 'Culturomics': the application of data-crunching to subjects typically considered part of the humanities. Published in Science, their paper gives the best-yet estimate of the true number of words in English — a million, far more than any dictionary has recorded (the 2002 Webster's Third New International Dictionary has 348,000), with more than half of the language considered 'dark matter' that has evaded standard dictionaries (PDF). The paper tracked word usage through time (each year, for instance, 1% of the world's English-speaking population switches from 'sneaked' to 'snuck') and found that English continues to grow at a rate of 8,500 new words a year. However the growth rate is slowing, partly because the language is already so rich, the 'marginal utility' of new words is declining. Another discovery is that the death rates for words is rising, largely as a matter of homogenization as regional words disappear and spell-checking programs and vigilant copy editors choke off the chaotic variety of words much more quickly, in effect speeding up the natural selection of words. The authors also identified a universal 'tipping point' in the life cycle of new words: Roughly 30 to 50 years after their birth, words either enter the long-term lexicon or tumble off a cliff into disuse and go '23 skidoo' as children either accept or reject their parents' coinages."
Scrabble (Score:5, Informative)
Anyone that has played Scrabble (especially against a computer) know that there's tons of words out there that no one has ever heard of, most of which you can't even find a definition for. What the hell is a Qi? I don't know, but I can get 66 points for it.
Dictionary size (Score:4, Informative)
Re:"Universal laws"? (Score:5, Informative)
Organizing Language Vs. The General Public (Score:5, Informative)
My husband works for Merriam-Webster as an assistant editor/lexicographer. You wouldn't believe some of the stuff that goes on there. People will call and demand fame for a word. For example, some guy called in and said he'd been the one to come up with the word 'ginormous', and wanted credit for it. They don't seem to understand the process. MW's archives in the basement is a CIA-esque compilation of language; they'll use every collegiate they have for reference, going all the way back to the first one. Husband says it won't be long before internet-meme creations are included.
Irregular verbs (Score:4, Informative)
There has been mathematical studies on how long irregular verbs might survive in the English language [scienceblogs.com] for a long time. I remember seeing the first such article a while back.
Basically the more used a verb- the longer it will take us to be liberated from its influence. Some like the verb "to be" are so enconsced in our language that they may take many many generations to eliminate.
Of course- this ignores any political movement to eliminate them- as countries become closer- if English remains the language of democracy- there may be a push to make English more standard. A new English without all the rule contradictions it currently has would be double-plus good.
Re:Scrabble (Score:0, Informative)
It's a show on BBC2.
That shouldn't be modded funny but informative because it is true. It stands for Quite interesting.
Hebrew not that controversial to classify (Score:4, Informative)
I agree with your main point, and agree that the modern Hebrew vocabulary is subject to diverse influences, including European languages.
That said, Hebrew [wikipedia.org] (modern or otherwise) is not that hard to classify -- it is firmly in the Semitic language grouping [wikipedia.org], itself part of the Afroasiatic language family [wikipedia.org]. Hebrew is a cousin to Arabic, and a cousin to ancient Egyptian, Touareg, Somali, and Amharic (Ethiopian).
Cheers,
Tempest in a teapot (Score:5, Informative)
Speaking as a linguist (working on my Ph.D.) this is something of a tempest in a tea-pot. The most relevant use would be for glottochronology [wikipedia.org] - a field that's largely been abandoned by anyone seriously working on historical linguistics because of the various problems involved with that approach, including what the authors of the paper find, that the rate of word loss is not constant over time. They have a better idea of the rate of word loss, which could help improve glottochronology, but the method has a lot of flaws regardless.
Also, the question they're asking - how do words change over time, in terms of coining, becoming current, and becoming obsolete - really isn't a question historical linguists are that concerned about. Historical linguists are much more interested in how the forms of words change over time (phonological change), or how their function changes over time (grammaticalization), whereas the coinage and loss of words isn't often so important, especially on the large scale statistical level. Furthermore, this type of model probably handles languages with phenomena like avoidance speech [wikipedia.org] poorly, since that would change how and why words are kept or lost.
Their language sample is at heart a convenience sample - they happened to have access to lots of data in those three languages, and it is largely written data. Spanish and English are both related languages with very similar cultural contexts, while Hebrew is a strange choice in that is has an ancient history, but only quite recent revitalised usage. Whether most spoken interaction (which is what linguists tend to be more interested in) has even a tiny subset of the total number of words they are talking about is an open question and would be better tested against corpora with a large quantity of spoken data such as the British National Corpus or the International Corpus of English.
It's an interesting study, but if it hadn't been written by physicists I'm not sure if it would have ended up in Diachronica or the Journal of Historical Lingiustics, much less Science. Their "statistical rules" are interesting, but really not of any great use to wider linguistic inquiry. I think its import is really just exaggerated by the fact that science editors read Science and NOT most linguistics journals, and therefore they think it's really impressive.
Re:"Universal laws"? (Score:5, Informative)
It's not that similar, actually. In the above "paradox", you have a sum of the total distance covered after x time. If they were 10 feet a part, then after x minutes it is 5 + 2.5 + 1.25 + ... until you have x terms. As x goes to infinity, this sum will approach the full 10 feet. So the math is right, never will 10 feet be reached. And so the physics/engineering joke is fine, technically they will not meet following those rules, but there's always a point of "close enough". The rule itself is impossible to follow, though.
In Zeno's paradox of Achilles and the tortoise, it works like this. The tortoise is say moving at 1 foot per second, and is 10 feet ahead. Achilles moves at 10 feet per second (~7mph), so after 1 second he will reach the point where the tortoise is now. But after that 1 second the tortoise will be another foot head, so Achilles must take another 0.1 seconds to reach the new point, but in that 0.1 seconds the tortoise has moved again, and so on forever, with the next step taking 0.01 seconds but still not catching the tortoise. Even if you allow for the physics/engineering "close enough" at no point is Achilles EVER past the tortoise, only "close enough" to call him "caught up". The reason this is different is that x terms in the sum no longer take exactly x minutes, since each term is over a shorter time as well as a shorter distance. If you take the limits on the infinite sum, the distance between them goes to 0, and the total amount of time goes to a finite number, not infinity (in this case, that finite number is 1 and 1/9 second, exactly what you get if you just ask how long it takes a person going 9 feet per second to cross the original 10 foot distance). Mathematically there is no problem with taking a finite amount of time to go a finite distance, so there is no paradox, the equation works out exactly when Achilles catches up to the tortoise. It's not a time reachable in the sums you came up with to describe it, but it's still a finite time. Where in the dance paradox above, the time it takes to reach 0 distance IS infinite.
Re:Some Advice (Score:5, Informative)
As a piano player/retailer, that was my first thought, chemists be damned. :P
Re:Some Advice (Score:5, Informative)
Aliquot (proportional) wasn't a surprise to me either. It is a mostly legal term, though.
It's a term used daily in any chemistry lab, and regularly in chemistry classes, as well.