Human Speech May Have a Universal Transmission Rate: 39 Bits Per Second (sciencemag.org) 102
sciencehabit writes: Italians are some of the fastest speakers on the planet, chattering at up to nine syllables per second. Many Germans, on the other hand, are slow enunciators, delivering five to six syllables in the same amount of time. Yet in any given minute, Italians and Germans convey roughly the same amount of information, according to a new study. Indeed, no matter how fast or slowly languages are spoken, they tend to transmit information at about the same rate: 39 bits per second, about twice the speed of Morse code. "This is pretty solid stuff," says Bart de Boer, an evolutionary linguist who studies speech production at the Free University of Brussels, but was not involved in the work. Language lovers have long suspected that information-heavy languages -- those that pack more information about tense, gender, and speaker into smaller units, for example -- move slowly to make up for their density of information, he says, whereas information-light languages such as Italian can gallop along at a much faster pace. But until now, no one had the data to prove it.
Dictionary of 2^39 (Score:2)
Does this mean that an "ai" only needs a dictionary of 2^39 to encode verbs, nouns, etc.?
Re: (Score:2)
2^39 would cover all possible 1-second speeches, which could include multiple nouns, verbs, and/or adjectives. I've read that you can do a working general vocabulary with 1000 words, which would be 10 bits; then we add jargon for less commonly used or domain-specific words.
Re: (Score:2)
We can take that a bit further. The normal speaking rate is about 2.5 words per second (150 wpm) however. So you'd expect an average of about 15 bits worth of information per word based in TFA. 2^15 is about 30000, which is not far off how many words it's said adult native English speakers would know.
Re: (Score:2)
If your attention span is 1 second, then yes.
Re: (Score:2)
Average Facebook fan is covered.
Re: (Score:2)
Does this mean that an "ai" only needs a dictionary of 2^39 to encode verbs, nouns, etc.?
It doesn't matter until the "fuzzy logic chip" is invented that allows AI to freely associate anything with anything and be able to somehow via heuristics to categorize them into the things that are valuable and those that are just noise. One of the reasons memes are so popular is because many of them randomly mash up concepts that normally don't go together so our brains try to determine is there some "connection" here or is it just noise/absurdity. Good luck teaching AI how to do it like we do. It's no
Re: (Score:2)
You mean like word2vec? Or word2vec's successors.
Re: (Score:2)
Besides the point someone makes below about 1 second containing > 1 word, a lot depends on the kind of information you want to encode. I'm looking at a 75k entries monolingual dictionary of Hindi right now. I'm told it has reasonably good coverage. In XML form, it takes up about 55 megabytes, and that zips down to a bit under 8 megabytes (I imagine a lot of that compression involves the highly repetitive XML tags). The dictionary contains orthographic, semantic (definitions), and morphological and sy
Book size (Score:3)
You also notice that some languages use more words to convey the same amount of information, so thickness of the same book varies by language.
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
What an asshole! All latinx people should be referred to with non-gender-assuming pronouns. It should be "mi amigx".
Re: (Score:3)
Maybe not the greatest example. :-) Even though the Spanish "encuentramelo" is one word (vs. 4 words in "find it for me"), it has 5 syllables (vs. only 4 syllables n English).
Re: (Score:2)
Crasis in Spanish only cover a few use cases with (fully formed the above is "encuentra a me lo") so it doesn't gain as much baud as one might think. Spanish can be spoken more rapidly than English for other reasons (like less consonant clusters) but English makes up ground with a larger vocabulary.
Re: (Score:3)
Re: (Score:2)
That's the cool thing. Spanish and Japanese have more syllables per unit of information, but Spanish and Japanese speakers speak more syllables per second, so their rate of information transfer is the same.
(The summary says this about Italian vs Japanese, but it's also true of other languages)
Somehow the brain adjusts the speed of speech to equalize the information flow rate.
Re: (Score:2)
Sorry, I meant Italian vs German, not vs Japanese
Re: (Score:2)
Indeed. Although when really learning a foreign language, you notice that some things are missing and that it has some things that your own language is missing. Language not only enables thinking, it also restricts it. That is why being fluent in more than one is so valuable.
Re: (Score:2)
Not 'missing' exactly, but the words for many things are bounded differently in different languages. In English the visible spectrum is divided into seven named colors, while Russian has two separately named shades of blue. The Sapir-Worf hypothesis is valid to a certain extent.
Re: (Score:2)
Yes, actually missing. Some cultures do not have some concepts and hence the languages evolved there do not have it either. Languages also represent a stance towards reality and towards things in it.
Re: (Score:2)
# of words (i.e. word tokens in a text) does not necessarily correlate with number of characters, since word length varies greatly across languages. Compare agglutinating languages like Turkish, Swahili (or other Bantu languages), Quechua (languages), Tamil (or other Dravidian languages), or Athabaskan languages with isolating languages like Vietnamese, Thai and even (to some extent) English. There's a graph here: http://www.ravi.io/language-wo... [www.ravi.io], but unfortunately it appears to compare the length of dic
So, (Score:5, Funny)
about the same as Comcast.
Re: (Score:1)
Re: (Score:2)
No because that's not what they examined. They looked at the difference between languages, not regional dialects of the same language.
Re: (Score:1)
Why not? Because it might provide evidence that did not support their conclusion?
Re: (Score:3)
What, specifically, about this study makes you believe that the researchers designed it to support a particular conclusion? Did you read the paper? Did you even read the article about the paper? The journal Science Advances has a generally good reputation. The authors have significant publication histories in the area of linguistics, and have been cited hundreds of times. The authors have made the article available for free, including the data and analysis code, for anyone to access. Did you access any of t
Re: (Score:3)
Studies have shown that Southern English speaker don't actually speak slower than their Northern counterparts. This is measured by counting the words per second in speech samples. The Northern accents sound faster because they tend to clip the end of words, while the Southern accents draw them out slightly. The "slow" speak is a perception issue.
Twice the speed of Morse code? Lolwut? (Score:3, Insightful)
Re: (Score:3)
we don't do 5 bits of real information per character in English, though. From TFA the estimate is about 7 bits per syllable, and a syllable is usually 2-3 letters.
Re: Twice the speed of Morse code? Lolwut? (Score:1)
Re: Twice the speed of Morse code? Lolwut? (Score:2)
Re: (Score:2)
That's an encoded format. Certainly I can read and write a lot faster than I can listen or speak. I think it's likely that the brain can much more efficiently interpret encoded language (i.e. written) than it can spoken language. So I think there's a bit of an apples and oranges comparison going on in your example.
Another thing to keep in mind is that spoken language is not alphabetical. Spoken language is built on sound units. Alphabets approximate that to some extent, but it's a compromise between efficie
Re: (Score:2)
I think it's likely that the brain can much more efficiently interpret encoded language (i.e. written) than it can spoken language.
Morse, or more correctly, International Code is not a written language. It is transmitted by audio signals, just like spoken language.
Another thing to keep in mind is that spoken language is not alphabetical. Spoken language is built on sound units.
As is Morse Code. DiDahDit is the sound of the letter 'r'. If you have to stop and think "dit dah dit, oh that's 'r'" then you are not operating at the level of even the "average" telegrapher. By average I am referring to the previous standard of 13 wpm for Amateur General Class license holders. Also, telegraphers do not have to pronounce the code sounds, so arguments about
Re: (Score:2)
Pinyin is the phonetic system, so it is explicitly not logogramic. I think you meant written Chinese, but even that is loosely 1 word=1 symbol only if you stretch the definition horribly and squint at it sideways.
Not to detract from your point, but didn't want others to get confused.
Re: (Score:2)
Certainly I can read and write a lot faster than I can listen or speak
That is actually demonstrably false. For most people speaking speed is 180wpm, handwriting speed is well below 130wpm, and typing speed below 80wpm.
And listening speed, are you joking? There is no listening speed, because it depends on the speaker, not the listener. If you mean the speed at which more than 0.1% of the words start become ineligible that's usually 210wpm for most people for a non-noisy channel and degrades as noise increases (funny how it almost behaves like a digital signal). Reading on the
Re: (Score:2)
> Reading on the other hand for most people...averages 120wpm, faster than typing, but not faster than speaking.
I believe that's incorrect. The #s (for English) that I've seen for speaking are comparable to yours (160 wpm, vs. your 180 wpm, close enough). But Reading speeds I've seen (giyf) are far higher: 200-250 wpm for "average" adults, 300 wpm for college students. It obviously depends on what you're reading, but in any case it's considerably higher than average speaking rate. Or perhaps you mis
Re: (Score:2)
Well, yes, I have see those figures of 260wpm reading speed and up. When you are talking wpm you also need to take into account the percentage of words understood (comprehension rate).
For the majority of people there is a very sharp drop of rate of comprehension above 120 wpm when reading. From somewhere around 99.9% (0.1% not understood) at 120 wpm, to as much 40% of words not understood (barely 60% comprehension) at 260 wpm.
Heck, the popular speed reading techniques push to 1000 wpm reading speed, but at
Re: (Score:2)
The idea that morse code has a speed is the same as the idea that ascii code or unicode has a speed.
Re: (Score:3)
While I agree that the comparison is somewhat nonsensical, I think it is valid to consider the information density of Morse code and the conveyance rate (based on the speed of a typical human operator) in the same way they did for the spoken language models in the article.
This is an article about information theory, not computer science.
Re: (Score:2)
No, the bit rate from the article has nothing to do with binary 1s and 0s. It is in reference to Shannon's information theory,
https://en.m.wikipedia.org/wik... [wikipedia.org]
To compare speech with Morse code, you have to take the 40 words per minute, convert to syllables per second (let's say 2 syllables / second, assuming an average 3 syllables / word), and then multiply by the information density of the language (7 bits / syllable for English). That works out to an information rate of 14 bits per second for your speed d
Re: (Score:2)
Morse code over radio was valued for its compactness in comparison to "phone" (speech). It died when digitization became an even more compact way to transmit data.
Re: (Score:2)
Wrong. It was valued for noise resilience, not for compactness, especially in the ear of analogue radio it took a ton more noise to make morse ineligible than spoken words. But in a zero noise scenario voice is always the fastest form of transmit.
ATC still uses voice and not morse.
Re: (Score:2)
Because radio Morse is encoded just as interruption of the carrier, you can cram many code channels into the space of one voice channel.
And you identified the wrong kind of resilience. In desert island rescue situations, it was possible to build a simple analog transmitter, perhaps out of shipwreck parts, capable of sending just unmodulated carrier. Implement any elementary way of interrupting (keying) the carrier, and you could get your messages out.
Re:from the sarcastic dept (Score:4, Informative)
Understanding how we process language is a pretty damned useful area of knowledge.
Re: (Score:2)
As language shapes thinking and many people really only have one language to think in and never mastered non-verbal thinking, this is pretty damn important and useful.
So what's the bit rate for reading English? (Score:1)
Re: (Score:2)
that's pretty darn fast. 9600bps is like 1200 characters per second or approximately 240 words per second (using an average of 5 chars per word). 240 words per minute is more typical. And that's using ASCII, which takes a lot of bits on the wire per bit of English information content (for example, once a Q has been sent the 8 bits for the following U add almost no new information.)
Experimental science (Score:2)
Linguists have been pretty sure about this for a long time, but it's great getting experimental data to back it up with.
Can it explain dialects? (Score:2)
This proposal explains why 'fast' languages and 'slow' languages convey the same bit rate of information, based on the density of information provided.
But can it also explain why some dialects within the same language are much faster paced than the others? An example is Tunisian and Moroccan Arabic, compared to, say Egyptian Arabic. The former two are way too fast for the latter people to keep pace with it.
The density of information would not be that different, but that assessment needs to be validated by l
Re: (Score:2)
Hmm, interesting. They looked at 17 languages, but not dialects within languages. Since, as you note, the information density (based on syllables) would be the same across dialects, I think they chose to ignore it by using a "standard" pronunciation. It definitely gets a bit more complicated when you allow alternate pronunciations, which you can probably think about as a form of compression.
Re: (Score:2)
It's debatable whether Tunisian and Moroccan Arabic are only different dialects, or actually different languages, from Egyptian Arabic. The very fact that you say Egyptians can't understand Tunisians or Moroccans tells me that they are distinct languages (and the Ethnologue would agree with that). Another way to put this: I very much doubt that what's hindering understanding is the speed, since there's presumably no significant biological difference between the two groups that would account for this (desp
Re: (Score:2)
Re: (Score:2)
Yes, Maltese is an Arabic language. It's not frozen (no language has been frozen, except maybe Biblical Hebrew, which was not spoken as a native language between AD 70--or even earlier--and the modern era). Maltese has undergone lots of changes, especially in the loss of a lot of phonemes (sounds), e.g. the emphatics and the ayin. These are the consonants you refer to as "Semitic consonants that don't exist in Latin"--it's not just that the consonants don't exist in the Latin script that is used to write
Re: (Score:2)
I didn't mean frozen as in "it has not changed since". Rather, it is closer to Arabic that was spoken there before the dissociation from the Muslim/Arab empire. Of course, as you say, all languages change over time. Maltese is no exception, and borrowed a lot from Italian.
Re: (Score:2)
"But the Tunisians can understand them, and they didn't lose any of the consonants." You're right, so I guess my explanation was wrong. Or possibly these changes are surmountable unless your dialect has lots of other differences (lexical, I would guess). I wonder if the emphatic/ non-emphatic distinction is losing ground in Arabic.
"A civilized and intellectually stimulating conversation on Slashdot?" Yeah, maybe we should turn in our /. badges. And our posts even got voted to +2 for it, almost as amazi
Re: (Score:2)
Meh ...
I have been voted -1 Troll basically because the person doing the voting is ignorant.
Example: there was an article about a Kazakhstan changing their script from Cyrillic to Latin, as a modernization effort. I posted that this is a bad idea, and Turkey did this in the 1920s, only to lose access to its 8 centuries of written lit
loose change (Score:2)
This is like quibbling over loose change.
the bit rate of language is not really all that important. The amount of data you are able to convey is and right now we can convey large amounts of information through speech. In many way machines are still not able to process the massive amount of data that humans generate through speech. Look at how fast humans adapted language patterns in speech to turn AI Taybot racist. The computer didn't even have a chance.
Re: (Score:2)
Actually, ASR is quite capable of running (on suitable processors) at faster than real time, and of digesting huge amounts of text data at much faster rates than humans can. I'm not making the claim that these AIs are really understanding anything in the same way we do, but they do have a semantic representation, and it's getting harder to distinguish their responses to questions from human responses. (Not impossible, see e.g. https://cmns.umd.edu/news-even... [umd.edu], and read the links for further details).
As f
RISC vs. CISC (Score:3)
Re:RISC vs. CISC (Score:4, Funny)
Life is too short to learn German. -- Oscar Wilde
Re: (Score:2)
Life is too short to learn German. -- Oscar Wilde
If the German had won the war, you'd learn German fast, or your life would have been shorter.
Re:RISC vs. CISC (Score:5, Funny)
Whenever the literary German dives into a sentence, that is the last you are going to see of him till he emerges on the other side of his Atlantic with his verb in his mouth. - Mark Twain
Re: (Score:2)
This is true for "literary Germans". The rest uses much simpler Grammar and mostly only two tense.
Examples:
We will eat pizza tomorrow -> Wir werden morgen Pizza essen (Verb eat = essen at the end literary German)
-> Wir essen morgen Pizza (the verb just after We, normal German)
We eat pizza -> Wir essen Pizza
We ate pizza yesterday -> Wir aßen gestern Pizza (literary German)
-> Wir haben gestern Pizza gegessen (in normal German, verb at the end)
but actually, no one would use that phrase. Ins
Re: (Score:2)
Gab's Pizza mit Sardellen? Warum hast du es mir nicht gesagt?!
Obligatory Bob and Rae sketch (Score:2)
An oldie but a goodie. [youtube.com]
Boy they talk fast! (Score:2)
Something something something Binars from Star Trek! Something something something
what about auctioneers? (Score:2)
https://www.youtube.com/watch?... [youtube.com]
Re: (Score:2)
Best link I've seen here!
Right up there is speed are those legal disclosures that you hear in advertisements for medicines and internet service providers (at least I think that's where I've heard them, I don't listen to many commercials these days). But like @42 here: https://www.youtube.com/watch?... [youtube.com]
okay, so.. (Score:2)
Re: (Score:2)
German = Windows, Italian = Unix? Mmm.
Re: (Score:2)
Spanish = Unix, I think: somewhere I have a photo (pre-digital camera) of a shop in Costa Rica with a big sign "Unix". It was a unisex barber shop.
Re: (Score:2)
Perhaps. Definitely Latin based. I remember a lot of talk about Unix in the Byzantine Empire.
Re: (Score:2)
German syllables are more complex and longer. In the end you need less syllables in German, but you need more time to produce them. The syllables in Italian are also more similar, like it is easier to say la la la or ra ra ra instead of school, awk ward.
Morse code is compressed data (Score:2)
Re: (Score:2)
Re: (Score:2)
I am not so sure it that was clear to everybody, but you are absolutely right with the notation that they mean information and not words or bits.
Hmm... (Score:2)
Re: (Score:2)
all rests on the premise that we know how much information is being conveyed
They defined an arbitrary definition of information, then applied it consistently across languages. Regardless of how useful it may be as an absolute measure, it's perfectly valid as a relative measure.
So syllables to concept translation? (Score:2)
The real communications is not in the syllables and bits per second. Communications lies in concepts per second. When they can give me that number I can start figuring out of I want to try to learn that more efficient language.
{^_^}
Re: (Score:2)
Re: (Score:2)
The real communications is not in the syllables and bits per second. Communications lies in concepts per second. When they can give me that number I can start figuring out of I want to try to learn that more efficient language. {^_^}
That is exactly what the 39bps are. It is measured in the concept spoken, not the number of letters or syllables.
Re: (Score:2)
This is the exact point the article makes. Oh yeah, this is slashdot!
Actually, English is among the more efficient languages, with relatively few syllables per word, and with relatively few words needed to express a thought, compared to languages like Spanish and Italian.
Laughable (Score:2)
"Italians are some of the fastest speakers on the planet, chattering at up to nine syllables per second."
A French waiter in Paris is much faster and he can insult you 3 times per second on top.
what about ability to remove redundant data? (Score:1)
they tend to transmit information at about the same rate: 39 bits per second, about twice the speed of Morse code.
information contained in a speech is an interesting but hard problem, highly unlikely to be able to be solvable by so called 'ai'. For example you can insert lot of bullshit in 'speech' and your precious 'ai' won't detect that - hell, even most 'humans' won't either.
Can they analyze speeches by Trump & Co. and report the data rates there? They can start here and not just calculate naive entropy but really work out how much 'real information' it contains, if you remove all the bullshit. Can they do th
Re: (Score:2)
"For example you can insert lot of bullshit in 'speech' and your precious 'ai' won't detect that - hell, even most 'humans' won't either." All depends. AI (NLP, really) already does better than some humans at that. And it's an active area of research, so chances are excellent that AI will do even better in the future. To take your example: "We will make America strong again. We will make America proud again. We will make America safe again. And we will make America great again." It's trivial to detect
Re: (Score:1)
'bogus promises', >99% probability
And i intentionally picked a trivial example because i suspected your answer will be elementary statistical analysis. As a more advanced example you can look at https://en.wikipedia.org/wiki/... [wikipedia.org] or https://www.facebook.com/med [facebook.com]
Vocal tics? (Score:2)
So if we transmit 39 bps, is it safe to assume we process roughly the same? And if so maybe this explains vocal tics in a way never considered before? Subconsciously slowing down transmission rates by creating a baseline sound to pause transmission?
Re: (Score:2)
Re: (Score:2)
See my post somewhere above: people can process spoken speech at ~1.5x normal speed without too much trouble, and still faster with some training. (Such speech is usually artificially sped up.) Of course a lot depends on the speaker and the listener both being conversant with whatever is being spoken about. I'd have a hard time understanding some sports commentators, because I know zip about much sports; you might have a harder time understanding me talking about computational linguistics, and I'd probab
Most Efficient Language? (Score:1)
This article raises an interesting debate.... what is the most efficient language, spoken and/or written? How could that communication efficiency benefit sectors like IT support that depend on evaluating/responding to spoken and written language as quickly as possible? Not only that, but could this communication efficiency equate into increased mathematical/statistical capability in AI?
Re: (Score:2)
If all of them transmit information at about 39bps, then they're all about the same.
Unless you're reading a disclaimer (Score:2)
If you are required by law to read a disclaimer on a 30-second radio ad and you don't want to spend half of that time reading the legalese, the transmission rate is about ten times that number.
Pimsleur (Score:1)