Algorithm Distinguishes Memes From Ordinary Information

KentuckyFC writes: "Memes are the cultural equivalent of genes: units that transfer ideas or practices from one human to another by means of imitation. In recent years, network scientists have become increasingly interested in how memes spread, work that has led to important insights into the nature of news cycles, into information avalanches on social networks and so on. But what exactly makes a meme and distinguishes it from other forms of information is not well understood. Now a team of researchers has developed a way to automatically distinguish scientific memes from other forms of information for the first time. Their technique exploits the way scientific papers reference older papers on related topics. They scoured the half a million papers published by Physical Review between 1893 and 2010 looking for common words or phrases. They define an interesting meme as one that is more likely to appear in a paper that cites another paper in which the same meme occurs. In other words, interesting memes are more likely to replicate. They end up with a list of words and phrases that have spread by replication and can also see how this spreading has changed over the last 100 years. The top five phrases are: loop quantum cosmology, unparticle, sonoluminescence, MgB2 and stochastic resonance; all of which are important topics in physics. The team say the technique is interesting because it provides a way to distinguish memes from other forms of information that do not spread in the same way through replication."
    Sorry, but a "meme" is a picture of a humorous animal with a joke in Impact font at the top and bottom. The word used to mean something else, but that definition got outcompeted by one that was better at replication.

    At first I was excited by them using my preferred definition of meme in the first sentence of the summary, then I saw the list" loop quantum cosmology, unparticle, sonoluminescence, MgB2 and stochastic resonance" and realized that it may as well be Yo dawg, I heard you like memes, so I put a meme in your meme, or I can haz cheezburger?. So they mined the journal for words and phrases... meh, those aren't memes

      So they mined the journal for words and phrases... meh, those aren't memes

      They are memes in the sense that they are specifically finding words and phrases that are frequently inherited by papers (where "descendant" is determined by citation links), and rarely appear spontaneously (i.e. without appearing in any of the papers cites by a paper). An important feature is that their method used zero linguistic information, didn't bother with pruning out stopwords, or indeed, do any preprocessing other than simple tokenisation by whitespace and punctuation. Managing to come out with nou

    it's so rare to see the word 'meme' used in its true sense any more. I'd love to see Internet memes called 'netmemes' to disambiguate the terms.
    • But the writers of TFA are still misusing the word. All learned knowledge is memetic: It's silly to pull arbitrary words from an information stream and pretend only they are memes. The word they should be looking for is "important" or "central". The software is pulling ideas more central to the science. That's excellent work and well worth doing... It's just not directly related to memes.
        But the writers of TFA are still misusing the word

        Actually no, they are not. By using citations to create a directed graph of papers they are specifically looking for words or phrases that are highly likely to be inherited by descendant documents and also much less frequently spontaneously appear in documents (i.e. not used in any of the cited documents). They really are interested in the heritability of words and phrases.

  • For human evaluation, they compared their "meme list" to a set of phrases selected at uniform random from papers with enough citations. This is worthless; any half-way intelligent method will outperform that. If you had physicists come up with a list of 20 important phrases de novo, it would probably not have a huge amount of overlap with their "memes." []

