Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
AI Communications Science

AI May Have Finally Decoded the Mysterious 'Voynich Manuscript' (gizmodo.com) 203

An anonymous reader quotes a report from Gizmodo: Since its discovery over a hundred years ago, the 240-page Voynich manuscript, filled with seemingly coded language and inscrutable illustrations, of has confounded linguists and cryptographers. Using artificial intelligence, Canadian researchers have taken a huge step forward in unraveling the document's hidden meaning. Named after Wilfrid Voynich, the Polish book dealer who procured the manuscript in 1912, the document is written in an unknown script that encodes an unknown language -- a double-whammy of unknowns that has, until this point, been impossible to interpret. The Voynich manuscript contains hundreds of fragile pages, some missing, with hand-written text going from left to right. Most pages are adorned with illustrations of diagrams, including plants, nude figures, and astronomical symbols. But as for the meaning of the text -- nothing. No clue. For Greg Kondrak, an expert in natural language processing at the University of Alberta, this seemed a perfect task for artificial intelligence. With the help of his grad student Bradley Hauer, the computer scientists have taken a big step in cracking the code, discovering that the text is written in what appears to be the Hebrew language, and with letters arranged in a fixed pattern. To be fair, the researchers still don't know the meaning of the Voynich manuscript, but the stage is now set for other experts to join the investigation. The researchers used an AI to study "the text of the 'Universal Declaration of Human Rights' as it was written in 380 different languages, looking for patterns," reports Gizmodo. Following this training, the AI analyzed the Voynich gibberish, concluding with a high rate of certainty that the text was written in encoded Hebrew."

The researchers then entertained a hypothesis that the script was created with alphagrams, words in which text has been replaced by an alphabetically ordered anagram. "Armed with the knowledge that text was originally coded from Hebrew, the researchers devised an algorithm that could take these anagrams and create real Hebrew words." Finally, "the researchers deciphered the opening phrase of the manuscript" and ran it through Google Translate to convert it into passable English: "She made recommendations to the priest, man of the house and me and people." The study appears in Transactions of the Association of Computational Linguistics .
This discussion has been archived. No new comments can be posted.

AI May Have Finally Decoded the Mysterious 'Voynich Manuscript'

Comments Filter:
  • "Finally Decoded" (Score:5, Insightful)

    by Anonymous Coward on Tuesday January 30, 2018 @08:11AM (#56032285)

    STOP using this phrase in each bi-weekly story about this book only to say at the bottom of each article it "isn't really decoded".

    It's "decoded" when the text is readable.

  • Comment removed (Score:5, Interesting)

    by account_deleted ( 4530225 ) on Tuesday January 30, 2018 @08:20AM (#56032327)
    Comment removed based on user account deletion
    • It's not completely meaninglessness though. I mean it's gobbledygook, but gobbledygook with Latin sentence structure and vocabulary.
      • So maybe this is "Hebrew gobbledygook". What difference does it make?
        I'm still not convinced it's anything more than a sort of forgery, a faked artifact, and the only reason people care about it now is the circular "a bunch of previous people also cared about it".
    • Re:Lorem Ipsum (Score:5, Informative)

      by Dwedit ( 232252 ) on Tuesday January 30, 2018 @09:34AM (#56032661) Homepage

      The lorem ipsum text actually means something though... (some words were removed)

      Nor again is there anyone who loves or pursues or desires to obtain pain of itself, because it is pain, but occasionally circumstances occur in which toil and pain can procure him some great pleasure. To take a trivial example, which of us ever undertakes laborious physical exercise, except to obtain some advantage from it? But who has any right to find fault with a man who chooses to enjoy a pleasure that has no annoying consequences, or one who avoids a pain that produces no resultant pleasure?

      On the other hand, we denounce with righteous indignation and dislike men who are so beguiled and demoralized by the charms of pleasure of the moment, so blinded by desire, that they cannot foresee the pain and trouble that are bound to ensue; and equal blame belongs to those who fail in their duty through weakness of will, which is the same as saying through shrinking from toil and pain.

      • The Lorem Ipsum text, though, is based on something that Cicero wrote, but is definitely not coherent latin.

        where Cicero wrote;

        "Dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam"

        Which is Latin, the Lorem Ipsum runs;

        "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua"

        Which has some Latin words in it, but is mostly not.

      • Comment removed based on user account deletion
    • Re: (Score:2, Informative)

      by Anonymous Coward

      Um, Lorem Ipsum isn't meaningless, it's Latin text copied from Cicero. We already know what it means. There goes your entire post.

      • Lorem Ipsum is garbled text from Cicero; it was munged to produce the desired letter frequencies. It's pretty much gobbledygook.

  • Lololololol (Score:5, Insightful)

    by bluegutang ( 2814641 ) on Tuesday January 30, 2018 @08:41AM (#56032423)

    Failing to find any Hebrew scholars who could help validate their findings, the researchers eventually resorted to using Google Translate,

    (Source [sciencealert.com])

    This "research" is a joke.

    • by Anonymous Coward on Tuesday January 30, 2018 @08:50AM (#56032471)

      To be fair, pasting something into google counts as research for millenials.

    • by Megol ( 3135005 )

      From the paper:
      "According to a native speaker of the language,
      this is not quite a coherent sentence. However,
      after making a couple of spelling corrections,
      Google Translate is able to convert it into passable
      English: “She made recommendations to the priest,
      man of the house and me and people.”"

      So it is manually "corrected" input that produces that result.

      To show that this is a valid approach to decode the document they have to be able to decode larger parts of the text to something that make sense

      • So it is manually "corrected" input that produces that result.

        Yes... that's the best. After all, with carefully "corrected" input you're able to craft world class conspiration theories: https://www.youtube.com/watch?... [youtube.com]

        To show that this is a valid approach to decode the document they have to be able to decode larger parts of the text to something that make sense.
        That of course doesn't mean that it isn't a valid approach, there may have been deliberate misspellings by the writer before encryption and similar things.

        Doesn't Hebrew have those Tetragramm thing where they leave out vowels

        But unless they can produce longer readable texts IMO they haven't proved anything.

        • by gwolf ( 26339 )

          To show that this is a valid approach to decode the document they have to be able to decode larger parts of the text to something that make sense.
          That of course doesn't mean that it isn't a valid approach, there may have been deliberate misspellings by the writer before encryption and similar things.

          Doesn't Hebrew have those Tetragramm thing where they leave out vowels

          The Tetragram means literally the Four Letters, that's how in the scriptures the name of God is written - And yes, as is the *usual practice* in Hebrew, vowels are left out. From those four letters, the naming "Jehova" is derived, although it could be read in several different ways.

          But, again, in Hebrew we do not write (most) vowels except when writing for children, or in several cases (such as bibles, prayer books and such) where the exact pronunciation is deemed required. Vowels can be identified (by a we

    • by Luthair ( 847766 )
      Its Gizmodo, what do you expect from a tech blog?
    • by DRJlaw ( 946416 )

      This "research" is a joke.

      I disagree. How do you recruit a classical Hebrew scholar to validate your hypothesis and assist with additional work? Not i the Yellow Pages. You publish your intermediate results and hope that it tickles a suitable person's interest such that they join in the effort.

      You may as will declare Linus' work a joke. It's not as if Linux 0.12 was useful for much. It took a boatload of domain experts to bring it up to the capabilities that made people find it useful.

      • I disagree. How do you recruit a classical Hebrew scholar to validate your hypothesis and assist with additional work?

        You hit up people you know to see if they know any, or anyone who might know any. You ask around the faculty at the university you're associated with. You reach out to other researchers in the same field to see if they know someone or someone who might know someone. You hit Google and find scholars and reach out to them via email. Etc... etc...

        All of these are professional method

        • by DRJlaw ( 946416 )

          You hit up people you know to see if they know any, or anyone who might know any. You ask around the faculty at the university you're associated with. You reach out to other researchers in the same field to see if they know someone or someone who might know someone. You hit Google and find scholars and reach out to them via email. Etc... etc...

          All of these are professional methods used routinely by serious researchers across any number of fields.

          Those are not the exlusive routes, especially when the interse

          • Those are not the exlusive routes, especially when the intersection between computer science, the Univerversity of Alberta, and classical Hebrew scholarship is approximately 0.

            If they are not exclusive routes, feel free to suggest others. Even with an intersection of approximately 0, it shouldn't be hard to find an expert. If they didn't try or couldn't find one that would participate, that in itself tells us something.

            The current publication proves that your statement is false. And they did mark

            • by DRJlaw ( 946416 )

              If they are not exclusive routes, feel free to suggest others.

              I did.

              First, I allowed for the possibility I was wrong. Second, "peer review" is a process - not a stamp of quality.

              Yet you reject that possibility at every turn. Also, peer review is a stamp a quality -- it is a process designed to establish a threshold of quality through the input of the reviewers. Journals may do so well or poorly -- TACL [aclweb.org] is fairly selective.

              My opinion is based on my experience and the information I have at hand.

              Logical fall

    • Re:Lololololol (Score:5, Insightful)

      by quantaman ( 517394 ) on Tuesday January 30, 2018 @10:07AM (#56032871)

      Failing to find any Hebrew scholars who could help validate their findings, the researchers eventually resorted to using Google Translate,

      (Source [sciencealert.com])

      This "research" is a joke.

      Why? Because the Hebrew scholars didn't want to participate?

      Google Translate botches modern languages. The fact that running their results through Google Translate gave them meaningful output suggests they have real data.

      • Google Translate botches modern languages. The fact that running their results through Google Translate gave them meaningful output suggests they have real data.

        That Google Translate produces errors when exposed to relatively comprehensible data does not mean that getting meaningful output from Google Translate implies that they have real data. You can't cite Translate's fallibility as an example of its utility.

      • Re:Lololololol (Score:4, Insightful)

        by bluegutang ( 2814641 ) on Tuesday January 30, 2018 @11:40AM (#56033513)

        But they didn't get meaningful output. They got "She made recommendations to the priest, man of the house and me and people". This makes little sense as the first line of a book on herbology. This is AFTER "making a couple of spelling corrections" (how many is a couple?) and AFTER "de-anagraming" every single word (i.e. arbitrary picking one of the thousands of permutations of letters in the word). Not to mention that Hebrew is written without vowels, so any string of several characters is as likely as not to be a word.

        When I was in high school I used a script to find dictionary anagrams of my name and my friends' name. A few of the anagrams looked pretty cool. Did they have any deeper meaning? Of course not. This is basically the same methodology.

        • by Calydor ( 739835 )

          It makes lots of sense as the opening sentence for a herbology book. The person in question (she) has tried to (or wants to) give this information to the church, to authorities (my take on 'man of the house'), to the author and everyone else.

          Basically: This is a Public Domain license.

        • by q4Fry ( 1322209 )

          ... This is AFTER "making a couple of spelling corrections" (how many is a couple?) and AFTER "de-anagraming" every single word (i.e. arbitrary picking one of the thousands of permutations of letters in the word). ...

          When I was in high school I used a script to find dictionary anagrams of my name and my friends' name.

          This is fun. Now I can make up codes everywhere:

          Knew I saw in high school suede prints...

          Thanks for introducing me to their methodology. And you should bring those suede prints back. They'll be big.

        • But they didn't get meaningful output. They got "She made recommendations to the priest, man of the house and me and people". This makes little sense as the first line of a book on herbology.

          In English, it makes little sense. Hebrew, especially ancient/Biblical Hebrew, uses different sentence structure, both in terms of word order and (lack of) punctuation. A better English translation could be something like "She has made many recommendations, first to the priest, then to her husband, then to me, and finally to everyone in town."

          • Remember, this is supposed to be an English translation of a putative Hebrew text (and that done with Google Translate); it is not the Hebrew text, nor even an interlinear (word for word, same order as Hebrew) gloss. So the word order and sentence structure is irrelevant (as is the lack of punctuation, which is not a matter of sentence structure anyway). What bluegutang was saying (as I read him) is that the sentence does not seem like one you'd find *in a book on herbology*.

      • Once you translate a LONG text , then YES it means you have soemthing. but analyzing a few words / a single sentence ? Time and time again we get news somebody found out the code on the infamous manuscript, and it NEVER pans out. Heck, if they got so much success for 1 sentence, WHY oh WHY there is no report on translating a whole page which would be a good evidence ? Instead we get this report about one sentence. Reproducibility is key to demonstrate that the manuscript is translated. If they got a page ca
      • Google Translate can also produce seemingly-sensible results when given senseless inputs [upenn.edu]. Getting some meaningful output is only a weak suggestion that they have meaningful inputs. They should not have published without finding at least one Hebrew scholar who would take a look at their work - and the fact that they couldn't convince anyone to do so is itself suggestive.

      • You might change your mind after you read this: http://languagelog.ldc.upenn.e... [upenn.edu], and some of the links there. (Mark Liberman is, btw, a very senior computational linguist.) Google Translate is now quite capable of turning gibberish into meaningful output.

    • I like to see machine learning fail and how it fails. Based on the assumption of an all or nothing training set, neural networks will be 100% confident in their choice and also wrong.

      This .gif shows three different hand positions that all communicate the number three:

      https://imgur.com/a/KFR2M [imgur.com]

    • think of it as a smoke-test to see if the overall approach makes sense. they'll likely take that as a sign they're on to something, finish the decoding - THEN hand the entire thing to a proper hebrew scholar, to do the final translation.

      you're focusing on the wrong part of this. =/

    • Last year the theory was that it was a gynelogical text based upon the pictures, though the "encryption" was speculative. Ultimately however, the manuscript is just a manuscript. It's interesting as a puzzle but beyond that there will be no deep meanings uncovered or conspiracies unmaksed.

  • In brief, the manuscript says, "Dear World, this is my esoteric theory of the nature of the universe. I wrote it because I am very very smart, and you should pay attention to me, and shower me with honors. Because it is esoteric and holds the key to all metaphysical knowledge, I have written it such that only the most intelligent and worthy may know its secrets. However, if no one decodes it, I will die happy because it proves I was the smartest person alive. Sincerely, Yaddayadda."
    • So ... it was written by the medieval equivalent of some conspiracy theorist?

  • by pezpunk ( 205653 ) on Tuesday January 30, 2018 @08:55AM (#56032477) Homepage

    you would think over time people would become less gullible, not more.

    and sure, if you train an AI long and hard enough, it will probably be able to tickle out something that looks like meaning from that nonsense. just like if you train an AI to see dogs, it can identify weird dogs in literally any image.

    https://www.washingtonpost.com... [washingtonpost.com]

    • you would think over time people would become less gullible, not more.

      One would think so, but Creationism is on the rise again.

      • And flat earthers. Very strange, they were almost extinct. Similarly, conspiracy theorists seemed also to be on the decline but they're very common these days too.

        • My guess is that there's a connection. Some of those that take this bible thing serious think that their book could in some way be wrong if the Earth wasn't flat, so it MUST be flat because the book MUST be right.

          • Except that the book doesn't say that. Of course, when you get into people who are literalists then even obviously poetical statement is treated as literal truth.

            Flat earthers in my experience seem to be much more politically minded than religious, believing there's a big conspiracy out there to hide the truth. They're individuals, they don't learn flat earth beliefs from their parents or community, it's something they pick up as an adult.

            • I've had my share of religiously motivated flat earthers. And yes, the bible actually talks about a firmament spanning above the earth and stuff, and for literalists this means that it cannot be a globe. Because on a globe, a "firmament above" is pretty much impossible.

    • by GuB-42 ( 2483988 )

      A prank or something written by a madman. Even if it is highly plausible, knowing what the prank is about is interesting by itself.
      It is noteworthy that it seems to follow patterns of natural languages (ex: Zipf law). So it is unlikely to be random.

  • by Idimmu Xul ( 204345 ) on Tuesday January 30, 2018 @08:55AM (#56032479) Homepage Journal

    https://arstechnica.com/scienc... [arstechnica.com]

    its the puzzle that keeps on giving!

  • One Line (Score:4, Insightful)

    by NicknameUnavailable ( 4134147 ) on Tuesday January 30, 2018 @09:00AM (#56032511)
    Is proof this is a fake. They ran their algorithm, got something almost sensible for the first sentence, and the rest was total gibberish but they needed to publish.
  • by Opportunist ( 166417 ) on Tuesday January 30, 2018 @09:00AM (#56032513)

    https://xkcd.com/593/ [xkcd.com]

    It is obvious when you think about it...

    • by pr0t0 ( 216378 )

      Bah! I was just about to make this joke. I didn't know that xkcd already beat me to it!

  • Since its discovery over a hundred years ago, the 240-page Voynich manuscript, filled with seemingly coded language and inscrutable illustrations, of has confounded linguists and cryptographers.

    "of has confounded" - ?

    Ah, I get it. It's not terrible editing, it's more mysterious encryption!

  • ... the text is really just gibberish, a practical joke created by the author, and the AI is just an ~infinite number of monkeys at an infinite number of typewriters~ type of thing, eventually "finding" something that may make sense.
    • That's one of the theories. However, there have been attempts at statistical analysis that suggest total gibberish is unlikely. Moreover, that's a TON of work for a practical joke.

  • The manuscript is considered the worldâ(TM)s most important cipher, one scrutinized by cryptographers, both professional and amateurs, for decades.

    The manuscript is intriguing, but we can't say it's important without knowing the message. It could be entirely meaningless.

    • Yeah, I wish someone would put as much effort into decoding Linear A (the language of the Minoans), and showing it to be the ancestral language of some present-day languages (or some extinct languages). It's of course possible that it _doesn't_ represent the ancestor of some modern languages, in which case it will be forever unknowable. And there's not much of it, so even if it was a language we can reconstruct by other means, we might not be able to confirm it.

      • Yeah, I wish someone would put as much effort into decoding Linear A (the language of the Minoans)

        I wonder how large the corpus of Linear A is, compared to the Voynich text. I would be surprised if the Linear A is the larger, but that is more likely to find more Linear A (in secure archaeological contexts) then it is to find a second Voynich manuscript.

    • It could be entirely meaningless.

      Indeed, I've seen papers published - in about the last decade, showing that the Voynich textis statistically indistinguishable from what could be produced with a physical frame (to select letters) and a sheet or randomly distributed characters. Which is well in advance of cryptographic and statistical methods consonant with the historical record of the document, but otherwise well within the reach of inventors of the time. The diagrams seem rather more interesting.

  • by CodeHog ( 666724 ) <joe.slackerNO@SPAMgmail.com> on Tuesday January 30, 2018 @10:08AM (#56032875) Homepage
    "Drink your Ovaltine" - a crummy commercial.
  • "for dark is the suede that mows like a harvest"
    Wow, some pretty serious research here... /sarcasm.

  • I was saying this just the other day about the Zodiac killer's coded messages.. what if the author made a coding error? It would be so easy to do and just a couple errors could render the whole thing totally useless.
  • I look forward to seeing the fully decoded text. Until now all indications were that it was a "spooky" coffee table book full of nonsense text.

    • Everybody knows that the Voynich manuscript actually describes how to bypass the booby-traps on Oak Island to recover the Ark of the Covenant hidden there by Mayan Templars. I saw a documentary about that on the History Channel, Rick Only offered $50 for it.
  • If they can get coherent results using only machine translation, not understanding the base language themselves, this gives an even stronger claim in some ways that they have really cracked the code. We will know they aren't hand-tweaking the results to get what they want, because they don't actually know what they want. They only know what comes out the other end of the process.

  • by PPH ( 736903 )

    The Protocols of the Elders of Zion

  • It will be exciting to see this process applied to the untranslated Indus Valley Language and Easter Island glyphs.

    http://content.time.com/time/w... [time.com]

  • That's the biggest news since it was translated completely a year ago! Wow!
  • As soon as you see "anagram" mentioned as part of the process to decode a cipher, you can stop reading, it's not a solution. If you allow for an arbitrary arrangement of letters or symbols as part of the solution, you can arrive at pretty much *any* text as the result, with no real connection to the cipher you started with.

    • by Mal-2 ( 675116 )

      As soon as you see "anagram" mentioned as part of the process to decode a cipher, you can stop reading, it's not a solution. If you allow for an arbitrary arrangement of letters or symbols as part of the solution, you can arrive at pretty much *any* text as the result, with no real connection to the cipher you started with.

      Unfortunately, if that's what the author of the manuscript actually did, then it's a necessary step in making heads or tails of the text. There will be words and phrases that will be ambiguous because there is more than one possible unscrambling of the letters, but just because the encoding is lossy, that doesn't mean it's completely meaningless. I would have to imagine the author was aware of the potential for confusion and chose words that would not induce collisions that could not be resolved by context,

    • If I understood correctly, Alphagram is the result of sorting alphabetically the letters of a word, so there aren't many different combinations.
      I.e. encoding a message about a CAB, it would sort to ABC, and only ABC. When discussing SHEEP, you could only encode it to EEHPS.

      • I think what wwalker is saying is that going the opposite direction--from encoded to decoded--is impossibly ambiguous. This is particularly true of Semitic languages, where the root most often consists of three consonants. That implies that if you choose three letters, many permutations of those letters will be real roots. A few combinations, like 3 identical letters, or two identical letters at the beginning of the word (IIRC), can be ruled out, but most other combinations will be *some* root.

Technology is dominated by those who manage what they do not understand.

Working...