Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI Books Science

Thanks to Machine Learning, Scientist Finally Recover Text From The Charred Scrolls of Vesuvius (sciencealert.com) 45

The great libraries of the ancient classical world are "legendary... said to have contained stacks of texts," writes ScienceAlert. But from Rome to Constantinople, Athens to Alexandria, only one collection survived to the present day.

And here in 2024, "we can now start reading its contents." A worldwide competition to decipher the charred texts of the Villa of Papyri — an ancient Roman mansion destroyed by the eruption of Mount Vesuvius — has revealed a timeless infatuation with the pleasures of music, the color purple, and, of course, the zingy taste of capers. The so-called Vesuvius challenge was launched a few years ago by computer scientist Brent Seales at the University of Kentucky with support from Silicon Valley investors. The ongoing 'master plan' is to build on Seales' previous work and read all 1,800 or so charred papyri from the ancient Roman library, starting with scrolls labeled 1 to 4.

In 2023, the annual gold prize was awarded to a team of three students, who recovered four passages containing 140 characters — the longest extractions yet. The winners are Youssef Nader, Luke Farritor, and Julian Schilliger. "After 275 years, the ancient puzzle of the Herculaneum Papyri has been solved," reads the Vesuvius Challenge Scroll Prize website. "But the quest to uncover the secrets of the scrolls is just beginning...." Only now, with the advent of X-ray tomography and machine learning, can their inky words be pulled from the darkness of carbon.

A few months ago students deciphered a single word — "purple," according to the article. But "That winning code was then made available for all competitors to build upon." Within three months, passages in Latin and Greek were blooming from the blackness, almost as if by magic. The team with the most readable submission at the end of 2023 included both previous finders of the word 'purple'. Their unfurling of scroll 1 is truly impressive and includes more than 11 columns of text. Experts are now rushing to translate what has been found. So far, about 5 percent of the scroll has been unrolled and read to date. It is not a duplicate of past work, scholars of the Vesuvius Challenge say, but a "never-before-seen text from antiquity."

One line reads: "In the case of food, we do not right away believe things that are scarce to be absolutely more pleasant than those which are abundant."

Thanks to davidone (Slashdot reader #12,252) for sharing the article.
This discussion has been archived. No new comments can be posted.

Thanks to Machine Learning, Scientist Finally Recover Text From The Charred Scrolls of Vesuvius

Comments Filter:
  • by quonset ( 4839537 ) on Sunday February 18, 2024 @06:27PM (#64249898)

    One line reads: "In the case of food, we do not right away believe things that are scarce to be absolutely more pleasant than those which are abundant."

    Be sure to drink your Ovaltine.

  • by Growlley ( 6732614 ) on Sunday February 18, 2024 @06:32PM (#64249912)
    but that wonderfull new invention AI is going to make life so much better by regulating the thermal vents in the volcano.
    • by Anonymous Coward
      What?
      • Probably played too much Horizon: Zero Dawn - the Frozen Wilds

        A significant portion of the plot involves an AI tasked with managing the thermal characteristics of a volcano.

  • AI hallucinations (Score:5, Interesting)

    by misnohmer ( 1636461 ) on Sunday February 18, 2024 @07:55PM (#64250080)
    How do we differentiate real deciphered content vs. AI hallucinations? Can we validate that in fact the deciphered content is what it originally was intended, or just an AI interpretation akin to people seeing object shapes looking at cloud formations?
    • by Rei ( 128717 ) on Sunday February 18, 2024 @08:24PM (#64250146) Homepage

      It's deciphering whether something is a paint fleck or not. It doesn't know language.

      Multiple teams competed with different models, and each given team iterated through many models. The text is what it is; it's not a treatice on pleasure in one model run but a recipe for silphium-brazed dormice in another.

    • by quenda ( 644621 )

      Thankyou for the obvious question. So obvious, it is addressed in detail in TFA. So RTFA.

    • Re:AI hallucinations (Score:4, Interesting)

      by christoban ( 3028573 ) on Sunday February 18, 2024 @09:42PM (#64250260)

      Hallucination? It's not an LLM, it's some other kind of neural net. It looks at a charred piece of papyrus and tries to work out a character.

      • by quenda ( 644621 )

        Image recognition can get false positives. Like seeing shapes in clouds.

        • by MikeS2k ( 589190 ) <mikes2 AT ntlworld DOT com> on Monday February 19, 2024 @04:26AM (#64250746)

          It looks like the AI distinguishes individual characters, it's not an LLM so it has no understanding of language itself - so it wouldn't hallucinate "Circenses" as "Circumference" - rather you may get "Circenscs" if the latter e is malformed. I'd imagine any human proofreader could sort out any hallucinations / misreadings of characters. Looking forward to seeing what they recover!

          • by quenda ( 644621 )

            Yes, the "hallucinate" thing happens with image recognition too.

            What they are doing is first using AI to find the ink flecks. So you get a picture which is just an image, before any character recognition is applied.
            This is no guarantee by itself! The AI may start to notice that ink flecks are more likely to be located in certain patterns, even if it was not previously trained with the greek alphabet.

            https://scrollprize.org/grandp... [scrollprize.org]

    • What is seen as "hallucinations" is actually a manifestation of the fact that there is simply not enough data in the model and the sample to justify the statistical conclusion that is produced by the algorithm.

      This is always the first question you should ask - is the data that your model was built with appropriate, and does the data for the conclusion your model produces actually available in the set you're feeding it for "analysis".

      When your answer is "no" or "I don't know", then you know you're only gett

      • If the AI does not work, the solution is more AI.
        Yeah.
        Classic answer.

        • Not sure how that follows from what I wrote, but be my guest, go ahead.

          • by stooo ( 2202012 )

            >> " is simply not enough data in the model"
            -> Need more data -> need more complex AI
            -> so conclusion in short : "if AI does not work -> solution is add more of it."

            • You need to work on your reading comprehension.

              • I see this conflation a lot -- people talk like the "AI" part is the same as the data fed to the AI.

                If a 10 year old kid raised in a cult thinks the earth is flat, the proper conclusion isn't "10 year olds can't understand science". The proper conclusion is "this 10 year old needs access to better textbooks and instruction"

                It sounds like the person responding to you is conflating your discussion of how improving the data/sample could improve results, with saying "this model produces poor results, so give u

                • Yes, to a large extent this.

                  While I'm simplifying it a lot, when you do "AI", you have either of two cases, a) you understand your (typically linearized) model and are trying to fit some coefficients to its variables to produce later some inference from an input vector of "real" data, or b) you don't understand it at all, and are dumping the whole shebang to the statistical meat grinder, hoping it will derive both some empirical "rules" and their coefficients, for the same purpose.

                  Doing that, you're implici

    • I think you need to separate the final deciphering of the text from the virtual unrolling of the scrolls here. The decyphering itself is probably best left to human translators and will take years.

      The scientifically interesting method of unrolling the scrolls virtually from X-rays is a problem that likely has a convincing and clear solution once found. It comes down to geometry and statistics, rolling up a virtual piece of paper in such a way that it intersects blobs of ink detected at 3D locations inside

      • It is widely known that OCR does mistakes.
        Look for "xerox scanner skandal" for a nice example.

      • Incidentally, as CT and MRI scans are rapidly advancing in resolution, years ago I suggested to Google researchers that they could scan entire books without opening them. Same idea, apparently.

  • by aRTeeNLCH ( 6256058 ) on Monday February 19, 2024 @04:37AM (#64250754)
    "The volcano has been here and we've been next to it for ages, there's nothing to worry about. Ignore the panic makers."
  • Why does this seem to read just like a text out of Skyrim?

  • Item was titled "To Serve Man." Um, turns out it was a recipe...

  • I mean if they said "AI" the thread would meltdown in conspiracy...too late.

Understanding is always the understanding of a smaller problem in relation to a bigger problem. -- P.D. Ouspensky

Working...