Thanks to Machine Learning, Scientist Finally Recover Text From The Charred Scrolls of Vesuvius (sciencealert.com) 45
The great libraries of the ancient classical world are "legendary... said to have contained stacks of texts," writes ScienceAlert. But from Rome to Constantinople, Athens to Alexandria, only one collection survived to the present day.
And here in 2024, "we can now start reading its contents." A worldwide competition to decipher the charred texts of the Villa of Papyri — an ancient Roman mansion destroyed by the eruption of Mount Vesuvius — has revealed a timeless infatuation with the pleasures of music, the color purple, and, of course, the zingy taste of capers. The so-called Vesuvius challenge was launched a few years ago by computer scientist Brent Seales at the University of Kentucky with support from Silicon Valley investors. The ongoing 'master plan' is to build on Seales' previous work and read all 1,800 or so charred papyri from the ancient Roman library, starting with scrolls labeled 1 to 4.
In 2023, the annual gold prize was awarded to a team of three students, who recovered four passages containing 140 characters — the longest extractions yet. The winners are Youssef Nader, Luke Farritor, and Julian Schilliger. "After 275 years, the ancient puzzle of the Herculaneum Papyri has been solved," reads the Vesuvius Challenge Scroll Prize website. "But the quest to uncover the secrets of the scrolls is just beginning...." Only now, with the advent of X-ray tomography and machine learning, can their inky words be pulled from the darkness of carbon.
A few months ago students deciphered a single word — "purple," according to the article. But "That winning code was then made available for all competitors to build upon." Within three months, passages in Latin and Greek were blooming from the blackness, almost as if by magic. The team with the most readable submission at the end of 2023 included both previous finders of the word 'purple'. Their unfurling of scroll 1 is truly impressive and includes more than 11 columns of text. Experts are now rushing to translate what has been found. So far, about 5 percent of the scroll has been unrolled and read to date. It is not a duplicate of past work, scholars of the Vesuvius Challenge say, but a "never-before-seen text from antiquity."
One line reads: "In the case of food, we do not right away believe things that are scarce to be absolutely more pleasant than those which are abundant."
Thanks to davidone (Slashdot reader #12,252) for sharing the article.
And here in 2024, "we can now start reading its contents." A worldwide competition to decipher the charred texts of the Villa of Papyri — an ancient Roman mansion destroyed by the eruption of Mount Vesuvius — has revealed a timeless infatuation with the pleasures of music, the color purple, and, of course, the zingy taste of capers. The so-called Vesuvius challenge was launched a few years ago by computer scientist Brent Seales at the University of Kentucky with support from Silicon Valley investors. The ongoing 'master plan' is to build on Seales' previous work and read all 1,800 or so charred papyri from the ancient Roman library, starting with scrolls labeled 1 to 4.
In 2023, the annual gold prize was awarded to a team of three students, who recovered four passages containing 140 characters — the longest extractions yet. The winners are Youssef Nader, Luke Farritor, and Julian Schilliger. "After 275 years, the ancient puzzle of the Herculaneum Papyri has been solved," reads the Vesuvius Challenge Scroll Prize website. "But the quest to uncover the secrets of the scrolls is just beginning...." Only now, with the advent of X-ray tomography and machine learning, can their inky words be pulled from the darkness of carbon.
A few months ago students deciphered a single word — "purple," according to the article. But "That winning code was then made available for all competitors to build upon." Within three months, passages in Latin and Greek were blooming from the blackness, almost as if by magic. The team with the most readable submission at the end of 2023 included both previous finders of the word 'purple'. Their unfurling of scroll 1 is truly impressive and includes more than 11 columns of text. Experts are now rushing to translate what has been found. So far, about 5 percent of the scroll has been unrolled and read to date. It is not a duplicate of past work, scholars of the Vesuvius Challenge say, but a "never-before-seen text from antiquity."
One line reads: "In the case of food, we do not right away believe things that are scarce to be absolutely more pleasant than those which are abundant."
Thanks to davidone (Slashdot reader #12,252) for sharing the article.
Can't wait for the next line (Score:5, Funny)
One line reads: "In the case of food, we do not right away believe things that are scarce to be absolutely more pleasant than those which are abundant."
Be sure to drink your Ovaltine.
Re: (Score:2)
It's more of a "rich people eat McDonald's too" sort of statement.
Re:Can't wait for the next line (Score:4, Funny)
Or "don't assume swan tastes better than chicken."
Re: (Score:2)
translation (Score:3)
Re: (Score:1)
Re: (Score:2)
Probably played too much Horizon: Zero Dawn - the Frozen Wilds
A significant portion of the plot involves an AI tasked with managing the thermal characteristics of a volcano.
Re: (Score:2)
Yes, it is such a travesty that even as a species with some 6-7 billion people or however many it is we are now, our entire species can only do ONE thing at a time. Such an oversight in design.
Re: (Score:1)
Re: (Score:1)
Da fuq are you talking about? The species is currently doing countless "things". Design?
Yes, between us we can both create sarcastic responded and be oblivious to them. Truly a wonder to behold.
Re: (Score:2)
I detected his sarcasm, didn't you?
Re: (Score:2)
Re: (Score:2)
Whoooooooooosh.
AI hallucinations (Score:5, Interesting)
Re:AI hallucinations (Score:4, Insightful)
It's deciphering whether something is a paint fleck or not. It doesn't know language.
Multiple teams competed with different models, and each given team iterated through many models. The text is what it is; it's not a treatice on pleasure in one model run but a recipe for silphium-brazed dormice in another.
You think it was not ? (Score:2)
>> You think the AI was trained with Latin?
Of course. You think it was not ?
Re: (Score:2)
Thankyou for the obvious question. So obvious, it is addressed in detail in TFA. So RTFA.
Re: (Score:2)
Uh really, because I didn't see a single word in "TFA" addressing this.
Re: (Score:2)
OK, sorry, there were a number of FAs linked. It was the last one:
https://scrollprize.org/grandp... [scrollprize.org]
Re:AI hallucinations (Score:4, Interesting)
Hallucination? It's not an LLM, it's some other kind of neural net. It looks at a charred piece of papyrus and tries to work out a character.
Re: (Score:2)
Image recognition can get false positives. Like seeing shapes in clouds.
Re:AI hallucinations (Score:4, Insightful)
It looks like the AI distinguishes individual characters, it's not an LLM so it has no understanding of language itself - so it wouldn't hallucinate "Circenses" as "Circumference" - rather you may get "Circenscs" if the latter e is malformed. I'd imagine any human proofreader could sort out any hallucinations / misreadings of characters. Looking forward to seeing what they recover!
Re: (Score:2)
Yes, the "hallucinate" thing happens with image recognition too.
What they are doing is first using AI to find the ink flecks. So you get a picture which is just an image, before any character recognition is applied.
This is no guarantee by itself! The AI may start to notice that ink flecks are more likely to be located in certain patterns, even if it was not previously trained with the greek alphabet.
https://scrollprize.org/grandp... [scrollprize.org]
Re: (Score:2)
What is seen as "hallucinations" is actually a manifestation of the fact that there is simply not enough data in the model and the sample to justify the statistical conclusion that is produced by the algorithm.
This is always the first question you should ask - is the data that your model was built with appropriate, and does the data for the conclusion your model produces actually available in the set you're feeding it for "analysis".
When your answer is "no" or "I don't know", then you know you're only gett
Classic answer. (Score:2)
If the AI does not work, the solution is more AI.
Yeah.
Classic answer.
Re: (Score:2)
Not sure how that follows from what I wrote, but be my guest, go ahead.
Re: (Score:2)
>> " is simply not enough data in the model"
-> Need more data -> need more complex AI
-> so conclusion in short : "if AI does not work -> solution is add more of it."
Re: (Score:2)
You need to work on your reading comprehension.
Re: (Score:2)
I see this conflation a lot -- people talk like the "AI" part is the same as the data fed to the AI.
If a 10 year old kid raised in a cult thinks the earth is flat, the proper conclusion isn't "10 year olds can't understand science". The proper conclusion is "this 10 year old needs access to better textbooks and instruction"
It sounds like the person responding to you is conflating your discussion of how improving the data/sample could improve results, with saying "this model produces poor results, so give u
Re: (Score:2)
Yes, to a large extent this.
While I'm simplifying it a lot, when you do "AI", you have either of two cases, a) you understand your (typically linearized) model and are trying to fit some coefficients to its variables to produce later some inference from an input vector of "real" data, or b) you don't understand it at all, and are dumping the whole shebang to the statistical meat grinder, hoping it will derive both some empirical "rules" and their coefficients, for the same purpose.
Doing that, you're implici
Re: (Score:3)
The scientifically interesting method of unrolling the scrolls virtually from X-rays is a problem that likely has a convincing and clear solution once found. It comes down to geometry and statistics, rolling up a virtual piece of paper in such a way that it intersects blobs of ink detected at 3D locations inside
xerox scanner skandal (Score:2)
It is widely known that OCR does mistakes.
Look for "xerox scanner skandal" for a nice example.
Re: (Score:2)
Incidentally, as CT and MRI scans are rapidly advancing in resolution, years ago I suggested to Google researchers that they could scan entire books without opening them. Same idea, apparently.
"nothing to worry about" (Score:5, Funny)
Skyrim (Score:2)
Why does this seem to read just like a text out of Skyrim?
Elderly Scroll (Score:2)
to serve man (Score:2)
Item was titled "To Serve Man." Um, turns out it was a recipe...
Machine Learning (Score:2)