Slashdot Log In
Search Engines for Handwritten Documents
Posted by
michael
on Fri Dec 03, 2004 05:25 PM
from the lost-art dept.
from the lost-art dept.
An anonymous reader writes "Researchers at the University of Massachusetts have created a tool for automatically searching handwritten historical documents, such as the 140,000 pages that make up George Washington's personal papers in the Library of Congress. The most interesting part is that the papers are scanned versions of the originals and the search tool actually recognizes the handwritten text from these images."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
Who still reads those? (Score:5, Funny)
Handwriting sucks (Score:4, Interesting)
I hate reading/producing anything longer than a post-it note that's in handwriting.
Parent
Re:Handwriting sucks (Score:5, Insightful)
I'd hate to be able to type in my equations, there's a feel to working things out on paper and pen. Besides, the tactile sensation of writing on paper is simply wonderful. No amount of typing can replace that.
Nothing beats a good old fountain pen and writing on good paper =)
Parent
Re:Handwriting sucks (Score:3, Insightful)
Re:Who still reads those? (Score:5, Interesting)
Parent
Re:Who still reads those? (Score:5, Funny)
Parent
Re:Who still reads those? (Score:4, Insightful)
Parent
Re:Who still reads those? (Score:2, Interesting)
Re:Who still reads those? (Score:2)
Re:Who still reads those? (Score:5, Interesting)
The best part is I don't have to worry about backing up my lab books. The only real threat is fire, and it is no more dangerous than it is to CDs or hard drives.
While the cursive handwriting of the 1700's and early 1800's may seem curious to us (notably, the tall 's' that looks like an 'f'), it is a very easy style that is neat, legible, and painless. Notice how there are very few back strokes.
For those who are wondering, cursive is what you use when you get sick of trying to write in print legibly and quickly without getting carpal tunnel. Every culture has it. It's unfortunate it isn't common knowledge anymore in the US. Handwriting is a wonderful skill. It used to be people would judge others based on their handwriting skills in addition to their oratory.
Parent
Fire, and Acid-Based Paper (Score:3, Insightful)
The only real threat is fire, and it is no more dangerous than it is to CDs or hard drives.
Go back and look at some old notebooks - if they used acid-based paper, then they'll be getting rather fragile.
Re:Who still reads those? (Score:2)
Re:Who still reads those? (Score:3, Interesting)
Anyway, I watch Full Metal Jacket and it reminds me of Catholic school. To continue my rabmle, how many people who actually went to catholic school aren't curretly aethiest? I'm guessing not too many.
Here...let me help you out, mods:
(-1 Offtopic)
Re:Who still reads those? (Score:2)
Umm (Score:5, Insightful)
How else would it search handwritten documents? Am I missing something here?
Re:Umm (Score:2, Funny)
Re:Umm (Score:2, Funny)
You write down exactly what you want to find in exactly the same handwriting that the document is written in and then it blocks scans it for what you wrote... duh.
Re:Umm (Score:2)
Re:Umm (Score:3, Informative)
Doc (Score:3, Funny)
This is so cool! (Score:3, Funny)
Like, so 10 years ago.
More like twenty years ago ;-) (Score:4, Interesting)
I worked on an OCR system about 20 years ago. No pre-defined bitmaps of text, you trained the system on the font to be recognized. After a few hours you could turn it loose and it did fairly well. While goofing off we tried handwritten text. With good penmanship it worked to a degree.
Parent
They are doing OCR (Score:2)
Yes, they are. They are not using an off-the-shelf OCR package. The OCR functionality is embedded into their software, it is highly specialized, but it is OCR. For those who are fixated on the letter 'C', recognizing multiple characters as a single unit is nothing new.
Hard to read! (Score:3, Interesting)
Re:Hard to read! (Score:3, Funny)
That's because it's written in a dead language.
English.
KFG
Accuracy? (Score:2, Interesting)
Re:Accuracy? (Score:2)
A waste? (Score:5, Insightful)
Re:A waste? (Score:3, Interesting)
This is nothing new (Score:3, Informative)
Re:This is nothing new (Score:3, Funny)
better Jay Williams link (Score:2)
In related news... (Score:2)
In related news, the family of Tobias Lear, George Washington's personal secretary [64.233.167.104], who took his own life [64.233.167.104] (arguably due to the horrible pain in his wrists), has filed suit.
Useful for more than just historians (Score:5, Interesting)
Re:Useful for more than just historians (Score:2)
This is slashdot. You would definitely be obnoxious if you argued a point with actual facts behind you...
Yes, but what they don't tell you... (Score:3, Funny)
Interesting, but limited (Score:3, Interesting)
If you can put the queries in English, with the search engine taking care of translation, it would be even better. Then, extended historical study comes within everyone's reach and the classical studies (or humaniora) might be transformed.
Good Work! (Score:5, Funny)
Re:Good Work! (Score:5, Funny)
Personally, I think it fucks.
Parent
It's not OCR (Score:4, Funny)
It's different. With OCR these rays of light scan the original, translate each scanpoint to discrete RGB values, and do pattern recognition.
With this system, they just read the discrete RGB values directly from pixels of documents scanned in with rays of light, then they do recognition of patterns. See, it's totally different.
Re:It's not OCR (Score:2, Funny)
Re:It's not OCR (Score:3, Insightful)
Lets examine your definitions:
Ocr: document->RGB(via light)->pixels->patern recognition
PTC: Document->Pixels(via light)->RGB->patern recognition.
Of course you forget that there are no rgb values here, because its black/white, so there is only a brightness value per pixel left. So what is the difference?
Sounds really AWFULLY different...
Maybe its just your description that is lacking...
National Treasure (Score:3, Funny)
OCR (Score:2)
Holy shnikes! Optical Character Recognition! Bah.. I'm part of a research team at the Center for Cybermedia Research who are working on new algorithms for OCR with $4 million from Homeland Security. Its to be used on a gi-normous database containing scanned images of documents relating to Yucca Mountain.
On top of that, OCR has been around for years. Yes, it isn't the best, but its functional. Doesn't census bureau use OCR for its census forms?
So, yeah.. where is the news in the article?
spoofing (Score:3, Funny)
One important thing to understand about this... (Score:3, Interesting)
For instance, the software may be unable to distinguish the word bug from dog in one person's handwriting, but can still mark it with probabilities of the word's possible meanings.
If a person later searches for the word bug or dog at a future date along with other terms, a mathematical calculation can be done for the likelyhood of the match and the searcher can make his/her own judgement to the meaning of the text.
---
Conrad Barski
Re:The search tool? (Score:2, Interesting)
Re:The search tool? (Score:2)
The search tool is doing the OCR then. OCR is simply taking an image and analyzing it to recognize text.
Re:The search tool? (Score:2)
Re:The search tool? (Score:3, Funny)