Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Search Engines for Handwritten Documents

Posted by michael on Fri Dec 03, 2004 05:25 PM
from the lost-art dept.
An anonymous reader writes "Researchers at the University of Massachusetts have created a tool for automatically searching handwritten historical documents, such as the 140,000 pages that make up George Washington's personal papers in the Library of Congress. The most interesting part is that the papers are scanned versions of the originals and the search tool actually recognizes the handwritten text from these images."
+ -
story
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by Anonymous Coward on Friday December 03 2004, @05:26PM (#10992151)
    In America, handwriting is only for old people.
    • Handwriting sucks (Score:4, Interesting)

      by October_30th (531777) on Friday December 03 2004, @05:36PM (#10992259) Homepage Journal
      You were modded as funny, but I fully agree with you.

      I hate reading/producing anything longer than a post-it note that's in handwriting.

      • You're apparently not into the pure sciences like math or physics.

        I'd hate to be able to type in my equations, there's a feel to working things out on paper and pen. Besides, the tactile sensation of writing on paper is simply wonderful. No amount of typing can replace that.

        Nothing beats a good old fountain pen and writing on good paper =)
          • TeX or LaTeX are neat for writing papers, but not for doing your labnotes or solving a research problem. Writing also helps you think while you are at it, because of the time it takes to get your idea on paper. Not to mention the ease in switching modes - I can write, draw and do everything without bothering to or having to switch between programs. Thought to action, the easiest possible way.
    • by gcaseye6677 (694805) on Friday December 03 2004, @05:37PM (#10992266)
      Cursive writing certainly is. I can barely even read it anymore, much less write it. Does anybody else who is under 30 still write in cursive, other than when they made you do it in elementary school?
      • by Sheepdot (211478) on Friday December 03 2004, @05:47PM (#10992374) Journal
        I write out my checks in cursive. The other day I was admiring how pretty my cursive looked and how well it had developed from when I was in second grade and told to "TRY HARDER WEAKLING OR YOU WILL NEVER GET A JOB!". Then I realized just how ghey it was that I was enjoying the sight of it and hurridly gave it to the cashier... who was a guy... who (ick) winked at me.
      • by realdpk (116490) on Friday December 03 2004, @05:48PM (#10992387) Homepage Journal
        I wish they'd never taught cursive. Cursive destroyed my handwriting. At least, that's my current theory on why my handwriting sucks. :)
      • I'm 23 and I write in perfect cursive. In fact, I prefer it to typing. Maybe I like it because I suffered a serious injury to my hand when I was 12 that necessitated my learning to use it again from scratch.. I dunno. I just like to write, it relaxes me.
      • I still write cursive occasionally, mainly in personal notes. If I'm writing something that I need somebody else to be able to read, I definatly print instead of using cursive.
      • by jgardn (539054) <jgardn@alumni.washington.edu> on Friday December 03 2004, @06:04PM (#10992537) Homepage Journal
        Yes, and I use it to record notes in my lab book I use at work. I record all sorts of things I discover there. Some entries are several pages long with charts and graphs and tables and diagrams. Try doing that in a few minutes in Word or OpenOffice.

        The best part is I don't have to worry about backing up my lab books. The only real threat is fire, and it is no more dangerous than it is to CDs or hard drives.

        While the cursive handwriting of the 1700's and early 1800's may seem curious to us (notably, the tall 's' that looks like an 'f'), it is a very easy style that is neat, legible, and painless. Notice how there are very few back strokes.

        For those who are wondering, cursive is what you use when you get sick of trying to write in print legibly and quickly without getting carpal tunnel. Every culture has it. It's unfortunate it isn't common knowledge anymore in the US. Handwriting is a wonderful skill. It used to be people would judge others based on their handwriting skills in addition to their oratory.
      • I had a friend in high school who always wrote in cursive, and this was...a year ago, so I'm pretty sure he's still under 30. I think that he was the only one in the whole school who still did, though.
      • Am I the only one who read this and actually thought "damn...I can write in cursive....I think...I should give that a try." And then shivered at the though of the nuns who toughit it to me, ruler-in-hand, readh to smack my knuckles with it if I screwed up.

        Anyway, I watch Full Metal Jacket and it reminds me of Catholic school. To continue my rabmle, how many people who actually went to catholic school aren't curretly aethiest? I'm guessing not too many.

        Here...let me help you out, mods:
        (-1 Offtopic)
    • ... and second-graders.
  • Umm (Score:5, Insightful)

    by swtaarrs (640506) <swtaarrsNO@SPAMcomcast.net> on Friday December 03 2004, @05:27PM (#10992155)
    The most interesting part is that the papers are scanned versions of the originals and the search tool actually recognizes the handwritten text from these images.

    How else would it search handwritten documents? Am I missing something here?
    • Yeah, it would have been much more "interesting" if the papers were, I don't know, read psychically by the computer or something.
    • How else would it search handwritten documents? Am I missing something here?

      You write down exactly what you want to find in exactly the same handwriting that the document is written in and then it blocks scans it for what you wrote... duh.
    • It might search for certain kinds of penstrokes or something like that. You could input a vector map and it would find similar vectors. Or even bitmaps I guess.
    • Re:Umm (Score:3, Informative)

      Popular handwriting recognition software doesn't work like that - it gains much of it's information from the "pen" strokes used to create the letters. There's less information in a "finished" printed page than you'd get by tracking the movements a pen made to write it. For an example of this different approach see this paper describing handwriting recognition using pen mounted accelerometers [grenoble-soc.com].
  • Doc (Score:3, Funny)

    by savagedome (742194) on Friday December 03 2004, @05:27PM (#10992161)
    Huh? Well, lets see how well it keeps up with my doctor's handwriting...
  • by raehl (609729) <raehl311.yahoo@com> on Friday December 03 2004, @05:29PM (#10992179) Homepage
    Somebody invented a way for computers to recognize handwriting.

    Like, so 10 years ago.
    • by AHumbleOpinion (546848) on Friday December 03 2004, @06:10PM (#10992597) Homepage
      Somebody invented a way for computers to recognize handwriting. Like, so 10 years ago.

      I worked on an OCR system about 20 years ago. No pre-defined bitmaps of text, you trained the system on the font to be recognized. After a few hours you could turn it loose and it did fairly well. While goofing off we tried handwritten text. With good penmanship it worked to a degree.
        • They aren't doing OCR

          Yes, they are. They are not using an off-the-shelf OCR package. The OCR functionality is embedded into their software, it is highly specialized, but it is OCR. For those who are fixated on the letter 'C', recognizing multiple characters as a single unit is nothing new.
  • Hard to read! (Score:3, Interesting)

    by DeionXxX (261398) on Friday December 03 2004, @05:31PM (#10992202)
    Wow, looking at some of those examples, I was amazed by the fact that I couldn't READ most of the words. It looks completely foreing to me, might as well be trying to read Japanese.
  • Accuracy? (Score:2, Interesting)

    How good is the accuracy? The OCR technology of today might not be able to recognize the "flowery" text of most historical documents (look at "We the People" in the Declaration of Independence)
    • I agree, my grandmother was heavy into genealogy. She had hundreds of pages of neatly hand written, non-cursive documents. I tried to scan them with many different OCR programs, but none even came close to deciphering the text without skewing it badly. I tried ABBYY, Omnipage Pro 14, and a few others. Anyone have any successes with this kind of thing?
  • A waste? (Score:5, Insightful)

    by Anonymous Coward on Friday December 03 2004, @05:34PM (#10992230)
    These documents are old and handwritten. Why waste the processing power decyphering results for each search when you can decypher the text once with a similar algorithm and search an index built that way? It's not like the information is ever going to change. (unless we do rewrite history)
    • Um, that's almost certainly what they did. Running an OCR over 14,000 pages every time you do a search is nearly impossible. I only say nearly because, in theory, you can do it, but then searches days a few days to complete for zero net gain.
  • This is nothing new (Score:3, Informative)

    by 42forty-two42 (532340) <bdonlan&gmail,com> on Friday December 03 2004, @05:34PM (#10992234) Homepage Journal
    Google already did it! [google.com] Well, it's not handwritten, but that's just a logical progression.
  • such as the 140,000 [handwritten] pages that make up George Washington's personal papers in the Library of Congress.

    In related news, the family of Tobias Lear, George Washington's personal secretary [64.233.167.104], who took his own life [64.233.167.104] (arguably due to the horrible pain in his wrists), has filed suit.
  • by Thunderstruck (210399) on Friday December 03 2004, @05:39PM (#10992289)
    I took a lot of notes in College. I took a lot more notes in graduate school. I've even taken notes on books I've read for the fun of it. If I could run all of these through my scanner & search them from an application on my desktop, I could be really obnoxious in an argument.

  • by aristus (779174) on Friday December 03 2004, @05:45PM (#10992344)
    You have to be able to handle a quill pen to use it.
  • It's an interesting approach that should be extended to other languages than English. Most of the world's history is not about the US and it has certainly not been written down in English. What I would really like to have is a similar tool that can search, say, Greek, or Latin, (or whatever) handwritten text. Imagine being able to query Ovid for an item of interest without having to consult everything he's written. I can imagine that this might encourage people to study the classics (a pet peeve of mine is that many people lack historical sense...) and it would certainly facilitate research in this area.
    If you can put the queries in English, with the search engine taking care of translation, it would be even better. Then, extended historical study comes within everyone's reach and the classical studies (or humaniora) might be transformed.
  • Good Work! (Score:5, Funny)

    by CaptainCarrot (84625) on Friday December 03 2004, @05:45PM (#10992354)
    How pleafant that they've done what waf neceffary to make this happen. How did they train the foftware to recognize the quirky 18th Century handwriting?
  • by Anonymous Cowdog (154277) on Friday December 03 2004, @05:55PM (#10992462) Journal
    It's "Pixelative Text Cognizance."

    It's different. With OCR these rays of light scan the original, translate each scanpoint to discrete RGB values, and do pattern recognition.

    With this system, they just read the discrete RGB values directly from pixels of documents scanned in with rays of light, then they do recognition of patterns. See, it's totally different.
    • I'm not sure what's more funny, your post, or that it was modded "informative". :)
    • Er.... Do i seriously miss something here or was only some mod fooled by a troll?

      Lets examine your definitions:
      Ocr: document->RGB(via light)->pixels->patern recognition
      PTC: Document->Pixels(via light)->RGB->patern recognition.
      Of course you forget that there are no rgb values here, because its black/white, so there is only a brightness value per pixel left. So what is the difference?

      Sounds really AWFULLY different...

      Maybe its just your description that is lacking...
  • by Torgo's Pizza (547926) on Friday December 03 2004, @06:04PM (#10992541) Homepage Journal
    If only Nicholas Cage had this tool at his disposal, it would have made things much, much easier.
  • Holy shnikes! Optical Character Recognition! Bah.. I'm part of a research team at the Center for Cybermedia Research who are working on new algorithms for OCR with $4 million from Homeland Security. Its to be used on a gi-normous database containing scanned images of documents relating to Yucca Mountain.

    On top of that, OCR has been around for years. Yes, it isn't the best, but its functional. Doesn't census bureau use OCR for its census forms?

    So, yeah.. where is the news in the article?

  • spoofing (Score:3, Funny)

    by sewagemaster (466124) <sewagemaster@NOspAm.gmail.com> on Friday December 03 2004, @07:13PM (#10993145) Homepage
    great. now people are just going to spoof documents and put pr0n or enlargement spams in the pdfs when i search for anything academic related. i'm glad i dont have that problem yet finding pdf papers via google yet.
  • Although it is hard to OCR text and very hard to OCR cursive text written in historical documents, performing searches on those documents does not require a complete comprehension of the textand is therefore much easier to do.

    For instance, the software may be unable to distinguish the word bug from dog in one person's handwriting, but can still mark it with probabilities of the word's possible meanings.

    If a person later searches for the word bug or dog at a future date along with other terms, a mathematical calculation can be done for the likelyhood of the match and the searcher can make his/her own judgement to the meaning of the text.

    ---
    Conrad Barski