Forgot your password?
typodupeerror
Software Science

Search Engines for Handwritten Documents 172

Posted by michael
from the lost-art dept.
An anonymous reader writes "Researchers at the University of Massachusetts have created a tool for automatically searching handwritten historical documents, such as the 140,000 pages that make up George Washington's personal papers in the Library of Congress. The most interesting part is that the papers are scanned versions of the originals and the search tool actually recognizes the handwritten text from these images."
This discussion has been archived. No new comments can be posted.

Search Engines for Handwritten Documents

Comments Filter:
  • Re:The search tool? (Score:2, Interesting)

    by Skippy_the_Evil_Twin (453297) on Friday December 03, 2004 @05:30PM (#10992192) Homepage
    No OCR is performed on the documents. The search tool operates on the image.
  • Hard to read! (Score:3, Interesting)

    by DeionXxX (261398) on Friday December 03, 2004 @05:31PM (#10992202)
    Wow, looking at some of those examples, I was amazed by the fact that I couldn't READ most of the words. It looks completely foreing to me, might as well be trying to read Japanese.
  • Accuracy? (Score:2, Interesting)

    by b0lt (729408) <b0lt@ls.qc.to> on Friday December 03, 2004 @05:33PM (#10992217)
    How good is the accuracy? The OCR technology of today might not be able to recognize the "flowery" text of most historical documents (look at "We the People" in the Declaration of Independence)
  • Handwriting sucks (Score:4, Interesting)

    by October_30th (531777) on Friday December 03, 2004 @05:36PM (#10992259) Homepage Journal
    You were modded as funny, but I fully agree with you.

    I hate reading/producing anything longer than a post-it note that's in handwriting.

  • by gcaseye6677 (694805) on Friday December 03, 2004 @05:37PM (#10992266)
    Cursive writing certainly is. I can barely even read it anymore, much less write it. Does anybody else who is under 30 still write in cursive, other than when they made you do it in elementary school?
  • !WOW (Score:1, Interesting)

    by Anonymous Coward on Friday December 03, 2004 @05:37PM (#10992267)
    ... eh eh !gniddik tsuJ. !skoobeton inciV ad eht no esool ti teL.
  • by Thunderstruck (210399) on Friday December 03, 2004 @05:39PM (#10992289)
    I took a lot of notes in College. I took a lot more notes in graduate school. I've even taken notes on books I've read for the fun of it. If I could run all of these through my scanner & search them from an application on my desktop, I could be really obnoxious in an argument.

  • Re:A waste? (Score:3, Interesting)

    by 42forty-two42 (532340) <bdonlan@@@gmail...com> on Friday December 03, 2004 @05:43PM (#10992327) Homepage Journal
    Um, that's almost certainly what they did. Running an OCR over 14,000 pages every time you do a search is nearly impossible. I only say nearly because, in theory, you can do it, but then searches days a few days to complete for zero net gain.
  • by InternationalCow (681980) <mauricevansteensel@@@mac...com> on Friday December 03, 2004 @05:45PM (#10992347) Journal
    It's an interesting approach that should be extended to other languages than English. Most of the world's history is not about the US and it has certainly not been written down in English. What I would really like to have is a similar tool that can search, say, Greek, or Latin, (or whatever) handwritten text. Imagine being able to query Ovid for an item of interest without having to consult everything he's written. I can imagine that this might encourage people to study the classics (a pet peeve of mine is that many people lack historical sense...) and it would certainly facilitate research in this area.
    If you can put the queries in English, with the search engine taking care of translation, it would be even better. Then, extended historical study comes within everyone's reach and the classical studies (or humaniora) might be transformed.
  • by smacktits (737334) on Friday December 03, 2004 @05:51PM (#10992412)
    I'm 23 and I write in perfect cursive. In fact, I prefer it to typing. Maybe I like it because I suffered a serious injury to my hand when I was 12 that necessitated my learning to use it again from scratch.. I dunno. I just like to write, it relaxes me.
  • by jgardn (539054) <jgardn@alumni.washington.edu> on Friday December 03, 2004 @06:04PM (#10992537) Homepage Journal
    Yes, and I use it to record notes in my lab book I use at work. I record all sorts of things I discover there. Some entries are several pages long with charts and graphs and tables and diagrams. Try doing that in a few minutes in Word or OpenOffice.

    The best part is I don't have to worry about backing up my lab books. The only real threat is fire, and it is no more dangerous than it is to CDs or hard drives.

    While the cursive handwriting of the 1700's and early 1800's may seem curious to us (notably, the tall 's' that looks like an 'f'), it is a very easy style that is neat, legible, and painless. Notice how there are very few back strokes.

    For those who are wondering, cursive is what you use when you get sick of trying to write in print legibly and quickly without getting carpal tunnel. Every culture has it. It's unfortunate it isn't common knowledge anymore in the US. Handwriting is a wonderful skill. It used to be people would judge others based on their handwriting skills in addition to their oratory.
  • by AHumbleOpinion (546848) on Friday December 03, 2004 @06:10PM (#10992597) Homepage
    Somebody invented a way for computers to recognize handwriting. Like, so 10 years ago.

    I worked on an OCR system about 20 years ago. No pre-defined bitmaps of text, you trained the system on the font to be recognized. After a few hours you could turn it loose and it did fairly well. While goofing off we tried handwritten text. With good penmanship it worked to a degree.
  • by lucifuge31337 (529072) * <daryl AT introspect DOT net> on Friday December 03, 2004 @06:44PM (#10992904) Homepage
    Am I the only one who read this and actually thought "damn...I can write in cursive....I think...I should give that a try." And then shivered at the though of the nuns who toughit it to me, ruler-in-hand, readh to smack my knuckles with it if I screwed up.

    Anyway, I watch Full Metal Jacket and it reminds me of Catholic school. To continue my rabmle, how many people who actually went to catholic school aren't curretly aethiest? I'm guessing not too many.

    Here...let me help you out, mods:
    (-1 Offtopic)
  • Although it is hard to OCR text and very hard to OCR cursive text written in historical documents, performing searches on those documents does not require a complete comprehension of the textand is therefore much easier to do.

    For instance, the software may be unable to distinguish the word bug from dog in one person's handwriting, but can still mark it with probabilities of the word's possible meanings.

    If a person later searches for the word bug or dog at a future date along with other terms, a mathematical calculation can be done for the likelyhood of the match and the searcher can make his/her own judgement to the meaning of the text.

    ---
    Conrad Barski
  • by Media_Scumbag (217725) on Saturday December 04, 2004 @03:10AM (#10995234)
    Excellent point!

    In the legal field, finding context in a search is typically as (or more) important as finding a single word... Products like Summation (Summation.com) and Adobe's industrial strength Acrobat Capture (? - may have a new name... Server-based - uses "hot folders" that are monitored, batches, etc.) have OCR capabilities that are pretty flexible, reading from text, pdf, MS Word, JPEG, BMP, GIF, or TIFF... Of course, these can be expensive...

    But, being able to get quickly to a target word is very useful indeed when the verbatim answer requires human eyes to confirm or contextualize, or ,if you just next a good start point...

    I used Acrobat Capture and a digital camera (I was not permitted to flatbed or sheetfeed scan - the items were deemed too "fragile") to make archive materals text-searchable for a law firm's special project, to very good result.

    Granted, these materals were written in decent fonts, not handwritten, but with many graphic illustrations interspersed in them, which can trip up some OCR solutions. Capture could have read the documents' Japanese too, if I'd bought the correct Adobe plugin, and the project had required it.

    Massaged correctly, OCR's come a long way, baby...

Lo! Men have become the tool of their tools. -- Henry David Thoreau

Working...