Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
Software Science

Search Engines for Handwritten Documents 172

Posted by michael
from the lost-art dept.
An anonymous reader writes "Researchers at the University of Massachusetts have created a tool for automatically searching handwritten historical documents, such as the 140,000 pages that make up George Washington's personal papers in the Library of Congress. The most interesting part is that the papers are scanned versions of the originals and the search tool actually recognizes the handwritten text from these images."
This discussion has been archived. No new comments can be posted.

Search Engines for Handwritten Documents

Comments Filter:
  • by Anonymous Coward on Friday December 03, 2004 @05:26PM (#10992151)
    In America, handwriting is only for old people.
    • Handwriting sucks (Score:4, Interesting)

      by October_30th (531777) on Friday December 03, 2004 @05:36PM (#10992259) Homepage Journal
      You were modded as funny, but I fully agree with you.

      I hate reading/producing anything longer than a post-it note that's in handwriting.

      • by metlin (258108) * on Friday December 03, 2004 @06:33PM (#10992808) Journal
        You're apparently not into the pure sciences like math or physics.

        I'd hate to be able to type in my equations, there's a feel to working things out on paper and pen. Besides, the tactile sensation of writing on paper is simply wonderful. No amount of typing can replace that.

        Nothing beats a good old fountain pen and writing on good paper =)
        • I'd hate to be able to type in my equations

          You can; it's called TeX. I hate trying to decipher my handwritten equations, worse yet, someone else's. Capital S versus s, u versus v, x versus y, 2 versus Z, 5 versus S, l versus 1, it's all a mess.

          the tactile sensation of writing on paper is simply wonderful. No amount of typing can replace that.

          The tactile sensation of pushing reeds into clay is simply wonderful. No amount of writing can replace cuniform.

          Times change, and the ineffable qualities get ign
          • I hate trying to decipher my handwritten equations, worse yet, someone else's.

            Most physicists or mathematicians I know have pretty standard and decent handwriting, atleast when it comes to writing their equations. It's more a question of practice.

            Capital S versus s, u versus v, x versus y, 2 versus Z, 5 versus S, l versus 1, it's all a mess.

            Maybe if people had taken their handwriting classes in 2nd and 3rd grades seriously, they would not be making mistakes of trying to confuse writing 5 and S.

            Time
            • The alternative is not half as good, and is not half as capable.

              Then argue that, not the "tactile sensation of writing on paper". No technology feels like the last, and almost every technology has people who appreciate its particular sensations. That doesn't stop them for getting replaced; the only thing that does that is real arguments.

              Maybe if people had taken their handwriting classes in 2nd and 3rd grades seriously, they would not be making mistakes of trying to confuse writing 5 and S.

              Always blam
              • I wasn't arguing for handwriting, I was putting across my opinion. Didn't realize one had to justify one's opinions. I was giving my reasons for preferring pen and paper, and the tactile sensation is one of the most important factors. I like the feel of writing, and it aids my problem solving capabilities.

                And I wasn't blaming the user, but the user has as much responsibility as the language. The alternative is to change the language, which is fairly hard. Besides, there are obvious advantages to writing th
                • The alternative is to change the language, which is fairly hard.

                  You have to change the orthography, which several Turkish languages have done three times in the last hundred years.

                  Even if you have a Tablet PC, you're still doing the same thing.

                  If the Tablet PC converts what you write to character data (as opposed to images), then there is crucial differences. You can output in an easy to read form that's easy to check for errors and easy for other people to decipher. Your input method is less importan
                  • You have to change the orthography, which several Turkish languages have done three times in the last hundred years.

                    True, but English has been adopted far more widely and has a lot more speakers across the world than Turkish. It would be next to impossible to undertake a mammoth task such as that.

                    If the Tablet PC converts what you write to character data (as opposed to images), then there is crucial differences. You can output in an easy to read form that's easy to check for errors and easy for other pe

                    • That's one of the worst analogies I've ever heard.

                      Comparing a language to an operating system is quite ridiculous. You write, read and communicate in English practically every waking minute of your life, starting since childhood. People *think* in English.

                      An operating system is hardly as ubiquitous.

                      Language skills are learnt and neural pathways formed when you are quite young, it would take a lot to change that in people.
                    • The last time I checked, people didn't think in Windows API. And the last time I checked, people didn't write their grocery lists in Visual C++. Nor did kids play around in blocks Linux API when they were 3 years old - they were playing around with blocks of alphabets.

                      In fact, the last time I checked, people had no clue about either of those until they were well versed in a spoken and written language called English.

                      No matter what foreign languages you learn, you seldom change your basic language skills o
      • Some of us who have been typing/keyboarding since the time I we were wee lads, can't even remember how to write in cursive.

        I think, maybe 3rd or 4th grade is the last time you have to use cursive. I do, however highly recommend giving your kids touch-typing classes, so that they too, can keyboard with fluidity (and rapidly lose their writing skills too).

        For me, it is a speed issue - I can type MUCH faster than writing, when I have a lot to do, typing on a computer is the way to go (plus, I can't live wi

    • by gcaseye6677 (694805) on Friday December 03, 2004 @05:37PM (#10992266)
      Cursive writing certainly is. I can barely even read it anymore, much less write it. Does anybody else who is under 30 still write in cursive, other than when they made you do it in elementary school?
      • by Sheepdot (211478) on Friday December 03, 2004 @05:47PM (#10992374) Journal
        I write out my checks in cursive. The other day I was admiring how pretty my cursive looked and how well it had developed from when I was in second grade and told to "TRY HARDER WEAKLING OR YOU WILL NEVER GET A JOB!". Then I realized just how ghey it was that I was enjoying the sight of it and hurridly gave it to the cashier... who was a guy... who (ick) winked at me.
      • by realdpk (116490) on Friday December 03, 2004 @05:48PM (#10992387) Homepage Journal
        I wish they'd never taught cursive. Cursive destroyed my handwriting. At least, that's my current theory on why my handwriting sucks. :)
        • I do not know, cursive made my handwriting better.

          In fact, I've two sets of handwriting - all my equations and math stuff is written straight up, and the rest of the stuff goes cursive. Makes it a lot easier for me (and those reading it) to decipher what I've written.

          Cursive also made me write a whole lot faster - the flow that you get from cursive is something that makes one enjoy writing.
      • I'm 23 and I write in perfect cursive. In fact, I prefer it to typing. Maybe I like it because I suffered a serious injury to my hand when I was 12 that necessitated my learning to use it again from scratch.. I dunno. I just like to write, it relaxes me.
      • Does anybody else who is under 30 still write in cursive, other than when they made you do it in elementary school?

        When I was in sixth grade [k12.ia.us], my teachers all got together and decided to ban me from writing cursive (D'Nealian [geocities.com], to be exact). I've never looked back.

        (Of course, I just turned 30.)

      • I still write cursive occasionally, mainly in personal notes. If I'm writing something that I need somebody else to be able to read, I definatly print instead of using cursive.
      • by jgardn (539054) <jgardn@alumni.washington.edu> on Friday December 03, 2004 @06:04PM (#10992537) Homepage Journal
        Yes, and I use it to record notes in my lab book I use at work. I record all sorts of things I discover there. Some entries are several pages long with charts and graphs and tables and diagrams. Try doing that in a few minutes in Word or OpenOffice.

        The best part is I don't have to worry about backing up my lab books. The only real threat is fire, and it is no more dangerous than it is to CDs or hard drives.

        While the cursive handwriting of the 1700's and early 1800's may seem curious to us (notably, the tall 's' that looks like an 'f'), it is a very easy style that is neat, legible, and painless. Notice how there are very few back strokes.

        For those who are wondering, cursive is what you use when you get sick of trying to write in print legibly and quickly without getting carpal tunnel. Every culture has it. It's unfortunate it isn't common knowledge anymore in the US. Handwriting is a wonderful skill. It used to be people would judge others based on their handwriting skills in addition to their oratory.
        • The best part is I don't have to worry about backing up my lab books. The only real threat is fire, and it is no more dangerous than it is to CDs or hard drives.

          There's also water. If I spill a Coke on my keyboard, all my data's safe; if I spill one on my notebook, it's all gone.

          It used to be people would judge others based on their handwriting skills in addition to their oratory.

          I'm quite happy those days are gone, and people will grade my work on its content rather than handwriting or typing skills
          • There's also water. If I spill a Coke on my keyboard, all my data's safe; if I spill one on my notebook, it's all gone.

            Of course, you use a ballpoint pen for lab notebooks, not fountain pens or other pens based on water-soluble inks. Of course, this won't help you if you spill vodka. :-)

            Anyway, in lab situations you might not have a place nearby to put a laptop and you might be running between different laboratories so a laptop is often not very convenient. I was taught that you should write observation

        • When I was a lad I served a term
          As office boy to an attorney's firm
          I cleaned the windows and I swept the floor
          And I polished up the handle of the big front door
          I polished up that handle so carefully
          That now I am the Ruler of the Queen's Navy


          As office boy I made such a mark
          That they gave me the post of a junior clerk
          I served the writs with a smile so bland
          And I copied all the letters in a big round hand
          I copied all the letters in a hand so free
          That now I am the Ruler of the Queen's Navy

          Fi


        • The only real threat is fire, and it is no more dangerous than it is to CDs or hard drives.

          Go back and look at some old notebooks - if they used acid-based paper, then they'll be getting rather fragile.

      • I write letters and such in cursive, but it's too slow for course notes... read: becomes a tangled mass of lines and ink blots.

        I do write my lab journals in cursive, and three colours of pen... according to one of my classmates I'm not human. :)
        • I'm learning shorthand just to get notes down easily, it's well worth it if you plan on doing a lot of note taking.

          Yes, I do write in cursive (admittedly on my palmtop, so it then just transcribes it).
      • I had a friend in high school who always wrote in cursive, and this was...a year ago, so I'm pretty sure he's still under 30. I think that he was the only one in the whole school who still did, though.
      • I do. And I do it well, and I'm proud of it. Of course, I'm in the dying breed that considers the ability to write legibly by hand a part of fluency in one's language. Maybe I should just give in and go back to third grade where I belong.
      • Am I the only one who read this and actually thought "damn...I can write in cursive....I think...I should give that a try." And then shivered at the though of the nuns who toughit it to me, ruler-in-hand, readh to smack my knuckles with it if I screwed up.

        Anyway, I watch Full Metal Jacket and it reminds me of Catholic school. To continue my rabmle, how many people who actually went to catholic school aren't curretly aethiest? I'm guessing not too many.

        Here...let me help you out, mods:
        (-1 Offtopic)
      • Does anybody else who is under 30 still write in cursive, other than when they made you do it in elementary school?

        I'm 23 and I use both print and cursive. I use print for anything that someone else will have to read (very rare) or for things people make me write that I don't really care about (taking notes in class). Cursive is used for things I want to write. For example, all the first drafts of my London Journal [colingregorypalmer.net] are done in cursive in a notebook I always keep on me.


        -Colin
      • "Does anybody else who is under 30 still write in cursive ?"

        Most of europeans ? In fact, that's the only thing I learned at school.

        That's not to be yet another US-flaming stuff, but I was wondering into which countries people primarly write in cursive and which in "print" ...

      • Does anybody else who is under 30 still write in cursive

        Just my name, and only in the snow.

      • Interesting you should ask, as I was recently discussing this with some friends. (Probably all over 30, though in my case only just.)

        They all still use joined-up (cursive) writing, as do most other people I know. I, on the other hand, haven't used it since I was at uni and found I had trouble reading my writing: I investigated various writing styles and types, and concluded that I could print (i.e write mostly not joined-up) pretty much as fast as I could write joined-up, and that the result was vastly

    • ... and second-graders.
    • > In America, handwriting is only for old people.

      And college students during exam season. (Can't speak for the Koreans.)

      Blue-stained hands-up, all those who remember those glorious essay exams from the mandatory humanities courses, where your grade ceases to be based on the merits of your ideas (and/or your ability to parrot your professor's ideas), but is solely a function of how well-developed the muscles in your right hand are, in order to keep scribbling for the entire three hours what would

  • Umm (Score:5, Insightful)

    by swtaarrs (640506) <swtaarrs@comcas t . n et> on Friday December 03, 2004 @05:27PM (#10992155)
    The most interesting part is that the papers are scanned versions of the originals and the search tool actually recognizes the handwritten text from these images.

    How else would it search handwritten documents? Am I missing something here?
    • Re:Umm (Score:2, Funny)

      Yeah, it would have been much more "interesting" if the papers were, I don't know, read psychically by the computer or something.
    • Re:Umm (Score:2, Funny)

      by ZagNuts (789429)
      How else would it search handwritten documents? Am I missing something here?

      You write down exactly what you want to find in exactly the same handwriting that the document is written in and then it blocks scans it for what you wrote... duh.
    • It might search for certain kinds of penstrokes or something like that. You could input a vector map and it would find similar vectors. Or even bitmaps I guess.
    • Re:Umm (Score:3, Informative)

      by mishmash (585101)
      Popular handwriting recognition software doesn't work like that - it gains much of it's information from the "pen" strokes used to create the letters. There's less information in a "finished" printed page than you'd get by tracking the movements a pen made to write it. For an example of this different approach see this paper describing handwriting recognition using pen mounted accelerometers [grenoble-soc.com].
  • Doc (Score:3, Funny)

    by savagedome (742194) on Friday December 03, 2004 @05:27PM (#10992161)
    Huh? Well, lets see how well it keeps up with my doctor's handwriting...
  • by raehl (609729) <raehl311&yahoo,com> on Friday December 03, 2004 @05:29PM (#10992179) Homepage
    Somebody invented a way for computers to recognize handwriting.

    Like, so 10 years ago.
    • No....
      10 years ago someone invented a (hand) writing style that computers could recognize ala grafitti on the Palm.
      -nB
    • by AHumbleOpinion (546848) on Friday December 03, 2004 @06:10PM (#10992597) Homepage
      Somebody invented a way for computers to recognize handwriting. Like, so 10 years ago.

      I worked on an OCR system about 20 years ago. No pre-defined bitmaps of text, you trained the system on the font to be recognized. After a few hours you could turn it loose and it did fairly well. While goofing off we tried handwritten text. With good penmanship it worked to a degree.
  • Hard to read! (Score:3, Interesting)

    by DeionXxX (261398) on Friday December 03, 2004 @05:31PM (#10992202)
    Wow, looking at some of those examples, I was amazed by the fact that I couldn't READ most of the words. It looks completely foreing to me, might as well be trying to read Japanese.
    • by kfg (145172)
      It looks completely foreing to me. . .

      That's because it's written in a dead language.

      English.

      KFG
    • Either could the software, apparently. At least the 'demo' on that site.

      I typed in "Cumberland" and received more false positives than correct lines.

      I agree it was hard to read on my eyes. Amazing what a person can get used to, or not get used to over time...

  • Accuracy? (Score:2, Interesting)

    by b0lt (729408)
    How good is the accuracy? The OCR technology of today might not be able to recognize the "flowery" text of most historical documents (look at "We the People" in the Declaration of Independence)
    • I think consistency matters more than individual letter formation.

      I could write entirely in scribbled hieroglyphs, but if it has a pattern, and the same squiggle means the same thing, then a computer could decipher it.

    • I agree, my grandmother was heavy into genealogy. She had hundreds of pages of neatly hand written, non-cursive documents. I tried to scan them with many different OCR programs, but none even came close to deciphering the text without skewing it badly. I tried ABBYY, Omnipage Pro 14, and a few others. Anyone have any successes with this kind of thing?
  • A waste? (Score:5, Insightful)

    by Anonymous Coward on Friday December 03, 2004 @05:34PM (#10992230)
    These documents are old and handwritten. Why waste the processing power decyphering results for each search when you can decypher the text once with a similar algorithm and search an index built that way? It's not like the information is ever going to change. (unless we do rewrite history)
    • Re:A waste? (Score:3, Interesting)

      by 42forty-two42 (532340)
      Um, that's almost certainly what they did. Running an OCR over 14,000 pages every time you do a search is nearly impossible. I only say nearly because, in theory, you can do it, but then searches days a few days to complete for zero net gain.
    • These documents are old and handwritten. Why waste the processing power decyphering results for each search when you can decypher the text once with a similar algorithm and search an index built that way? It's not like the information is ever going to change. (unless we do rewrite history) Context, context, context! If there's one thing I've learned in all of my schooling (and there is a lot), it is that how the information is portrayed is just as important as the information itself. Think about hearing vs
      • I don't get it.. he's advocating building an index. That would point to the image of the original document. Which is what they already did.
  • This is nothing new (Score:3, Informative)

    by 42forty-two42 (532340) <bdonlan@nosPAm.gmail.com> on Friday December 03, 2004 @05:34PM (#10992234) Homepage Journal
    Google already did it! [google.com] Well, it's not handwritten, but that's just a logical progression.
  • such as the 140,000 [handwritten] pages that make up George Washington's personal papers in the Library of Congress.

    In related news, the family of Tobias Lear, George Washington's personal secretary [64.233.167.104], who took his own life [64.233.167.104] (arguably due to the horrible pain in his wrists), has filed suit.
  • !WOW (Score:1, Interesting)

    by Anonymous Coward
    ... eh eh !gniddik tsuJ. !skoobeton inciV ad eht no esool ti teL.
  • by Thunderstruck (210399) on Friday December 03, 2004 @05:39PM (#10992289)
    I took a lot of notes in College. I took a lot more notes in graduate school. I've even taken notes on books I've read for the fun of it. If I could run all of these through my scanner & search them from an application on my desktop, I could be really obnoxious in an argument.

    • If I could run all of [my notes] through my scanner & search them from an application on my desktop, I could be really obnoxious in an argument.

      This is slashdot. You would definitely be obnoxious if you argued a point with actual facts behind you...

    • If I could run all of these through my scanner & search them from an application on my desktop, I could be really obnoxious in an argument.


      Hell, *I* don't need all that much processing power to be obnoxious in an argument. Oh, wait...
  • by aristus (779174) on Friday December 03, 2004 @05:45PM (#10992344)
    You have to be able to handle a quill pen to use it.
  • by InternationalCow (681980) <mauricevansteensel.mac@com> on Friday December 03, 2004 @05:45PM (#10992347) Journal
    It's an interesting approach that should be extended to other languages than English. Most of the world's history is not about the US and it has certainly not been written down in English. What I would really like to have is a similar tool that can search, say, Greek, or Latin, (or whatever) handwritten text. Imagine being able to query Ovid for an item of interest without having to consult everything he's written. I can imagine that this might encourage people to study the classics (a pet peeve of mine is that many people lack historical sense...) and it would certainly facilitate research in this area.
    If you can put the queries in English, with the search engine taking care of translation, it would be even better. Then, extended historical study comes within everyone's reach and the classical studies (or humaniora) might be transformed.
  • Good Work! (Score:5, Funny)

    by CaptainCarrot (84625) on Friday December 03, 2004 @05:45PM (#10992354)
    How pleafant that they've done what waf neceffary to make this happen. How did they train the foftware to recognize the quirky 18th Century handwriting?
  • Their handwriting recognition system doesn't work for shit. It couldn't even correctly retrieve results from words that I know are in its scanned letters. The word "governor" appears as a result from one of their suggested queries (*cough* hard coded results *cough*), but if you do a separate search for governor it returns stuff that doesn't even contain the word.
  • by Anonymous Cowdog (154277) on Friday December 03, 2004 @05:55PM (#10992462) Journal
    It's "Pixelative Text Cognizance."

    It's different. With OCR these rays of light scan the original, translate each scanpoint to discrete RGB values, and do pattern recognition.

    With this system, they just read the discrete RGB values directly from pixels of documents scanned in with rays of light, then they do recognition of patterns. See, it's totally different.
    • by GigsVT (208848)
      I'm not sure what's more funny, your post, or that it was modded "informative". :)
    • Re:It's not OCR (Score:3, Insightful)

      by imsabbel (611519)
      Er.... Do i seriously miss something here or was only some mod fooled by a troll?

      Lets examine your definitions:
      Ocr: document->RGB(via light)->pixels->patern recognition
      PTC: Document->Pixels(via light)->RGB->patern recognition.
      Of course you forget that there are no rgb values here, because its black/white, so there is only a brightness value per pixel left. So what is the difference?

      Sounds really AWFULLY different...

      Maybe its just your description that is lacking...

  • Convert the search text into an image to look as written by hand.

    Then do an image search on the documents. You will need a powerful image recognition software.

    This would be news.

    *** Find that COM error at http://www.comerrors.com [comerrors.com] **
  • by Torgo's Pizza (547926) on Friday December 03, 2004 @06:04PM (#10992541) Homepage Journal
    If only Nicholas Cage had this tool at his disposal, it would have made things much, much easier.
  • Holy shnikes! Optical Character Recognition! Bah.. I'm part of a research team at the Center for Cybermedia Research who are working on new algorithms for OCR with $4 million from Homeland Security. Its to be used on a gi-normous database containing scanned images of documents relating to Yucca Mountain.

    On top of that, OCR has been around for years. Yes, it isn't the best, but its functional. Doesn't census bureau use OCR for its census forms?

    So, yeah.. where is the news in the article?

  • For sure it will cost 5 times and more complicated algorihtm if it were use to search Doctor's handwriting.
  • This is really, really, really, really stupid, it would be faster just to hand type the documents into the database, then search it, you could link to pictures of the documents if you really needed it
    • Faster to handtype?! Do you have any idea whatsoever the number of handwriten documents the Library of Congress contains? I can' giveyou a number, but rest assured that it is more than any army of typists would care to copy. But computers don't care about work volume and they are a hell of a lot quicker than any human typist could ever hope to be.
  • I've been using this feature in OneNote for a long time now. It searches through my handwriting with amazing accuracy
  • spoofing (Score:3, Funny)

    by sewagemaster (466124) <sewagemasterNO@SPAMgmail.com> on Friday December 03, 2004 @07:13PM (#10993145) Homepage
    great. now people are just going to spoof documents and put pr0n or enlargement spams in the pdfs when i search for anything academic related. i'm glad i dont have that problem yet finding pdf papers via google yet.
  • Although it is hard to OCR text and very hard to OCR cursive text written in historical documents, performing searches on those documents does not require a complete comprehension of the textand is therefore much easier to do.

    For instance, the software may be unable to distinguish the word bug from dog in one person's handwriting, but can still mark it with probabilities of the word's possible meanings.

    If a person later searches for the word bug or dog at a future date along with other terms, a mathematical calculation can be done for the likelyhood of the match and the searcher can make his/her own judgement to the meaning of the text.

    ---
    Conrad Barski
    • Excellent point!

      In the legal field, finding context in a search is typically as (or more) important as finding a single word... Products like Summation (Summation.com) and Adobe's industrial strength Acrobat Capture (? - may have a new name... Server-based - uses "hot folders" that are monitored, batches, etc.) have OCR capabilities that are pretty flexible, reading from text, pdf, MS Word, JPEG, BMP, GIF, or TIFF... Of course, these can be expensive...

      But, being able to get quickly to a target word is ve
  • "Do you think that OCR is actually the wrong way to think about this problem? After all, we don't really care about characters, but rather about what words and ideas have been written. Do you have a strong background in pattern recognition, machine learning, image processing and computer graphics? Google currently "reads" almost every web page in the world. Come help us read all the printed material as well!"

    Requires MS/PhD in CS/EE. Position available only in Mountain View.
    http://www.google.com/jobs/eng/ [google.com]

Are you having fun yet?

Working...