Search Engines for Handwritten Documents 172
An anonymous reader writes "Researchers at the University of Massachusetts have created a tool for automatically searching handwritten historical documents, such as the 140,000 pages that make up George Washington's personal papers in the Library of Congress. The most interesting part is that the papers are scanned versions of the originals and the search tool actually recognizes the handwritten text from these images."
Who still reads those? (Score:5, Funny)
Handwriting sucks (Score:4, Interesting)
I hate reading/producing anything longer than a post-it note that's in handwriting.
Re:Handwriting sucks (Score:5, Insightful)
I'd hate to be able to type in my equations, there's a feel to working things out on paper and pen. Besides, the tactile sensation of writing on paper is simply wonderful. No amount of typing can replace that.
Nothing beats a good old fountain pen and writing on good paper =)
Re:Handwriting sucks (Score:2)
You can; it's called TeX. I hate trying to decipher my handwritten equations, worse yet, someone else's. Capital S versus s, u versus v, x versus y, 2 versus Z, 5 versus S, l versus 1, it's all a mess.
the tactile sensation of writing on paper is simply wonderful. No amount of typing can replace that.
The tactile sensation of pushing reeds into clay is simply wonderful. No amount of writing can replace cuniform.
Times change, and the ineffable qualities get ign
Re:Handwriting sucks (Score:2)
Most physicists or mathematicians I know have pretty standard and decent handwriting, atleast when it comes to writing their equations. It's more a question of practice.
Capital S versus s, u versus v, x versus y, 2 versus Z, 5 versus S, l versus 1, it's all a mess.
Maybe if people had taken their handwriting classes in 2nd and 3rd grades seriously, they would not be making mistakes of trying to confuse writing 5 and S.
Time
Re:Handwriting sucks (Score:2)
Then argue that, not the "tactile sensation of writing on paper". No technology feels like the last, and almost every technology has people who appreciate its particular sensations. That doesn't stop them for getting replaced; the only thing that does that is real arguments.
Maybe if people had taken their handwriting classes in 2nd and 3rd grades seriously, they would not be making mistakes of trying to confuse writing 5 and S.
Always blam
Re:Handwriting sucks (Score:2)
And I wasn't blaming the user, but the user has as much responsibility as the language. The alternative is to change the language, which is fairly hard. Besides, there are obvious advantages to writing th
Re:Handwriting sucks (Score:2)
You have to change the orthography, which several Turkish languages have done three times in the last hundred years.
Even if you have a Tablet PC, you're still doing the same thing.
If the Tablet PC converts what you write to character data (as opposed to images), then there is crucial differences. You can output in an easy to read form that's easy to check for errors and easy for other people to decipher. Your input method is less importan
Re:Handwriting sucks (Score:2)
True, but English has been adopted far more widely and has a lot more speakers across the world than Turkish. It would be next to impossible to undertake a mammoth task such as that.
If the Tablet PC converts what you write to character data (as opposed to images), then there is crucial differences. You can output in an easy to read form that's easy to check for errors and easy for other pe
Re:Handwriting sucks (Score:2)
That's one of the worst analogies I've ever heard.
Comparing a language to an operating system is quite ridiculous. You write, read and communicate in English practically every waking minute of your life, starting since childhood. People *think* in English.
An operating system is hardly as ubiquitous.
Language skills are learnt and neural pathways formed when you are quite young, it would take a lot to change that in people.
Re:An OS is a language (Score:2)
In fact, the last time I checked, people had no clue about either of those until they were well versed in a spoken and written language called English.
No matter what foreign languages you learn, you seldom change your basic language skills o
Re:Handwriting sucks (Score:2)
Good for you.
Re:Handwriting sucks (Score:3, Insightful)
I can't even read my own printing sometimes (Score:2)
I think, maybe 3rd or 4th grade is the last time you have to use cursive. I do, however highly recommend giving your kids touch-typing classes, so that they too, can keyboard with fluidity (and rapidly lose their writing skills too).
For me, it is a speed issue - I can type MUCH faster than writing, when I have a lot to do, typing on a computer is the way to go (plus, I can't live wi
Re:Who still reads those? (Score:5, Interesting)
Re:Who still reads those? (Score:5, Funny)
Re:Who still reads those? (Score:4, Insightful)
Re:Who still reads those? (Score:2)
In fact, I've two sets of handwriting - all my equations and math stuff is written straight up, and the rest of the stuff goes cursive. Makes it a lot easier for me (and those reading it) to decipher what I've written.
Cursive also made me write a whole lot faster - the flow that you get from cursive is something that makes one enjoy writing.
Re:Who still reads those? (Score:2, Interesting)
Re:Who still reads those? (Score:1)
When I was in sixth grade [k12.ia.us], my teachers all got together and decided to ban me from writing cursive (D'Nealian [geocities.com], to be exact). I've never looked back.
(Of course, I just turned 30.)
Re:Who still reads those? (Score:2)
Re:Who still reads those? (Score:5, Interesting)
The best part is I don't have to worry about backing up my lab books. The only real threat is fire, and it is no more dangerous than it is to CDs or hard drives.
While the cursive handwriting of the 1700's and early 1800's may seem curious to us (notably, the tall 's' that looks like an 'f'), it is a very easy style that is neat, legible, and painless. Notice how there are very few back strokes.
For those who are wondering, cursive is what you use when you get sick of trying to write in print legibly and quickly without getting carpal tunnel. Every culture has it. It's unfortunate it isn't common knowledge anymore in the US. Handwriting is a wonderful skill. It used to be people would judge others based on their handwriting skills in addition to their oratory.
Re:Who still reads those? (Score:2)
There's also water. If I spill a Coke on my keyboard, all my data's safe; if I spill one on my notebook, it's all gone.
It used to be people would judge others based on their handwriting skills in addition to their oratory.
I'm quite happy those days are gone, and people will grade my work on its content rather than handwriting or typing skills
Writing good for lab books? (Score:2)
Of course, you use a ballpoint pen for lab notebooks, not fountain pens or other pens based on water-soluble inks. Of course, this won't help you if you spill vodka. :-)
Anyway, in lab situations you might not have a place nearby to put a laptop and you might be running between different laboratories so a laptop is often not very convenient. I was taught that you should write observation
Re:Who still reads those? (Score:2)
Fi
Re:Who still reads those? (Score:2)
Re:Who still reads those? (Score:2)
Fire, and Acid-Based Paper (Score:3, Insightful)
The only real threat is fire, and it is no more dangerous than it is to CDs or hard drives.
Go back and look at some old notebooks - if they used acid-based paper, then they'll be getting rather fragile.
Re:Who still reads those? (Score:1)
I do write my lab journals in cursive, and three colours of pen... according to one of my classmates I'm not human.
Re:Who still reads those? (Score:1)
Yes, I do write in cursive (admittedly on my palmtop, so it then just transcribes it).
Re:Who still reads those? (Score:2)
Re:Who still reads those? (Score:1)
Re:Who still reads those? (Score:3, Interesting)
Anyway, I watch Full Metal Jacket and it reminds me of Catholic school. To continue my rabmle, how many people who actually went to catholic school aren't curretly aethiest? I'm guessing not too many.
Here...let me help you out, mods:
(-1 Offtopic)
Re:Who still reads those? (Score:2)
I'm 23 and I use both print and cursive. I use print for anything that someone else will have to read (very rare) or for things people make me write that I don't really care about (taking notes in class). Cursive is used for things I want to write. For example, all the first drafts of my London Journal [colingregorypalmer.net] are done in cursive in a notebook I always keep on me.
-Colin
Re:Who still reads those? (Score:2)
Most of europeans ? In fact, that's the only thing I learned at school.
That's not to be yet another US-flaming stuff, but I was wondering into which countries people primarly write in cursive and which in "print" ...
Re:Who still reads those? (Score:2)
Just my name, and only in the snow.
Re: Who still reads those? (Score:2)
They all still use joined-up (cursive) writing, as do most other people I know. I, on the other hand, haven't used it since I was at uni and found I had trouble reading my writing: I investigated various writing styles and types, and concluded that I could print (i.e write mostly not joined-up) pretty much as fast as I could write joined-up, and that the result was vastly
Re:Who still reads those? (Score:2)
Re:Who still *writes* those? Well, after college? (Score:1, Flamebait)
And college students during exam season. (Can't speak for the Koreans.)
Blue-stained hands-up, all those who remember those glorious essay exams from the mandatory humanities courses, where your grade ceases to be based on the merits of your ideas (and/or your ability to parrot your professor's ideas), but is solely a function of how well-developed the muscles in your right hand are, in order to keep scribbling for the entire three hours what would
Umm (Score:5, Insightful)
How else would it search handwritten documents? Am I missing something here?
Re:Umm (Score:2, Funny)
Re:Umm (Score:2)
Re:Umm (Score:2, Funny)
You write down exactly what you want to find in exactly the same handwriting that the document is written in and then it blocks scans it for what you wrote... duh.
Re:Umm (Score:2)
Re:Umm (Score:3, Informative)
Doc (Score:3, Funny)
Re:Doc (Score:2)
Yeah, they're going about this all wrong. They should be selling this stuff to pharmacies and hospitals -- that's where the technology will be useful!
EricVioxx recall parody [ericgiguere.com]
This is so cool! (Score:3, Funny)
Like, so 10 years ago.
Re:This is so cool! (Score:1)
10 years ago someone invented a (hand) writing style that computers could recognize ala grafitti on the Palm.
-nB
More like twenty years ago ;-) (Score:4, Interesting)
I worked on an OCR system about 20 years ago. No pre-defined bitmaps of text, you trained the system on the font to be recognized. After a few hours you could turn it loose and it did fairly well. While goofing off we tried handwritten text. With good penmanship it worked to a degree.
They are doing OCR (Score:2)
Yes, they are. They are not using an off-the-shelf OCR package. The OCR functionality is embedded into their software, it is highly specialized, but it is OCR. For those who are fixated on the letter 'C', recognizing multiple characters as a single unit is nothing new.
Hard to read! (Score:3, Interesting)
Re:Hard to read! (Score:3, Funny)
That's because it's written in a dead language.
English.
KFG
Re:Hard to read! (Score:2)
I typed in "Cumberland" and received more false positives than correct lines.
I agree it was hard to read on my eyes. Amazing what a person can get used to, or not get used to over time...
Accuracy? (Score:2, Interesting)
Re:Accuracy? (Score:1)
I could write entirely in scribbled hieroglyphs, but if it has a pattern, and the same squiggle means the same thing, then a computer could decipher it.
Re:Accuracy? (Score:2)
A waste? (Score:5, Insightful)
Re:A waste? (Score:3, Interesting)
Re:A waste? (Score:1)
Re:A waste? (Score:1)
This is nothing new (Score:3, Informative)
Re:This is nothing new (Score:3, Funny)
better Jay Williams link (Score:2)
Re:This is nothing new (Score:1)
In related news... (Score:2)
In related news, the family of Tobias Lear, George Washington's personal secretary [64.233.167.104], who took his own life [64.233.167.104] (arguably due to the horrible pain in his wrists), has filed suit.
!WOW (Score:1, Interesting)
Useful for more than just historians (Score:5, Interesting)
Re:Useful for more than just historians (Score:2)
This is slashdot. You would definitely be obnoxious if you argued a point with actual facts behind you...
Re:Useful for more than just historians (Score:1)
Hell, *I* don't need all that much processing power to be obnoxious in an argument. Oh, wait...
Yes, but what they don't tell you... (Score:3, Funny)
Interesting, but limited (Score:3, Interesting)
If you can put the queries in English, with the search engine taking care of translation, it would be even better. Then, extended historical study comes within everyone's reach and the classical studies (or humaniora) might be transformed.
Good Work! (Score:5, Funny)
Re:Good Work! (Score:5, Funny)
Personally, I think it fucks.
Re:Good Work! (Score:2)
they've done what waf neceffary
I know you're just making a joke on the penmanship style used with a quill, but traditionally, a double-S would be more like fs, as in necefsary. Only the first S of a pair is elongated.
Doesn't work (Score:1)
It's not OCR (Score:4, Funny)
It's different. With OCR these rays of light scan the original, translate each scanpoint to discrete RGB values, and do pattern recognition.
With this system, they just read the discrete RGB values directly from pixels of documents scanned in with rays of light, then they do recognition of patterns. See, it's totally different.
Re:It's not OCR (Score:2, Funny)
Re:It's not OCR (Score:1)
Re:It's not OCR (Score:3, Insightful)
Lets examine your definitions:
Ocr: document->RGB(via light)->pixels->patern recognition
PTC: Document->Pixels(via light)->RGB->patern recognition.
Of course you forget that there are no rgb values here, because its black/white, so there is only a brightness value per pixel left. So what is the difference?
Sounds really AWFULLY different...
Maybe its just your description that is lacking...
They should do an image search instead (Score:1)
Convert the search text into an image to look as written by hand.
Then do an image search on the documents. You will need a powerful image recognition software.
This would be news.
*** Find that COM error at http://www.comerrors.com [comerrors.com] **
National Treasure (Score:3, Funny)
Re:National Treasure (Score:1)
OCR (Score:2)
Holy shnikes! Optical Character Recognition! Bah.. I'm part of a research team at the Center for Cybermedia Research who are working on new algorithms for OCR with $4 million from Homeland Security. Its to be used on a gi-normous database containing scanned images of documents relating to Yucca Mountain.
On top of that, OCR has been around for years. Yes, it isn't the best, but its functional. Doesn't census bureau use OCR for its census forms?
So, yeah.. where is the news in the article?
Must be expensive search engine (Score:1)
Re:Must be expensive search engine (Score:1)
Stupid (Score:1)
Re:Stupid (Score:2)
OneNote does this already (Score:1)
spoofing (Score:3, Funny)
One important thing to understand about this... (Score:3, Interesting)
For instance, the software may be unable to distinguish the word bug from dog in one person's handwriting, but can still mark it with probabilities of the word's possible meanings.
If a person later searches for the word bug or dog at a future date along with other terms, a mathematical calculation can be done for the likelyhood of the match and the searcher can make his/her own judgement to the meaning of the text.
---
Conrad Barski
Re:One important thing to understand about this... (Score:2, Interesting)
In the legal field, finding context in a search is typically as (or more) important as finding a single word... Products like Summation (Summation.com) and Adobe's industrial strength Acrobat Capture (? - may have a new name... Server-based - uses "hot folders" that are monitored, batches, etc.) have OCR capabilities that are pretty flexible, reading from text, pdf, MS Word, JPEG, BMP, GIF, or TIFF... Of course, these can be expensive...
But, being able to get quickly to a target word is ve
OT: google wants OCR engr. in Mt.View, Calif. (Score:2)
Requires MS/PhD in CS/EE. Position available only in Mountain View.
http://www.google.com/jobs/eng/ [google.com]
Re:The search tool? (Score:2, Interesting)
Re:The search tool? (Score:2)
The search tool is doing the OCR then. OCR is simply taking an image and analyzing it to recognize text.
Re:The search tool? (Score:1)
Re:The search tool? (Score:2)
Re:The search tool? (Score:1)
I forgot, this is Slas
Re:The search tool? (Score:3, Funny)
Re:Standards at risk (Score:1)
Really? Whatever happened to the bit???????