Statistics For Data Entry: The Brave New Step 121
A reader writes:"First there was Dasher, a novel application of statistical theory that lets free texts be written using only a pointing device. Dasher works by predicting the continuations of the text being written, based on what has been written so far; there is a probability associated with each offered continuation and the presentation is designed to make it easier to choose more probable continuations. A big advantage of statistics-based interfaces is that they automatically enforce correctness, because correct strings are more probable than incorrect ones. Now the same approach has been extended to writing maths. Apropos is a Javascript application (it supports IE6 & Firefox) to create mathematical expressions. It represents the math using MathML, the official XML spec for mathematics. It is definitely clunky when compared to Dasher, but better than MS Equation Editor etc. It is interesting to consider if this approach can be extended to other XML vocabularies (for example, a model for HTML that suggests the markup as you go along - a properly trained one will make it harder to create pages with blinking text, loads of images etc.), or to formal languages other than XML (e.g. programming languages). Stochastic modeling can also be used as a basis for speech recognition, with the recognizer using the model to choose a continuation when the speech signal is ambiguous or indistinct."
Like t9 (Score:4, Interesting)
Re:Like t9 (Score:3, Insightful)
Re:Like t9 (Score:2, Offtopic)
Dasher is something I would really like to have on a PDA and even a cellphone. T9 is just a simple aid to write a couple of hundred charachters at most, but nothing that would help me writing longer texts.
PDA-makers, hear this: You need to put a lot more effort into text-entry interfaces. Have a serious look at Dash
Re:Like t9 (Score:1)
Re:Like t9 (Score:3, Interesting)
t9 is a great technology because the vocabulary used writing SMS is pretty narrow. After entering the first few characters of a word, the contextual information in the dictionary is good enough (most of the times) to suggest the wanted word very fast. t9 is even able to dereive this information without the user specifying the exact characters but rather just one of the 3-4 on any mobile
Re:Like t9 (Score:1)
Of course, that's still not a very long mail, but I don't see why it should be difficult to exp
Old technology (Score:4, Interesting)
Re:Old technology (Score:1)
Re:Old technology (Score:2)
Re: (Score:2)
Re:Old technology (Score:2)
Re:Old technology (Score:2)
Wrong. The Mobile phone interface is nothing like Dasher. It's not as fluid and as usable as Dasher. Dasher is really something that you should download and actually try before you comment on it.
And if you're not up to downloading it, at the very least you should look at its demos (available in either animated gifs or mpeg/avi/asf movies) [cam.ac.uk].
It sounds good .. in theory (Score:5, Funny)
OFF:The first thing that came to (Score:1)
"NAME
apropos - search the manual page names and descriptions"
Re:OFF:The first thing that came to (Score:1)
I Can Only Hope...... (Score:3, Interesting)
Then maybe I'd get in in the next version of fedora.
I'm so sick of *Tex.
*sigh*
Re:I Can Only Hope...... (Score:1, Informative)
Re:I Can Only Hope...... (Score:2)
Re:I Can Only Hope...... (Score:1)
First, it's TeX, not Tex. Secondly, TeX goes through email, and most people who care to read it unrendered very easily, so they don't need to install any dopy software just to read teo little formulas in my e-mail. Plus, TeX math notation is fast to type, and you only need to learn a page or so from the TeX manual in order to be able to use it for math. So, how is this Dasher thing better?
Re:I Can Only Hope...... (Score:3, Informative)
You're comparing apples and aardvarks here. Dasher is an input method that tries to predict what letter you'll input
Re:I Can Only Hope...... (Score:1)
I'm not hopeful (Score:2, Insightful)
Dasher works because there is a small number of words that are likely to follow on from where you are. The same does not apply to MathML or HTML. The most useful you are likely to get is tab-completion for tag names, attribute names, etc.
Re:I'm not hopeful (Score:2, Insightful)
It doesn't seem to me that there's anything like as much redundancy in mathematical formulae as there is in written language. When the professor writes "X=..." on the board, it's very hard to predict the next symbol unless you know what x is in fact equal to.
Re:I'm not hopeful (Score:3, Informative)
There are over 90,000 words in the English language (based on number of entries in the American Heritage Dictionary), but nobody uses all of them. Good predictive data entry is not just a matter of waiting until you've typed "tomor" and concluding that you're going to write "tomorrow" because no other words begin that way, it's a matter of noticing when you get to "tom" that, based on your past word usage, the most
Re:I'm not hopeful (Score:2)
The OED has "over half a million words", which supports my previous argument even better. Nobody comes close to using all of those words, so predictive text input can make intelligent guesses based on which words you do use and how often you use them.
A lot better than what I recall. (Score:2)
Re:A lot better than what I recall. (Score:1)
That is a horrible idea. Burn it! Hope it never happens. Sounds like something a college droppout geek who often gets pie'd in the face would do.
Riiiiiiiiight (Score:4, Funny)
Did gyre and gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe."
It knew he was going to say that.
More likely, it's going to predict that someone's going to say "Let's circle back and touch base tomorrow".
Re:Riiiiiiiiight (Score:2)
Re:Riiiiiiiiight (Score:2)
Well... (Score:2)
Why this isn't the same as T9. (Score:5, Informative)
mis-read (Score:1)
Quick test (Score:4, Insightful)
MathML is a good idea in theory, but until there are good tools for writing and editing MathML, there will be very few people using it (either for publishing or for archival purposes.)
Re:Quick test (Score:1, Funny)
Wow, this has got to be the first time in the history of the world that a math person has criticized something for being "good in theory" but not "in practice". It's math! There's nothing in it but theory!
Re:Quick test (Score:2)
Re:Quick test (Score:1, Insightful)
Re:Quick test (Score:1, Funny)
Re:Quick test (Score:2)
Re:Quick test (Score:2)
Failures of inattention (Score:4, Interesting)
I can't help but think of someone entering a mathematical equation and concentrating more on his idea than what is being written to the screen. Due to this inattention, the equation doesn't work, he figures he's just wrong, and spends hours/days to find the point at which the computer put in its prediction and not what he thought he entered. Worst case, he could abandon what would have been a great idea.
Or, imagine this applied to writing computer programs. Say for example, you are writing a program to calculate the correct distance the probe should hold above the atmosphere so it doesn't burn up. Your cube mate distracts you briefly, and...
um... (Score:1)
apparently they havnt taken my writing into account
Re:um... (Score:1)
Like auto-completion, only better? (Score:1)
GIGO would be proud (Score:2, Insightful)
Correctness, huh? (Score:2, Funny)
They obviously didn't include many PHBs' writings in their calculations...
I'm frequently amazed at some of the grammatical... umm... experimentations undertaken by the upper two or three levels of management in their memos -- and the speeling, good grief, the SPEELING!! Is [F7] the last great secret of our civilization?!?!
Re:Correctness, huh? (Score:1)
Never been.... (Score:2, Funny)
Though probably college educated the writer of the above sentence has probably NOT BEEN a TA in an English class. Truly correct strings are a rare find
Dasher and stats rock (Score:5, Funny)
I did a quick test run of Dasher instead of RTFA, and as far as I understand, it works by presenting the most statistically-probable letter in the middle of the input area.
So, by dragging a perfectly horizontal line with my mouse cursor, I was able to create the most statistically-probable sentence.
Here goes, for Science:
Conspiracy theorists, area51 nuts and cypherpunks are going to be thrilled!
what a productivity increase (Score:1)
Or imagine the possibilities for bookwriters. You write half an the rest is predicted based on your previous works. Seems as if some authors already use such a technique
Re:what a productivity increase (Score:2)
Re:what a productivity increase (Score:2)
Ahem (Score:4, Insightful)
A big advantage of statistics-based interfaces is that they automatically enforce correctness, because correct strings are more probable than incorrect ones.
In a rigorous, technical environment, being _usually_ correct is not enough and a statistics-based approach to ensuring correctness is not very useful.
In an informal environment, correctness is not nearly as common as you might hope, so again a statistics-based approach may well not be as good as actually enforcing definite correctness.
exactly (Score:2)
Dasher for Zaurus? (Score:2)
Bye egghat.
Re:Dasher for Zaurus? (Score:1)
Note: I am only guessing for everyhting beyond the CPU usage on this machine, but then isn't that the way of /.
Funny thing is... (Score:2)
And, there are no predictable new ideas. Who could've guessed that Einstein would follow the equals sign with "mc^2".
Why? (Score:2, Insightful)
Why should it? What if I want to create such a page? Why should someone (or something) tell me what to say, or how to say it? And who will "train" such a thing? The Government??
Re:Why? (Score:2, Informative)
To make the other (more likely) options more easily available, spend a lot of time poking around for tags with smaller targets *or* type it by hand *or* change the settings to lower the effect of prediction *or* replace the training files *or* just use the damn thing since it'lol learn, nobody's telling you to do anything, and The
Re:Why? (Score:1)
Re:Why? (Score:2)
Obviously, you're a karma whore who's trying to work both sides of the issue.
Re:Why? (Score:1)
Screenwriters (Score:2)
Further dumbing of humanity (Score:2)
The problem is that the best prose contains unexpected novelty such as a plot twists, new facets of a character, joke punch lines, etc. In a true "page-turn
Re:Further dumbing of humanity (Score:1)
Re:Further dumbing of humanity (Score:1)
Re:Further dumbing of humanity (Score:2)
This approach favors bloated, redundant encodings (Score:4, Interesting)
To see this for yourself, pick a nice big hunk of English text and gzip it. You'll get about 50-60% compression. Now, pick a similar-sized hunk of XML and gzip it - you'll probably get 75% compression or more.
Tools like this make using bloated, redundant encodings more tolerable by automating some of the redundancy away. It's not clear to me that this is a good thing.
Re:This approach favors bloated, redundant encodin (Score:2)
Re:This approach favors bloated, redundant encodin (Score:2)
When designing a language - be that a simple one which can be encapsulated in an XML schema for example, or even a complex natural language there is a trade off between being efficiently terse and introducing sufficient redundancy as to allow communicants to differentiate signal from noi
OT: Your sig (Score:2)
The linked article neglects to mention Unicode compatibility in its list, but a good read nonetheless.
Re:OT: Your sig (Score:2)
I've seen that article before. It does a fairly good job of missing the point, or seeing the point and getting it backwards. XML does get one thing right - the idea that chunks of information ought to be self-describing, down to the character set level. Even Common Lisp punts on that one - the spec basically says "we require this subset of ASCII, and here's an API to manipulate whatever your implementatio
Re:OT: Your sig (Score:2)
I usually treat things as:
metadata = attribute
related content = sub element
You are right in that there are no hard and fast rules for what should be an attribute and what should be an element, but then I really haven't found it to be a real problem once I adopted the above.
DTDs do suc
Re:OT: Your sig (Score:2)
My heuristic for that is attributes are for metadata that has little or no structure, and is very unlikely to change. In practice, this reduces to "never use attributes" for me.
Why not TeX? (Score:2)
That said, I have been feeling that TeX is a bit outdated as a system, but then I discovered TeXmacs [texmacs.org]. This is a fully wysiwyg editor for TeX, where you type in TeX code and see the formatting instead of the code. I have switched to using it, and would definitely recommend it to others
Training (Score:2)
I'm going to pop over to OpenOffice.org, and use their source to create a training document.
Stay tuned for details.
~D
Re:Training (Score:3, Informative)
I took the "English with lots of punctuation", and copied the
Yeah! (Score:2)
Dasher vs T9, and NLP=SNLP (Score:2)
The statistical properties of languages are utilized in most (successful) approaches for natural language processing [stanford.edu], from part-of-speech tagging, information extraction, syntactic parsing, machine translation to question answering; y
Really! (Score:2)
I find it a bitch to get proper punctuation, nevermind capitalization, and the routine stuttery freezes are amazingly annoying. I suppose if I were incapacitated to the point that I could only type by looking around I would appreciate it alot more though.
So I'll just call it a really cool toy that is in fact worth trying out and hope some games incorporate some of this technology at some point in the future.
Re:Really! (Score:1)
Re:Really! (Score:1)
Re:Really! (Score:1)
What are you talking about with the capitalisation? After a full stop (question mark, etc) and a space, the yellow (capitals) box is massive.
Why games?
Re:Really! (Score:2)
I've been pointing at letters for well over ten minutes now. I've figured out the capitals box now, nearly got the punctuation sorted out.
Why games? Because I find the lack of straightforwardness and it's adapting to be the kind of feature I'd like to see in a game.
My box is a 1GHz Athlon, which I never figured as really slow. I'd be noticing those stuttery freezes even if they were three times shorter though, easily. Perhaps my Gentoo compile is to blame.
Regarding my spelling,
Re:Really! (Score:1)
Are you turning the speed slider up? I'm near maximum... 7 I think.
Oh yes, I've tried it on a
Re:Really! (Score:2)
Game wise I mean as somehow worked into the game mechanics. Just tacking it on to some existing game as is for what it does would be kinda silly.
Application to program code will be "interesting" (Score:2)
Correct spelling is no longer more probable than incorrect spelling.
Some misspellings are intentional. I knew a guy who frequently wanted to use MODE as a variable name in his COBOL programs. But MODE is a COBOL keyword and the compiler would hiss at him. So he now always spells it MOAD.
Likewise some misspellings are due to local culture. Paw through some DEC c
Re:Application to program code will be "interestin (Score:1)
Similarly for "list" in Lisp or Scheme. I use "lyst" since I learnt the basics off Douglas Hofstadter from "Metamagical Themas".
Not surprising... (Score:2)
I will be first to cheer anybody who invents a worse way of typing math than MS Equation Editor. Being better than that is not an achievement at all. Can't they simply learn TeX for their math?
about as novel as my foot (Score:2)
And the fact that it only generates "correct" input can be a real problem: names, foreign words, etc. just don't come out right.
Re:about as novel as my foot (Score:1)
Much better than T9 in that respect. Hendwriting recognition isn't so much better there either (with symbols and numbers, too).
straitjacket? (Score:2)
TeX math notation anyone? (Score:1)
Re:TeX math notation anyone? (Score:1)
Random Words (Score:2)
Feed the entire contents of /usr/dict/words into a markov generator and you get pretty much the same thing. Random words which, whilst not having any meaning, are reasonably syntactically correct.
http://www.fourteenminutes.com/fun/words/ [fourteenminutes.com].
Apples and Oranges (Score:2)
However, while Dasher can be compared to the JavaScript application that works with MathML, Dasher and MathML cannot be directly compared. Determining correctness would be from a program reading the DTD or schema of MathML. MathML wou
this idea has been around for a long, long time (Score:2, Interesting)
Let me explain... (Score:1)
Re:correctness? (Score:2, Insightful)
Re:correctness? (Score:3, Interesting)
With version 3, as with version 1.6, every language requires a text file full of natural writing (about 300K or more); a specification of the alphabet of the language is also required.
It wouldn't be hard at all to make it work for English, as opposed to Americanese, all you have to do is train it on text written with your own preferred idiosyncrasies