Become a fan of Slashdot on Facebook

Text Mining the Multiverse 137

Posted by michael on Friday October 17, 2003 @04:41PM from the mother-lode dept.

The NYT has a decent piece about text-mining, skimming large volumes of miscellaneous text to extract some sort of refined knowledge from it.

This discussion has been archived. No new comments can be posted.

Text Mining the Multiverse

Load All Comments

Search 137 Comments Log In/Create an Account

Comments Filter:

Quick... (Score:2)

by inertia187 ( 156602 ) * writes:

Quick, someone patent it before Microsoft does or else Slashdot is going to be the next casualty.

Then again, we could just skip the patent and let WWdN die too. Seems like the internet community would break even.
- Re:Quick... (Score:1)
  
  by Mod Me God ( 686647 ) writes:
  
  That is pointless... if it has not already been patented it cannot be because of prior existence... not patenting (useful or other) variations on a theme is another matter!
  - Re:Quick... (Score:1)
    
    by Carnildo ( 712617 ) writes:
    
    Since when has "prior art" stopped someone from filing for and recieving a software patent?
- Prior Art (Score:1)
  
  by AndroidCat ( 229562 ) writes:
  
  Micheal Moorcock has been text-mining the multiverse for decades now.
No need to register! Here's the Text! (Score:1, Informative)

by scumbucket ( 680352 ) writes:

MICHAEL N. LIEBMAN knows his limitations. Even with a Ph.D. and a long career in medical research, he cannot keep up with all the developments in his area of interest, breast cancer. Medline, the database that already houses more than 10 million abstracts for journal articles, is adding 7,000 to 8,000 abstracts per week. Only a fraction of these are about cancer, but the volume of information is daunting nonetheless.

"There is just too much literature to be able to go through it all," said Dr. Liebman, the
- Copyright (Score:1)
  
  by skamp ( 559446 ) writes:
  
  Doesn't it bother anyone that copying the article is probably illegal?
  - Re:Copyright (Score:1)
    
    by lamp540 ( 644770 ) writes:
    
    OH the guilt...
I didn't read the article (Score:2, Insightful)

by Mattwolf7 ( 633112 ) writes:

Why does slashdot keep linking to articles that require NYT registration? Isn't there some sort of Google news out there?
(Yes I am a lazy /. reader)
- Re:I didn't read the article (Score:2)
  
  by throbbingbrain.com ( 443482 ) writes:
  
  I think they're catching on to us. I couldn't find a single email address at nytimes.com that wasn't 'already in use'.
  
  I guess they figured out why so many readers are 90 year old CEOs of religeous organizations in beverly hills.
  - Re:I didn't read the article (Score:2)
    
    by MisanthropicProggram ( 597526 ) writes:
    
    Is that why my:
    Heywood Jablowme@whitehouse.gov
    Dick Hertz@yahoo.com
    HarryPNisss@microsoft.com
    SudoNy mm@slashodot.com
    ImaNassHole@whitehouse.gov
    Homu rSexual@apple.com
    and others..Doesn't work?
    - Re:I didn't read the article (Score:3, Funny)
      
      by Rick the Red ( 307103 ) writes:
      
      I feel realy sorry for luser@aol.com, because I've signed him up for all sorts of things...
      - Re:I didn't read the article (Score:1)
        
        by wan23 ( 636995 ) writes:
        
        Someone actually has that address you know... some German guy, judging from his profile. How would you like it if someone signed you up for all kinds of junk?
        
        Re:I didn't read the article (Score:2)
        
        by joshuac ( 53492 ) writes:
        
        Your kidding, right? Some guy actually has "luser@aol.com"?
        
        Reminds me of one of the companies I worked at, long ago. SMTP addresses went first initial, last name. However they made an exception for a Samuel Hitt.
        
        Re:I didn't read the article (Score:2)
        
        by whereiswaldo ( 459052 ) writes:
        
        Reminds me of one of the companies I worked at, long ago. SMTP addresses went first initial, last name. However they made an exception for a Samuel Hitt.
        
        What's his nospam address? noshitt@aol.com? LOL
      - Re:I didn't read the article (Score:1)
        
        by ncr53c8xx ( 262643 ) writes:
        
        AOL has a list of email addresses you can't sign up for (and it is not offensive or already taken). You cannot, for instance, signup for aoluser@aol.com.
- Re:I didn't read the article (Score:2)
  
  by glenrm ( 640773 ) writes:
  
  Quite frankly for straight tech news without commentary and without NYT this Wired that check out Google Tech News [google.com]. A great range of stories with out all of the same new outlets being mentioned again and again.
- perhaps more importantly (Score:1)
  
  by SweetAndSourJesus ( 555410 ) writes:
  
  Why does every story linking to a New York Times article have people such as yourself complaining about it? Your comment lends nothing to the discussion and identical sentiments have been expressed countless times. Write a journal entry [slashdot.org] if you want mindless discourse regarding the New York Times registration requirement. Complain to the Times [nytimes.com] and tell them what horrible, horrible people they are for making you take ten seconds of your time to provide them with false information.
  
  The fact is that the New Yor
  - - Re:bullshit (Score:1)
      
      by Eric Ass Raymond ( 662593 ) writes:
      
      And why should you not have to register for information that someone worked for? Freeloader.
    - uh, no (Score:1)
      
      by SweetAndSourJesus ( 555410 ) writes:
      
      Slashdot is a business. If they started using Google's partner link, the NYT would hand their asses to them in court.
      
      Perhaps Slashdot should get in touch with the NYT and see if they can get a partnership set up, but stealing someone else's wouldn't be such a hot idea.
- - Re:Alright Slack... (Score:1)
    
    by Mattwolf7 ( 633112 ) writes:
    
    That works =)
    Thanks!
  - Re:Alright Slack... (Score:2)
    
    by Creepy Crawler ( 680178 ) writes:
    
    Sounds like the start of Xanadu.
    
    I'd like that.
  - Re:Alright Slack... (Score:1)
    
    by pvt_medic ( 715692 ) writes:
    
    quick someon mine this and give us the refined knowledge that the article has
- - - Oh you mean (Score:1)
      
      by bob_calder ( 673103 ) writes:
      
      that the whining cretins are really vastly intelligent creatures railing against the machine? What a relie.
Bringing Star Trek-like Computing one step closer! (Score:1)

by GuardianBob420 ( 309353 ) writes:

I've always wanted to ask the computer to find all references to some complex interplay of topics at hand the way those Star Fleet engineers were always able to in TNG...
create large volumes of junk to feed this.. (Score:2)

by joeldg ( 518249 ) writes:

yea...

text mining is fun until someone creates something to generate a bunch of junk to feed to the text miners..

take a look at my .sig
- Re:create large volumes of junk to feed this.. (Score:2)
  
  by Elwood P Dowd ( 16933 ) writes:
  
  That will work if you can get the botfeed created articles published in a major medical journal.
  
  Otherwise, totally not an issue.
- Re:create large volumes of junk to feed this.. (Score:1)
  
  by mcpkaaos ( 449561 ) writes:
  
  text mining is fun until someone creates something to generate a bunch of junk to feed to the text miners..
  
  Too late. It's called Slashcode.
  
  Ba-dum bum. =P
RTFA (Score:2, Funny)

by devphaeton ( 695736 ) writes:

skimming large volumes of miscellaneous text to extract some sort of refined knowledge from it.

Like those ppl who actually RTFA and try to get "FORST PIST!!!"?
- Ouch! (Score:2)
  
  by cookie_cutter ( 533841 ) writes:
  
  "FORST PIST!!!"
  That sounds painful!
why the "Multiverse" buzzword ? (Score:2)

by freuddot ( 162409 ) writes:

Multiverse doesn't appear anywhere in the article. Multiverse is a technical word, for interpreting Quantum Physics. It is totally misplaced in this news submission.
Did the poster even know what it means ?
- Re:why the "Multiverse" buzzword ? (Score:1)
  
  by SquadBoy ( 167263 ) writes:
  
  I've been rereading Snow Crash but maybe he meant metaverse that at least makes sense as an attempted joke.
  - Re:why the "Multiverse" buzzword ? (Score:1)
    
    by cei ( 107343 ) writes:
    
    Multiverse could also be the different incarnations of Michael Moorcock's "Eternal Champion" (Elric, Hawkmoon, Corum, etc...)
- Re:why the "Multiverse" buzzword ? (Score:2)
  
  by metlin ( 258108 ) writes:
  
  I'm guessing that this is for data in multiple versions of documents -- spatial and temporally disparate ones.
  
  One of the groups that I work with [gatech.edu] does some data analysis stuff with how data changes over space (location based) and time (your beliefs yesterday vs. your beliefs today) and the ilke -- so this could be something along those lines.
  
  Or like you said, it could just be a buzzword! :)
- Re:why the "Multiverse" buzzword ? (Score:1)
  
  by seriv ( 698799 ) writes:
  
  Why don't you google it to find out:p
  -Seriv
- obviously its (Score:1)
  
  by PurplePhase ( 240281 ) writes:
  
  ...short for "Marvel Multiverse". Text-mining all the comic books in existence to find out which timelines conflict with the others would be an excellent research project.
  
  8-PP
Support non-whoring reg-free linkage! (Score:5, Informative)

by Anonymous Coward writes: on Friday October 17, 2003 @04:46PM (#7243816)

Brought to you by your favorite anonymous non-whoring poster: the Google link [nytimes.com].
The same article is also posted at CNET, which doesn't require registration. They also have it in a nice single-page [com.com] format for those that don't like to keep hitting "next".

Share
twitter facebook
Oh shit (Score:1)

by Brahmastra ( 685988 ) writes:

If I apply this to slashdot, I'll have only 3-4 posts to read everyday... What will I do all day at work?
Brute forcing the problem (Score:3, Interesting)

by metlin ( 258108 ) writes: on Friday October 17, 2003 @04:49PM (#7243859) Journal

To make sense of what it is reading, the software uses algorithms to examine the context behind words.

They make it sound like Semantic and Contextual modeling is done on the fly -- the way I see this system, it does this based on a preset lexicon or database.

Thats again brute forcing the problem -- a lot of researchers in the field feel that real solution does not lie that way. We need to analyze this from ground up, to gather meaning from data.

The above method fails the moment you have spatial and temporal data -- my lexicon may evolve over a period of time.

You're looking at all the information and then deciding whats for you -- a better way is to develop an "instinct" for the right kind of information and refine it.

If you really want to know where data mining is going to, look at KDD or SIGMOD -- thats where all the real action is.

Share
twitter facebook
- shameluss plug of parent (Score:2)
  
  by koekepeer ( 197127 ) writes:
  
  never did this before
  
  MOD PARENT UP (and me down i don't care)
  
  finally a /. comment that makes sense
- Re:Brute forcing the problem (Score:2)
  
  by shdragon ( 1797 ) * writes:
  
  I don't believe that medical terminology & medical journals contain a lot of terminology that change over time. Most medical words are latin-based and fairly rigid in their usage. A tibia is a tibia is a tibia. What the doctor has created is very specific application to skim a very large (volume) of information and report back things that might be of interest. You are correct that an "instinct" would be much more useful. This does not however make the doctor's accomplishment any less viable.
  
  Is text mi
- Re:Brute forcing the problem (Score:2)
  
  by john82 ( 68332 ) writes:
  
  There are more than enough opinions about "the right way" to model data from a semantic or centextual standpoint. Like most things there's the academic approach and there's one that a company can afford. Whether or not either is appropriate depends on your needs, point of view and the size of the coporate wallet.
  
  Sure there are those who short change some approaches because they have temporal limitations. New data comes in and you need to categorize that too and determine it's context or supremacy to data y
  - Re:Brute forcing the problem (Score:2)
    
    by metlin ( 258108 ) writes:
    
    That is true, that the answer is not straight nor is it simple.
    
    However, one thing that I have learnt (the hard way) over a period of time is that Ontology (Specification of data conceptualization) is infinitely more important than Epistemology (Knowledge of the data).
    
    There is nothing wrong with a system which has tags, the trouble is when you classify it eitherway -- the references of the tags are once again more important than how they are acquired. You could perhaps have a purely automated system, maybe
Fun with numbers (Score:3, Interesting)

by ajs ( 35943 ) writes: <ajs AT ajs DOT com> on Friday October 17, 2003 @04:50PM (#7243869) Homepage Journal

Here's some fun you can have with numbers. Take this Perl one-liner:

perl -ne '$x{$1}++ while /(\d)/g;END{print map {"$_ occured $x{$_} times\n"} sort {$a<=>$b} keys %x}' xxxx

and run it with "xxxx" replaced by the name of some large text file that you create by saving email messages, web pages, log files, what have you.

The scary part (that took mathmeticians a long time to accept and longer to figure out) is that the distribution is the same for any sufficiently representitive sample of text....

Share
twitter facebook
- Re:Fun with numbers (Score:2)
  
  by ajs ( 35943 ) writes:
  
  Ooops. While you see a distribution there, that's not what I was trying to point out. The correct one-liner would be
  
  perl -ne '$x{$1}++ while /\b([1-9])\d*\b/g;END{$all+=$_ foreach map {$x{$_}} keys %x;print map {sprintf "%d occured %.1f%% (%d times)\n",$_,$x{$_}/$all*100,$x{$_}} sort {$a<=>$b} keys %x;print "$. lines read\n"}'
  
  Benford's law is the name of this phenomenon. Its even more interesting because it is independant of base!
  
  There are many ways that this is used, including detecting human tamp
  - Re:Fun with numbers (Score:1)
    
    by cornjones ( 33009 ) writes:
    
    Don't have easy access to a perl machine anymore. Can somebody post an example?
    - Re:Fun with numbers (Score:1)
      
      by NilObject ( 522433 ) writes:
      
      See my reply to parent, I'm not posting it again here. Hope it's interesting. Muahaha...
      P.S. The filters suck. *sigh*
- Re:Fun with numbers (Score:1)
  
  by PD ( 9577 ) * writes:
  
  The key is sufficiently representitive. And I'm not quite clear on what that means, but I know some examples:
  
  Make a list of the areas of all the lakes in your state. Doesn't matter what the units are. The distribution will be so the highest count will be zeros, and the lowest count will be the nines.
  
  Same for a list of all the house numbers in a city. Same for a list of just about anything you can think of, in whatever units you want.
  
  This can be used to detect fraud. For example, if you look at the finaci
- Re:Fun with numbers (Score:1)
  
  by NilObject ( 522433 ) writes:
  
  For those curious of what the output gives, here's the output as run on the Slashdot homepage stripped of HTML etc:
  1 occured 44.9% (57 times) 2 occured 17.3% (22 times) 3 occured 9.4% (12 times) 4 occured 7.1% (9 times) 5 occured 11.0% (14 times) 6 occured 1.6% (2 times) 7 occured 3.9% (5 times) 8 occured 2.4% (3 times) 9 occured 2.4% (3 times) 315 lines read
  
  Run the same thing on, say, Microsoft's home page and you get:
  1 occured 13.6% (3 times) 2 occured 27.3% (6 times) 3 occured 4.5% (1 times) 4 occured 18.2% (
- - Re:Fun with numbers (Score:2)
    
    by ajs ( 35943 ) writes:
    
    Yep, binaries are a good example. Basically, in any data files that represent large systems with many variables, you should find that the Perl regular expression
    
    /\b(\d)\d*\b/g
    
    should match a 1 most often. In some types of text (especially code), you will find things like "0" show up a lot. That's why in my example, I didn't allow for single-digit numbers, but if you want to, that's cool.
    
    I find that a large pool of USENET posts works best.
How Long.. (Score:1)

by Disco Stew ( 703497 ) writes:

..until no student ever has to research any topic again?

Just head over to tellmewhatthisthingyisabout.com > Print
Speed reading (Score:1)

by Serious Simon ( 701084 ) writes:

I took a speed-reading course and read War and Peace in twenty minutes. It involves Russia. -- Woody Allen
- Re:Speed reading (Score:1)
  
  by stipe42 ( 305620 ) writes:
  
  I just read this exact quote in a post on plastic . . . similar net browsing habits or just plain coincidence?
Hmmm, isn't there a prerequsite??? (Score:2)

by 3seas ( 184403 ) writes:

That the text has to first contain some knowledge in it to begin with?

Maybe this is just an attempt at getting a machine to generate core knowledge but then haven't they been working on common sense, which is sorta needed first?
Fark Registration. Get in without stupid reg. (Score:1, Informative)

by Anonymous Coward writes:

Tired of going through their stupid registration? CLICK HERE [nytimes.com]
Red Necks (Score:2, Funny)

by k_stamour ( 544142 ) writes:

"to extract some sort of refined knowledge from it." hum....
If you have an infinite number of red necks ....Infinite number of shot guns & shotgun shells.... And an infinite number of stop signs, you will eventually get Shakespeare in brail.....
You mean... (Score:1)

by JamesP ( 688957 ) writes:

kimming large volumes of miscellaneous text to extract some sort of refined knowledge from it.

like grep?

I'm sorry, reading this text requires meta-technology.
Could do us a big favor (Score:3, Funny)

by Strange Ranger ( 454494 ) writes: on Friday October 17, 2003 @04:57PM (#7243964)

...skimming large volumes of miscellaneous text to extract some sort of refined knowledge from it.

Dear Text Miners,

Please start here: http://slashdot.org [slashdot.org]

Thanks so much.

Share
twitter facebook
- Re:Could do us a big favor (Score:2)
  
  by stefanlasiewski ( 63134 ) * writes:
  
  No no no, they said REFINED knowledge...
  
  sheesh. Some people!
- Re:Could do us a big favor (Score:1)
  
  by Jace of Fuse! ( 72042 ) writes:
  
  Yeah, but how high should it set it's threshold?
  
  Should it filter Funny?
sorry, still sounds a lot like text searching (Score:1)

by cnb ( 146606 ) writes:

Text-mining programs go further, categorizing information, making links between otherwise unconnected documents

For any google results
"Category" is shown right on top of the results.
"Links" - try link:slashdot.org & related:slashdot.org as google queries.

If someone is doing research on computer modeling, for example, it not only knows to discard documents about fashion models but can also extract important phrases, terms, names and locations

Try the google advanced search you can search with "all of
Cool Picture (Score:1)

by Mattwolf7 ( 633112 ) writes:

Hey anyone else think this [nytimes.com] picture was really cool?
- You mean you don't already have one? (Score:1, Offtopic)
  
  by djeaux ( 620938 ) writes:
  
  Something tells me at least six /.ers are already working on the case mods :-D
Well, DUH! (Score:3, Insightful)

by djeaux ( 620938 ) writes: on Friday October 17, 2003 @05:13PM (#7244119) Homepage Journal

How well computers truly make sense of what they are reading is, of course, highly questionable, and most of those who use text-mining software say that it works best when guided by smart people with knowledge of the particular subject.

May I offer that computers make no sense of what they are reading & that "smart people with knowledge of the particular subject" aren't optional if the results of text-mining are to be of any usefulness whatsoever, at least in any kind of reasonable time frame.
Otherwise, the text-mining computer is playing the old "99 monkeys with typewriters" game...

Share
twitter facebook
- Re:Well, DUH! (Score:2)
  
  by koekepeer ( 197127 ) writes:
  
  i would say that is a problem of the underlying data, not of the textmining per se.
- Re:Well, DUH! (Score:1)
  
  by geekBass ( 665923 ) writes:
  
  It's not as bad as you think. Check out Vivisimo [vivisimo.com]
- Hence the CYC project (Score:1)
  
  by spage ( 73271 ) writes:
  
  The compelling dream is that you laboriously load up a computer with enough facts so that it can glean understanding of what it's reading, and one glorious day the computer has enough smarts to make sense of things on its own, and two weeks after crawling the entire Internet, it knows everything.
  Hence Doug Lenat's Cyc [cyc.com], now partly open source [opencyc.org]. Unfortunately that glorious day has been "a few years away" for over 13 years.
  The knowledge base is built upon a core of over 1,000,000 hand-entered assertions
  - Re:Hence the CYC project (Score:1)
    
    by Tablizer ( 95088 ) writes:
    
    But I haven't come across any postings from Cyc on Slashdot correcting misinformation and lies.
    
    How do you know? You cannot verify human authorship just by looking at the text. Perhaps the Goatse troll is really an AI bot.
while text minning... (Score:1)

by seriv ( 698799 ) writes:

Do you have to where a minning hat:p
-Seriv
- Re:while text minning... (Score:1)
  
  by seriv ( 698799 ) writes:
  
  I was in a hurry, I didn't think, plus can't you figure out a joke, a dumb one at that, but still a joke!!!!!!!!!
  -Seriv
Text Mining for Corellation (Score:1)

by NoSlack913 ( 627840 ) writes:

Mining for data that might be related based on proximity, either temporal or locational, starts to get interesting when you are dealing with millions of interactions like in a call center on voice data (check out www.callminer.com) and suddenly you find out that when a customer says "hurricane" in an insurance call center, your agents are 5x more likely to hand them off to a supervisor, is real money saving information. This is what this technology is good for, and is being bought and used by a lot of compa
but what about the data itself? (Score:3, Insightful)

by koekepeer ( 197127 ) writes: on Friday October 17, 2003 @05:35PM (#7244331)

i always wondered about this

allright, you can take huge amounts of text and apply some smart tricks to extract patterns from it.

but how can you determine whether the original data was trustworthy?

take the example of genome annotation (description of gene function), which would be helped greatly by including more functional descriptions from scientific literature. how do you determine whether the original publication was backed by solid experimental research?

by the reviewers of the articles? i don't think so, peer review is a snakepit filled with politics. by the amount of people who cited it? hmmmm... so hip subjects are more true?

me personally, because i'm experienced, can recognise bullshit articles when i see them. but how to translate this into an algorithm... anyone any ideas about this? or even working solutions?

(of course this is an example from my field of expertise - biology, but it applies to any set of text data/articles IMO)

Share
twitter facebook
- Re:but what about the data itself? (Score:2)
  
  by metlin ( 258108 ) writes:
  
  Hmm, I guess you cannot say that for sure, but most systems today use trust metrics.
  
  For example, an ACM/IEEE source would have a much higher trust metric than say, from some local conference in Egypt (no offence to any local conferences in Egypt, but you get the wind :)
  - Re:but what about the data itself? (Score:2)
    
    by koekepeer ( 197127 ) writes:
    
    i see the point, but is this truly representative of realiability?
    
    you rely on peer review, on citation indices, so mostly IM-not-so-HO on matters of politics.
    
    when you scan abstracts yourself, you can dig into the detail when something looks interesting enough, but the decision making process that drives me while scanning abstracts is not much influenced by the fact whether it is in a high impact journal (or any other high impact publishing body) or in something mostly not noteworthy.
    
    to put it in another
    - Re:but what about the data itself? (Score:2)
      
      by JDevers ( 83155 ) writes:
      
      Definitely...I've read some very BS papers in Science and Nature and some really good ones in MUCH less respected journals.
      
      I would not apply a trust metric to an article based on the journal alone...
Some notes... (Score:2, Interesting)

by ekephart ( 256467 ) writes:

(1)"Of course, no one, Dr. Liebman included, is arguing that these products are actually reading anything. What they are engaged in is "text mining,'' "

Dijkstra once said "The question of whether computers can think is like the question of whether submarines can swim." ... Just a thought.

(2)As noted in the article sarcasm is very hard to detect. If you think about it even many people have a hard time recognizing it. How are we supposed to develop an intelligent system when we "intelligent" humans don'
Cognitive Science? (Score:1)

by KrackHouse ( 628313 ) writes:

You know how those guys at MIT are constantly trying to figure out ways to teach their robots how to interact with people? Let the robots roam the Internet with a topic in mind. If I'm at a party with a bunch of dog groomers I'm probably not going to say much. I'm sure robots have the same issue; they have nothing in common with us. If we start by making a Cancer-Expert-Bot then let it try to have a conversation with an oncologist I think AI will have more success.
Warballs (Score:2)

by DrSkwid ( 118965 ) writes:

They just had to get it in somehow :

like the 858-page report on the congressional inquiry into intelligence failures regarding the Sept. 11, 2001, terrorist attacks.
statistical nlp (Score:1)

by MarkWatson ( 189759 ) writes:

I spent more years than I care to admit writing natural language processing software that tried to extract semantic information - conceptual dependency, parsers, etc.
I gave up a few years ago, now I mostly use statiscal approaches (markov processes, word counts, huge databases of proper names, etc.)
-Mark
- Re:statistical nlp (Score:2)
  
  by MarkWatson ( 189759 ) writes:
  
  .. I meant "statistical approaches", not "statiscal approaches" ..
  (I was trying to type while holding my wife's baby parrot, and he sometimes goes nuclear if you don't pay enough attention to him :-)
  BTW, pardon the shameless plug, but I added a short chapter on statistical nlp (simple enough example program to understand easily) to my free Java/AI web book.
  -Mark
unfortunately (Score:2)

by jafac ( 1449 ) writes:

NYT won't be contributing to this large body of text, because registration is STILL required.
Skimming random information? (Score:1)

by st0rmshadow ( 643869 ) writes:

skimming large volumes of miscellaneous text to extract some sort of refined knowledge from it.

I like to call it High School.
- That's what studentf are for. (Score:1)
  
  by bob_calder ( 673103 ) writes:
  
  At dinner last night, my friend asked how I could justify spending a lot of time putting in massive amounts of information on a project. I told him that's what students are for! (wink wink, nudge)
just curious (Score:1)

by dandelion_wine ( 625330 ) writes:

but could a person who had sufficient knowledge of the program(s) build a large document (say, the giant 9/11 intelligence-failure doc mentioned in the article) so as to fool the text-miners? Subtle misinformation -- let's say that widespread use of text miners results in larger docs being published, then unscrupulous types bury information in such a way that a ridiculously long human endeavour will turn them up, but the programs won't, so those responsible can say: "See? It's all in there. Your program is
Text-miner in MS-Word (Score:1)

by Knights who say 'INT ( 708612 ) writes:

MS Word has a surprising "summary" feature that has given me impressive results in portuguese. How the hell do they do that?
KDD Cup (Score:4, Informative)

by apsmith ( 17989 ) * writes: on Saturday October 18, 2003 @12:18AM (#7246610) Homepage

The knowledge discovery and datamining cup challenge this year [cornell.edu] was looking at the arxiv.org [arxiv.org] papers for this sort of analysis - some very interesting results. The Task 4 winnder [umass.edu] looked at the structure of the papers as a sort of relational database and uncovered a lot of statistical patterns and metrics that could be quite useful for scientists.

Share
twitter facebook
FBI agents? (Score:1)

by hairyface ( 717081 ) writes:

"I was an FBI agent for 20 years," said Randall Murch, now a researcher at the Institute for Defense Analyses, which works for the Office of the Defense Secretary and other government agencies. "And I have yet to see anyone who is able to model the way an agent thinks and works through an investigation."

Apart from suggesting the jibe that, of course, only an ex-fbi dick could think that anyone would want to model his/her behaviour, this misses the point that text-mining is intended to find precisely thos
Subrogation - Firemen's Fund would do well to (Score:1)

by bob_calder ( 673103 ) writes:

think before they use a hammer. Using software to fix problems that exist within their human intelligence arena is soooo typical. The bit about subrogation is so idiotic, I can't believe it. Any idiot can check a box on the report if there is a basis for subrogation. If there is enough data in the report to determine a basis for subro. then the adjuster obviously knew that it should have been handed to the subro. dept. from the outset. There is obviously an issue here. The adjusters are reluctant to send c
- Re:What's up with Slashdot? (Score:1, Offtopic)
  
  by daeley ( 126313 ) * writes:
  
  This is OT, but read this journal entry [slashdot.org] from CmdrTaco.
  - Re:What's up with Slashdot? (Score:1)
    
    by Osty ( 16825 ) writes:
    
    That explains some of the problems, but not everything. For instance, why haven't I had mod points in nearly two years, despite having good karma and contributing to conversations (rather than trolling)? Yes, I know the rules about getting moderation points, but even with those I'd expect to get points at least two or three times a year, not every two years. As well, I recently started noticing that Slashdot has popup ads now (I saw one in the last day, and then added slashdot into my popup blocker's bla
- Re:What's up with Slashdot? (Score:1, Offtopic)
  
  by davidstrauss ( 544062 ) writes:
  
  Agreed.
- Re:What's up with Slashdot? (Score:1, Offtopic)
  
  by joeldg ( 518249 ) writes:
  
  yes, I have noticed the metamod thing too..
  
  in addition there is also this tasteless group of guys who keep making posts about greased-up yoda dolls which has also forced me to start browsing at +2..
  
  seems that mod points are being handed out with less frequency than they were before.
  
  I think they should start handing them out for people with "excellent karma" and then track if the metamods agree with the point distribution..
  
  that is just me..
- Re:What's up with Slashdot? (Score:1, Offtopic)
  
  by cK-Gunslinger ( 443452 ) writes:
  
  Answers to your questions: HERE [slashdot.org]
- Re:What's up with Slashdot? (Score:1, Offtopic)
  
  by TheFlyingGoat ( 161967 ) writes:
  
  I think part of it has to do with the number of posts lately. The last 10 articles (not including the current) have had an average of 224 posts. I estimated the average in early-mid summer and came up with 550. Twice as many posts = twice as much required moderation? I'm not sure how slash works in this regard.
  
  Now, as far as the reason for fewer posts, I know that the editors have said that late summer-fall tends to be slower for news, but I also think they've been putting up some boring articles latel
  - Re:What's up with Slashdot? (Score:2)
    
    by cK-Gunslinger ( 443452 ) writes:
    
    I agree about the quality of recent stories. I use to look forward to refreshing the main page all day and seeing an interesting story pop-up every hour or so, one that generates a couple of hundred comments and several deeply-nested threads.
    
    Now I refresh and see a review of a pirate book with ~70 "+2" comments and "Third Anniversary of Bezos-Backed Patent Reform," which went completely ignored. Meh.
    
    Of course, I'm not helping by posting near-useless comments like this...
    - Re:What's up with Slashdot? (Score:2)
      
      by cK-Gunslinger ( 443452 ) writes:
      
      And naturally, the few mods that are around have basically wasted a dozen or so points by methodically modding this entire thread Off-Topic. Not that I mind, but for Pete's sake, there's not a single +5 Mod in this topic yet! There's only 4 "+3"s posts! At least try to be constructive, for crying-out-loud! =P
- Re:What's up with Slashdot? (Score:1)
  
  by termos ( 634980 ) writes:
  
  I have been noticing as well, I've not had modpoints for quite some time, and most of my posts have been modde redundant for some reason, when I feel they're not. No major problems, I usually meta-moderate 10 posts, but yes there is some weird issues.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Quick... (Score:2)

Re:Quick... (Score:1)

Re:Quick... (Score:1)

Prior Art (Score:1)

No need to register! Here's the Text! (Score:1, Informative)

Copyright (Score:1)

Re:Copyright (Score:1)

I didn't read the article (Score:2, Insightful)

Re:I didn't read the article (Score:2)

Re:I didn't read the article (Score:2)

Re:I didn't read the article (Score:3, Funny)

Re:I didn't read the article (Score:1)

Re:I didn't read the article (Score:2)

Re:I didn't read the article (Score:2)

Re:I didn't read the article (Score:1)

Re:I didn't read the article (Score:2)

perhaps more importantly (Score:1)

Re:bullshit (Score:1)

uh, no (Score:1)

Re:Alright Slack... (Score:1)

Re:Alright Slack... (Score:2)

Re:Alright Slack... (Score:1)

Oh you mean (Score:1)

Bringing Star Trek-like Computing one step closer! (Score:1)

create large volumes of junk to feed this.. (Score:2)

Re:create large volumes of junk to feed this.. (Score:2)

Re:create large volumes of junk to feed this.. (Score:1)

RTFA (Score:2, Funny)

Ouch! (Score:2)

why the "Multiverse" buzzword ? (Score:2)

Re:why the "Multiverse" buzzword ? (Score:1)

Re:why the "Multiverse" buzzword ? (Score:1)

Re:why the "Multiverse" buzzword ? (Score:2)

Re:why the "Multiverse" buzzword ? (Score:1)

obviously its (Score:1)

Support non-whoring reg-free linkage! (Score:5, Informative)

Oh shit (Score:1)

Brute forcing the problem (Score:3, Interesting)

shameluss plug of parent (Score:2)

Re:Brute forcing the problem (Score:2)

Re:Brute forcing the problem (Score:2)

Re:Brute forcing the problem (Score:2)

Fun with numbers (Score:3, Interesting)

Re:Fun with numbers (Score:2)

Re:Fun with numbers (Score:1)

Re:Fun with numbers (Score:1)

Re:Fun with numbers (Score:1)

Re:Fun with numbers (Score:1)

Re:Fun with numbers (Score:2)

How Long.. (Score:1)

Speed reading (Score:1)

Re:Speed reading (Score:1)

Hmmm, isn't there a prerequsite??? (Score:2)

Fark Registration. Get in without stupid reg. (Score:1, Informative)

Red Necks (Score:2, Funny)

You mean... (Score:1)

Could do us a big favor (Score:3, Funny)

Re:Could do us a big favor (Score:2)

Re:Could do us a big favor (Score:1)

sorry, still sounds a lot like text searching (Score:1)

Cool Picture (Score:1)

You mean you don't already have one? (Score:1, Offtopic)

Well, DUH! (Score:3, Insightful)

Re:Well, DUH! (Score:2)

Re:Well, DUH! (Score:1)

Hence the CYC project (Score:1)

Re:Hence the CYC project (Score:1)

while text minning... (Score:1)

Re:while text minning... (Score:1)

Text Mining for Corellation (Score:1)

but what about the data itself? (Score:3, Insightful)

Re:but what about the data itself? (Score:2)

Re:but what about the data itself? (Score:2)

Re:but what about the data itself? (Score:2)

Some notes... (Score:2, Interesting)

Cognitive Science? (Score:1)

Warballs (Score:2)

statistical nlp (Score:1)

Re:statistical nlp (Score:2)

unfortunately (Score:2)