Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
News Science

Text-Mining Your E-mail 229

Misha writes "There have been a number of weeks/months in anyone's life that called for a better organization of your Inbox. filtering and folders work, but it'd be nice to have an text-mining tool running in the background that categorized incoming messages by topic as they arrive. It's nice to see that besides NLP research, there are some great algorithmic advances being done, as seen in this paper. Perhaps even one of them Perl monkeys will quickly hack such a background tool." Note: it's a PostScript file.
This discussion has been archived. No new comments can be posted.

Text-Mining Your E-mail

Comments Filter:
  • by Phred_Johnston ( 530218 ) on Wednesday April 24, 2002 @02:01PM (#3402619) Homepage
    I'm sure I'm not alone in saying that having a good history of well filtered incoming, and especially just about all of my Outgoing (Outbox) available for searching. My Outbox has been a lifesaver several times when someone claims that they didn't have that (electronic) discussion with me. It's great to quote "in a message sent... ...I asked you to...".
  • by LeeZard ( 109522 ) on Wednesday April 24, 2002 @02:08PM (#3402658)
    That's not the point. The paper is talking about modeling spikes in topic/content of data streams over time. This is the second layer analysis of the meta-data that gets stored in the database.
  • by CaptainPhong ( 83963 ) on Wednesday April 24, 2002 @02:15PM (#3402709) Homepage
    I've found the most joy from owning my own domains, and a lot of it has to do with e-mail sorting/filtering as much as the traditional benefits (a permanent www.yourdomain.com web site address and yourname@yourdomain.com e-mail address).

    Every time you sign up for some mailing list or discussion group, create a new e-mail account or alias for just those mailings. Bam, it's automatically sorted out by itself with extreme ease. If you have limited bandwith (or are checking, say, on your palm) sometimes, just check your important addresses frequently, and reserve your mailing lists for a once-per-day check.

    If some site asks for your e-mail address to download a piece of software, or to register, make up a new alias and give that to them. If you start getting tons of crap at that address, you can just remove that alias, and they get it all bounced back in their stupid spamming faces.

    Give one address to your cow-orkers just for work stuff. Give a different one to your Mom and other techno-nots that blocks all attachments. Give another one to your friends with brains that goes unfiltered. For people you don't want to talk to, give them the address of an autoresponder tied to Eliza [fury.com].

    Be a *Happy Camper* and let your addresses be *Bubbles* and you be just *You*.

  • outlook does this (Score:1, Insightful)

    by Anonymous Coward on Wednesday April 24, 2002 @02:34PM (#3402863)
    as evil as ms is, outlook does all this.

    and with simple vb you too can automate outlook. It has a really nice object model and loads of documentation.

    I wrote a silly little plugin that uses msspeech api and outlook sdk and now my computer reads to me when I want.

    The most useful is the rule that pops up and plays sounds and shit whenever my boss emails me. replyign within seconds has helped me get on her good side.

    tell me, if I made the switch completely to linux, would I have this easy of a time customizing my email application?
  • by Anonymous Coward on Wednesday April 24, 2002 @03:31PM (#3403304)
    Just use GhostView...
  • by kentborg ( 12732 ) on Wednesday April 24, 2002 @04:06PM (#3403714)
    Once I was at some internet tradeshow in Boston and every other booth seemed to be showing off their e-mail filtering features, each with one or more enormously complicated dialog box. Features! Features! Features!

    My reaction was to want an e-mail reading program that didn't require any filter configuration, though I imagined it would do well to be given a few hints, such as who my boss is, who my mother is, and who my wife is. Other than that, let the program figure it out.

    Imagine the canonical, old-fashioned secretary temp. She ('cause that's what the canonical version was) didn't have to know anything domain-specific to sort the morning mail. Magazines go together, bills go together, personal letters go together, etc.

    I imagine an automated version for my e-mail. Look at who it is "to" (am I on the list?), look at who is "cc"-ed (am I on that list?), look at who it is from (my boss, wife, or mother?), look at who else it is to (boss, wife, or mother?), look at the thread it is part of (is it responding to something I previously wrote?), look at the content (does it mention me, things I have written, my boss, wife, or mother?). Was it sent to a mailing list? Was it written by someone I have explicitly written to (once or many times?)? Was it written by someone who has previously sent me direct e-mail (once or many times?)? Those ideas are just the obvious ones, think of others. Think of more. (Does it talk about sex, credit card merchant accounts, stock tips, or Nigerian money?)

    Now take that and sort it by importance and similarity. Look for a way to present me in a descriptive summary, arranged in a hierarchy with a top-level of, say, 3 to 9 categories, a greatest depth no greater than, say, 4, and keep the sub-branching at intermediate nodes between 3 and 5--but don't max out all those dimensions at once, try to keep the total number of leaf categories to under, say, two dozen. Try to make more important items land higher in the tree and with few siblings, grouped with siblings of similar importance. (Maybe give an importance weight to each e-mail and balance the tree on that scale, that would float e-mails to me from my boss about my mother and wife really high with few siblings.)

    This summary needs to be integrated with a complete index of the e-mail so I can see how a message fits into a larger thread, how it fits into previous e-mails.

    I (the user) would need to tell the program when to make me a summary of my e-mail (e-mail reading is different when a lot comes in or just a little), and I want to be able to browse through old summaries, including deciding to see composite summaries or, say, the last several days, a week (or three), month, year, or 400 days.

    So I think it ends up being a 4-part user interface:

    List of summaries (which can be manipulated).

    A given summary.

    Exhaustive thread/date/subject/sender list (analogous to what every e-mail reader seems to have now). Note that this view could effectively be turned into an exhaustive address book. Frequent (favored) correspondents could be highlighted by me for ease in sending a new e-mail, and also to provide importance hints to the program. This is where I might say who my boss/wife/mother is.

    A body of a (or more) specific e-mail being read, written, or old e-mail (sent or received) being reviewed.

    And I could go on, but I won't. If anyone wants to write such a thing and wants to hear more, send me an, um, e-mail.

    -kb, the Kent who has been saving all his e-mail (including spam!) for a year or so, providing plenty of raw material to test any such program.

The hardest part of climbing the ladder of success is getting through the crowd at the bottom.

Working...