Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
The Internet Science

Search Engine Learns From User Feedback 269

An anonymous reader writes "Ian Clarke, founder of the Freenet project, has set up a web search engine that allows users to rate each of the search results it returns. WhittleBit will use your feedback to determine which keywords should be added or removed from your search, then you can search again to get more accurate results. This could be useful for those cases where Google just refuses to return the search results you want. Could improved interactivity be the next big search engine advancement after Pagerank?"
This discussion has been archived. No new comments can be posted.

Search Engine Learns From User Feedback

Comments Filter:
  • by garcia ( 6573 ) * on Monday August 11, 2003 @02:31PM (#6667763)
    Could improved interactivity be the next big search engine advancement after Pagerank?"

    In short, no.

    I have tried Whittebit before (a user had a link to it in his .sig on Slashdot). I was unimpressed with the results the first time (there were 8 or so to work with) and limiting with the thumbs down was of little use when there were so few results.

    I can't see google's superiority being challenged by this at all. What else would Whittebit offer me other than this "feature"? I didn't see anything else when I used it (and in fact, was rather annoyed by the fact that it remained at the top of the screen while reading the link I was sent to).

    No thanks, just my worthless .02
    • I like it. (Score:5, Interesting)

      by Doesn't_Comment_Code ( 692510 ) on Monday August 11, 2003 @02:37PM (#6667833)
      I like the idea of interactive page rankings. I don't think it should be the one decisive ranking alogrithm. But human interaction is just what search engines need.

      I do a lot with Google, and it leaves some to be desired. The goal of Google is to make the ranking of pages partly out of the hands of webmasters, so they can't just trick the spiders. And that has worked very well for Google (serves over 70% of internet searches). But all page ranks are very cold and calculated. Maybe that cold, calculated rank is a good place to start, and then it's time for human reviewers to fine tune the list.

      By the way, Google has attempted to acheive this concept of human ranking by watching to see how long you stay at a page you clicked on. If they rank a page 1, and you click it, and immediately return to the search page, they penalize that page. So if even Google is trying the same abstract concept, it probably has a future on the web.

      • Re:I like it. (Score:5, Interesting)

        by Thoguth ( 203384 ) on Monday August 11, 2003 @02:55PM (#6668005) Homepage
        By the way, Google has attempted to acheive this concept of human ranking by watching to see how long you stay at a page you clicked on. If they rank a page 1, and you click it, and immediately return to the search page, they penalize that page. So if even Google is trying the same abstract concept, it probably has a future on the web.

        If that's true, then the way I do searches is counter-productive. I load the google search page, and then middle-click all the links that look the most promising and read them in tabs. No wonder Google's searches have seemed to get worse and worse for me lately, I'm training it to think my most promising results are no good!
        • I load the google search page, and then middle-click all the links that look the most promising and read them in tabs.

          I do the same, but I don't think Google "knows" that you're doing that.

          As far as I know, Mozilla doesn't send Google any info when you middle-click (or even left-click) on an outside link (if you clicked on the cached link, then of course it does).

          So if you never refresh the Google tab, it doesn't report anything of your Googling habits back to Google. So you're not training it at

        • Probably not. I know that Google sets a cookie, most likely they store what you clicked on in that cookie or in a session ID associated with it. It would read those cookies regardless of weither or not it was in a new tab/window/whatever.

          What is probably more counter-productive would be to turn off cookies, but even then without the information, Google wouldn't get worse, I would think it would just return the same results everytime.
          • But doesn't the cookie scare you? I find it terrifying to think that google has a database of every ip and all the searches performed from that IP, and that's collected even without cookies that never expire. Remember that google is a private corporation, and is likely to cow to any request for information by a law enforcement officer. I'm sure it won't be too long til all of this is linked straight into TIA. Check out Google Watch [google-watch.org] for more info.
            • Unless your using Google for kiddie pr0n, system hacks and cracker scripts I think you'll be ok with a little cookie
              ...as long as it's a sugar cookie, yummy!

              But if that is what you are using Google for...Burn in Hell you Communist Nazi Pig!
      • Re:I like it. (Score:3, Interesting)

        by JimDabell ( 42870 )

        By the way, Google has attempted to acheive this concept of human ranking by watching to see how long you stay at a page you clicked on. If they rank a page 1, and you click it, and immediately return to the search page, they penalize that page.

        Do you have a reference for that? According to the HTTP RFC [w3.org], user-agents aren't supposed to talk to the server when users hit their back buttons, but rather display what the user last saw (despite it possibly being stale). It seems odd that Google would try

      • Re:I like it. (Score:3, Interesting)

        by costas ( 38724 )
        Well, if you're excited about user-rankings and feedback, check out the newsbot in my .sig. It focuses on user interaction with the code/algorithm to build not just page rankings but also relationships --between pages, and between users. Try it out, I am guessing you'll like it...
      • "I like the idea of interactive page rankings. I don't think it should be the one decisive ranking alogrithm. But human interaction is just what search engines need."

        Big question is: how does it avoid Google's problems, (i.e. how does it perform when millions of people are spending big money trying to distort and to cripple the results?)

        If feedback were ever tried, it would certainly need to be personalised, and the sharing of page-rank limited to "people who voted similarly to you in the past thought *th
    • by nate1138 ( 325593 ) on Monday August 11, 2003 @02:53PM (#6667984)
      Perhaps one reason there were so few result returned is the fact that this seems to be more of a proof of concept than a fully functioning engine. Imagine combining a feedback mechanism with an already excellent search like Google. This can't stand alone, but it would be an excellent addition to an engine that already has a huge index.

      One thing that does worry me, what about the potential for abuse. Something like a script that connects to whittlebit, searches by a keyword important to your industry, and gives all of your competitors links thumbs-down.

    • I agree that Whittebit will most likely fail at overtaking Google, but this feature, or ones like it, could still take hold. What is stopping Google from implementing or buying this technology if it turns out to be useful. I think Google's search results would improve if they allowed for certain user feedback. Perhaps they could keep track of which link each user clicks after searching for 'sex' and incorporate that information into their alorithm. People/bots could try to promote their own sites, but G
    • I think improved interactivity COULD be the next big advance. The issue is how to control the input.

      Now, if Google had a subscription system, where subscribing members were randomly given limited moderator and metamoderator privileges...

  • by Dr. Transparent ( 77005 ) * on Monday August 11, 2003 @02:32PM (#6667772) Homepage Journal
    Great idea until the second month when your local viagra spammer's SEO guy moves all his pages to the top of the search for "Futurama" or "Ninja Turtles."
    • And that's why it will not suceed. Everything where users are given enough privileges can be turned into `unusable crap' by a group with `bad-intentions'.
    • by saskwach ( 589702 ) on Monday August 11, 2003 @02:40PM (#6667855) Homepage Journal

      I think this is for whittling down a person's individual searches. My preferences when I'm searching for something about rj45 plugs won't affect yours. This could be cool if used in conjunction with pagerank, so that I don't have to keep clicking on all the little "o"s...it makes it so I only have to see 1 page of links.

      The biggest flaw I can see with this system is that if I'm looking for something rare and specific, once I find it, I won't thumbs-up it, I'll just click on the link...It might be useful to have a "thumbs-down all on page checkbox" which might narrow the search intelligently.

      • Maybe the search results page can provide a link with a title something like "looks like it", in conrast to "let me see". If the summary provided looks useful you go with the former, if you don't have an opinion, you go with latter.
      • Something like that (Score:5, Interesting)

        by siskbc ( 598067 ) on Monday August 11, 2003 @02:51PM (#6667967) Homepage
        The biggest flaw I can see with this system is that if I'm looking for something rare and specific, once I find it, I won't thumbs-up it, I'll just click on the link...It might be useful to have a "thumbs-down all on page checkbox" which might narrow the search intelligently.

        That would help, but it would have to know why they're bad to know how it would differ from other results that might be more acceptable.

        Here's what I would do. First, instead of google returning the most relevant choices, it needs to be a factor of relevance and diversity. So, with the typical "apple" search, it would return some apple computer results, some fiona apple results, and some results about the fruit. All of those would be highly relevant, but it would only give, say, a few of each. You could then click on the more relevant results (if you wanted apple the fruit, you'd click on the three fruit links), at which point it would reject the others and give you more of what you want.

        The key here is that it would have to give diversity in the beginning for you to be *able* to differentiate things like what you want from things you don't. This is not how google works now, I don't believe.

        For what it's worth, this algorithm wouldn't be too complicated to do. I lack the programming ability, but I could do the algorithm in pseudocode (at point most decent programmers could reduce it to C++). It should be quite possible.

        • "For what it's worth, this algorithm wouldn't be too complicated to do. I lack the programming ability, but I could do the algorithm in pseudocode (at point most decent programmers could reduce it to C++). It should be quite possible"

          No offense but:
          In general, statements like that are used by people who haven't actually thought through the algorithm in detail, or who don't have good knowledge of algorithmic theory.

          In specific, your suggestion sounds excellent. Sufficiently excellent that I would be very
          • No offense but:

            In general, statements like that are used by people who haven't actually thought through the algorithm in detail, or who don't have good knowledge of algorithmic theory.

            None taken. Put it this way - I could write it in matlab, and I could write it pretty bad in C++. However, I'm not familiar with google's code, and wouldn't be able to integrate it into that. But I could write a version of it, just not as it would need to be, final form. In other words, I'm very familiar with the algorit

        • Yes, it would likely be very difficult, but Google might have done some of the hard work already. Remember their Google Labs tool that, given a few sample items, would return a list of other things in that category?

          If you gave it:
          apple
          pear
          orange

          It would return:
          grape
          cherry
          strawberry
          kiwi
          etc.

          If you gave it:
          apple
          dell
          compaq

          It would return
          gateway
          hp
          hewlett-packard
          ibm
          etc

          So, if Google's tool could also be used to identify the different meanings that each word has, then maybe they could give you a few li
          • So, if Google's tool could also be used to identify the different meanings that each word has, then maybe they could give you a few links for each meaning.

            That's very much the idea.

            Of course, many search items might not be this easy to categorize.

            That's true, but at point the most successful solution would be what google does now. In other words, if there isn't any clear substructure to the organization of results, just return the most relevant.

            We haven't yet proven that your algorithm is easy

            Obvio

    • It is possible to delay the serving of pages that require interactive action. Then an automated robot will not be very fast.
      Also, the behavioural pattern of an automated robot can be detected very easily, imagine a connection from a domain suddenly submits favorable reviews for a particular page, and no other such review is submitted. This should raise a red flag. If the effect of reviews is processed and used after an analysis, I think robots can be defeated.
    • I can see vendors writing scripts that will at random times access the search engine with searches related to their product, and automatically give "thumbs down" to high ranked results not affiliated with their own products, and thumbs up for their own pages.

      For pr0n, this would of course happen an order of a magnitude more often, starting two days BEFORE the search engine launches.

      Regards,
      --
      *Art
    • You are totally off track. The ranking is for your-self only and has no effect on other people. So, the only effect that would have would be for the spammer to be spamming himself.
  • As long as... (Score:2, Insightful)

    by vasqzr ( 619165 )

    Ad revenues have nothing to do with the ratings....

    All the good search engines end up corrupting themselves (by making money, which I guess is the point of anything...)

  • Kaltix (Score:5, Interesting)

    by bmongar ( 230600 ) on Monday August 11, 2003 @02:35PM (#6667809)
    I think something like what Kaltix is trying [com.com] has a better chance of replacing Google. However I don't see that happening either. I just think Google will learn from the user based systems
  • by The Bungi ( 221687 ) <thebungi@gmail.com> on Monday August 11, 2003 @02:35PM (#6667811) Homepage
    People have been doing experiments like these since the first search engine was rolled off the assembly line. They're prone to abuse and dependent on the goodwill of the user. Imagine of PageRank was based on this - that "SearchKing" dude would have a bot searching for crap and then voting "yes" every time.

    Won't work. Goodwill as we knew it in '95 is gone from the Internet.

  • hell no (Score:2, Interesting)

    by Anonymous Coward
    no, i dont want to have to give feedback in a search, I just want to type keywords and find related results ...
  • by numbski ( 515011 ) * <numbski&hksilver,net> on Monday August 11, 2003 @02:37PM (#6667834) Homepage Journal
    This is a great idea in concept, but the potential for abuse is incredibly high (if it's implement on a system that actually matters, like google).

    Imagine for a moment, a geek for hire, such as myself, writing a PERL script and deploying it on several servers nationwide. It uses LWP::UserAgent and spoofs a few different versions on IE on Windows. It then run searches for hot keywords that my client wants to rank high on. Then it 'mods down' anything it isn't my client's product, and 'mods up' what is, or links to, my clients products.

    Set the script to run several times a day at each location. Write some spyware that does so in the background of a shareware-app-for-hire (Kazaa?).

    You see where I'm going with this? Protections would have to be in place.
    • This is a great idea in concept, but the potential for abuse is incredibly high (if it's implement on a system that actually matters, like google).

      Check out the voting buttons on the google toolbar.
    • by xyzzy ( 10685 ) on Monday August 11, 2003 @03:09PM (#6668159) Homepage
      You're missing the point. The system isn't watching user actions while searching to fine tune OTHER user's results, but to fine tune THAT user's results.

      While you can certainly claim that one user's actions MIGHT indicate relevance for another user's queries, it's certainly true that if a user gives you a clue that the document you have returned is irrelevant, it must be irrelevant.
    • >Imagine for a moment, a geek for hire, such as myself, writing a PERL script and deploying it on several servers nationwide. It uses LWP::UserAgent and spoofs...

      You need to read the description again:

      will use your feedback to determine which keywords should be added or removed from your search, then you can search again to get more accurate results.

      This does not imply that the results of your feedback would affect somebody elses search.

      since this got modded up to a 5, i'd guess that the moderato
    • I guess to get around this we kinda need to create a list of "friends" or people who don't abuse the service. You know, honest people who care about things other than money.

      Another way to do this is for each user check to see how they are voting compared to the other votes for that site. If they are consitently opposite the public reaction then they are most likely some sort of troll. Also limiting the number of times someone can submit feedback could cut down on abuses. Thresholds 'n stuff.
  • Even though google uses PageRank, often sites are higher in the results are only there because they had the right keywords in the title. Sites like this have been tweaked with other similar tricks to score higher. Obviously, this new system would be able to get around this. Perhaps, when joined with Google, this could take over when PageRank fails to be applicable. Then we would have something great!
  • by nother_nix_hacker ( 596961 ) on Monday August 11, 2003 @02:39PM (#6667847)
    It was going well until we realised that all people wanted was pron so we just provide that now.
  • Similar concept... (Score:5, Informative)

    by X86Daddy ( 446356 ) on Monday August 11, 2003 @02:41PM (#6667866) Journal
    I think I found the link somewhere on Slashdot once:

    Gnod.net [gnod.net] is a learning system like a search engine that allows you to put in your three favorite authors/musicians/movies and it returns a series of "suggestions" that match, asking you if you like/dislike/haven't heard of each result in series.

    This sort of creature has the potential of placing the final nails in the media cartels' coffins, as it provides what's missing from current P2P and self-production techniques: a recommendation/promotion mechanism.
  • No. (Score:2, Troll)

    by Anonymous Coward
    Does this "search engine" search images? No. Google does.

    Does this "search engine" search 20 years of Usenet? No. Google does.

    Does this "search engine" provide stock quotes, maps, phone numbers, and news? No. Google does.

    Thanks for playing. Google will never lose.
    • Re:No. (Score:5, Funny)

      by mhesseltine ( 541806 ) on Monday August 11, 2003 @03:23PM (#6668298) Homepage Journal

      Is this testing a concept? YES

      Could something like this be implemented in Google? YES

      Is this supposed to replace Google? NO

      Are you a troll? YES

      Thanks for playing.

    • by Sanity ( 1431 ) *
      Does this "search engine" claim to reproduce all of Googles features? No

      Does this "search engine" allow users to give feedback on search results? Yes. Google doesn't.

      Thanks for playing. Google will never lose.
      Remember when people said that about Yahoo, or Netscape, or any of the other market dominating companies that were eventually replaced?
  • Warning: fsockopen(): unable to connect to 127.0.0.1:9182 (Connection refused) in /home/ian/whittlebit.com/wqserver.php on line 13
    Connection to WQServer failed

    I rate it thumbs down (for now)...
  • One word - "abuse" (Score:3, Insightful)

    by MrFenty ( 579353 ) on Monday August 11, 2003 @02:42PM (#6667875)
    This will quickly be abused, much like other rating systems like Amazon's book reviews. Anything worthwhile will ultimately be abused, you can be sure of that.
  • by (eternal_software) ( 233207 ) on Monday August 11, 2003 @02:42PM (#6667876)
    "This could be useful for those cases where Google just refuses to return the search results you want."

    That has really never happened to me. Google is fast and extremely accurate, especially when you do a more advanced search, + this and - that.

    I'm not sure I would want to take the time to "rate" search engine results and re-search when I can just fine-tune my search from the start.
  • by Sanity ( 1431 ) * on Monday August 11, 2003 @02:43PM (#6667883) Homepage Journal
    The server is down - it was totally ill-equipped to handle a slashdotting unfortunately, I was hoping it would get some testing, but this is a bit much ;-)

    As a poor substitute to being able to play with it (try bookmarking whittlebit.com and coming back in a day or two) I will try to answer people's questions. For the moment - here is the blurb from the front page:

    What is WhittleBit?

    Have you ever searched for something and wished you could tell the search engine that it was totally on the wrong track and it should try again? Well now you can! WhittleBit works much like most other search engines, except it can help you to refine your searches by allowing you to give positive or negative feedback on each search result.

    Simply rate the search results by clicking on the "thumbs up" or "thumbs down" buttons then click on Whittle to get a refined set of search results based on your feedback.

    Tips

    • Even if you visit another site and then return, WhittleBit will remember your search query until you explicitly click the "New Search" button.
    • You can either rate a search result on the results page itself, or visit the page and rate it using the buttons at the top of the page. You will return to the WhittleBit search results after clicking one of the buttons.
    • WhittleBit requires a browser which supports "Cookies" and "Frames" such as Mozilla [mozilla.org] or Internet Explorer [microsoft.com].
    - Ian Clarke, creator of WhittleBit
    • Ok, back up - kinda (Score:5, Informative)

      by Sanity ( 1431 ) * on Monday August 11, 2003 @02:58PM (#6668043) Homepage Journal
      Ok, it is back up after I killed the "whittling" engine - feel free to play with the UI, but it won't do anything intelligent.

      This was more intended as a proof of concept - rather than an all-out replacement for Google. I was frustrated with the way that Google works really well if you are looking for something easily defined and-or well known, but trying to find something obscure that was "masked" by more popular sites with similar keywards could be a real PITA. Whittlebit is designed to automate the manual process of trying to refine your keyword choice to get the search results you want.

      • Pardon my ignorance, but how is this different than entering my search term, identifying the topics that I don't want to see, then excluding them and searching again? I do this frequenly with google, like if I am searching for 'mushrooms', but I'm not interested in tripping, I can add '-magic -psilocybe' or '+morel' or whatever.

        Does this do some kind of bayesian-like filtering on the pages to get me a better match than I'm likely to come up with by using keywords?
        • Pardon my ignorance, but how is this different than entering my search term, identifying the topics that I don't want to see, then excluding them and searching again? I do this frequenly with google, like if I am searching for 'mushrooms', but I'm not interested in tripping, I can add '-magic -psilocybe' or '+morel' or whatever.

          It is different in that it automates the process of identifying which keywords should be excluded, rather than relying on the user to figure it out for themselves (which is often

    • Have you ever searched for something and wished you could tell the search engine that it was totally on the wrong track and it should try again?
      Isn't that what the back button is for?
    • The hitrate has died down enough that I think it might be able to handle it - I switched it back to full-functionality about 5 minutes ago and it seems to be coping - lets hope this continues (it is downhill from here - right?).
  • Sounds Great...but (Score:5, Insightful)

    by mstieg ( 68031 ) on Monday August 11, 2003 @02:44PM (#6667897)
    who wants to wade through results and rank them? I came here to search!

    That's why google is king. It doesn't require you to do *anything*. It barely *allows* you to do anything.

    And it still returns what you need.

    That's the perfect UI.
  • by Boss, Pointy Haired ( 537010 ) on Monday August 11, 2003 @02:44PM (#6667905)
    Google's PageRank is failing miserably for commercial search. PageRank is fine for academic / informational searches.

    In a commercial environment, it is simply not possible for a free search service to exist that is fair, represents an even distribution of wealth, and is immune from abuse.

    Advertising has to be paid for. "Free Search" is fine for university sites and purely non-profit informational pages, but for a commercial search your position in search engines must be purchased based on the keywords against which you wish to bid.

    Otherwise basic economics breaks down.
    • What do you mean by "commercial search"? Search engines are for information. Information can be used for commercial or noncommercial purposes. "Advertising" and "commercial information" are two different things. If you're just talking about searching for products or commercial information, "free search" engines work fine; the information is still findable. If you're looking for specific commercial information, go to the company's website and search there. But I don't agree with what you seem to be say
      • But I don't agree with what you seem to be saying, which is that search engines should be advertising disguised as a reference tool.

        Google is a reference tool through which people are advertising, and as an advertising medium, it is not good.

        That's the problem.

        By commercial search, I mean a search for products and services, such as "web hosting". There are thousands of companies providing "web hosting", but you go to Google, and the same company is #1 every time.

        That isn't right.

        Or set up a commerci
        • Some time ago after a heated argument, we decided that Google does not provide commercial search.

          The most perfect commercial search engines belong to Ebay.com followed by Amazon.com. As the previous poster said, there is no way to quickly evaluate commercial results in Google... Froogle maybe in the future might offer decent commercial search, but Google won't because of Pagerank, which is fine for searching how to format a Linux hard disk, but not fine for searching where to buy Linux hard drives.

          I kno

  • great news! (Score:2, Interesting)

    This seems like a great idea. Google might be number 1 in the search engine rankings at the moment but it would be good to see them have a bit of competition so that they do not use their dominant position for financial gain.

    Here in the lab we're doing some work on using the principles of thermodynamics in order to improve search engines. The second law of thermodynamics states that in a closed system ethalpy will alway increase, which is a lot like the disorder cause by sites spamming themselves to se
    • "Google might be number 1 in the search engine rankings at the moment but it would be good to see them have a bit of competition so that they do not use their dominant position for financial gain."

      Sorry to burst your bubble, but Google is a commercial company. Its aim is to make money. I see nothing wrong with Google using its dominant position to make money. After all, it does this because a lot of people think it's the best search engine out there, not because it forces others out of business with sha

  • i would have given a thumbs down to this message, but unfortunately there was no thumbs down at all. ;) Warning: fsockopen(): unable to connect to 127.0.0.1:9182 (Connection refused) in /home/ian/whittlebit.com/wqserver.php on line 13 Connection to WQServer failed
  • how long until google buys them out?

    I give it 3 weeks after they begin getting rave reviews.

  • by Anonymous Coward on Monday August 11, 2003 @02:49PM (#6667944)
    What is really needed is to separate out commercial sites. Google works great 90% of the time but when you are searching for something that triggers a response from sites trying to sell something, the results get swamped with the commercial noise.

    This would benefit commercial sites because when you really are looking to buy something, you will be guaranteed not to be annoyed by anything non-commercial.

    -- YAAC (Yet Another Anonymous Coward)
  • How ironic? (Score:2, Interesting)

    by CompWerks ( 684874 )
    Is it that a google search for whittlebit doesn't even have a link to whittlebit.com.
  • What we need are computers that experience pleasure and pain, along with the means to deliver these sensations.

    When a search engine delivers good results, the user rewards the engine with a dose of pleasure.

    In return for bad results, the user unleashes a blast of pain.

    That should teach the circuits a thing or two about delivering the goods!

  • Kartoo (Score:2, Informative)

    I have used kartoo [kartoo.com] and like it.

    It does not "learn" per se, but allows you to select from multiple possibilities using a GUI - and it has been available for a while.

    If I have problems finding something with Google, I use Kartoo.
  • I've wished that Google would do this for ages. The possibilites for increasing accuracy are endless with a model for this. I wonder where else this could go. Maybe some sort of integration with another (though possibly encumbered by its relationship with LookSmart) post-google search engine like Grub? However, this is a BIG step. Once information like this begins to be integrated into a massive database, we could see the next quantum leap in search engine accuracy, and possibly breadth. One thing tha
  • Google Problems (Score:2, Interesting)

    by Ateryx ( 682778 )
    Although probably bias as it is by msn, there was an excellent article about the faults of google in a past article [slashdot.org]

    Unless I read the article incorrectly, this response-feedback-accuracy was the exact cause of the problem with google as shown by msn.

    Just an observation...

  • Pagerank cool (Score:4, Interesting)

    by MxTxL ( 307166 ) on Monday August 11, 2003 @02:56PM (#6668020)
    Page rank is cool, uses distributed data to improve search results. Definately AWESOME in the search engine world.

    BUT i would also like to see the distributed concept applied to searching itself. Something like this idea, but having the engine return results on what were popular click-thrus for searches. From what i can tell (IANA Google Expert) Google isn't keeping click through data on search results (they are on the adwords, but that's different). By tracking click thru data and calculating how long a user stayed at a clicked result before hitting the back button or otherwise returning to google... good insights can be learned. Aggregate this over millions of users with billions of page views... wouldn't take too long to figure out what everyone wants to see for a particular search result. Combine all of that with improving your searches by what others are searching for... i think you are talking a powerful system.

    Granted this whole idea may be liable to spamming and all of that... but that's not part of the concept yet. On the surface, it seems like a good idea.

    NOTE: I know other engines track click thrus, but i don't think any of them do it for non-advertising purposes.... if it's purely to improve results then cool. If it's to show you better ads, not cool.
    • By tracking click thru data and calculating how long a user stayed at a clicked result before hitting the back button or otherwise returning to google... good insights can be learned.

      True, but remember this: hitting the back button on a browser often doesn't generate a new HTTP request for the page you're returning to--normally the browser will just re-draw the cached version of the document that it already retrieved.

      Thus, to the Google session tracker, it will look like you followed the link and never c
      • I did over simplify there, the actual tracker would have to be smart enough to know that when a person clicked the next site after having backed up from the first one that they had gone back. There will be some loss there, but with aggregated results it should get lost in the statistics.

        The whole thing is further complicated by people using tabbed browsing and multiple windows and such, but still in the base case, it's an interesting idea.
  • Totally unneeded. (Score:2, Insightful)

    by Kickasso ( 210195 )
    Web pages are already rated -- by other web pages. Ever noticed these blue underlined chunks of text? They are called links. Each link is a rating that says "Lookie here, I liked it and you might too!" And somebody already uses this rating system in a search engine. Bonus points for correctly guessing who.
  • Not prone to abuse (Score:2, Insightful)

    by blchrist ( 695764 )
    I don't understand how this system can be abused. From the post: WhittleBit will use your feedback to determine which keywords should be added or removed from your search, then you can search again to get more accurate results.

    People are not changing how the search engine ranks the results for other people, it is just slightly modifying your query to produce more precise results. How can that be abused to make trash sites show up with rank 1?

  • Google do this (Score:3, Informative)

    by Richard5mith ( 209559 ) on Monday August 11, 2003 @02:58PM (#6668050) Homepage
    I'm sure I've seen Google do this. I've occasionally seen that links I click on in Google search results get forwarded through another Google URL which is no doubt tracking what I'm clicking on.

    Like a lot of Google features they're testing though, it's very much random and it's been a month since I've seen it.
  • by ChiefArcher ( 1753 ) on Monday August 11, 2003 @03:06PM (#6668136) Homepage Journal
    I think people will start making their websites look better.. and then make other ones look bad (like it's been said in here).

    What if i get a list of proxys.. write a program and click on each of the links and rate all of them..
    It's easy as that... I don't think it'll work.

    All the porn and viagra sites will be #1

    Chiefarcher
  • Server declares "Nobody loves me" before crashing and taking down the search engine which allowed users to rank its results.

    Experts believe this was due to repeated thumbs down given to its site within its own results.
  • Abuse (Score:3, Insightful)

    by Ed Avis ( 5917 ) <ed@membled.com> on Monday August 11, 2003 @03:07PM (#6668149) Homepage
    So how do you deal with trolls and spammers who will vote up or vote down sites for partisan reasons? Or ignoring that, what about straightforward differences of opinion? (The world may be polarized 50/50 between those who think 'firebird' refers to a database and those who think it is a web browser - at least among the geekier-than-average WhittleBit users.)

    Anonymous feedback won't scale well to the big bad Internet; some kind of login and network of trust is needed.
  • Google has this too (Score:4, Informative)

    by acm ( 107375 ) on Monday August 11, 2003 @03:12PM (#6668192) Homepage
    If you install the google toolbar you can vote for or against pages on an individual basis.

    acm
  • Something needs to be done to seperate stories,informative article useful for research and education from the crass commercial websites that are like SPAM on all search engines. Some sort of separation needed. Do something about that and i will be happier. Just type in anything about money or business on all the search engines and you will be flooded with irrelevant links.
  • While the idea has plenty of problems for use on a general web search engine, it could work very well to tune results on a site's internal search engine, where the user has no vested interest in one result coming up higher than the others, the user only wants good results.

    It might also have potential, even if the thumbs up/thumbs down are only shown to trusted users. One of the enduring problems in tuning search engines is that the people who build the search engine aren't the people who know the content b
  • Netnose! (Score:5, Informative)

    by notsoanonymouscoward ( 102492 ) on Monday August 11, 2003 @03:16PM (#6668234) Journal
    Not trying to steal the show too much from whittlebit [whittlebit.com], but theres another new search engine recently released. Netnose [netnose.com] lets the users decide which keywords a web page should be listed under. The search results also include handy identifiers about the page content like whether it has popups, or contains adult material (as decided by the raters).
  • by presroi ( 657709 ) <neubau@presroi.de> on Monday August 11, 2003 @03:17PM (#6668241) Homepage
    Could improved interactivity be the next big search engine advancement after Pagerank?"


    Well, actually, Google does receive feedback. Once in I while, google changes its result page in a way alexa is doing every time:

    You don't get a url to the result back but rather a pointer in a way like www.google.com/result?target=realurl.

    I'm sorry that I can't provide you a real url but I'm confident that someone in this /.-crowd can help me out. Thank you in advance.
  • Abuse (Score:2, Informative)

    by sageFool ( 36961 )
    Seems totally open to abuse, and there seem like their are issues with people not rating results and keeping the statistics meaningful. If they can get something up for doing ratings and figuring out if a user thinks a result is 'good' or 'bad' that is easy for the user to use, isn't abuseable, and has some kind of statistical validity I will be impressed, but I think it is much harder to do than most people think. Yar!
  • While posters seem to have high programming skills, most people here are assuming that your thumbs up/down affect other people's searches. That is not the case.

    This is not the first time I have seen people make similar dumb mistakes.

    I think it has to do with not reading the actual article/sample. /.ers tend to skim a lot.

  • by gbnewby ( 74175 ) on Monday August 11, 2003 @03:39PM (#6668493) Homepage
    In the academic field of information retrieval, this is called "relevance feedback." It's a part of many information retrieval (IR) algorithms, some of which can happen automatically (i.e., unsupervised). There is also overlap with the fields of machine learning and even Bayesian processes (see today's other /. story about spam filters -- spam filtering is actually the same problem, conceptually, as search engines try to solve).

    In Yahoo and other search engines (but not Google, that I've seen), you often get a "click-through" that goes to their system before transparently redirecting to the actual URL you clicked. This is relevance feedback. It's true that the system can't determine whether you LIKED the site (aka, whether it was "relevant"), but at least it's some sort of feedback the system can use to tune.

    The other most familiar type of system I can think of is Alexa [alexa.com] (now owned by Amazon.com, and the brainchild of the Internet Archive's Brewster Kahle). With Alexa, they could count not just that you visited a site, but how long you spent and where else you went. This is at least part of the basis for Amazon's recommendation system for books and other geegaws they sell.

    Can this work in a search engine? Yes, certainly. Does it mean that a search engine that implements relevance feedback will instantly be better than Google? Definitely not! There are many other things (about 20, from what I've heard) that go in to the ranking system that Google uses...Pagerank is one of them, but there are many other factors (such as term frequency, document HTML structure, etc.). Some these, notably Pagerank, work poorly on relatively small collections (in the TREC conference [nist.gov], people have almost never found that Pagerank, HITS or similar algorithmns improve performance with "only" a few tens of GB of Web documents -- a few million pages).

    Wanna know more about information retrieval? The TREC page above is very good for state-of-the-art research reports (see the Publications area -- it's all online and free). More general texts are mostly in libraries, but one good one online is Managing Gigabytes [mu.oz.au], which covers the IR aspects thoroughly and also has lots of ideas about how to use compression in an IR system (something that I'm curious whether Google & others do).

"Protozoa are small, and bacteria are small, but viruses are smaller than the both put together."

Working...