Search Engine Learns From User Feedback 269
An anonymous reader writes "Ian Clarke, founder of the Freenet project, has set up a web search engine that allows users to rate each of the search results it returns. WhittleBit will use your feedback to determine which keywords should be added or removed from your search, then you can search again to get more accurate results. This could be useful for those cases where Google just refuses to return the search results you want. Could improved interactivity be the next big search engine advancement after Pagerank?"
no it won't replace google. (Score:5, Interesting)
In short, no.
I have tried Whittebit before (a user had a link to it in his
I can't see google's superiority being challenged by this at all. What else would Whittebit offer me other than this "feature"? I didn't see anything else when I used it (and in fact, was rather annoyed by the fact that it remained at the top of the screen while reading the link I was sent to).
No thanks, just my worthless
I like it. (Score:5, Interesting)
I do a lot with Google, and it leaves some to be desired. The goal of Google is to make the ranking of pages partly out of the hands of webmasters, so they can't just trick the spiders. And that has worked very well for Google (serves over 70% of internet searches). But all page ranks are very cold and calculated. Maybe that cold, calculated rank is a good place to start, and then it's time for human reviewers to fine tune the list.
By the way, Google has attempted to acheive this concept of human ranking by watching to see how long you stay at a page you clicked on. If they rank a page 1, and you click it, and immediately return to the search page, they penalize that page. So if even Google is trying the same abstract concept, it probably has a future on the web.
Re:I like it. (Score:5, Interesting)
If that's true, then the way I do searches is counter-productive. I load the google search page, and then middle-click all the links that look the most promising and read them in tabs. No wonder Google's searches have seemed to get worse and worse for me lately, I'm training it to think my most promising results are no good!
Re:I like it. (Score:2)
I do the same, but I don't think Google "knows" that you're doing that.
As far as I know, Mozilla doesn't send Google any info when you middle-click (or even left-click) on an outside link (if you clicked on the cached link, then of course it does).
So if you never refresh the Google tab, it doesn't report anything of your Googling habits back to Google. So you're not training it at
Re:I like it. (Score:2)
What is probably more counter-productive would be to turn off cookies, but even then without the information, Google wouldn't get worse, I would think it would just return the same results everytime.
Re:I like it. (Score:2)
Re:I like it. (Score:2)
...as long as it's a sugar cookie, yummy!
But if that is what you are using Google for...Burn in Hell you Communist Nazi Pig!
Re:I like it. (Score:3, Interesting)
Do you have a reference for that? According to the HTTP RFC [w3.org], user-agents aren't supposed to talk to the server when users hit their back buttons, but rather display what the user last saw (despite it possibly being stale). It seems odd that Google would try
Re:I like it. (Score:2)
Before acting so condescending, please be familiar with the subject matter and think about what you are saying. It seems you are doing neither.
My point is that clicking your back button shouldn't return you to their site in the way you are thinking of. It should show a stale copy - therefore no talking to the server, and no transmitting of cookies.
Re:I like it. (Score:3, Interesting)
Re:I like it. (Score:2)
Big question is: how does it avoid Google's problems, (i.e. how does it perform when millions of people are spending big money trying to distort and to cripple the results?)
If feedback were ever tried, it would certainly need to be personalised, and the sharing of page-rank limited to "people who voted similarly to you in the past thought *th
Re:no it won't replace google. (Score:5, Insightful)
One thing that does worry me, what about the potential for abuse. Something like a script that connects to whittlebit, searches by a keyword important to your industry, and gives all of your competitors links thumbs-down.
Re:no it won't replace google. (Score:2)
If the folks at WhittleBit are smart (and Ian Clarke is, very), they will only let the thumbs up/down ratings affect the current session. Or maybe if the use a login or cookie to uniquely identify users, they could save the
Re:no it won't replace google. (Score:2)
Re:no it won't replace google. (Score:2)
I think improved interactivity COULD be the next big advance. The issue is how to control the input.
Now, if Google had a subscription system, where subscribing members were randomly given limited moderator and metamoderator privileges...
Cool, but can't last (Score:5, Insightful)
Re:Cool, but can't last (Score:2, Insightful)
Re:Cool, but can't last (Score:5, Interesting)
I think this is for whittling down a person's individual searches. My preferences when I'm searching for something about rj45 plugs won't affect yours. This could be cool if used in conjunction with pagerank, so that I don't have to keep clicking on all the little "o"s...it makes it so I only have to see 1 page of links.
The biggest flaw I can see with this system is that if I'm looking for something rare and specific, once I find it, I won't thumbs-up it, I'll just click on the link...It might be useful to have a "thumbs-down all on page checkbox" which might narrow the search intelligently.
Re:Cool, but can't last (Score:2)
Something like that (Score:5, Interesting)
That would help, but it would have to know why they're bad to know how it would differ from other results that might be more acceptable.
Here's what I would do. First, instead of google returning the most relevant choices, it needs to be a factor of relevance and diversity. So, with the typical "apple" search, it would return some apple computer results, some fiona apple results, and some results about the fruit. All of those would be highly relevant, but it would only give, say, a few of each. You could then click on the more relevant results (if you wanted apple the fruit, you'd click on the three fruit links), at which point it would reject the others and give you more of what you want.
The key here is that it would have to give diversity in the beginning for you to be *able* to differentiate things like what you want from things you don't. This is not how google works now, I don't believe.
For what it's worth, this algorithm wouldn't be too complicated to do. I lack the programming ability, but I could do the algorithm in pseudocode (at point most decent programmers could reduce it to C++). It should be quite possible.
Re:Something like that (Score:2)
No offense but:
In general, statements like that are used by people who haven't actually thought through the algorithm in detail, or who don't have good knowledge of algorithmic theory.
In specific, your suggestion sounds excellent. Sufficiently excellent that I would be very
So why *isn't* this being done? (Score:3, Interesting)
In general, statements like that are used by people who haven't actually thought through the algorithm in detail, or who don't have good knowledge of algorithmic theory.
None taken. Put it this way - I could write it in matlab, and I could write it pretty bad in C++. However, I'm not familiar with google's code, and wouldn't be able to integrate it into that. But I could write a version of it, just not as it would need to be, final form. In other words, I'm very familiar with the algorit
Re:Something like that (Score:2)
If you gave it:
apple
pear
orange
It would return:
grape
cherry
strawberry
kiwi
etc.
If you gave it:
apple
dell
compaq
It would return
gateway
hp
hewlett-packard
ibm
etc
So, if Google's tool could also be used to identify the different meanings that each word has, then maybe they could give you a few li
Doesn't have to be useful for everything (Score:2)
That's very much the idea.
Of course, many search items might not be this easy to categorize.
That's true, but at point the most successful solution would be what google does now. In other words, if there isn't any clear substructure to the organization of results, just return the most relevant.
We haven't yet proven that your algorithm is easy
Obvio
Re:Cool, but can't last (Score:3, Interesting)
Also, the behavioural pattern of an automated robot can be detected very easily, imagine a connection from a domain suddenly submits favorable reviews for a particular page, and no other such review is submitted. This should raise a red flag. If the effect of reviews is processed and used after an analysis, I think robots can be defeated.
Open for exploitation (Score:2)
For pr0n, this would of course happen an order of a magnitude more often, starting two days BEFORE the search engine launches.
Regards,
--
*Art
Re:Cool, but can't last (Score:2)
As long as... (Score:2, Insightful)
Ad revenues have nothing to do with the ratings....
All the good search engines end up corrupting themselves (by making money, which I guess is the point of anything...)
Kaltix (Score:5, Interesting)
I doubt this will fly (Score:5, Informative)
Won't work. Goodwill as we knew it in '95 is gone from the Internet.
Re:I doubt this will fly (Score:2, Interesting)
Re:I doubt this will fly (Score:2)
Re:I doubt this will fly (Score:2)
Re:I doubt this will fly (Score:2)
Obviously, they need some sort of Meta Moderation (Score:2)
hell no (Score:2, Interesting)
Ack! Do you know what you're doing? (Score:5, Interesting)
Imagine for a moment, a geek for hire, such as myself, writing a PERL script and deploying it on several servers nationwide. It uses LWP::UserAgent and spoofs a few different versions on IE on Windows. It then run searches for hot keywords that my client wants to rank high on. Then it 'mods down' anything it isn't my client's product, and 'mods up' what is, or links to, my clients products.
Set the script to run several times a day at each location. Write some spyware that does so in the background of a shareware-app-for-hire (Kazaa?).
You see where I'm going with this? Protections would have to be in place.
Re:Ack! Do you know what you're doing? (Score:2)
Check out the voting buttons on the google toolbar.
Re:Ack! Do you know what you're doing? (Score:5, Insightful)
While you can certainly claim that one user's actions MIGHT indicate relevance for another user's queries, it's certainly true that if a user gives you a clue that the document you have returned is irrelevant, it must be irrelevant.
Re:Ack! Do you know what you're doing? (Score:2)
Re:Ack! Do you know what you're doing? (Score:2)
You need to read the description again:
will use your feedback to determine which keywords should be added or removed from your search, then you can search again to get more accurate results.
This does not imply that the results of your feedback would affect somebody elses search.
since this got modded up to a 5, i'd guess that the moderato
Re:Ack! Do you know what you're doing? (Score:2)
Another way to do this is for each user check to see how they are voting compared to the other votes for that site. If they are consitently opposite the public reaction then they are most likely some sort of troll. Also limiting the number of times someone can submit feedback could cut down on abuses. Thresholds 'n stuff.
tweaking for higher results (Score:2, Insightful)
Body before it gets /.ed (Score:5, Funny)
Re:Body before it gets /.ed (Score:2)
Similar concept... (Score:5, Informative)
Gnod.net [gnod.net] is a learning system like a search engine that allows you to put in your three favorite authors/musicians/movies and it returns a series of "suggestions" that match, asking you if you like/dislike/haven't heard of each result in series.
This sort of creature has the potential of placing the final nails in the media cartels' coffins, as it provides what's missing from current P2P and self-production techniques: a recommendation/promotion mechanism.
No. (Score:2, Troll)
Does this "search engine" search 20 years of Usenet? No. Google does.
Does this "search engine" provide stock quotes, maps, phone numbers, and news? No. Google does.
Thanks for playing. Google will never lose.
Re:No. (Score:5, Funny)
Is this testing a concept? YES
Could something like this be implemented in Google? YES
Is this supposed to replace Google? NO
Are you a troll? YES
Thanks for playing.
Re:No. (Score:2)
Does this "search engine" allow users to give feedback on search results? Yes. Google doesn't.
Remember when people said that about Yahoo, or Netscape, or any of the other market dominating companies that were eventually replaced?Thumbs down.... (Score:2)
Connection to WQServer failed
I rate it thumbs down (for now)...
One word - "abuse" (Score:3, Insightful)
Google is Highly Accurate (Score:5, Interesting)
That has really never happened to me. Google is fast and extremely accurate, especially when you do a more advanced search, + this and - that.
I'm not sure I would want to take the time to "rate" search engine results and re-search when I can just fine-tune my search from the start.
Ouch - major slashdot - mirror of page (Score:5, Informative)
As a poor substitute to being able to play with it (try bookmarking whittlebit.com and coming back in a day or two) I will try to answer people's questions. For the moment - here is the blurb from the front page:
- Ian Clarke, creator of WhittleBitOk, back up - kinda (Score:5, Informative)
This was more intended as a proof of concept - rather than an all-out replacement for Google. I was frustrated with the way that Google works really well if you are looking for something easily defined and-or well known, but trying to find something obscure that was "masked" by more popular sites with similar keywards could be a real PITA. Whittlebit is designed to automate the manual process of trying to refine your keyword choice to get the search results you want.
Re:Ok, back up - kinda (Score:2)
Does this do some kind of bayesian-like filtering on the pages to get me a better match than I'm likely to come up with by using keywords?
Re:Ok, back up - kinda (Score:2)
It is different in that it automates the process of identifying which keywords should be excluded, rather than relying on the user to figure it out for themselves (which is often
Re:Ouch - major slashdot - mirror of page (Score:2)
Now fully operational again (Score:2)
Sounds Great...but (Score:5, Insightful)
That's why google is king. It doesn't require you to do *anything*. It barely *allows* you to do anything.
And it still returns what you need.
That's the perfect UI.
Re:Sounds Great...but (Score:5, Funny)
You're aren't required to do anything.... are barely allowed to do anything..... and this is perfect?
you must be a Mac user, right?
"Free Search" has no place in the commercial web. (Score:3, Interesting)
In a commercial environment, it is simply not possible for a free search service to exist that is fair, represents an even distribution of wealth, and is immune from abuse.
Advertising has to be paid for. "Free Search" is fine for university sites and purely non-profit informational pages, but for a commercial search your position in search engines must be purchased based on the keywords against which you wish to bid.
Otherwise basic economics breaks down.
commercial search? (Score:2)
Re:commercial search? (Score:2)
Google is a reference tool through which people are advertising, and as an advertising medium, it is not good.
That's the problem.
By commercial search, I mean a search for products and services, such as "web hosting". There are thousands of companies providing "web hosting", but you go to Google, and the same company is #1 every time.
That isn't right.
Or set up a commerci
Re:commercial search? (Score:2)
The most perfect commercial search engines belong to Ebay.com followed by Amazon.com. As the previous poster said, there is no way to quickly evaluate commercial results in Google... Froogle maybe in the future might offer decent commercial search, but Google won't because of Pagerank, which is fine for searching how to format a Linux hard disk, but not fine for searching where to buy Linux hard drives.
I kno
Re:"Free Search" has no place in the commercial we (Score:2)
Of course there is no requirement on a search engine to represent an even distribution of wealth, but it is in the SE's own interest to if it does not want to become the spam-fest that is the commercial Google.
When there are thousands of companies providing $service, why should serch engines direct the overwhelming majority of traffic to the one site that happens to fit their algorithmic opinion the best.
Any syste
Re:"Free Search" has no place in the commercial we (Score:2)
I know exactly why: because he's joking. Look at his user name.
great news! (Score:2, Interesting)
Here in the lab we're doing some work on using the principles of thermodynamics in order to improve search engines. The second law of thermodynamics states that in a closed system ethalpy will alway increase, which is a lot like the disorder cause by sites spamming themselves to se
Re:great news! (Score:2)
Sorry to burst your bubble, but Google is a commercial company. Its aim is to make money. I see nothing wrong with Google using its dominant position to make money. After all, it does this because a lot of people think it's the best search engine out there, not because it forces others out of business with sha
thumbs down (Score:2)
how long until (Score:2, Funny)
I give it 3 weeks after they begin getting rave reviews.
What is really needed is... (Score:5, Interesting)
This would benefit commercial sites because when you really are looking to buy something, you will be guaranteed not to be annoyed by anything non-commercial.
-- YAAC (Yet Another Anonymous Coward)
How ironic? (Score:2, Interesting)
Pleasure and Pain (Score:2)
When a search engine delivers good results, the user rewards the engine with a dose of pleasure.
In return for bad results, the user unleashes a blast of pain.
That should teach the circuits a thing or two about delivering the goods!
Re:Pleasure and Pain (Score:4, Funny)
Masochist and Sadist (Score:2)
A little reverse psychology will fix perverse, disobedient systems:
"The masochist says 'Hurt me'
Kartoo (Score:2, Informative)
It does not "learn" per se, but allows you to select from multiple possibilities using a GUI - and it has been available for a while.
If I have problems finding something with Google, I use Kartoo.
Post-Google Searching (Score:2, Interesting)
Re:Post-Google Searching (Score:2)
Google Problems (Score:2, Interesting)
Unless I read the article incorrectly, this response-feedback-accuracy was the exact cause of the problem with google as shown by msn.
Just an observation...
Pagerank cool (Score:4, Interesting)
BUT i would also like to see the distributed concept applied to searching itself. Something like this idea, but having the engine return results on what were popular click-thrus for searches. From what i can tell (IANA Google Expert) Google isn't keeping click through data on search results (they are on the adwords, but that's different). By tracking click thru data and calculating how long a user stayed at a clicked result before hitting the back button or otherwise returning to google... good insights can be learned. Aggregate this over millions of users with billions of page views... wouldn't take too long to figure out what everyone wants to see for a particular search result. Combine all of that with improving your searches by what others are searching for... i think you are talking a powerful system.
Granted this whole idea may be liable to spamming and all of that... but that's not part of the concept yet. On the surface, it seems like a good idea.
NOTE: I know other engines track click thrus, but i don't think any of them do it for non-advertising purposes.... if it's purely to improve results then cool. If it's to show you better ads, not cool.
Re:Pagerank cool (Score:2)
True, but remember this: hitting the back button on a browser often doesn't generate a new HTTP request for the page you're returning to--normally the browser will just re-draw the cached version of the document that it already retrieved.
Thus, to the Google session tracker, it will look like you followed the link and never c
Re:Pagerank cool (Score:2)
The whole thing is further complicated by people using tabbed browsing and multiple windows and such, but still in the base case, it's an interesting idea.
Totally unneeded. (Score:2, Insightful)
Not prone to abuse (Score:2, Insightful)
People are not changing how the search engine ranks the results for other people, it is just slightly modifying your query to produce more precise results. How can that be abused to make trash sites show up with rank 1?
Google do this (Score:3, Informative)
Like a lot of Google features they're testing though, it's very much random and it's been a month since I've seen it.
Re:Google do this (Score:2)
I don't know about this (Score:3, Insightful)
What if i get a list of proxys.. write a program and click on each of the links and rate all of them..
It's easy as that... I don't think it'll work.
All the porn and viagra sites will be #1
Chiefarcher
In other news... (Score:2, Funny)
Experts believe this was due to repeated thumbs down given to its site within its own results.
Abuse (Score:3, Insightful)
Anonymous feedback won't scale well to the big bad Internet; some kind of login and network of trust is needed.
Google has this too (Score:4, Informative)
acm
Commercial sites overflooding search engines (Score:2, Insightful)
Better for limited document sets (Score:2, Insightful)
It might also have potential, even if the thumbs up/thumbs down are only shown to trusted users. One of the enduring problems in tuning search engines is that the people who build the search engine aren't the people who know the content b
Netnose! (Score:5, Informative)
Google phoning home... (Score:5, Informative)
Well, actually, Google does receive feedback. Once in I while, google changes its result page in a way alexa is doing every time:
You don't get a url to the result back but rather a pointer in a way like www.google.com/result?target=realurl.
I'm sorry that I can't provide you a real url but I'm confident that someone in this
Abuse (Score:2, Informative)
General Lack of intelligence by posters (Score:2)
This is not the first time I have seen people make similar dumb mistakes.
I think it has to do with not reading the actual article/sample. /.ers tend to skim a lot.
Re:General Lack of intelligence by posters (Score:2)
More importantly, if you did not get any of the details, then people should keep their mouths shut. What would you think about someone that gave a review of a movie, after they saw the "previews", without ever seeing the movie?
Finally, this fact SHOULD have been obvious to anyone that t
It's called "Relevance Feedback" (Score:5, Interesting)
In Yahoo and other search engines (but not Google, that I've seen), you often get a "click-through" that goes to their system before transparently redirecting to the actual URL you clicked. This is relevance feedback. It's true that the system can't determine whether you LIKED the site (aka, whether it was "relevant"), but at least it's some sort of feedback the system can use to tune.
The other most familiar type of system I can think of is Alexa [alexa.com] (now owned by Amazon.com, and the brainchild of the Internet Archive's Brewster Kahle). With Alexa, they could count not just that you visited a site, but how long you spent and where else you went. This is at least part of the basis for Amazon's recommendation system for books and other geegaws they sell.
Can this work in a search engine? Yes, certainly. Does it mean that a search engine that implements relevance feedback will instantly be better than Google? Definitely not! There are many other things (about 20, from what I've heard) that go in to the ranking system that Google uses...Pagerank is one of them, but there are many other factors (such as term frequency, document HTML structure, etc.). Some these, notably Pagerank, work poorly on relatively small collections (in the TREC conference [nist.gov], people have almost never found that Pagerank, HITS or similar algorithmns improve performance with "only" a few tens of GB of Web documents -- a few million pages).
Wanna know more about information retrieval? The TREC page above is very good for state-of-the-art research reports (see the Publications area -- it's all online and free). More general texts are mostly in libraries, but one good one online is Managing Gigabytes [mu.oz.au], which covers the IR aspects thoroughly and also has lots of ideas about how to use compression in an IR system (something that I'm curious whether Google & others do).
Not my damn job (Score:2, Insightful)
Nice TROLL! (Score:2)