Slashdot Log In
SETI's Anti-Cheating Strategy
Posted by
michael
on Thu May 24, 2001 08:06 AM
from the lots-and-lots-of-proctors dept.
from the lots-and-lots-of-proctors dept.
mtDNA writes: "There's an article in the New York Times about the strategies SETI is using to avoid fraudulent reports. One trick they're using is multiple analyses of the same data. Another strategy is the use of "ringer" data, where they send you fake data for which they know the results." One of the researchers has several postscript papers on his home page - Incentives for Sharing in Peer-to-Peer Networks, Uncheatable Distributed Computations, Distributed Computing with Payout. In related news, ProcessTree apparently sent out an email to participants indicating it is closing up shop, so although SETI seems to be chugging along, the idea of distributed computing as a business model is perhaps a bit premature.
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
Active punishment? (Score:1)
Eventually these projects are going to go to identity-based security, and the feds will be only to happy to issue Internet Driver's Licenses.
Article text (Score:1)
May 24, 2001
The Search for E.T. Yields Earthly Cheats
By J. D. BIERSDORFER
THE SETI@home program, the distributed computing project that harnesses the power of
personal computers to look for signs of extraterrestrial intelligence, signed up its three
millionth user last week. SETI, which began in 1999, has quickly become the most popular
public computing project of all time.
But what may appear to be the search for E. T. phoning home has sometimes turned out to be
the signals of people cheating the project by falsifying results. Unfortunately for the dishonest,
Philippe Golle and Ilya Mironov, both doctoral students in the computer science department at
Stanford University, have come up with a set of security schemes that can help thwart those
trying to claim computing work that they did not actually complete.
"It is worth bearing in mind that it takes only one talented or lucky hacker to potentially ruin a
distributed computation," Mr. Golle wrote in an e-mail message.
In their recent paper, "Uncheatable Distributed Computations," Mr. Golle and Mr. Mironov
explain how to verify that the work has been done, by inserting special checkpoints, or
"ringers," into a unit of distributed data. If the data is returned to the sender without the
purposely planted material among the results, the organization knows the data was not
processed and the user is trying to cheat.
The idea that someone might cheat SETI@home is almost as shocking as the actual discovery
of little green men would be. SETI@home is a typical example of a large-scale, Internet-based
distributed computing project: users donate their computers' spare processing time by installing
software to crunch data from Arecibo Radio Observatory and return the results to the sender.
The SETI@home people were well aware that some participants might cheat, whether by
tampering with the data file they were given to process or hacking the program's settings.
Although fewer than 1 percent of the work units appear to have been tampered with, Dr. David
Anderson, the project coordinator for SETI@home, estimated that there had been some months
during the project when half of its resources were devoted to smoking out cheaters.
"What we ended up doing," Dr. Anderson said, "for a variety of reasons, is to process each
piece of data several times and wait until all the results get back and compare them."
The SETI project relies on unpaid volunteers; the cheaters seem motivated purely by a desire
to get a high user ranking on a project Web page. Dr. Anderson said it was fairly easy to reject
work submitted by cheaters and to cancel their SETI@home accounts, even though the cheaters
could get other accounts.
The potential for cheating is increasingly worrisome as commercial distributed computing
ventures that offer cash or credit to participants, like Ubero (www.ubero.net), become more
commonplace.
"As soon as you offer any kind of incentive, you will invite cheating," said Armin Lenz, a
former executive at a commercial distributed computing company who is familiar with the need
for security in online projects. "Be it stats, money or giveaways -- it is just human nature to try
to get things the easy way."
In the case of SETI@home, a bigger concern is not that the data unit returned by a user was
completed or not, but that the result returned was accurate and free of incorrect results from
tampering or faulty user hardware. "The challenge of being absolutely confident that that result
is the output of that program and not something else is really, really hard," Dr. Anderson said.
"The stuff that those guys from Stanford have done -- it doesn't exactly solve that problem, but
it's a a way of verifying that at least their computer did all the work it was supposed to do. It
still doesn't guarantee that the answer they give you back is correct."
Along with Stuart Stubblebine, a vice president at CertCo Inc., an online security firm, Mr.
Golle has also written a paper called "Distributed Computing With Payout" that complements
his work with Mr. Mironov and discusses methods to streamline redundant computing for those
who do not have a surplus of resources.
"The trick is that while most tasks are only ever assigned once in our scheme, some tasks are
assigned twice or more, so that it is never possible for a participant to determine when it is
safe to cheat," Mr. Golle explained. (For those wanting to read them, both papers are available
on the Web at crypto.stanford
While commercial distributed computing operations may want to incorporate the work of Mr.
Golle, Mr. Mironov and Mr. Stubblebine into their security measures, at least SETI@home can
rely on its millions of users to help cross-check results and make sure that any potential
discoveries are really from authentic aliens, not the ethically alienated.
Re:Active punishment? (Score:2)
Solve the cheating problem =AND= the population crisis at the same time.
To prevent cheating... (Score:4)
William Gibson's "Black Ice" should do nicely. Failing that, slice or dice the data in multiple directions and compare results.
(The "different slices" is important, to ensure that you aren't trying to validate one modified client against another.)
Let's say that you have a grid of data, N x M x B (where N, M is the data, and B is the number of bits per word for that data.)
The probability that one modified client is doing the rounds, and will be encountered again by chance, is non-zero. It's not high, but it's high enough that nobody is releasing their client code in a hurry.
On the other hand, you've three simple slices you can do (along each axis), and any number of more complicated ones. That means that you have to hit the correctly-modified client for the slice you've picked, for each slice in each axis, for the data to be marked "valid". Any failure by any one client to return a result that confirms the other 16 clients that would overlap with it, would signal a bogus client.
With that much redundancy, you could also simply have "client voting". The results that are returned identically by the most clients (in excess of some threshold), regardless of the direction of slice, could be regarded as "true", with a reasonable degree of certainty. (Sure, it's not 100%, but that's the price you pay for having a society that rewards the greedy and the ethically sick.)
Of course, if you want to go one stage further, there's nothing to stop you "dicing" the data. Instead of taking a single slice through the data, you take random, small chunks from all sections, and feed them in a random order to the client. Again, the server re-constitutes the "valid" results, by merging together the results from multiple clients, taking the generally-accepted results as "correct".
This would mean that, instead of needing 20+ clients, all with suitable code for cheating "correctly" along each slice, you now need !(N x M x B)/(Size of chunks) such clients. The values don't have to be large to make this a virtual impossibility.
If you then only credit "confirmed" units (whether "slices" or "chunks"), since cheating becomes impractical, short of a global Internet conspiracy which also included the researchers, nobody is going to bother modifying the clients in any way which produced inaccurate results.
They =MIGHT= modify them to produce faster, accurate results. But, in that case, who bloody cares? I'm not going to object to someone handing round an honest, genuine client that can plow through 10 times as many blocks in a second, and still deliver the true results back to the central system. And, if the scientists were being honest to themselves, I doubt they would, either. PROVIDED the results could be guaranteed.
And that gets back to why independent result reviews, using slicing, dicing, or some other method of producing non-duplicate data sets, is very important.
They believe so (Score:2)
Somebody who believes in extraterrestrial intelligences can believe in SETI-impressionable girls.
__
Processtree closing down. Where is your user info? (Score:4)
May 2001
Dear ProcessTree Network suppliers,
It is with sadness that I have to announce that this will be the last newsletter you receive from Distributed Science, Inc.
etc etc etc...
We will diligently negotiate the sale of the supplier database, with emphasis on the privacy policy under which you signed up. As soon as we came to a result, the new owners will be informing you about any changes they might plan, including an opt-out for those concerned about their privacy under new management.
EEP!
An agorithmic solution (Score:2)
1. For each quantum of the distribution calculation in the range you have been assigned store one or more bits of evidence for the result.
2. Calculate a Merckle hash tree of this evidence vector
3. Use cryptographic hashes of the tree root to "randomly" select 64 leaves of the tree
4. Transmit the branches leading to these leaves as proof that you have performed the full calculations
To verify, the server verifies the hash chains of the branches, the randomly selected challenges and verifies the evidence for the selected leaves by repeating the calculation for a very small subset (64) of the assigned range.
You cannot create this evidence without performing virtually all of the calculation assigned to you.
You can still cheat by finding the solution and not reporting it, but there is no incentive to do this.
-
Re:Processtree closing down. Where is your user in (Score:2)
So I trashed it.
If someone doesn't have the courtesy to put at least a "please read the attached letter for a very important announcement" in the plaintext portion of an email, I don't read it. Assuming we all use either a Microsoft or a Netscape client for our email belies some kind of ignorance or arrogance, or both.
And those qualities are also probably also the reason they're failing.
Re:Already a business model (Score:3)
Render programs are free. (povray for example, many many Excellent CG films have came out of povray. Just check the Intertnational Raytracing Competition pages)
Yes some render programs cost exorberant and insane prices, but places like pixar have programmers that write the software, and most good animation houses have their own programmers, so your cost per copy goes from $30,000.00 from the development of the first one to $0.00 for every copy thereafter. (dont give me any crap that there is a cost associated with the copies afterwards, that is pure bullcocky)
Do you think that lucasfilms goes to "CG-R_US" and buys a new effect? nooo, they create it, and then they can use it on 94,999 computers for free.
CG is cheap, and distributed processing (possible in POVRAY for a really long time now) is also cheap.
publicize "cheaters list" (Score:2)
I wouldn't recommend doing this. In practice, negativity and bad will, even when justified, often backfires injuring the issuer.
hmmm, just like I've been saying all along (Score:4)
Their argument against open-sourcing the client has always been that this would allow cheaters and that people would use modified clients that didn't crunch the numbers right. To which I have always responded that with any distributed computational task running on untrusted clients, you would have to do this sort of redundant analysis on each data block anyway. Even a closed-source client can be hacked fairly easily if you really wanted to, so not releasing the source doesn't magically guarantee the validity of any client-side processing. It's nice to see SETI@Home finally acknowledge what some of us have known all along.
So, when will we be seeing the client source code available for download? I'm all ready to start working on an Xscreensaver [jwz.org] module for it.
Caution: contents may be quarrelsome and meticulous!
More Distributed Projects (Score:2)
A listing of notable distributed computing projects are here [hardcorelinux.com] - (http://www.hardcorelinux.com/distributed-computi
come off crisp and play up to the cynic
clean and schooled right down to the minute
Re:Why? - other cheating alternatives (Score:2)
Yeah, especially when there's the new shared IBM mainframe [slashdot.org] coming out, where anybody can install programs. That's going to be the biggest use of it - a bunch of l33t h4x0rs installing various Distributed.net clients on it, all trying to add more power to their results. Whoop-dee-doo.
Re:Double Resources (Score:2)
1. SETI can't afford to buy some massive 'big iron' to get the performance that they get (essentially for free) from SETI@home.
2. The way that SETI@home has been ripping through the data packets, they were going to run out of data to send to the clients very soon (like sometime next year). Any way that they can slow down the process (while increasing thoroughness and reliability) is welcome.
Oh - and SETI@home only uses 1 telescope (not even a satellite) to do it's work: the Radio Telescope at the Arecibo Radio Observatory in Puerto Rico (the big satellite dish built into the mountain that was in the James Bond movie) - the largest single satellite in the world.
What about public ridicule? (Score:3)
I have an idea for how to at least reduce the amount of cheating going on with SETI: ridicule. Because let's face it if you cheat at SETI you deserve ridicule. You're a worthless mess of a human being who probably hasn't been laid in, I dunno, EVER and has to inflate their self-esteem by turning a quest for Contact into a bigger dick contest. No one respects you. Kill yourself and leave your computer running. Your computer is worth more to society than you are.
Grr. I'm way too high strung today. Where's the bong? But godDAMN people are so freaking simple minded sometimes! What do you gain by cheating at SETI? Higher rankings? So fucking what! Great, now instead of being ranked 39623 your at 32532. RaH. You're my hero. The world is a better place because you cheated. You've fed the hungry and increased our collective wisdom. L0s3r.
Dump core. And pass the bong.
- Rev.speaking of distributed.net... (Score:2)
--
distributed.net does the same (Score:4)
I'm a processtree participant ... (Score:2)
------
Re:Already a business model (Score:2)
Pixar doesn't have to pay per license because
they wrote Renderman so they get it for free.
But POVRAY? Please, this hasn't been used
on any films that I know of (and yes I work
in the film visual effects industry).
There is a free version of Renderman called
BMRT (www.bmrt.org) but many many
visual effects companies do pay $10,000
US per copy for Renderman render licenses
or slightly less than that for Maya or
Mental Ray render licenses.
Attention Team Slashdot ! Let's Climb SETI Ranks ! (Score:2)
Re:distributed.net does the same (Score:2)
Re:The reason people are cheating. (Score:2)
What you really can't say is that 90% of the people are in it because of the stats. That wouldn't be allowed in a court of law, and won't be allowed here.
I still say get rid of them. Competition brings out the WORST in people, not the best, as evidenced by the cheaters who hacked their clients to download work units, and immediately (after NO analysis) send back a blank results file. These people were "crunching" thousands of units per day and really stinking things up.
If SETI loses any people from having no stats, I can assure you they won't be missed.
Rich...
The reason people are cheating. (Score:3)
I for one wish they would get rid of the scorekeeping entirely. I crunch SETI units because I enjoy the idea of helping them with their science.
Any users they lose because they were to get rid of scorekeeping would be no great loss. They were probably the losers who were compromising the datapool anyway. (talk about having no self esteem, I can see it now, some geek going up to a girl to impress her with his falsified SETI numbers).
I was one of the first 10,000 people to sign up, and I'll help them with their science as loing as they need me to, scorekeeping or no.
Rich...
Men In Black? (Score:2)
[sniff, sniff]Hey, what's that funny smell? Urrrggh, eyelids....heavy....soooo sleeepyyyy.....
Don't forget Juno (Score:2)
Why bother? (Score:2)
Known signals in the SETI system (Score:2)
More than once I've got a clear signal that was obviously extra terrstrial in nature. The distribution was so far away from random noise that it had to be artificial. I run the data through the Seti program, and what does it come out with? Nothing.
SETI@home beams known signals to the radio telescope as a check to make sure the whole system is still working properly and to call out clients that give false negatives. There are a few on constant frequencies; there are probably others on frequencies that change daily.
Re:Active punishment? (Score:2)
About time (Score:2)
Anyone know where I can buy a seti card?
no no no no (Score:2)
Premature? Premature?! Of course it's not premature, it's about 30 years too late. Distributed computing used to be nice and profitable, but processors are just too cheap now for it to work. For large-scale, nonprofit efforts like SETI, sure, but if someone's actually going to pay to rent computer time, it would just be cheaper to buy the processors themselves. Or, if it was truly profitable to rent computer time, specialized computers with intel/amd clusters would pop up to provide it with less overhead.
--
Re:Active punishment? (Score:4)
Re:Why bother? (Score:3)
They do. What the client programs do is something of a preliminary analysis, filtering the most interesting packets of data from the usual junk. In the further analysis it often turns out that lots of interesting signals originated on Earth, while many others are inconclusive.
--
I hit the karma cap, now do I gain enlightenment?
Re:Double Resources (Score:2)
I find it very, very disturbing that the EFF of all people give its members a uniquely identifying number. Anyone realize how god awfully ironic that is?
Re:An agorithmic solution (Score:2)
Have you implemented anything like this? Any ideas on where you might point me for a similiar function in a different class of program? I have been grappling with this same issue on a smaller project, and have put together a kludegy hash/redudancy style protection scheme that an experienced math-man could break in about 15-minutes.
Seti are hiding the truth (Score:2)
I repeatedly try to get interest from the government over this, but they aren't interested. I mentioned it to the roman catholic church, and they were horrified. I think it mut interere with their religious dogma or something. I sent it to Carl Sagan. He mysteriously died.
The truth is out there. They don't want you to hear it!
Re:Why bother? (Score:3)
You're missing the point with SETI. There is no such thing as "a hit" when analysing these massive amounts of data. Your computer will never give a message like: "Analysis detected a HOW ARE YOU GENTLEMTN, ALL YOUR BASE ARE BELONG TO US from outer space". What your computer does is just an analysis and then the SETI-folks will do the real exciting stuff with the resulting data from your computers work.
The problem before SETI@Home was that the data wasn't analysed completely to detail because these analysis take a shitload of time so they just did a rough analysis, trying to find extreme peaks but no checking for patterns over longer periods of time.
I should have known (Score:4)
Here are some warning signs that you may have a SETI hoax on your hands:
In other news: Bi Curious: The Senator Jim Jeffords Story [ridiculopathy.com]
AH! (Score:5)
--
Why? (Score:5)
"LOOK! i'm high on the hours list with 31337 years of data done on my computer for SETI. I RULE! Oh god, I wish I were dead..."
Re:Active punishment? (Score:4)
Double Resources (Score:3)
TEN
Anonymity breeds cheating. (Score:2)
Re:Anonymity breeds cheating. (Score:2)
So you think that vandalizing their data and experiment is legal?
What are they gonna charge you with?
There is a license that forbids connection to their servers with software other than the client they supply. That's a breach of contract. It's also a potential "trespass against chattels." There are simple charges like vandalism that can be made as well as charging the cheaters with violation of the Computer Fraud and Abuse Act.
There are plenty of legal avenues that could be used.
Re:Anonymity breeds cheating. (Score:2)
The opportunity to participate in the study, combined with the possibility of fame and fortune, is consideration. The compute time made available to the study is consideration in the other direction.
So, there is offer, acceptance, and consideration.
Re:hmmm, just like I've been saying all along (Score:2)
However there are tons of unofficial add-ons that *are* allowed: see here at the SETI@home site [berkeley.edu].
This and much more info in the unofficial SETI FAQ... infuritatingly, I've got a copy saved at home but can't find a link to it anywhere. (Think this was the Usenet FAQ.) Anyone?
--
"I'm not downloaded, I'm just loaded and down"
Re:hmmm, just like I've been saying all along (Score:2)
--
"I'm not downloaded, I'm just loaded and down"
Just don't keep score. (Score:2)
why kick bad users off? (Score:2)
John
Shocking! (Score:5)
This is why... (Score:2)
While SETI and NASA are jumping the gun and declaring a fake packet to be a sign of "intelligent life out there" and awarding some loser a lot of money for making the find, any real signals that for no apparent reason which are aimed specifically at us, won't be processed because the SETI@Home project will have achieved its goal.
Now if only CounterStrike games could end so vividly.
"Cheater detected, cheater wins, GAME OVER."
Heh.