## Mathematician Predicts Yankees To Dominate170

anthemaniac writes "Computerized projections in sports are nothing new, but Bruce Bukiet of the New Jersey Institute of Technology has developed a model that seems to work pretty well. He projects how many games a Major League Baseball team will win by factoring in how each hitter ought to do against each pitcher in every game. His crystal ball says the Yankees will win 110 games this year, a pretty safe bet, many might agree. But he also projects all the divisional winners. He claims to be right more than wrong in five of the past six years."
• #### 110 wins? (Score:5, Insightful)

on Thursday April 05, 2007 @08:39PM (#18629517)
It's a safe bet that the Yankees will do well, they always seem to spend almost twice as much as most other teams on talent, not to mention luring good players from other teams away to crush competition. Having said that, they have always spent such money, and not done exceptionally well as of late. 110 wins is a lot, and not many tesms have accomplished that. Safe bet? Hardly.
• #### A Much Safer Bet... (Score:3, Funny)

The Pirates - 2nd lowest payroll - will suck again. 14 losing seasons in a row. I give it a 99.9% certainty they make it 15. I'm not even a MIT grad!

• #### Re: (Score:2)

Of course the Mariners will spend three times as much on payroll and be right there with the Pirates.
• #### Re:110 wins? A Safe bet? (Score:1)

Not a safe bet at all. Especially considering that the AL East is fairly strong this year. It seems april fools day comes 4 days late for baseball fans... The prediction is a joke. While math can certainly be applied to predict things like this, it fails to take into account that yankees overspend on old players. A more accurate prediction would be that by the end of the season, the cumulative number of years that yankees players are past their primes is about 110.
• #### He left out several important variables (Score:2, Interesting)

Injuries. Did he take these into account? A lot of good teams have had lousy seasons due to players being hurt for long periods of time. MAYBE if every member of every team was able to play a full schedule of 162 games...

Performances. If every player played consistently every day, but some guys go on hot streaks and get moved up in the batting order. Some guys go cold and get bumped down, or even worse, sent to the minors. MAYBE if the 25-man rosters stayed constant for the entire season.

Luck. Three
• #### Re: (Score:3, Funny)

Not safe at all, until you factor in whomever the Mob has their money riding on.
• #### Um. Yeah. (Score:2)

He claims to be right more than wrong in five of the past six years.

Whoopty fsck. So's RailGunner [slashdot.org]. Runs are fun to watch, but pitching is what wins. And the Yanks have? Anyone? Anyone at all? Yep. They got nothin' at pitcher.

• #### Re: (Score:2)

We got Wang and Mussina... Pettite maybe. Igawa is an unknown quantity right now. Proctor blows hot and cold, Farnsworth is a flip of the coin. I have a soft spot for Myers cause he always seems to get Ortiz out (bwhahaha). We also have Rivera to close things out and he's the best in the business. I'm not even going to talk about Pavano. The rest, ehhh they're ok I guess. Sometimes. We have some excellent prospects in the minors so I'm HOPING pitching will improve in the coming years.

If we win 110
• #### If he's so confident... (Score:3, Interesting)

<banantarr@nospAM.hotmail.com> on Thursday April 05, 2007 @08:41PM (#18629537) Homepage
Has he put up beaucoup bucks in Vegas on his numbers? If not, why not. If so, how much did he win, and where can I get his numbers this year?

TLF
• #### I never understand these things... (Score:5, Informative)

on Thursday April 05, 2007 @08:52PM (#18629645)
Isn't here some rule or law about 'fitting a curve' to past data? Yet, the sports predictions, and many of the 'stock market systems' are all about
finding some seemingly obvious pattern in past data. While you might come up with a 'back tested' model that matches really well,
it doesn't mean squat for the future.
• #### Re:I never understand these things... (Score:5, Informative)

on Thursday April 05, 2007 @09:09PM (#18629789)
His models have evolved over the years, but he tries to simulate actual games using both individual statistics (players batting averages, etc.) as well as team trends (how well does a player do against a specific pitcher). He uses a large Markov chain to predict state transitions (Runner on first, no outs - how often does it go to two outs? That sort of thing.) Very interesting project, it was a lot of fun to work on. (I was an undergrad working with Bruce 15 years ago, when he was first starting this project. He's kept it going for years.)
• #### Re:I never understand these things... (Score:5, Insightful)

on Thursday April 05, 2007 @10:23PM (#18630297)
It is still trying to predict future results based on past performance. No matter what you predict, last year's Chipper Jones will never again face last year's Roger Clemens. Even if Clemens un-retires (again), he is not the same person, and neither is Chipper Jones. You also can't predict injuries, trades, managers' decisions, umpires' calls, weather, etc., all of which have an impact on the outcome of an individual game.
• #### Re: (Score:2, Insightful)

by Anonymous Coward
You're right. We should stop trying to predict anything because we won't ever be 100% correct.
• #### The best way to test... (Score:2)

The best way to test any model is to start with the end points. How low does it score the New York Mets?
• #### Huh? (Score:5, Insightful)

on Thursday April 05, 2007 @09:00PM (#18629701) Journal

While Bukiet is the first to admit he's not a baseball expert, in five out of the past six years, he says that his model has produced more correct than incorrect predictions.
What? Does this even mean anything? If, say, he was right 51% percent of the time five years and wrong 90% of the time that other year, wouldn't that make his number of successes less than the expected number of successes from just guessing "win" or "lose"? I guess he's either really modest ("I don't like to brag, so I'll just say the accuracy is higher than 42%."), or a really, really bad statician.
• #### Re: (Score:3, Informative)

...or a really, really bad statician.

Or a really good statistician. Remember, when you ask a statistician to crunch some numbers for you he'll reply back with "and what would you like the numbers to say?". They'll make it fit any curve you throw at them.
• #### Keeping up appearances (Score:5, Funny)

on Thursday April 05, 2007 @09:00PM (#18629703)
"Hello Mr. Bukiet"

"It's pronounced bouquet!"
• #### Re: (Score:2)

He didn't mind being called professor Bucket. He had some of the best-worst math jokes too, made staying awake for his 8:30 AM class that much more tolerable
• #### amazing (Score:3, Insightful)

on Thursday April 05, 2007 @09:03PM (#18629733)
Wait, you mean you can use past data to try to predict future events under certain assumptions, and sometimes it works? Someone should generalize this into some sort of academic discipline!
• #### Re:amazing (Score:5, Funny)

on Thursday April 05, 2007 @09:07PM (#18629775)
They did. It's called "tenure".
• #### Re: (Score:1)

No way. That bridge I walked across this morning was sturdy enough. It's just that I'm never going to walk on that bridge again. Not for those historical, "can't cross the same stream twice" reasons either. I just don't trust engineers.
• #### We did this in college too... (Score:2)

It was called Strat-O-Matic Baseball, and many a night in the hills of Worcester I had to fall asleep to the constant clinkity-clink-clink-clinkle of a pair of dice in a stolen cafeteria coffee cup.

• #### Re: (Score:2)

1-5 HOMERUN

:)

PS - My all-time favorite Strat-O-Matic cards belonged to Bobby Witt. Especially his 1987 card. 143 IP, 160 K, 140 BB. Every inning an exciting one. :D

• #### Re: (Score:2)

Wow, someone else that knows what Strat-O-Matic is.

By the way, backgammon boards and cups really keep the noise down quite a bit.

Aero
• #### Re: (Score:2)

Most of the guys who played this 24-7 ended up on the 6-year plan. I'm not sure backgammon was in their repertoire.
Three dice you say? Guess I left out a "clinkity" from all those years ago.

• #### But... Yankees Suck!! (Score:3, Funny)

on Thursday April 05, 2007 @09:07PM (#18629773)
signed,

Red Sox fan
• #### Re: (Score:2)

My prediction is that the Yankees will spend more money than any other team. And still not win a World Series.

• #### Red Sox suck!! (Score:2, Funny)

Signed,

Yankees fan

PS Have fun blowing up more innocuous devices because you think they're bombs
• #### Re: (Score:2)

The typical New York wit. Reverse the insult. How original.
• #### Re:But... Yankees Suck!! Alot! (Score:2, Flamebait)

George and the whole Yankee's Organization ruined Baseball, there are so many teams now that have so little conceivable chance of winning the world series that the sports watching public just isn't interested anymore. Its time for a salary cap.
• #### Re: (Score:2)

Have you not been watching baseball under the last 2 CBA's? There is more parity now than ever. Sure there are generally about 2-3 also-rans but by and large any team with good scouting and intelligent management can compete year in and year out. Oakland, Minnesota, Florida, Arizona, Atlanta (with new management), Washington(formerly Montreal), Milwaukee, have all done an exceptional job at being competitive even with less money.

Revenue sharing and the soft-cap have helped to wonders for the competitive
• #### Re: (Score:2)

George and the whole Yankee's Organization ruined Baseball, there are so many teams now that have so little conceivable chance of winning the world series that the sports watching public just isn't interested anymore. Its time for a salary cap.

Actually, they serve as proof that you CAN'T buy the World Series. They keep losing to less well funded teams that enjoy playing the game more.

• #### Re: (Score:2)

I love that you got at least one "Insightful" mod.
• #### Exactly 110 or at least 110? (Score:2)

The article says he has made more correct than incorrect predictions in his several years of doing this.

Something tells me that when he predicts that the Yankees will win 110 games, for example, he is counting his prediction as fulfilled if the Yankees win AT LEAST 110 games.

Because it would be pretty remarkable if he has correctly predicated the EXACT number of games teams will win more than incorrectly over the past several years.

And since no margin of error is provided, there's really no basis for saying
• #### Re: (Score:2)

My model predicts that they will win at least one game. That makes me right for all six out of the last six years, so I guess I've got him beat.
• #### That's nothing... (Score:5, Funny)

on Thursday April 05, 2007 @09:17PM (#18629843)

He claims to be right more than wrong in five of the past six years.

That's nothing: I've devloped a new mathematical algorithm that correctly predicts the outcome of the past six years with 100% accuracy.

• #### 110 Games? (Score:2)

The Yankees have weak-ass pitching this year. No chance they win 110 games. More likely 90.
• #### Bah (Score:1, Redundant)

Don't Yankees fans predict they will dominate every year? That being said, I never take predictions like this seriously, especially if it is another "Yankees will pwn" claim. Odd, however, that I didn't see anyone predict what the 2001 Seattle Mariners did [wikipedia.org] (116 wins).

Oh, and yes, I am a mathematician (will obtain BA degree in math this June).

• #### Re: (Score:2, Insightful)

Generally one needs a Ph.D in math to be a "mathematician".
• #### He's been way off-the-mark for years... (Score:5, Interesting)

on Thursday April 05, 2007 @09:46PM (#18630023)
First, a link to the professor's baseball page. [njit.edu]

In 2006, he predicted 102 Yankee wins. They won 97. Not too bad.

In 2005, he predicted 113 Yankee wins. They won 95. Way off.

In 2004, he predicted 117 Yankee wins. They won 101. Way off.

In 2003, he predicted 110 Yankee wins. They won 101. Not great.

In other words, take this forecast with a big boulder of salt.

• #### Re: (Score:2)

So basically he tends to overestimate on the Yankees, so maybe a safer bet would be ~100 wins?
• #### Re:He's been way off-the-mark for years... (Score:4, Funny)

on Thursday April 05, 2007 @10:33PM (#18630365) Journal
I would say 1.0*10^2 wins.
• #### Re: (Score:2)

So basically he's just a myopic Yankee's fan. Got it.

Although that is funny, him predicting in 2004 the Yankee's would break the season record for wins.
• #### Re: (Score:2)

Yeah, and did he predict the Yankees getting crushed by Detroit in the playoffs?
• #### Re: (Score:2)

Note that the naïve prediction (the "prediction" is that they win the same number of games this year as they did last year) is much more accurate that the professor.
• #### Big Whup... (Score:2, Informative)

by Anonymous Coward
Bill James came up with simple quantifiable statistics that could very accurately predict the success rate for a baseball team back in the '70s. The Oakland A's had a lot of success using those methods to put teams out of the field that would win between 95-100 games per year while spending as little as possible. It worked remarkably well and a book (Moneyball, by Michael Lewis) was written about it.

In short, this is old and well covered news, unless this guy has come up with a simulation that is significan
• #### Predicting the past is... (Score:1, Interesting)

by Anonymous Coward
easier than predicting the future.

He modeled his program on the past 5-6 years data thats why: "He claims to be right more than wrong in five of the past six years."

How does he factor rookies? Does he model injuries and use the data to rank teams susceptibility to lost talent?

Unless this program is 6 years old his model is only back-tested; not proven.
• #### Re: (Score:2)

Predicting the past is easier than predicting the future.

No, it's seriously not. They are exactly the same. There's no difference between taking the first 3 of the last 5 years and training your dataset and validating on the last 2, and training on the last 3 years and validating on the next two to come. The models doesn't know the clock, and datasets are datasets.

There is a world of difference between accuracy rates on your training/calibration set and your models performance on the validation set. One of

• #### In Other News: (Score:1)

"Accountant predicts Yankees will dominate based on salary spending."

"Sports historian predicts Yankees will dominate based on past seasons."

"Incoherent drunk predicts Yankees will dominate based on voices in his head telling him so."

"Everyone who's even remotely familiar with MLB dies of a massive simultaneous aneurysm trying to comprehend why anyone predicting the Yankees will be one of the top teams in the league for any reason at all qualifies as "news" rather than statement of the obvious."

Seriously, I
• #### What about Daisuke? (Score:2)

I want to know how he calculated Daisuke Matsuzaka's numbers since he's never played ball in the states. Theoretically he should dominate the AL given his performance in Japan but those numbers don't mean much when considering the power hitters in the AL, much less MLB. Here's hoping Bukiet is wrong though. I'd love to see the Yankees tank and not make the play-offs but I'm a Red Sox fan and I always hope that happens.
• #### Climate Models? (Score:5, Insightful)

on Thursday April 05, 2007 @10:30PM (#18630341)

So let me get this straight..

Climatologists use past data, computer models, and mathematical projections to support global warming and predict future results, and everyone calls it strong science based on facts. If the models are off, it's just a part of the scientific process, but the overall claim is still valid.

But if a statistician uses past data, computer models, and mathematical projections to predict baseball results, it's dismissed as some crack job's phony science. If the models are off, it's proof that he has no idea what he's doing and how these kinds of models don't work.

Am I missing something here?

• #### Re: (Score:3, Insightful)

Yes, In the public experience, most fancy sports predictions have a history of being inaccurate. This is unlike the experience with climate models, which historically have also given us some predictions.
• #### Re: (Score:3, Insightful)

What you are missing is that not all models are created equal, and not all things are as easy to model. It's all about variance. Consider the weather, for example. We can accurately predict what it will be for a day or two, and we have a decent guess for about a week, but beyond that, there is too much complexity and variability for us to say much (not to mention that weather appears to be a dynamical system, i.e., an example of chaos theory, which means that prediction is theoretically impossible). How
• #### Re: (Score:2)

Yes, that was a good one.

But the guys that modded you Insightful instead of Funny really made my day. I am still snickering writing this post.
• #### Re: (Score:2)

Yes, the climate is ostensibly generated using some static algorithm that runs the universe with a bit of input from humans en masse, whereas major league sports rests on the shoulders of relatively few individuals, their whims, and their day-to-day fortunes.

• #### Re: (Score:2)

What you are missing is the human factor. Predicting baseball is more like predicting the economy than predicting the weather. The latter is difficult only because of the sheer amount of variables involved (but we do understand the underlying principles), while the former is ultimately attempting to predict human behavior (we have no reliable scientific methods for doing that).
• #### Re: (Score:2)

How about this: the audience wants the weather to be predictable; in baseball, much of the audience (1) just wants their team to win or (2) wants there to be a reason to play the game. Perhaps the difference in people's opinions of the validity of climate and baseball modeling lies in what the people want to believe.
• #### Re: (Score:2)

Free will.

People like to think that human events can't be reduced to numbers in the way that non-human events can. Being susceptible to prediction offends their sense of self determination.
• #### Re: (Score:2)

Many people just don't understand the power and limitations of statistics. They point to each individual anecdote that goes against the trend predicted by the model as proof that it doesn't work. That's an emotional reaction that is stronger in baseball than in weather.

If people understood statistics, they would understand that the trend predicted by the model (110 games) is never intended to forecast the result of a particular game. Further, they would understand that the model _expects_ outliers to app
• #### Win Expectancy and available data (Score:1)

FTA: "Were the model to be commercialized, it could be updated on a play-by-play basis, which fans could monitor to see how every play changes the outcome of a game. "I think some fans would think that's cool," Bukiet said."

How individual plays affect the outcome (or probable outcome) has been a well-worn subject of late in the blogs and discussion lists of baseball fans. And you don't need commercial products for answers. Retrosheet.org [retrosheet.org] provides play-by-play data reaching back decades, from which I ca

• #### From one of his students (Score:5, Informative)

<kenbarney@NOsPAM.gmail.com> on Thursday April 05, 2007 @10:46PM (#18630443)

Wow, I never expected somebody that I knew to get on Slashdot. Bruce Bukiet is my Calculus II professor at NJIT.

He mentioned this before a few times, including today after that article made it to the most popular spot on Yahoo! [yahoo.com] News. This is more of a hobby for him than an official project.

From what he has said in the past about the model, it tends to overestimate the Yankees, among other reasons, because they often buy good players at the end of their prime. Thus the players won't play as well as they had in the past. He hasn't used it to make any bets. For the model, coming within a game or two of the actual results is considered a good prediction.

As some people above said, the model isn't intended to be extremely accurate, and is frequently off by a significant amount. The interviews he does are more to get people interested in math, and to see how it has real use, rather than to try and show off. He used to go into more details in the past, but doesn't now because they tend to confuse the interviewer, and don't make it into the final article.

Some pages of his own about the project are:
http://m.njit.edu/~bukiet/baseball/baseball.html [njit.edu]
http://www.egrandslam.com/ [egrandslam.com]
• #### Baseball and nerdiness go hand-in-hand... (Score:1)

Two of the more respected, statistically-based projection systems out there are Nate Silver's PECOTA [wikipedia.org] and Diamond Mind Baseball [diamond-mind.com].

Their 2007 Yankees projections:

PECOTA: 93
Diamond Mind: 96

• #### Steinbrenner and Bush (Score:1, Flamebait)

Just as president Bush ignores Congress, so does George Steinbrenner ignore the salary cap rules of Major League Baseball. The yankees literally buy a spot in the playoffs every year.

• #### Not a real world application (Score:1)

This sounds like a good idea but you are gonna go crazy just like Maximillian Cohen trying to predict life. You cant predict a player going on the injured list like you can calculate RBIs. It is illogical to use something like this in a chaos filled world. For all you know the whole Yankee's team can be thrown out for illegal sports betting. It is also wrong because you forgot about the Detroit Tigers.
• #### Isn't saying the Yankees will win (Score:1)

a little like saying the Cubs won't win?
• #### Math? Hardly (Score:2)

AL East: New York Yankees
AL Central: Cleveland Indians
AL West: Los Angeles Angels
AL wildcard: either the Boston Red Sox, the Toronto Blue Jays or the Minnesota Twins

OK, so he managed to choose division winners and then say that the Wild card would come from one of THREE other teams. I don't think there's much math or stats going on here. Shouldn't he be able to pick ONE team and say they're going to win the Wild Card? This sounds more like a baseball fans prediction than a mathematical prediction.
• #### This sort of thing is explained in detail.... (Score:2)

... in the book Moneyball by Michael Lewis. He follows Billy Beane through a season with the Oakland As, where they beat their division even though they were outspent by nearly every other team. This prompted former Fed Chair Paul Volker to comment that Beane had found a market inefficiency. He had used such an inefficiency, but it wasn't Beane who had found it.

To do this right, however, you have to do legwork, because according to the model described in Moneyball, On Base Percentage is really what you'r
• #### I'll take that bet (Score:2)

His crystal ball says the Yankees will win 110 games this year, a pretty safe bet, many might agree.
Winning 110 (or more) games has only happened 6 times in all of MLB history (out of over 2000 chances)! There are just too many things that can go wrong to make this a "safe bet". The odds are dramatically stacked against it.
• #### Lemme Guess the Red Sox World Series Win... (Score:2)

...was the year he blew it.
• #### If I flip a coin (Score:2)

If I flip a coin 300 times, it *should* land on head 150 times and tails 150 times. Guess what. It doesn't.
• #### Lame Ass Prediction (Score:2)

Who the F*uck cares how wins the division! Give me the World Series Winner!!

The F*ucking AL East is a joke. Stong my ass. The division will come down to either Boston or N.Y. Oooo surprise. With the Yankees most likely winning. 110 games, they will not win.

The odds are against the Yankees of winning the World Series because they don't have pitching!

Every F*ucking year some Ass-hole picks the yankees to win and they keep failing - 7 years since the last win people.

When they finally rebuild their pitching sta
• #### Bullshit. (Score:2)

This is total bullshit.

First off, no one has been able to predict baseball results with great accuracy, and it's not for lack of trying. There's a whole cottage industry built around baseball statistics, populated by fans and professional scouts alike, and there's been some major innovation. But there's so much chance involved, and so many factors that we just can't measure (injuries, weather, slumps, etc.), that I don't think it's even possible to generate reliable predictions. Being more right than wr

• #### I got your formula right here.... (Score:2)

lemme see... 999X (dollars to buy the best free agents) over 2x (rest of major league baseball) = profit and championships!

plus fans who hound mere mortals out of the ballpark...

yeah, I think that might lead to better than statistical dead heats.

I hereby place my secret formula into the public domain under GPL 2. any time X > Y in any of your programs, be sure to credit me.
• #### Oh yeah? (Score:2)

Well *this* mathematician predicts that the Red Sox will win the division this year. Pulling numbers out of my ass has been right more often than wrong, so my prediction meets the described standard for quality.
• #### Wonder how he's calculating statistics... (Score:2)

Is he following Pascal, or what?

