Jamie found a NYTimes op-ed by a grad student and a professor from Cornell, outlining some research they did into alternate baseball universes. The goal was to find out how unlikely in fact was Joe DiMaggio's 56-game hitting streak, played out in the 1941 season. No one since has even come close to that record. The math guys ran simulations of the entire history of baseball from 1885 on — 10,000 of them. For each simulation they put each player up to the plate for each at-bat in each game in each year, just like it happened; and they rolled the dice on him, based on his actual hitting stats for that season. (Their algorithm sounds far simpler than whatever the Strat-O-Matic guys use.) The result: Joltin' Joe's record is not merely likely, it's basically a sure thing. Every alternate universe produced a streak of 39 games or better; one reached 109 games. Joe DiMaggio was not the likeliest player in the history of the game to accomplish the record, not by a long shot.
• Re:If its so likely, they why hasn't it happened? (Score:4, Insightful)

by Anonymous Coward on Sunday March 30, 2008 @04:12PM (#22915018)
Clearly they aren't factoring in the stress and nerves the average ballplayer would be dealing with as they got closer to the mark.
• Nerves (Score:5, Insightful)

on Sunday March 30, 2008 @04:13PM (#22915028)
This doesn't take into account that once a player achieves an impressive hit streak he gets more media attention, people start asking him about Dimaggio's record, and every time he steps up to the plate he's a bit more nervous about it than the last time, making it slightly less likely that he'll get a hit.
• Re:If its so likely, they why hasn't it happened? (Score:5, Insightful)

on Sunday March 30, 2008 @04:18PM (#22915080)
The most likely reason is that statistics isn't the appropriate method by which to study this problem.

This sort of a study is really more about curiosity, it doesn't deal with things like changes to the way in which the game is played. For instance early on, and for quite a while later, it was common for a pitcher to pitch 9 innings every game, and in many cases to pitch both games out of a double header. Meaning more opportunity for errors and since batters get time to rest up, there's a bit of an edge under that style of play to the batter which doesn't exist today.

That also doesn't include the variety of pitching which players see today or the fact that a player might get to see 3 different pitchers in a single game.

Even the length of the season has an effect on how players play. None of those things are easily quantified, much less analyzed by statisticians.
• too simplistic (Score:5, Insightful)

on Sunday March 30, 2008 @04:25PM (#22915154)
From reading the article (which is light on the details) it seems like they used nothing but batting average, at bats, and games played.

The problem is this doesn't control for variances in the quality of pitching. The chances of going that many games without running into a hot pitcher isn't accounted for.

Imagine you average a 75% chance of getting a hit in any individual game. If you face three average pitchers, your chances are (.75)^3 but if you face a good pitcher an average pitcher and a bad pitcher it might be (.5)(.75)(1.0) which gives a different probability, despite the same average number of hits.

In order to be realistic the calculation would need to account for the deviation from average in the ability of the pitchers (which would likely be higher 100 years ago because of fewer player and segregation, and now because of expansion, as compared to the 1950s)

What they don't report is how often there are long (but not record) streaks in their model, so there is no way of knowing how accurately it reproduces reality.
• Bogus (Score:2, Insightful)

on Sunday March 30, 2008 @04:26PM (#22915166)
Shouldn't we say that the probability of it happening was 1.0, because it did happen?

It seems to me that if their experiments report anything else, then either their models are erroneously inaccurate, or they got something else wrong.
• Re:If its so likely, they why hasn't it happened? (Score:5, Insightful)

on Sunday March 30, 2008 @04:38PM (#22915254)
Because baseball players aren't dice?
• Re:If its so likely, they why hasn't it happened? (Score:1, Insightful)

by Anonymous Coward on Sunday March 30, 2008 @04:39PM (#22915262)
While it might be true that statistics is not an appropriate technique to study baseball, none of the things you mention are evidence against it being useful. All of the factors you mention influence a player's batting average, and the hypothesis they are using is that once you know the batting average you can calculate a set of possible histories of hits for that player, with the right statistical weight. They are assuming that the probability of a batter getting a hit in any game is uncorrelated with his performance in previous games.

The answer to the question in the subject line is "It has happened." They're not claiming that in any given year there is likely to be a 56 game or longer hitting streak. What they calculated was the probability of the longest streak in the entire history of baseball having a particular value, and found that the most likely longest streak is 51 games, about what is observed in the universe we live in.

• So basically... (Score:2, Insightful)

on Sunday March 30, 2008 @04:43PM (#22915300) Homepage
They took a bunch of measured statistics, ran a simulation with outcomes biased using said statistics, and then acted surprised when the simulation results ended up pretty close to what actually happened?
• What about slumps? (Score:3, Insightful)

on Sunday March 30, 2008 @05:02PM (#22915438)

By assuming the hitter's probability of getting a hit is equal to his season average the researchers don't take into account that most, if not all, batters have a higher batting average at some points in the season than they do in others. As one with experience in Monte Carlo simulations I know that taking that into account would complicate the analysis considerably, but I suspect their results would be a bit different if they even did something as simple as using a 10-game moving average of the batter's average.

• Re:If its so likely, they why hasn't it happened? (Score:3, Insightful)

<[joeXbanks] [at] [hotmail.com]> on Sunday March 30, 2008 @05:19PM (#22915558)
I wish my mod points hadn't just expired, because you just summed it up perfectly. Silly study with no basis in reality.

In other news, I've just started a fund of stocks that are held and traded based on historical data. If you invest in it, I guarantee a large return, because complex systems that rely heavily on myriad human variables are of course determined entirely by statistics.
• Re:If its so likely, they why hasn't it happened? (Score:5, Insightful)

on Sunday March 30, 2008 @07:47PM (#22916558) Homepage

I wish my mod points hadn't just expired, because you just summed it up perfectly.

It seems perfectly reasonable to me to take a set of data and try to model how likely a particular outcome is. That's a very valid question to ask that a statistical model can answer. The model may be flawed, need improvement, or whatever, but that doesn't mean the question isn't one that can't be answered by science.

If you invest in it, I guarantee a large return, because complex systems that rely heavily on myriad human variables are of course determined entirely by statistics.

This is simply an invalid analogy. The article isn't saying it can predict the future (or even the past!) based on a statistical model. All it's saying is "just how likely was it for DiMagio to get his streak, given past performance".
• Re:If its so likely, they why hasn't it happened? (Score:5, Insightful)

on Sunday March 30, 2008 @10:04PM (#22917376)
So what does that have to do with the study? Statistics applies to a lot more than dice. No offense, but your observation sounds like one of those cute but irrelevant observations that just add noise.
• The Monroe Factor (Score:3, Insightful)

on Sunday March 30, 2008 @10:49PM (#22917632) Homepage Journal
Okay, okay, but what are the odds that Joe DiMaggio would have such a streak, and land Marilyn Monroe? Somebody needs to get on that simulation asap. Here are my statistics, by the way...
• Re:If its so likely, they why hasn't it happened? (Score:1, Insightful)

by Anonymous Coward on Sunday March 30, 2008 @10:53PM (#22917656)
Wouldn't it be easier, and more efficient, to calculate the probabilities directly from the distribution? Why the simulations?
• Re:If its so likely, they why hasn't it happened? (Score:3, Insightful)

on Monday March 31, 2008 @06:22AM (#22919594)

You might be able to model some long term behavior that way
Like the probability of winning streaks over the lifespan of baseball? Pretty much what they are doing.
• Re:If its so likely, they why hasn't it happened? (Score:2, Insightful)

on Monday March 31, 2008 @06:42AM (#22919674)

But the point they were trying to make is that the statistics aren't valid. The guys writing the paper were measuring independent random events. Baseball hits may or may not be random, but the key thing is that they are not independent.

Players suffer from pressure because of streaks. A player who goes several games without hitting is under a huge amount of pressure to hit, and their form may suffer. A player who hits for a few games is likely to have high confidence and keep hitting. However, if they do keep hitting for a few games more, everyone starts talking about it. They get asked questions at interviews etc. Many players crack under that sort of extra pressure. Di Maggio's great streak is great because he overcame that. Sportsmen in all sorts of sports are heard to make comments such as, "I'm just focussing on hitting/scoring" or "I'm taking one game at a time". This is because they're trying to avoid that sort of pressure.

I'm sure the guys who wrote the article know about gamblers' fallacy and would be quick to point out someone's mistake. Eg the gambler who bets extra money on a six being rolled because no six has come up on the last 20 rolls and 'a six must be due'. Of course, a six is no more likely than it was on any other roll because each roll is random and independent of any other roll. In the case of sporting streaks, each game is not independent, so the argument is just and flawed and invalid as the gamblers' fallacy.

• Re:If its so likely, they why hasn't it happened? (Score:5, Insightful)

on Monday March 31, 2008 @09:29AM (#22920866)

Baseball hits may or may not be random, but the key thing is that they are not independent.

Announcers and actual players to the contrary, they actually are remarkably independent. Studies have been done. The one to start with is: Albright, S. C. (1993), "A statistical analysis of hitting streaks in baseball," Journal of the American Statistical Association , 88, 1175-1183. He shows that there is almost no evidence that hitting streaks are anything other than statistical noise.

• Re:If its so likely, they why hasn't it happened? (Score:5, Insightful)

on Monday March 31, 2008 @09:45AM (#22921030) Homepage

:::The early years tended to be batting competitions (in some ways like today's) rather than pitching competitions

::If by "early years", you mean 1920 and later, yeah.

:Otherwise, buddy, you're way off base.

The only one off base is yourself -- check your own link (baseball-reference.com is an amazing site and I recommend it to anyone) and pay extra attention to the 1890s. In the years immediately after the pitcher's mound was moved back to its current 60 feet 6 inches, offensive totals soared far beyond what we're used to seeing. Given that you're familiar with the lowering of the mound for 1969, I'm surprised that you're not familiar with when it was fixed at its current distance.

The article even mentions that the record was most likely to have been set in 1894, when the National League ERA was well over 5.00, and there were 11.6 hits per team per game, more than 20% more than we see now.

Look at those ERAs pre-1920. Before 1920, the ERA on the NL never significantly exceeded 3.00.

I'm looking at them. The "5.32" for 1894, which is somewhat more than three, is particularly striking.

After 1920, it never dropped below 3.3 or so, with the exception of a 2.99 in 1968, after which MLB made changes to the rules, amongst them lowering the acceptable height of the pitcher's mound.

...

You need to research "dead ball era", and the response by baseball to "Black Sox". (Hint: just like the response to the 1994 strike, it involves the ball...)

While he's doing this, perhaps you could research what came before the dead ball era: namely, the high-offense 1890s. Teams were taken off guard by the increase in the pitching distance and continued to play an 1880s game in a new environment. It took several seasons for adjustments, such as four-man pitching rotations and the occasional use of relief pitchers, to balance the sudden advantage that had been given to the batters. It is not surprising that 1894 would be the year in which a long hitting streak would have been most likely -- the single-season record for runs scored, 194 by Billy Hamilton, was set that year and still stands today.

The fact that you got a +5 out of such a demonstrably incorrect post is a major indictment of the baseball knowledge of the Slashdot faithful.

No, Martin is right -- the 1890s, while not as famous as Ruth and Gehrig's 1930s, were one of the most offensive eras in baseball. His simple analysis is much more forgivable than the insults you throw his way even while being completely ignorant of an entire decade of baseball history, the data from which are right on the web page you so callously direct him to visit.

The bigger the theory the better.

Working...