Forgot your password?
typodupeerror
Math Entertainment Games

Alternate Baseball Universes 229

Posted by kdawson
from the say-it-ain't-so-joe dept.
Jamie found a NYTimes op-ed by a grad student and a professor from Cornell, outlining some research they did into alternate baseball universes. The goal was to find out how unlikely in fact was Joe DiMaggio's 56-game hitting streak, played out in the 1941 season. No one since has even come close to that record. The math guys ran simulations of the entire history of baseball from 1885 on — 10,000 of them. For each simulation they put each player up to the plate for each at-bat in each game in each year, just like it happened; and they rolled the dice on him, based on his actual hitting stats for that season. (Their algorithm sounds far simpler than whatever the Strat-O-Matic guys use.) The result: Joltin' Joe's record is not merely likely, it's basically a sure thing. Every alternate universe produced a streak of 39 games or better; one reached 109 games. Joe DiMaggio was not the likeliest player in the history of the game to accomplish the record, not by a long shot.
This discussion has been archived. No new comments can be posted.

Alternate Baseball Universes

Comments Filter:
  • by quanticle (843097) on Sunday March 30, 2008 @05:10PM (#22915004) Homepage

    I know the statisticians among you are going to bash me with a cluestick for such a naive question, but I'll ask anyway - if this event is so likely to occur, then why hasn't it happened again?

  • by Martin Blank (154261) on Sunday March 30, 2008 @05:16PM (#22915062) Journal
    It was likely to occur early in the history of baseball, and fell off dramatically after the 1930s. The early years tended to be batting competitions (in some ways like today's) rather than pitching competitions, and a pitcher's repertoire was limited to about a half-dozen pitches, plus whatever grease, oil, jelly, file, sandpaper, thumbtack, or razor blade he could conceal.
  • Re:Nerves (Score:3, Interesting)

    by p0tat03 (985078) on Sunday March 30, 2008 @05:20PM (#22915096)
    That kind of error can be accounted for by tracking their batting averages over time. If we have a model for batting average deterioration due to stress, then the simulation will still work as a good approximation.
  • by kingmundi (54911) on Sunday March 30, 2008 @05:24PM (#22915144)
    One of the key points mentioned in this article is when does the hitting game streak occur? They mention that it was much more likely to occur during the early 1900's which is known as the deadball era. The baseball wasn't as springy and they tended to use the same ball during the entire game. During that time it was more efficient to try and knock the ball between the holes in the fielders and get a double or single then to try and hit it out of the park.

    I think it would be more impressive to take a subset of the data, and compare from 1930 up until the present. Of course, there have been other major changes to; glove sizes, introduction of the slider for a pitch, steroid use.
  • Re:Bogus (Score:4, Interesting)

    by Miseph (979059) on Sunday March 30, 2008 @05:39PM (#22915270) Journal
    No, because the probability for ANYTHING, given enough chances, is 1.

    What they are actually saying is that reality appears to follow a probability bell curve.

    You could also say that, in 1,230,000 years of baseball games, we could be almost certain of a hitting streak longer than 56 games.
  • Re:So basically... (Score:3, Interesting)

    by kevinatilusa (620125) <kcostellNO@SPAMgmail.com> on Sunday March 30, 2008 @05:51PM (#22915364)

    They took a bunch of measured statistics, ran a simulation with outcomes biased using said statistics, and then acted surprised when the simulation results ended up pretty close to what actually happened?
    I think their point was that they took a set of numbers that were generally considered unremarkable (the overall statistical distribution of batting totals from the last 100+ years) and tried to show that a number that most people considered very unusual (the 56 game streak) was in fact also typical given this other, "unremarkable" set of data.
  • by hedwards (940851) on Sunday March 30, 2008 @05:52PM (#22915370)

    While it might be true that statistics is not an appropriate technique to study baseball, none of the things you mention are evidence against it being useful. All of the factors you mention influence a player's batting average, and the hypothesis they are using is that once you know the batting average you can calculate a set of possible histories of hits for that player, with the right statistical weight. They are assuming that the probability of a batter getting a hit in any game is uncorrelated with his performance in previous games.
    Actually, they are reasons why statistical analysis is not appropriate in this instance.

    Statistical analysis isn't inappropriate in terms of studying baseball, it is just inappropriate to use it in this manner.

    What you are suggesting is a good example of the gambler's fallacy. And it breaks down in this case for the reasons that I mentioned, the underlying conditions in which those batting averages were collected has changed in such a way that they no longer accurately reflect the present conditions.

    The GP was asking if the occurrence is that common, why hasn't it happened since, and the answer I gave was that there was a fundamental change in the way that the game is played which changed who has the advantage. It's similar to why nobody has had a .400+ season since 1930.
  • Re:Nerves (Score:4, Interesting)

    by Kjella (173770) on Sunday March 30, 2008 @05:55PM (#22915390) Homepage

    In any case, what you are talking about would affect all players equally, therefore it would cancel itself out in their research.
    Not when they use it the way they use it, and say streaks of 39 to 109 is to be expected. If the difficulty increases by the length of the streak, 56 could be a far more exceptional streak than their research indicates.
  • Re:too simplistic (Score:3, Interesting)

    by DannyO152 (544940) on Sunday March 30, 2008 @07:23PM (#22915982)

    On the other hand, one doesn't get the benefit of running into the belly-itchers. My feeling is that, on average, the superstars, the ones with above 340 career averages, generally feasted on the mediocre to minor pitchers.

    What this study doesn't take into account is how long it takes to live through a streak. DiMaggio needed two months. Besides the strain of day to day playing (and if it's a pennant race, you know the hot hitter is going to be in the lineup) there's also the way the weather and the light changes during the season. There used to be more day games and double-headers back in the 30s-40s-50s when batting averages were highest. Travel was by train and by bus and took longer. There seems to be a week every season when a cold or flu is making the rounds of the club. Then there's situational issues. 7th inning and behind, man on second base, the hitter is 0-3 and 30 games into the streak. I say the pitcher semi-intentionally walks the batter and amid a chorus of boos the streak goes poof. Here's another consideration, the opposing players and pitchers know the hitter has a streak when it gets past 20 games and the pitching gets a bit more careful and the batter has to extend the streak via pitchers' mistakes, and that makes it less likely.

    if what I say is true, it should follow that the incidence of any consecutive games with a hit streak beyond 15 in a MLB season should be lower than the probability suggested by the league batting averages (which are depressed in the NL by pitchers and the other bottom 4 from the lineup.)

  • by Anonymous Coward on Sunday March 30, 2008 @07:34PM (#22916064)

    The early years tended to be batting competitions (in some ways like today's) rather than pitching competitions
    If by "early years", you mean 1920 and later, yeah.

    Otherwise, buddy, you're way off base.

    NL year-by-year stats. [baseball-reference.com]

    Look at those ERAs pre-1920. Before 1920, the ERA on the NL never significantly exceeded 3.00. After 1920, it never dropped below 3.3 or so, with the exception of a 2.99 in 1968, after which MLB made changes to the rules, amongst them lowering the acceptable height of the pitcher's mound.

    The time prior to 1920 was marked by pitchers such as Cy Young, Mordecai Brown, Walther Johnson, Ed Walsh, Christy Mathewson. You've probably heard of most of them.

    Here are the single-season MLB ERA leaders. [baseball-reference.com] Outside of Bob Gibson in the aforementioned 1968, you have to go all the way to Greg Maddux in 1994 at #48 all time to find a season after 1920 on the list. Barely 10 of the 100 lowest single-season ERAs in MLB history occurred after 1920. And that's only because Pedro Martinez in 2000 and Ron Guidry in 1978 tied with 9 others for #100 on the list. So only 8 of the best single-season ERAs happened after 1920.

    You need to research "dead ball era", and the response by baseball to "Black Sox". (Hint: just like the response to the 1994 strike, it involves the ball...)

    The fact that you got a +5 out of such a demonstrably incorrect post is a major indictment of the baseball knowledge of the Slashdot faithful.
  • by jocknerd (29758) on Sunday March 30, 2008 @08:48PM (#22916562)
    After the streak ended, he started a new 16 game hitting streak. That means he hit safely in 72 of 73 games.

    During the streak Joe DiMaggio had a batting average of .408, a slugging average of .717, he faced four (4) future hall of fame pitchers, and he played in the 1941 All-Star Game (he went one-for-four, scored a run, and drove in a run). Source is http://www.baseball-almanac.com/feats/feats3.shtml [baseball-almanac.com]

    During Joe DiMaggio's streak, Ted Williams actually had a higher batting average. William's batted .412 and finished with a .406 average for the year.

    Joe DiMaggio had a 61 game hitting streak while playing for the San Francisco Seals in the Pacific Coast League in 1933.
  • by ByteSlicer (735276) on Monday March 31, 2008 @04:28AM (#22918976)
    Well, they modeled the batter using random numbers and their player stats. The problem is that real people don't behave deterministically. They might hit better on their birthday, or when it was a clear sky the night before the game, just because they believe that (baseball players are extremely superstitious). The model doesn't take into account that some player might get psyched out by a certain number (and always screw up on the 13th consecutive hit), or just by the pressure of wanting to break the record. It doesn't take into account the pitcher, weather conditions, and a lot of other things that matter to real people but not to computers.

    You might be able to model some long term behavior that way, but never the short term stuff, because the model is too simplified (man versus dice).

"We learn from history that we learn nothing from history." -- George Bernard Shaw

Working...