Forgot your password?
typodupeerror
Math Entertainment Games

Alternate Baseball Universes 229

Posted by kdawson
from the say-it-ain't-so-joe dept.
Jamie found a NYTimes op-ed by a grad student and a professor from Cornell, outlining some research they did into alternate baseball universes. The goal was to find out how unlikely in fact was Joe DiMaggio's 56-game hitting streak, played out in the 1941 season. No one since has even come close to that record. The math guys ran simulations of the entire history of baseball from 1885 on — 10,000 of them. For each simulation they put each player up to the plate for each at-bat in each game in each year, just like it happened; and they rolled the dice on him, based on his actual hitting stats for that season. (Their algorithm sounds far simpler than whatever the Strat-O-Matic guys use.) The result: Joltin' Joe's record is not merely likely, it's basically a sure thing. Every alternate universe produced a streak of 39 games or better; one reached 109 games. Joe DiMaggio was not the likeliest player in the history of the game to accomplish the record, not by a long shot.
This discussion has been archived. No new comments can be posted.

Alternate Baseball Universes

Comments Filter:
  • by quanticle (843097) on Sunday March 30, 2008 @05:10PM (#22915004) Homepage

    I know the statisticians among you are going to bash me with a cluestick for such a naive question, but I'll ask anyway - if this event is so likely to occur, then why hasn't it happened again?

    • by Anonymous Coward on Sunday March 30, 2008 @05:12PM (#22915018)
      Clearly they aren't factoring in the stress and nerves the average ballplayer would be dealing with as they got closer to the mark.
    • Re: (Score:2, Interesting)

      by Martin Blank (154261)
      It was likely to occur early in the history of baseball, and fell off dramatically after the 1930s. The early years tended to be batting competitions (in some ways like today's) rather than pitching competitions, and a pitcher's repertoire was limited to about a half-dozen pitches, plus whatever grease, oil, jelly, file, sandpaper, thumbtack, or razor blade he could conceal.
      • by Anonymous Coward on Sunday March 30, 2008 @07:34PM (#22916064)

        The early years tended to be batting competitions (in some ways like today's) rather than pitching competitions
        If by "early years", you mean 1920 and later, yeah.

        Otherwise, buddy, you're way off base.

        NL year-by-year stats. [baseball-reference.com]

        Look at those ERAs pre-1920. Before 1920, the ERA on the NL never significantly exceeded 3.00. After 1920, it never dropped below 3.3 or so, with the exception of a 2.99 in 1968, after which MLB made changes to the rules, amongst them lowering the acceptable height of the pitcher's mound.

        The time prior to 1920 was marked by pitchers such as Cy Young, Mordecai Brown, Walther Johnson, Ed Walsh, Christy Mathewson. You've probably heard of most of them.

        Here are the single-season MLB ERA leaders. [baseball-reference.com] Outside of Bob Gibson in the aforementioned 1968, you have to go all the way to Greg Maddux in 1994 at #48 all time to find a season after 1920 on the list. Barely 10 of the 100 lowest single-season ERAs in MLB history occurred after 1920. And that's only because Pedro Martinez in 2000 and Ron Guidry in 1978 tied with 9 others for #100 on the list. So only 8 of the best single-season ERAs happened after 1920.

        You need to research "dead ball era", and the response by baseball to "Black Sox". (Hint: just like the response to the 1994 strike, it involves the ball...)

        The fact that you got a +5 out of such a demonstrably incorrect post is a major indictment of the baseball knowledge of the Slashdot faithful.
        • Most geeks find baseball deathly boring. In fact some think themselves morally superior because they don't watch competitive sports, which is of course a very stupid attitude.

          What's sad is the fact that baseball is the ULTIMATE geek sport due to sheer volume of stats it produces. I adore baseball. The SPORT of baseball. I don't actually watch MLB, but thanks to the magic of the internets I can watch Nippon League etc... Plus I enjoy looking at historic stuff, plus I have a variety of baseball sims. In fact
        • by Heian-794 (834234) on Monday March 31, 2008 @10:45AM (#22921030) Homepage

          :::The early years tended to be batting competitions (in some ways like today's) rather than pitching competitions

          ::If by "early years", you mean 1920 and later, yeah.

          :Otherwise, buddy, you're way off base.

          The only one off base is yourself -- check your own link (baseball-reference.com is an amazing site and I recommend it to anyone) and pay extra attention to the 1890s. In the years immediately after the pitcher's mound was moved back to its current 60 feet 6 inches, offensive totals soared far beyond what we're used to seeing. Given that you're familiar with the lowering of the mound for 1969, I'm surprised that you're not familiar with when it was fixed at its current distance.

          The article even mentions that the record was most likely to have been set in 1894, when the National League ERA was well over 5.00, and there were 11.6 hits per team per game, more than 20% more than we see now.

          Look at those ERAs pre-1920. Before 1920, the ERA on the NL never significantly exceeded 3.00.

          I'm looking at them. The "5.32" for 1894, which is somewhat more than three, is particularly striking.

          After 1920, it never dropped below 3.3 or so, with the exception of a 2.99 in 1968, after which MLB made changes to the rules, amongst them lowering the acceptable height of the pitcher's mound.

          ...

          You need to research "dead ball era", and the response by baseball to "Black Sox". (Hint: just like the response to the 1994 strike, it involves the ball...)

          While he's doing this, perhaps you could research what came before the dead ball era: namely, the high-offense 1890s. Teams were taken off guard by the increase in the pitching distance and continued to play an 1880s game in a new environment. It took several seasons for adjustments, such as four-man pitching rotations and the occasional use of relief pitchers, to balance the sudden advantage that had been given to the batters. It is not surprising that 1894 would be the year in which a long hitting streak would have been most likely -- the single-season record for runs scored, 194 by Billy Hamilton, was set that year and still stands today.

          The fact that you got a +5 out of such a demonstrably incorrect post is a major indictment of the baseball knowledge of the Slashdot faithful.

          No, Martin is right -- the 1890s, while not as famous as Ruth and Gehrig's 1930s, were one of the most offensive eras in baseball. His simple analysis is much more forgivable than the insults you throw his way even while being completely ignorant of an entire decade of baseball history, the data from which are right on the web page you so callously direct him to visit.

    • by hedwards (940851) on Sunday March 30, 2008 @05:18PM (#22915080)
      The most likely reason is that statistics isn't the appropriate method by which to study this problem.

      This sort of a study is really more about curiosity, it doesn't deal with things like changes to the way in which the game is played. For instance early on, and for quite a while later, it was common for a pitcher to pitch 9 innings every game, and in many cases to pitch both games out of a double header. Meaning more opportunity for errors and since batters get time to rest up, there's a bit of an edge under that style of play to the batter which doesn't exist today.

      That also doesn't include the variety of pitching which players see today or the fact that a player might get to see 3 different pitchers in a single game.

      Even the length of the season has an effect on how players play. None of those things are easily quantified, much less analyzed by statisticians.
    • by ByteSlicer (735276) on Sunday March 30, 2008 @05:38PM (#22915254)
      Because baseball players aren't dice?
      • Re: (Score:3, Insightful)

        I wish my mod points hadn't just expired, because you just summed it up perfectly. Silly study with no basis in reality.

        In other news, I've just started a fund of stocks that are held and traded based on historical data. If you invest in it, I guarantee a large return, because complex systems that rely heavily on myriad human variables are of course determined entirely by statistics.
        • by Vellmont (569020) on Sunday March 30, 2008 @08:47PM (#22916558)

          I wish my mod points hadn't just expired, because you just summed it up perfectly.

          Really? For the purposes of this article, why?

          It seems perfectly reasonable to me to take a set of data and try to model how likely a particular outcome is. That's a very valid question to ask that a statistical model can answer. The model may be flawed, need improvement, or whatever, but that doesn't mean the question isn't one that can't be answered by science.

          If you invest in it, I guarantee a large return, because complex systems that rely heavily on myriad human variables are of course determined entirely by statistics.

          This is simply an invalid analogy. The article isn't saying it can predict the future (or even the past!) based on a statistical model. All it's saying is "just how likely was it for DiMagio to get his streak, given past performance".
      • The parent should be modded insightful, not just funny.
      • by Kamineko (851857) on Sunday March 30, 2008 @07:35PM (#22916076)
        Like Einstein said: "God does not play baseball!"

        I think.
      • by khallow (566160) on Sunday March 30, 2008 @11:04PM (#22917376)
        So what does that have to do with the study? Statistics applies to a lot more than dice. No offense, but your observation sounds like one of those cute but irrelevant observations that just add noise.
        • by ByteSlicer (735276) on Monday March 31, 2008 @04:28AM (#22918976)
          Well, they modeled the batter using random numbers and their player stats. The problem is that real people don't behave deterministically. They might hit better on their birthday, or when it was a clear sky the night before the game, just because they believe that (baseball players are extremely superstitious). The model doesn't take into account that some player might get psyched out by a certain number (and always screw up on the 13th consecutive hit), or just by the pressure of wanting to break the record. It doesn't take into account the pitcher, weather conditions, and a lot of other things that matter to real people but not to computers.

          You might be able to model some long term behavior that way, but never the short term stuff, because the model is too simplified (man versus dice).

          • Re: (Score:3, Insightful)

            by khallow (566160)

            You might be able to model some long term behavior that way
            Like the probability of winning streaks over the lifespan of baseball? Pretty much what they are doing.
    • by Frequency Domain (601421) on Sunday March 30, 2008 @06:00PM (#22915432)
      No bashing, it's not a bad question. The answer is because it still qualifies as a "rare event". The thing that's kind of counter-intuitive, but easy to demonstrate, is that having a particular rare event happen is rare, but having some rare event happen is common.

      A good illustration of this is the so-called "birthday paradox", which asks what's the probability of having duplicate birthdays in a group of n people (whose birthdays are independent of each other). Think of adding the people to the room one by one. The first person doesn't have any chance of having a duplicate birthday, because there's nobody else in the room. The second person has 1/365 chances of duplicating, 364/365 of missing the first one. Let's follow up on the misses, they're easier to work with. In general, if we've got k people in the room without a duplicate, that means they've used up k of the 365 days in the year, and the next person we introduce to the room has to miss all of those days to avoid a duplication. So the probability of everybody missing everybody else, by the time we get up to n people in the room, is (365/365)*(364/365)*(363/365)*...*((365-n+1)/365), which starts diving towards zero really fast. The probability of having one or more duplicates is 1 - P(no duplicates), which correspondingly climbs to one really fast. If you write a short program to do the exact calculations, you'll find that by the time you have 23 people in the room the probability is greater than 0.5 of having a duplicate, and by the time you get 57 people it's greater than 0.99!

      If you pick one particular person and ask what's the probability of duplicating that birthday it remains quite small. That's the difference between having a particular rare event rather than having some rare event. For a large enough group, some pair of people will almost surely share a birthday but the odds of it being you (or any other designated person) remain quite small.

      Just to preserve my computing geek cred, this is why you need collision resolution for hashing algorithms. You don't know which entries will share hash values, but collisions are almost certain to happen by the time you've loaded 3 * sqrt(Hash Table Capacity) values, e.g., if your hash table has capacity 10000 you will almost surely see a duplicate within the first 300 entries.

      • by popmaker (570147)
        Indeed, when there are 23 people in the room, the probability is around 55% that some two of them will share a birthday. 23 is the smallest number such that the probability is more than 50%. Then you only need 40 people so that the probability is more then 90%. It is around 91% at n = 40.
  • Nerves (Score:5, Insightful)

    by digidave (259925) on Sunday March 30, 2008 @05:13PM (#22915028)
    This doesn't take into account that once a player achieves an impressive hit streak he gets more media attention, people start asking him about Dimaggio's record, and every time he steps up to the plate he's a bit more nervous about it than the last time, making it slightly less likely that he'll get a hit.
    • Re: (Score:3, Interesting)

      by p0tat03 (985078)
      That kind of error can be accounted for by tracking their batting averages over time. If we have a model for batting average deterioration due to stress, then the simulation will still work as a good approximation.
    • by EdIII (1114411) *
      That's like saying that a famous player would be more likely to be popular with the women and get the "clap" thereby depriving him of a few games.

      In any case, what you are talking about would affect all players equally, therefore it would cancel itself out in their research.
      • Re:Nerves (Score:4, Interesting)

        by Kjella (173770) on Sunday March 30, 2008 @05:55PM (#22915390) Homepage

        In any case, what you are talking about would affect all players equally, therefore it would cancel itself out in their research.
        Not when they use it the way they use it, and say streaks of 39 to 109 is to be expected. If the difficulty increases by the length of the streak, 56 could be a far more exceptional streak than their research indicates.
    • You know, for a lot of successful athletes stress either doesn't affect them much or it actually works the opposite way: it makes them more successful. These people are who we call "clutch performers". Of course, a lot of talented but non-clutch performers would be eliminated were this compensated for, but a lot of clutch performers would do better.
      • Re: (Score:2, Informative)

        by cleatsupkeep (1132585)
        I remember reading an article saying that "clutch performers" don't really exist - and that the reason we believe they do is because of the same biases that make us cling to our beliefs - taking note of something when it fits your belief and tossing it away when it disagrees.

        Wikipedia to the rescue ( http://en.wikipedia.org/wiki/Clutch_(sports) [wikipedia.org])

        Some sports analysts have presented evidence that while individual plays and moments may resonate as "clutch" because of their importance, there is no such thing

  • by morari (1080535) on Sunday March 30, 2008 @05:15PM (#22915054) Journal
    Talk about the statistics of anyone at bat..
    • by pchan- (118053) on Sunday March 30, 2008 @05:27PM (#22915168) Journal
      You don't understand. Baseball is so boring, the fans find the statistics exciting!
      • Re: (Score:3, Funny)

        by jd (1658)
        Do you have the statistics to prove that?
      • You just have to slow down. Put away the x-box and the crackberry. It has it's own pace and ebb and flow. drink a few cold ones, fire up the grill. A nice saturdy or sunday relaxing watching a game is never wasted. Or go out to a park and see a real live game. Get out of your parent's basement and get some sun. Even better, call in to work for a mental health day and go to a park. If you don't have access to a mjor league park a minor league game can be fun too.

        In an ADD culture, a nice relaxing baseball ga
        • by sjames (1099)

          It's also a question of HOW you watch the game. It's a lot more interesting when you understand the deeper strategy and try to guess what the manager and players are thinking, look at how the play should have happened vs. how it did and why that might be. The slower way the game unfolds allows time to talk about it with friends.

          Of course, some of the perceived slowness is just due to not understanding what is quietly happening. The runner slowly stretching his lead, the pitcher deciding if a throw to firs

      • by jocknerd (29758)
        While I can't agree or disagree with you on whether you find baseball boring, it is still my favorite sport. While I'd rather watch a college football game than a baseball game on television, I can read about baseball. I can't read about football. I can study baseball.

        I find soccer extremely boring, but its probably because I don't really know the strategy in the sport. I understand the strategy in baseball. While the game on the field goes a bit slow, the strategy that is taking place behind the scene
    • by garett_spencley (193892) on Sunday March 30, 2008 @05:29PM (#22915200) Journal
      I was once at a friend's BBQ and a lot of the other guests were really into sports and talking a lot about their various sporting events etc. I made a comment about how baseball was one of those sports that is fun to play but boring as hell to watch. One of the guys responded with, simply, "I disagree". To which I replied "You're right. It's pretty boring to play too." He wasn't very amused.

      Talk about a great way to make an awkward social event even more awkward :(
      • by rob1980 (941751)
        It's definitely one of those sports that is boring to watch on TV, at the very least. Going to a college or minor league game is a good way to spend an afternoon with the kids or some friends, as long as you eat beforehand so you don't talk yourself into paying $4 for a hot dog, $6.50 for a bag of peanuts, etc etc etc. I've heard hockey's the same way, but where I live it's pretty tough to see a live hockey game let alone one on TV.
        • by Boronx (228853)
          It's the ultimate sport: everyone's going about 5 times faster than they ought while wielding deadly weapons.
      • Actually, baseball is very exciting compared to cricket.
    • Involve baseball. There, fixed.

      Also, go Jays.
    • by krelian (525362)

      I love baseball. I view it as a large scale RPG only that I am not the DM nor a player (when I want to actually play I fire up ootp [ootpdevelopments.com] which allows me to play several adventures in a short amount of time or even include some of the heroes from old lore into my adventure.

      I don't find baseball on T.V very interesting but it's a fun sport to follow up on

  • unfortunately, not many of my comments are insightful, so with my batting average, you will have to refer to a parallel universe

    there you will find that this comment contains something worthwhile reading. sorry
  • by kingmundi (54911) on Sunday March 30, 2008 @05:24PM (#22915144)
    One of the key points mentioned in this article is when does the hitting game streak occur? They mention that it was much more likely to occur during the early 1900's which is known as the deadball era. The baseball wasn't as springy and they tended to use the same ball during the entire game. During that time it was more efficient to try and knock the ball between the holes in the fielders and get a double or single then to try and hit it out of the park.

    I think it would be more impressive to take a subset of the data, and compare from 1930 up until the present. Of course, there have been other major changes to; glove sizes, introduction of the slider for a pitch, steroid use.
  • too simplistic (Score:5, Insightful)

    by ndenissen (1201635) on Sunday March 30, 2008 @05:25PM (#22915154)
    From reading the article (which is light on the details) it seems like they used nothing but batting average, at bats, and games played.

    The problem is this doesn't control for variances in the quality of pitching. The chances of going that many games without running into a hot pitcher isn't accounted for.

    Imagine you average a 75% chance of getting a hit in any individual game. If you face three average pitchers, your chances are (.75)^3 but if you face a good pitcher an average pitcher and a bad pitcher it might be (.5)(.75)(1.0) which gives a different probability, despite the same average number of hits.

    In order to be realistic the calculation would need to account for the deviation from average in the ability of the pitchers (which would likely be higher 100 years ago because of fewer player and segregation, and now because of expansion, as compared to the 1950s)

    What they don't report is how often there are long (but not record) streaks in their model, so there is no way of knowing how accurately it reproduces reality.
    • Re: (Score:3, Interesting)

      by DannyO152 (544940)

      On the other hand, one doesn't get the benefit of running into the belly-itchers. My feeling is that, on average, the superstars, the ones with above 340 career averages, generally feasted on the mediocre to minor pitchers.

      What this study doesn't take into account is how long it takes to live through a streak. DiMaggio needed two months. Besides the strain of day to day playing (and if it's a pennant race, you know the hot hitter is going to be in the lineup) there's also the way the weather and the light

  • by kevinatilusa (620125) <kcostell@gmail.cTIGERom minus cat> on Sunday March 30, 2008 @05:25PM (#22915156)
    From the descriptions I've seen of their research, it seems that they're treating all games identically for the purpose of determining a typical season's behavior. While this may me necessary to make the computation tractable, it's not realistic, and introduces a sizable bias towards long hitting streaks.

    In reality, a league is typically very imbalanced from team to team and from pitcher to pitcher (probably even more so in the game of the early 20th century than now). It's easier to get hits off of two successive average pitchers than it is to get hits both off of a very good and a very bad pitcher. For example (to oversimplify a good deal):

    Say the league is split 50/50 between "good" pitchers (pitchers you'll get a hit off of 50% of games) and "bad" pitchers (pitchers you'll get a hit off of 80% of games). In a typical 20 game stretch, you'll encounter 10 good pitchers and 10 bad ones, and your odds of getting a hit in all 20 games would be (0.50)^10(0.80)^10, about 1/9537.

    Under their analyis as I understand it, they'd replace all the pitchers by mediocre pitchers who you'd get a hit off of 65% of the time, and your odds would be (0.65)^20, about 1/5517.

    This one assumption almost doubled your chances of getting a hit in all 20 games.

    There are other biases as well going the other way (ignoring the effect of hitting slumps, for example), but this one jumped out at me.
    • by mortonda (5175)
      Wait. You're saying that you can make statistic lie? Say it ain't so!
      </sarcasm>
  • Bogus (Score:2, Insightful)

    by DoofusOfDeath (636671)
    Shouldn't we say that the probability of it happening was 1.0, because it did happen?

    It seems to me that if their experiments report anything else, then either their models are erroneously inaccurate, or they got something else wrong.
    • Re:Bogus (Score:4, Interesting)

      by Miseph (979059) on Sunday March 30, 2008 @05:39PM (#22915270) Journal
      No, because the probability for ANYTHING, given enough chances, is 1.

      What they are actually saying is that reality appears to follow a probability bell curve.

      You could also say that, in 1,230,000 years of baseball games, we could be almost certain of a hitting streak longer than 56 games.
      • by Reziac (43301) *
        Hell, in 1,230,000 years of baseball games, *I* could get a 56 game hitting streak. ;)

        True story: I am the world's worst volleyball server. I'm lucky to whack the ball in the correct direction, let alone get it over the net. One day in the 9th grade, I scored 14 spikes in a row. Everyone (including the gym teacher) is staring at me like "WTF?! What have you done with the real Rez??" Needless to say it was a one-time freak event. :?~

  • ... they didn't take into account my 162 game hitting streak in "The Bigs" on PS3. With settings on easy.
  • Isn't this the same thing as saying that an large number of monkeys typing for a large period of time is more likely to properly re-create the complete works of tolkien ... by accident?
    • No, It would be like taking monkeys, giving them RPG style intelligence points, and running them via an RNG and see which one can write the book to stay on the best seller list for the longest time.
  • Our simulations did something very much like this, except instead of a coin, we used random numbers generated by a computer.
    It is not mathematically sound to do statistics with a random number generator. Computers do not actually generate random numbers, but instead, they can only make pseudo-random numbers that have a certain distribution.
    Any 'simulation' done in this way will always have a bias.
    In order to get correct statistics, you must actually compute the statistics.
    • It is not mathematically sound to do statistics with a random number generator. Computers do not actually generate random numbers, but instead, they can only make pseudo-random numbers that have a certain distribution. Any 'simulation' done in this way will always have a bias. In order to get correct statistics, you must actually compute the statistics.
      Sure, the proper way to put it mathematically would have been "we did a Monte-Carlo based simulation of the probability distribution of the longest hitting streak under our model due to the intractability of direct computation", but this is an editorial in the New York Times, not a mathematical journal! As a side note, just because a computation is performed on a set of pseudorandom numbers does not mean it is biased...usually the whole point of pseudorandomness is that the discrepancy between computations involving them and identical computations involving true random numbers will typically be quite small.
    • by Vellmont (569020) on Sunday March 30, 2008 @07:17PM (#22915958)

      Computers do not actually generate random numbers

      That'll be a surprise to the multiple true random number generators build into most operating systems. There's many sources of random data in a computer. Timing between keystrokes, timing of mouse movements, network latency between packets, and of course hardware random number generators that use thermal noise as its source.

      So to put it mildly, computers can, and DO generate truly random numbers that are completely unpredictable and free from bias.

      (Oh, BTW, to do a Monte-Carlo simulation (which the referenced article is) you actually don't need true random numbers, you only need a pseudo-random source that's free from bias. Those pseudo-random sources do exist, and aren't that even that difficult to code.)
      • None of those things are truly random. Unless you are dealing with quantum effects, you are not dealing with something truly random.

        In particular, timing between keystrokes is not at all random. In fact, one can use the timings between keypresses to figure out who the typist is!
        • by Vellmont (569020) on Sunday March 30, 2008 @08:35PM (#22916492)

          Unless you are dealing with quantum effects, you are not dealing with something truly random.


          From wikipedia on "electronic (thermal) noise":

          In any electronic circuit, there exist random variations in current or voltage caused by the random movement of the electrons carrying the current as they are jolted around by thermal energy.

          Is that quantum mechanical enough for you?

          As for network latency between packets, while it may not be random on a quantum-mechanical level, it's still unpredictable unless you can get on the same lan segment as the target computer. The keyboard timings are taken on a small enough time scale that they're quite unpredictable, and not related to the typist.
          • You're saying that computers _typically_ use thermal noise as a source of entropy for random-number generation?

            The only one I'm aware of would be the Yamaha DX7 (and their ilk); that's how the random-noise low-frequency oscillator is fed.
            • Re: (Score:3, Informative)

              by CTachyon (412849)

              Modern Intel motherboards (i810 forward) and AMD motherboards (768 forward) have a hardware RNG (Random Number Generator) that IIRC is based on diode noise. That's straight up quantum randomness, and most modern Linux distros automatically detect and use it if available.

    • This isn't how modern statistics is done. The pseudo-random number generators used in statistical research are entirely predictable when their initial seed is known, but are otherwise statistically random. They must obey certain requirements of "statistical randomness" that make the output look like pure entropy for essentially any form of real statistical examination, other than an attempt at determining the sequence directly. Monte carlo computation is always done with PRNGs so that the experiments are ac
    • MODS - Parent post is not informative, it is flat-out wrong. A pseudo-random number generator is is considered unacceptable if it can't pass a Turing-like test - if I gave you two sequences where one was pseudo-random and the other was "truly" random, you would be unable to tell which was which using any statistical test you can dream up. If one of the sequences yielded biased results for some known distributional property, that would itself be grounds for rejecting it.
    • It is not mathematically sound to do statistics with a random number generator.

      Why do people like you get modded up, as you sound like you never even heard the word "Chi-square" before. People don't get modded up for being correct but for *sounding* knowledgeable.

  • maybe. for nerds? i doubt it. "the math guys"... ok, definitely not news for nerds -- too dismissive of the experts
  • So basically... (Score:2, Insightful)

    by davidbrit2 (775091)
    They took a bunch of measured statistics, ran a simulation with outcomes biased using said statistics, and then acted surprised when the simulation results ended up pretty close to what actually happened?
    • Re: (Score:3, Interesting)

      by kevinatilusa (620125)

      They took a bunch of measured statistics, ran a simulation with outcomes biased using said statistics, and then acted surprised when the simulation results ended up pretty close to what actually happened?

      I think their point was that they took a set of numbers that were generally considered unremarkable (the overall statistical distribution of batting totals from the last 100+ years) and tried to show that a number that most people considered very unusual (the 56 game streak) was in fact also typical given this other, "unremarkable" set of data.

  • In every simulation, a ground ball went between Bill Buckner's legs in the 1986 World Series.
  • What about slumps? (Score:3, Insightful)

    by Squirmy McPhee (856939) on Sunday March 30, 2008 @06:02PM (#22915438)

    By assuming the hitter's probability of getting a hit is equal to his season average the researchers don't take into account that most, if not all, batters have a higher batting average at some points in the season than they do in others. As one with experience in Monte Carlo simulations I know that taking that into account would complicate the analysis considerably, but I suspect their results would be a bit different if they even did something as simple as using a 10-game moving average of the batter's average.

    • by jd (1658)
      Yes, you'd need the variance and not just the mean, and you'd need a suitable distribution, which will probably not be symmetrical and certainly not a single spike.
      • Yes, you'd need the variance and not just the mean, and you'd need a suitable distribution

        That would be another way to do it, sure. You could also use a Markov chain with states corresponding to slumping/not slumping (or something like that). Bootstrapping might not be a bad way to go, either. It all depends on how much computational effort you want to expend and how well suited these various methods are to the actual statistics (that's a project unto itself). Even then, the decisions a batter and a team'

  • This seems relevant:

    http://abcnews.go.com/Technology/WhosCounting/story?id=3694104&page=1 [go.com]

    Disclaimer: I'm not an American, so I know next to nothing about baseball - and care less!
    • by Vellmont (569020)
      The thing I find strange is the idea of
      The record book.

      As if there's some big "official" book that's published as
      "The Record Book Of Baseball", and MLB officials all sit around arguing about asterisks. I'm no baseball fan, but I never thought there was "The" record book. Isn't there just a series of "A" record books?

      I've never been a big sports fan, but what drives me crazy, especially about a game like this where you win or lose, is how people get all hung up about this record or that record. What does
      • by Otter (3800)
        I'm no baseball fan, but I never thought there was "The" record book. Isn't there just a series of "A" record books?

        My understanding is that MLB (and the other major North American leagues) does maintain an official record book, but that the Roger Maris asterisk is a myth.

  • A lot of people think baseball is boring - today it is, but take it from a geezer, not always so.

    I blame television. I can no longer watch a ball game on TV. Might as well be Entertainment tonight. They used to have a camera behind the backstop so you could see the pitch, the swing (from behind) and the infield. Another camera to go to the outfield, and maybe one for the infield. They game has strategy. It has finesse. It even has - to use a term no longer apparently known in the software world - ele
    • by Reziac (43301) *
      Bah, Vulcans could never play baseball; they'd be too caught up in the stats! It'd be exactly like Gene Mauch and the Angels -- everything done right per the stats book, but never, ever would they try anything that was out of spec. It's not the stats that win games; it's the quarter of an inch you reach beyond what your stats say you can. (I love little ball, but Mauch made me crazy.)

      I haven't been where I could get sports TV reception (or even radio reception) in 11 years. Before that, I worked my business
    • by peektwice (726616)

      I disagree on the point of baseball being ruined. There is a decidedly downhill slide occurring and it has much to do with marketing (among many other things), as you correctly point out. However, it's not beyond repair. Complicity between owners and players in the steroid scandal, marketing "multi-hundred dollar sneakers to our kids", player strikes, new stadium building binges funded by taxpayers, and all the maladies that affect baseball will never be able to overcome the nobility of baseball, unless we

  • Tickets, hot dogs and beer would be a lot more affordable.
  • Yeah, and so is the Cubs winning the World Series more than once in a hundred years
  • I've switched domains to operating systems and can now say that it took 42 googleplex simulations before I found a parallel universe where Vista doesn't suck. As you would expect, that's also the only parallel universe that had Steve Jobs throwing chairs.

  • There's no feedback here.

    Don't forget that the makeup of teams, the behavior of other players, and even the rules of baseball all depend on what happens in the game. If someone was setting a 109-game hitting streak in the 1890s, then they would be facing more determined pitchers and probably better pitchers by the time they were more than 20 or 30 games into the run. It seems pretty good odds that would have changed their batting average for that year. :)

    How are real hitting streaks distributed in time? Do
  • Comparison of Sports (Score:2, Informative)

    by buildguy (965589)
    Interesting comparison made on this page, but I'm not sure if it is accurate. http://en.wikipedia.org/wiki/Don_Bradman#World_sport_context [wikipedia.org]
  • ... simulation runs have not yet identified an alternate universe in which a Slashdotter gets a date with Jessica Alba.
  • by jocknerd (29758) on Sunday March 30, 2008 @08:48PM (#22916562)
    After the streak ended, he started a new 16 game hitting streak. That means he hit safely in 72 of 73 games.

    During the streak Joe DiMaggio had a batting average of .408, a slugging average of .717, he faced four (4) future hall of fame pitchers, and he played in the 1941 All-Star Game (he went one-for-four, scored a run, and drove in a run). Source is http://www.baseball-almanac.com/feats/feats3.shtml [baseball-almanac.com]

    During Joe DiMaggio's streak, Ted Williams actually had a higher batting average. William's batted .412 and finished with a .406 average for the year.

    Joe DiMaggio had a 61 game hitting streak while playing for the San Francisco Seals in the Pacific Coast League in 1933.
  • Joe D was my first cousin once removed. That is, he was my father's mother's sister's son. Yeah, I'm a geezer for sure. Unfortunately, as much as I love baseball, none of Joe's genes filtered down my way. I couldn't hit a big-league fastball on the best day I ever had.

  • "The result: Joltin' Joe's record is not merely likely, it's basically a sure thing. Every alternate universe produced a steak of 39 games or better; one reached 109 games. Joe DiMaggio was not the likeliest player in the history of the game to accomplish the record, not by a long shot."

    Is this just poorly written, or is their conclusion really this silly? The article seemed to say that they just took the player's batting average, and calculated how likely it is that he would get at least one hit in a gam
  • The Monroe Factor (Score:3, Insightful)

    by Slur (61510) on Sunday March 30, 2008 @11:49PM (#22917632) Homepage Journal
    Okay, okay, but what are the odds that Joe DiMaggio would have such a streak, and land Marilyn Monroe? Somebody needs to get on that simulation asap. Here are my statistics, by the way...
  • I just picked up Deep Space Nine Season One, so I gotta say... ...this is not linear.

Somebody ought to cross ball point pens with coat hangers so that the pens will multiply instead of disappear.

Working...