Forgot your password?
typodupeerror
Math News

New Pattern Found In Prime Numbers 509

Posted by Soulskill
from the benford-and-sons dept.
stephen.schaubach writes "Spanish Mathematicians have discovered a new pattern in primes that surprisingly has gone unnoticed until now. 'They found that the distribution of the leading digit in the prime number sequence can be described by a generalization of Benford's law. ... Besides providing insight into the nature of primes, the finding could also have applications in areas such as fraud detection and stock market analysis. ... Benford's law (BL), named after physicist Frank Benford in 1938, describes the distribution of the leading digits of the numbers in a wide variety of data sets and mathematical sequences. Somewhat unexpectedly, the leading digits aren't randomly or uniformly distributed, but instead their distribution is logarithmic. That is, 1 as a first digit appears about 30% of the time, and the following digits appear with lower and lower frequency, with 9 appearing the least often.'"
This discussion has been archived. No new comments can be posted.

New Pattern Found In Prime Numbers

Comments Filter:
  • Other bases? (Score:5, Insightful)

    by wiredlogic (135348) on Sunday May 10, 2009 @10:38AM (#27896535)

    When happens with the primes are represented in base-9 or base-11?

  • Duh (Score:3, Insightful)

    by Anonymous Coward on Sunday May 10, 2009 @10:41AM (#27896567)

    Benford's "law" is not a law at all... any exponential distribution will exhibit this behavior.

    • Re: (Score:2, Funny)

      by Anonymous Coward

      You're right! I'm writing to my congress asking them to repeal Benford's Law.

    • Re: (Score:3, Insightful)

      by jstott (212041)

      Benford's "law" is not a law at all... any exponential distribution will exhibit this behavior.

      A law, as the word is commonly used in math and physics, is a mathematical expression of a universal relationship. As you say, Benson's law is a property of any exponential distribution, so we agree it's universal. Why then can't we call it a law? Just because it's obvious after you understand it doesn't make it any less a law.

      -JS

  • Like batting orders (Score:4, Interesting)

    by sam_handelman (519767) <{ude.aibmuloc} {ta} {3002hks}> on Sunday May 10, 2009 @10:46AM (#27896617) Homepage Journal
    I'm not a mathematician, could someone explain why this is surprising in terms that a computer programmer or biologist could follow? First thing I thought - no matter how many innings you have, you can guarantee that the top of the order will be up at least as many times as the bottom of the order.

      Okay, if you have a random number along the interval (1,10^X), all the leading digits will be equally likely.

      If you have some other interval (1,n*10^X), 1<=n<=9, then the leading digits > n will appear roughly 1/10 as often as leading digits 1..n.

      If you have a large sample which is drawn from an admixture of some huge number of random distributions (1,n*10^X), with the "n" of each sub-distribution evenly distributed on 1..9, then the lower leading digits will be moderately more common, yeah?

      Prime numbers, meanwhile, become decreasingly common as you get larger and larger, is that not correct? So it seems to me this is the obvious way to model prime numbers, no?
  • by MSTCrow5429 (642744) on Sunday May 10, 2009 @10:48AM (#27896635)
    I am admittedly not a mathematician, but I do have a good understanding of economics and finance, and I am not seeing how a pattern found in prime numbers could have any application to stock market analysis. Where is the interaction between prime numbers and the praxeology of buying and selling securities? Even if you're only focusing on automated buying and selling, those algorithms were still programmed by humans with their own subjective approaches and underlying premises.
    • by wjh31 (1372867)
      it makes more sense to me if you read that benfords law had applications in stock markets, fraud etc, but that isnt news...
    • Re: (Score:2, Informative)

      by maxume (22995)

      Benford's law can be used to detect fraud (the article states this, I don't have any reason to doubt it). They studied primes and found a pattern that is associated with a related property that they are calling Generalized Benford's Law. Presumably, the generalized rule can be used to detect a wider range of unnatural activity than Benford's law itself.

      • by Rayban (13436) on Sunday May 10, 2009 @11:22AM (#27896893) Homepage

        I've always wondering how I could figure out when someone was trying to pass off a list of fraudulent primes. Glad to see that this problem is finally solved!

        • by arth1 (260657) on Sunday May 10, 2009 @11:37AM (#27896995) Homepage Journal

          I've always wondering how I could figure out when someone was trying to pass off a list of fraudulent primes. Glad to see that this problem is finally solved!

          You're jesting, but I imagine that many fields of encryption would benefit from this, like dual key encryption, where the security lies in the ability to trust that the product really is of two primes, and that factoring this would be extremely time consuming.

          Sets with a backdoor inserted may indeed have a different signature, and to be able to quickly see that one set differs would be invaluable. It wouldn't prove anything, but if, say, keys received from a certain company's key generator stood out like a sore thumb in a Benford distribution check, you would have reason to suspect foul play, incompetence or both.

    • by rackserverdeals (1503561) on Sunday May 10, 2009 @11:35AM (#27896981) Homepage Journal

      I am admittedly not a mathematician, but I do have a good understanding of economics and finance, and I am not seeing how a pattern found in prime numbers could have any application to stock market analysis. Where is the interaction between prime numbers and the praxeology of buying and selling securities?

      By understanding the patterns in prime numbers you can learn to spot them and avoid the sub-prime mortgage backed securities. Duh.

  • You can find the mathematicians' article at http://www.citeulike.org/group/3214/article/3664693 [citeulike.org] or http://arxiv.org/pdf/0811.3302 [arxiv.org] (pdf warning).

    I find it interesting that the article doesn't prove any theorems. At least searching for the word "theorem" in the pdf only gives references to other theorems. Searching for "proof" gives no hits.

    That leaves me thinking: what does this article tell us that we couldn't find out ourselves by ripping through some prime numbers? I thought the real power of math was to say something 100% certain about some infinitude of stuff, so we don't have to go and check every case by hand.

    Oh well, I guess every open question needs some results on the form "this holds for all n <= bignum"; say, like the Goldbach Conjecture (every even number n > 2 is the sum of two primes).

    • by quarrel (194077) on Sunday May 10, 2009 @11:10AM (#27896821)

      > That leaves me thinking: what does this article tell us that we couldn't find out ourselves by ripping through some prime numbers?

      Nothing?

      The important thing is that they ripped through some prime numbers and did notice, and they were the first to publish what they noticed.

      The world moves forward in tiny steps like this. Maybe the next mathematician gets his 'Ahuh' moment on the back of an insight like this and bang modern crypto is fucked. He might even be able to prove it for you.

      --Q

    • Their experimental result is a trivial consequence of the fact that prime number density around n is about 1/log(n). One could work out the exact theoretical distribution in one paragraph and that'd be all. I guess the authors are either ignorant or they prefer to market their result as "mysterious". Probably both.

      • OK, not the most exciting science story of all time. Perhaps Carl Sagan either implanted or discovered a potential capacity for fascination with the science of primes in his novel "Contact" where a large sequence of prime numbers is used as an attempt by extra terrestrials to communicate with humanity.
  • by PolygamousRanchKid (1290638) on Sunday May 10, 2009 @10:55AM (#27896709)

    Could this have any applications there?

    "Well, I wasn't expecting The Spanish Mathematician . . ."

  • by l00sr (266426) on Sunday May 10, 2009 @10:56AM (#27896715)

    Nobody expects the Spanish Mathematicians!

  • by Anonymous Coward

    Before I begin, I am a math phd candidate, but not in number theory. The following is probably better than a lay interpretation, but not an expert either.

    Basically, they have generalized BL (Benson's Law) to get a GBL. They then tested the primes in the range [1,10^11] against GBL, and verified they were satisfied. They DID NOT PROVE THIS HOLDS FOR ALL PRIMES!!! They then went on to conjecture the applications of this to other areas (finance, etc).

    Though the result is interesting, I really see this pape

  • Yeah, it only looks like that because started finding primes from 1 up. If we started finding them from infinity down...
  • by Ralph Spoilsport (673134) on Sunday May 10, 2009 @11:37AM (#27896991) Journal
    It has less to do with math and more to do wit physics: as in how to use a an old school phone. Phone numbers, until comparatively recently would "prefer" lower numbers because they are EASIER TO DIAL. If a company had the phone number (909)999-9009 you would HATE dialing that thing. It would take about half a minute just to dial the damn number.

    Ssssshhhhhhik!
    diggadiggadiggadiggadiggadiggadiggadiggadigga!

    Total pain in the finger.

    1 as a first number was reserved for "other stuff" like international calls, so the lowest possible area codes (first numbers) went to places like New York City (212 - very quick to dial) or LA (213) because millions of people would be dialing that number, so it made for an overall faster dialing experience for (on average) more people.

    This is compared to the relatively few people who lived in more obscure parts of the country, like Saginaw MI (989) or Bryan TX (979).

    So, you have millions of phones in 212, thousands in 979. The result: saved effort in dialing.

    Also, to this end there was a preference for exchanges to have lower numbers as well to save on dialing effort, and phone numbers with lower (but NON-ZERO) values were sought after. You'd see advertisments like "Call RotoRooter - 213 464 1111 !" or "Call us NOW for a free analysis! 201 738 1122 !" etc. and so on.

    So, lower numbers in phone numbers have been a product of primitive dialing technology. Now with touchtone - all that is out the window - but the historic trend is still there and quite powerful - people will pay good money for a 212 area code for the distinction of being in the "real" New York Area code...

    RS

    • Re: (Score:3, Insightful)

      by Sir_Lewk (967686)

      Where are my mod points when I need them, that's pretty damned interesting.

    • by jmp_nyc (895404) on Sunday May 10, 2009 @12:21PM (#27897379)
      While you're absolutely right about the reasoning behind NYC, LA, and Chicago getting 212, 213, and 312, you're a little off on the 989 and 979 area codes, which are much more recent.

      In the original system design, all area codes had a middle digit of 0 or 1. The convention was that a middle digit of 1 was used for area codes that only covered part of a state, while a middle digit of 0 was used for area codes that covered entire states. Furthermore, an area code could not begin with a 1 or a 0. and an area code with a middle digit of 1 couldn't have 1 as the third digit. (This left the shortest dial time area code for a statewide code as 201, which went to New Jersey.)

      As early as the late 1950s, the idea of single area codes for some states went out the window (with NJ splitting into 201 and 609 in 1958) because of increasing population and proliferation of phone service.

      By the late 1980s, the rules were further changed to allow for area codes with middle digits other than 1 or 0. Area codes like 989 and 979 weren't introduced until the late 1980s at the very earliest, by which point very few people were still using rotary phones. At one point, I had heard that the middle digit value of 9 was reserved for the future to allow for four digit area codes, but I can't vouch for the accuracy of that recollection. There are plenty of other rules, some of which you can see summarized here... [wikipedia.org]
      -JMP
    • by jstott (212041) on Sunday May 10, 2009 @01:26PM (#27897855)

      So, you have millions of phones in 212, thousands in 979. The result: saved effort in dialing.

      Nice idea, but you give the phone company too much credit. In the old days telephone switches still used physical relays (this is well before transistors were invented). This significantly limited the number of connections in progress each switch could handle. Since switches are expensive, you naturally wanted to pass on the call as fast as possible so you could free up the switch for the next caller. A number like '212' wasn't just easy to dial, it was fast — remember this is the era of pulse dialing as well, so a '9' took literally 9 times longer to dial than a '1'. Assigning fast numbers like '212' to New York saved money for the phone company because Ma Bell could buy fewer switches. Any benefit to the customer was purely accidental.

      -JS

  • by eldavojohn (898314) * <eldavojohnNO@SPAMgmail.com> on Sunday May 10, 2009 @11:43AM (#27897031) Journal
    Here's what I got on my own counts using the first million primes [utm.edu]:

    1: 415441
    2: 77025
    3: 75290
    4: 74114
    5: 72951
    6: 72257
    7: 71564
    8: 71038
    9: 70320

    Which puts the probabilities at:

    1: 0.415441
    2: 0.077025
    3: 0.07529
    4: 0.074114
    5: 0.072951
    6: 0.072257
    7: 0.071564
    8: 0.071038
    9: 0.07032

    My computer is currently crunching the first fifty million primes and I will post those as a reply to this post later today when it is done.

    These ratios on numbers 2-9 seem far too close in range for this to be a true log scale. Hopefully with more data it will be more logarithmic.

    • by Daimanta (1140543) on Sunday May 10, 2009 @11:55AM (#27897121) Journal

      This is one of those moments that I love /.
      Personally, I was trying to calculate the first 50M primes using the sieve of Erastothenes and then contructing a program that categorizes them but since you are doing all the work I say go ahead and I'll wait for the results.

    • I Found a Fit! (Score:5, Interesting)

      by eldavojohn (898314) * <eldavojohnNO@SPAMgmail.com> on Sunday May 10, 2009 @06:57PM (#27900261) Journal
      The results for all primes between one and one hundred million:

      Counted Occurances:
      686048, 664277, 651085, 641594, 633932, 628206, 622882, 618610, 614821
      Frequencies:
      0.119, 0.115, 0.113, 0.111, 0.110, 0.109, 0.108, 0.107, 0.107

      So there's some more data for you. I added that to this spreadsheet [google.com].

      So I hope that satisfies everyone who replied to my thread first of all. I hope 5,761,455 primes between one and one hundred million satisfies you.

      I used a very simple Non Linear Squares model to solve for a single constant on a log of these values. I think I have a fit. Using Benford's model and the NLS Package in R, I found:

      f(x) = 0.020814 * log(161.147689 * ((x+1)/x))

      To fit quite nicely, here's the summary:

      Formula: y ~ Const1 * log(Const2 * ((x + 1)/x))

      Parameters:
      Estimate Std. Error t value Pr(>|t|)
      Const1 0.020814 0.001940 10.7292 1.343e-05 ***
      Const2 161.147689 80.222081 2.0088 0.08452 .
      ---

      Residual standard error: 0.0010413 on 7 degrees of freedom

      Number of iterations to convergence: 8
      Achieved convergence tolerance: 1.8104e-07

      Here is the list of frequencies next to what my model produced:

      Benford Prime Rates
      0.11907548
      0.11529674
      0.11300704
      0.11135972
      0.11002984
      0.10903600
      0.10811193
      0.10737045
      0.10671280

      NLS Model Results
      0.1202106
      0.11422279
      0.11177125
      0.11042794
      0.10957828
      0.10899193
      0.10856276
      0.10823497
      0.10797641

      I would wager that they are correct. Neat discovery!

  • Enron (Score:5, Interesting)

    by Anna Merikin (529843) on Sunday May 10, 2009 @11:45AM (#27897059) Journal

    was busted by auditors who found the books were "cooked" by applying the law of first numbers described in the /. blurb and TFA. The independent auditors found the first figures were randomly distributed instead of following Benford's law with the number 1 the most plentiful and nine the least -- therefore, the entries were fraudulent.

    Benford's law knocked my out at the time; I thought of how many bogus figures I had entered in my expense accounts over the years....

    • Re:Enron (Score:4, Informative)

      by arth1 (260657) on Sunday May 10, 2009 @12:42PM (#27897529) Homepage Journal

      At smaller scales than Enron, Benford and other related number distribution analysis schemes are indeed used to find fraud.

      Know the time report you fill out for the company you work at? There's a chance that it passes through a filter. If you make up figures for how long you spend at certain tasks, someone higher up may see it with a footnote saying "High probability of data being fictitious". If this pattern repeat itself over months, don't be surprised if your chances of retaining your job diminishes.

      On the other hand, you can use the statistics for your own advantage too. Not the least in games and gambling, where guessing your competitor's status and actions can change the odds quite dramatically. Including games and gambling you don't think of as games and gambling, like placing bids in auctions.

    • Re:Enron (Score:4, Interesting)

      by eison (56778) <`moc.liamtoh' `ta' `nosietkp'> on Sunday May 10, 2009 @05:22PM (#27899631) Homepage

      Great example. Here's a pretty good article on it: http://abcnews.go.com/print?id=98043 [go.com]

      I also like their explanation for "why":
      "Imagine that you deposit $1,000 in a bank at 10 percent compound interest per year. Next year you'll have $1,100, the year after that $1,210, then $1,331, and so on. The first digit of your bank account remains a "1" for a long time.

      When your account grows to more than $2,000, the first digit will remain a "2" for a shorter period as your interest increases. And when your deposit finally grows to more than $9,000, the 10 percent growth will result in more than $10,000 in your account the following year and a long return to "1" as the first digit. "

  • by Mike_K (138858) on Sunday May 10, 2009 @12:15PM (#27897319)

    If you had asked me about the distribution of first digits of prime #s yesterday I probably would have guessed logarithmic, regardless of base (except for binary, of course).

    Think about it. We know that primary # are distributed logarithmically. A set of N digit #s has equal subsets of numbers starting with 1, 2, 3, etc. Those subsets are equal in size, exclusive and completely ordered with respect to each other. So it follows that the # of prime #s in consecutive subsets would be a logarithmic function. And if you add the sizes of prime subsets for each starting digit, you'll still get a logarithmic distribution.

    Nothing to see here, move along.

    m

  • by __roo (86767) on Sunday May 10, 2009 @12:59PM (#27897625) Homepage

    If this is the first time you've run across Benford's law, you might thought to yourself, "Wow, I can use that to predict large prime numbers! I'll just convert numbers to base X, where X is really big, and only check numbers that start with 1."

    It's worth actually trying this, if you get a minute. You'll find that as X gets large, the number of primes that start with 2 gets closer to the number of primes tat start with 1. As X approaches infinity, your distribution approaches a smooth logarithmic curve.

    It's neat to see it yourself. This gives you an easy way to experiment with an infinite, easily generated set of numbers that follows Benford's law. It turns out that math actually works, oddly enough.

  • Complete bullshit (Score:5, Insightful)

    by gnasher719 (869701) on Sunday May 10, 2009 @01:04PM (#27897663)
    The prime number theorem was conjectured in 1796 by Adrien-Marie Legendre and proved in 1896 independently by Jacques Hadamard and Charles Jean de la Vallée-Poussin. It says that if pi (N) denotes the number of primes p = N, then pi (N) / (N / ln N) converges towards 1; accordingly the number of primes between A and B is about (B / ln B - A / ln A). This shows that there should be slightly more d digit primes starting with 1 than with 2, 3, 4 etc. A reasonably good approximation is that the number of d digit primes starting with 1 is not 1/9 th of all d digit primes, but more precisely (11 1/9 + 5.7 / d) percent. This is all very, very simple maths. I don't think it hasn't been observed before, it was just never considered worth mentioning. However, the prime number theorem alone is not enough to prove this; it would be necessary to prove that convergence happens at a certain speed. So anything that these so-called "mathematicians" claim that depends on observations of large list of primes is pure nonsense.
    • Re: (Score:3, Interesting)

      by emurphy42 (631808)
      I was thinking this myself at first, but apparently something goes wrong with it, because (per TFA) the distribution actually moves away from Benford's Law weighting and toward uniformity as the sample size grows larger. The specific rate of this movement is somehow described by a generalization of Benford's Law; while BL at one point uses 1/x, GBL uses 1/(x^a), with BL as the special case where a = 1.
  • by eldavojohn (898314) * <eldavojohnNO@SPAMgmail.com> on Sunday May 10, 2009 @04:39PM (#27899343) Journal
    So I read the comments and see that I need to do this in ranges or 1 to 100, 1 to 1000, etc. Which is fine, I've added another R method and would post the code here if it didn't yell at me for junk characters. So here are your Benford lists:

    All Primes 1-100
    Counted Occurances:
    4, 3, 3, 3, 3, 2, 4, 2, 1
    Frequencies:
    0.160, 0.120, 0.120, 0.120, 0.120, 0.080, 0.160, 0.080, 0.040

    All Primes 1-1,000
    Counted Occurances:
    25, 19, 19, 20, 17, 18, 18, 17, 15
    Frequencies:
    0.149, 0.113, 0.113, 0.119, 0.101, 0.107, 0.107, 0.101, 0.089

    All Primes 1-10,000
    Counted Occurances:
    160, 146, 139, 139, 131, 135, 125, 127, 127
    Frequencies:
    0.130, 0.119, 0.113, 0.113, 0.107, 0.110, 0.102, 0.103, 0.103

    All Primes 1-100,000
    Counted Occurances:
    1193, 1129, 1097, 1069, 1055, 1013, 1027, 1003, 1006
    Frequencies:
    0.124, 0.118, 0.114, 0.111, 0.110, 0.106, 0.107, 0.105, 0.105

    All Primes 1-1,000,000
    Counted Occurances:
    9585, 9142, 8960, 8747, 8615, 8458, 8435, 8326, 8230
    Frequencies:
    0.122, 0.116, 0.114, 0.111, 0.110, 0.108, 0.107, 0.106, 0.105

    All Primes 1-10,000,000
    Counted Occurances:
    80020, 77025, 75290, 74114, 72951, 72257, 71564, 71038, 70320
    Frequencies:
    0.120, 0.116, 0.113, 0.112, 0.110, 0.109, 0.108, 0.107, 0.106

    This is the raw data so to turn that into something visual, I dumped it into a Google spreadsheet and made it public [google.com] (note the scale on the y axis). Enjoy!

    It seems that the curve is flattening out the more data I collect, but the logarithmic curve may be valid. I have the data for 100,000,000 and will add that to the spreadsheet once it completes.

Every successful person has had failures but repeated failure is no guarantee of eventual success.

Working...