Slashdot Log In
New Pattern Found In Prime Numbers
Posted by
Soulskill
on Sun May 10, 2009 09:34 AM
from the benford-and-sons dept.
from the benford-and-sons dept.
stephen.schaubach writes "Spanish Mathematicians have discovered a new pattern in primes that surprisingly has gone unnoticed until now. 'They found that the distribution of the leading digit in the prime number sequence can be described by a generalization of Benford's law. ... Besides providing insight into the nature of primes, the finding could also have applications in areas such as fraud detection and stock market analysis. ... Benford's law (BL), named after physicist Frank Benford in 1938, describes the distribution of the leading digits of the numbers in a wide variety of data sets and mathematical sequences. Somewhat unexpectedly, the leading digits aren't randomly or uniformly distributed, but instead their distribution is logarithmic. That is, 1 as a first digit appears about 30% of the time, and the following digits appear with lower and lower frequency, with 9 appearing the least often.'"
Related Stories
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
Other bases? (Score:5, Insightful)
When happens with the primes are represented in base-9 or base-11?
Re:Other bases? (Score:5, Funny)
It would be bad.
Parent
Re:Other bases? (Score:5, Funny)
Bad as in "cross the streams" bad, or "according to an AC on Slashdot" bad ?
Parent
Re:Other bases? (Score:5, Funny)
"Bad" as in you will see the Message as hinted at by Carl Sagan's "Contact". It's from God and apparently decodes to: "We apologize for the inconvenience".
Parent
Re:Other bases? (Score:5, Funny)
Parent
Re:Other bases? (Score:5, Funny)
Parent
Re:Other bases? (Score:5, Informative)
Benson's Law is actually independent of the number base used. It wouldn't be much of a mathematical property if it wasn't. No matter how you convert a number, you will always see the same bias.
Parent
Re:Other bases? (Score:5, Insightful)
I don't know; it might be interesting to know that the leading digits of powers-of-k are distributed in some interesting way in base not-k. They obviously all have a leading 1 in base k.
Parent
Re:Other bases? (Score:5, Informative)
They are also distributed as Benson's law describes, providing that k is not a rational power of the base. IAAM.
Parent
Re:Other bases? (Score:5, Funny)
Wow, first use of "I am a moron" I've seen in the field!
Hmm, or it is Mormon?
Parent
Re:Other bases? (Score:5, Funny)
I'm pretty sure that in base-2 with no zero-padding, 100% will start with 1. :-p
Parent
Re:Other bases? (Score:5, Insightful)
...and all but one would end with 1 as well.
Parent
Re:Other bases? (Score:5, Informative)
You mean, how many are Mercene primes [wikipedia.org]?
Parent
Re:Other bases? (Score:5, Funny)
I will never understand how people do that. You have the link right there. Even if you didn't open it to make sure, the link itself mentions the name "Mersenne Prime", and yet you write Mercene.
Parent
Re:Other bases? (Score:5, Informative)
But how many would contain all 1s? Answer that, and provide a proof for your answer, and you'll make math history.
For those who didn't get it: it's not known whether there are infinitely many Mersenne primes [wikipedia.org], which have this form in binary (they're primes of the form 2^n - 1). Similarly, if you could figure out how many primes have only their first and last bits equal to 1, you would answer a longstanding question about Fermat primes [wikipedia.org] (which are primes of the form 2^n + 1).
Parent
Re:Other bases? (Score:5, Funny)
I'm pretty sure that in base-2 with no zero-padding, 100% will start with 1. :-p
100% = 100/100 = 1 = 0b1, which, by the way, looks like "Obi" and sounds like "Obi-Wan" when you say it.
Parent
Re:Other bases? (Score:5, Informative)
Numbers are objects, I wish people would understand that numbers are just distinctions. The whole of mathematics is really just a language of form and structure, a system to systematize and decribe structure and forms (relationships are a type of form).
Parent
Re:Other bases? (Score:5, Interesting)
The whole of mathematics is really just a language of form and structure, a system to systematize and decribe structure and forms (relationships are a type of form).
So... mathematics is the vaguest thing possible?
Parent
Re:Other bases? (Score:5, Funny)
Parent
Re:Other bases? (Score:5, Funny)
base-9 or base-11?
NEVER FORGET
Parent
Re:Other bases? (Score:5, Insightful)
The faux concern, or misplaced real concern, so many people show over 9/11 has made it a relevant target for such jokes since 9/12.
Parent
Re:Other bases? (Score:5, Funny)
Knock knock.
Who's there?
9/11.
9/11 who?
YOU SAID YOU'D NEVER FORGET!
Parent
Re:Other bases? (Score:5, Informative)
My math degree is getting dusty, but I'm pretty sure that the same pattern could be represented in another base by changing their generalization of Benford's law to include it, and the distribution would look like log(x)/log(9) or log(x)/log(11). Remember, changing the base of a logarithm is easy: for example, log(x)/log(e) = ln(x)
So you get the same distribution, different base.
Parent
Re:Other bases? (Score:5, Informative)
from benfords law link:
"The result holds regardless of the base in which the numbers are expressed, although the exact proportions change."
Parent
Re:Other bases? (Score:5, Funny)
Oh yeah? Well give me two minutes and check again.
Parent
Re:Other bases? (Score:5, Funny)
All your base are belong to Benford.
Parent
Re:Other bases? (Score:5, Funny)
Parent
Re:Other bases? (Score:5, Informative)
Benford's law works by the observation that, when numbers come up in certain real world contexts, the fluctuations you get in numbers should be proportional to the numbers themselves. Phrased differently, variations tend to be relative, not absolute. Because of this, if you have a very large range of random numbers from many real world measurements, then you would expect the number between t and t*(1.0001) not to vary too much for small changes in t. Let us try to use this observation very coarsely. Among the numbers with 6 digits, the number that look like 1xxxxxx (those between 100000 and 200000) should be about the same the number between 200000 and 400000. The same thing happens with the numbers with 5 digits or 7 digits or n digits (assuming that you have a wide range of random numbers, and the numbers are the kind that come from certain sorts of real world measurements). Additionally, you can get distributions for the first two digits, the first three digits, etc.
This observation doesn't depend on the base that you're working with.
Now, with the prime numbers, they have a distribution that is different from a lot of real world measurement data. The number of primes between n and n+d is approximately d/ln(n), where ln is the log with base e and d is small compared to n. So the number of primes between 500000 and 600000 is about 100000/ln(500000), and the number of primes between 500000 and 600000 is about 100000/ln(600000). By using this, and being slightly more careful, one can determine fairly easily the distribution of the leading terms of the prime numbers.
This is not a hard result. I would say that any professional mathematician who knew about the basic distribution of the primes could derive the distribution of the leading digis of the prime numbers fairly easily if anybody actually asked them to. The reason nobody mentioned this before is that nobody actually cares. While Benford's law does have applications to fraud detection, this new result does not. It's one of those things that makes people say "ooh, a pattern!" but which is just an easy and somewhat mundane corollary to a well known theorem.
Parent
Like batting orders (Score:4, Interesting)
Okay, if you have a random number along the interval (1,10^X), all the leading digits will be equally likely.
If you have some other interval (1,n*10^X), 1<=n<=9, then the leading digits > n will appear roughly 1/10 as often as leading digits 1..n.
If you have a large sample which is drawn from an admixture of some huge number of random distributions (1,n*10^X), with the "n" of each sub-distribution evenly distributed on 1..9, then the lower leading digits will be moderately more common, yeah?
Prime numbers, meanwhile, become decreasingly common as you get larger and larger, is that not correct? So it seems to me this is the obvious way to model prime numbers, no?
On the density of prime numbers (Score:5, Insightful)
Prime numbers, meanwhile, become decreasingly common as you get larger and larger, is that not correct?
Yes, that is correct. There are roughly logarithmically many of them.
Bertrand's Conjecture (proven by Chebyshev) states than for all n > 1, there's a prime p with n < p < 2n.
If you look only at powers of two, it's readily seen that there are n primes between 1 and 2^n; setting k=2^n, there are log(k) primes between 1 and k.
A logarithmic upper bound follows from the Prime Number Theorem, which doesn't have an easy proof (AFAIK). It says something much more specific than just "It's O(log n)", though. Maybe there's a simple theorem from which you can derive O(log n), but I don't know.
Parent
Re:On the density of prime numbers (Score:5, Informative)
Maybe if I had read the prime number theorem, I would have known that it's O(n / log n), which is somewhat bigger...
Parent
Stock market analysis? (Score:5, Interesting)
Re:Stock market analysis? (Score:5, Funny)
I am admittedly not a mathematician, but I do have a good understanding of economics and finance, and I am not seeing how a pattern found in prime numbers could have any application to stock market analysis. Where is the interaction between prime numbers and the praxeology of buying and selling securities?
By understanding the patterns in prime numbers you can learn to spot them and avoid the sub-prime mortgage backed securities. Duh.
Parent
Re:Stock market analysis? (Score:5, Funny)
I've always wondering how I could figure out when someone was trying to pass off a list of fraudulent primes. Glad to see that this problem is finally solved!
Parent
Re:Stock market analysis? (Score:5, Interesting)
You're jesting, but I imagine that many fields of encryption would benefit from this, like dual key encryption, where the security lies in the ability to trust that the product really is of two primes, and that factoring this would be extremely time consuming.
Sets with a backdoor inserted may indeed have a different signature, and to be able to quickly see that one set differs would be invaluable. It wouldn't prove anything, but if, say, keys received from a certain company's key generator stood out like a sore thumb in a Benford distribution check, you would have reason to suspect foul play, incompetence or both.
Parent
Cryptography? (Score:5, Funny)
Could this have any applications there?
"Well, I wasn't expecting The Spanish Mathematician . . ."
Re:Cryptography? (Score:5, Funny)
Our two main powers are insight into the nature of primes, fraud detection
and stock market analysis.
I'll come in again...
Parent
If you're dealing with phone numbers (Score:5, Interesting)
Ssssshhhhhhik!
diggadiggadiggadiggadiggadiggadiggadiggadigga!
Total pain in the finger.
1 as a first number was reserved for "other stuff" like international calls, so the lowest possible area codes (first numbers) went to places like New York City (212 - very quick to dial) or LA (213) because millions of people would be dialing that number, so it made for an overall faster dialing experience for (on average) more people.
This is compared to the relatively few people who lived in more obscure parts of the country, like Saginaw MI (989) or Bryan TX (979).
So, you have millions of phones in 212, thousands in 979. The result: saved effort in dialing.
Also, to this end there was a preference for exchanges to have lower numbers as well to save on dialing effort, and phone numbers with lower (but NON-ZERO) values were sought after. You'd see advertisments like "Call RotoRooter - 213 464 1111 !" or "Call us NOW for a free analysis! 201 738 1122 !" etc. and so on.
So, lower numbers in phone numbers have been a product of primitive dialing technology. Now with touchtone - all that is out the window - but the historic trend is still there and quite powerful - people will pay good money for a 212 area code for the distinction of being in the "real" New York Area code...
RS
Re:If you're dealing with phone numbers (Score:5, Informative)
In the original system design, all area codes had a middle digit of 0 or 1. The convention was that a middle digit of 1 was used for area codes that only covered part of a state, while a middle digit of 0 was used for area codes that covered entire states. Furthermore, an area code could not begin with a 1 or a 0. and an area code with a middle digit of 1 couldn't have 1 as the third digit. (This left the shortest dial time area code for a statewide code as 201, which went to New Jersey.)
As early as the late 1950s, the idea of single area codes for some states went out the window (with NJ splitting into 201 and 609 in 1958) because of increasing population and proliferation of phone service.
By the late 1980s, the rules were further changed to allow for area codes with middle digits other than 1 or 0. Area codes like 989 and 979 weren't introduced until the late 1980s at the very earliest, by which point very few people were still using rotary phones. At one point, I had heard that the middle digit value of 9 was reserved for the future to allow for four digit area codes, but I can't vouch for the accuracy of that recollection. There are plenty of other rules, some of which you can see summarized here... [wikipedia.org]
-JMP
Parent
Independent Verification (Score:5, Interesting)
Which puts the probabilities at:
My computer is currently crunching the first fifty million primes and I will post those as a reply to this post later today when it is done.
These ratios on numbers 2-9 seem far too close in range for this to be a true log scale. Hopefully with more data it will be more logarithmic.
Re:Independent Verification (Score:5, Funny)
This is one of those moments that I love /.
Personally, I was trying to calculate the first 50M primes using the sieve of Erastothenes and then contructing a program that categorizes them but since you are doing all the work I say go ahead and I'll wait for the results.
Parent
I Found a Fit! (Score:5, Interesting)
Counted Occurances:
686048, 664277, 651085, 641594, 633932, 628206, 622882, 618610, 614821
Frequencies:
0.119, 0.115, 0.113, 0.111, 0.110, 0.109, 0.108, 0.107, 0.107
So there's some more data for you. I added that to this spreadsheet [google.com].
So I hope that satisfies everyone who replied to my thread first of all. I hope 5,761,455 primes between one and one hundred million satisfies you.
I used a very simple Non Linear Squares model to solve for a single constant on a log of these values. I think I have a fit. Using Benford's model and the NLS Package in R, I found:
f(x) = 0.020814 * log(161.147689 * ((x+1)/x))
To fit quite nicely, here's the summary:
Here is the list of frequencies next to what my model produced:
I would wager that they are correct. Neat discovery!
Parent
Enron (Score:5, Interesting)
was busted by auditors who found the books were "cooked" by applying the law of first numbers described in the /. blurb and TFA. The independent auditors found the first figures were randomly distributed instead of following Benford's law with the number 1 the most plentiful and nine the least -- therefore, the entries were fraudulent.
Benford's law knocked my out at the time; I thought of how many bogus figures I had entered in my expense accounts over the years....
Complete bullshit (Score:5, Insightful)
Some More Information (Score:5, Interesting)
All Primes 1-100
Counted Occurances:
4, 3, 3, 3, 3, 2, 4, 2, 1
Frequencies:
0.160, 0.120, 0.120, 0.120, 0.120, 0.080, 0.160, 0.080, 0.040
All Primes 1-1,000
Counted Occurances:
25, 19, 19, 20, 17, 18, 18, 17, 15
Frequencies:
0.149, 0.113, 0.113, 0.119, 0.101, 0.107, 0.107, 0.101, 0.089
All Primes 1-10,000
Counted Occurances:
160, 146, 139, 139, 131, 135, 125, 127, 127
Frequencies:
0.130, 0.119, 0.113, 0.113, 0.107, 0.110, 0.102, 0.103, 0.103
All Primes 1-100,000
Counted Occurances:
1193, 1129, 1097, 1069, 1055, 1013, 1027, 1003, 1006
Frequencies:
0.124, 0.118, 0.114, 0.111, 0.110, 0.106, 0.107, 0.105, 0.105
All Primes 1-1,000,000
Counted Occurances:
9585, 9142, 8960, 8747, 8615, 8458, 8435, 8326, 8230
Frequencies:
0.122, 0.116, 0.114, 0.111, 0.110, 0.108, 0.107, 0.106, 0.105
All Primes 1-10,000,000
Counted Occurances:
80020, 77025, 75290, 74114, 72951, 72257, 71564, 71038, 70320
Frequencies:
0.120, 0.116, 0.113, 0.112, 0.110, 0.109, 0.108, 0.107, 0.106
This is the raw data so to turn that into something visual, I dumped it into a Google spreadsheet and made it public [google.com] (note the scale on the y axis). Enjoy!
It seems that the curve is flattening out the more data I collect, but the logarithmic curve may be valid. I have the data for 100,000,000 and will add that to the spreadsheet once it completes.
Re:9999991 (Score:5, Insightful)
Explain one man being hit seven times with lightning. http://en.wikipedia.org/wiki/Roy_Sullivan [wikipedia.org]
Improbable doesn't mean impossible.
Parent
Re:Why do people study "math" in college? (Score:5, Insightful)
Parent
Plenty of reasons people study math (Score:5, Insightful)
A few examples:
For the same reason some people take Philosophy, Ancient Literature, Paleontology, etc. Because they think the subject is cool, and aren't necessarily at school to learn a trade. (Indeed, Engineering students that are paying attention also discover they aren't directly being taught a trade either. Or at least they aren't in any Engineering college worthy of the name.)
They want to become an actuary. This is a fairly well-paid job that is also rather difficult to do, and even harder to do well.
They want to become math teachers; a valuable and much-needed profession. Math is a useful tool in teaching students how to think. We certainly don't torture legions of high school students with the details of conic sections because anybody is under the impression this is a directly practical skill for most citizens to have. Nor are hundreds of thousands of college students subjected to the horrors of calculus because of some kind of employment program for math post-docs.
They are double-majors in a field in which math is extremely important (physics, astronomy, computer science, every type of engineering, linguistics, medicine, biology, etc. Pretty much every field outside the humanities. Oh, and some of the humanities make extensive use of math too.)
SirWired
Parent
Re:Why do people study "math" in college? (Score:4, Funny)
The real question is did his feigned interest result in sexual intercourse?
Parent
Re:9 not too common? (Score:5, Funny)
that makes my /. id even more impressive :)
Parent