Follow Slashdot blog updates by subscribing to our blog RSS feed

typodupeerror

## New Pattern Found In Prime Numbers509

stephen.schaubach writes "Spanish Mathematicians have discovered a new pattern in primes that surprisingly has gone unnoticed until now. 'They found that the distribution of the leading digit in the prime number sequence can be described by a generalization of Benford's law. ... Besides providing insight into the nature of primes, the finding could also have applications in areas such as fraud detection and stock market analysis. ... Benford's law (BL), named after physicist Frank Benford in 1938, describes the distribution of the leading digits of the numbers in a wide variety of data sets and mathematical sequences. Somewhat unexpectedly, the leading digits aren't randomly or uniformly distributed, but instead their distribution is logarithmic. That is, 1 as a first digit appears about 30% of the time, and the following digits appear with lower and lower frequency, with 9 appearing the least often.'"
This discussion has been archived. No new comments can be posted.

## New Pattern Found In Prime Numbers

• #### Like batting orders (Score:4, Interesting)

<skh2003@coPARISlumbia.edu minus city> on Sunday May 10, 2009 @10:46AM (#27896617) Homepage Journal
I'm not a mathematician, could someone explain why this is surprising in terms that a computer programmer or biologist could follow? First thing I thought - no matter how many innings you have, you can guarantee that the top of the order will be up at least as many times as the bottom of the order.

Okay, if you have a random number along the interval (1,10^X), all the leading digits will be equally likely.

If you have some other interval (1,n*10^X), 1<=n<=9, then the leading digits > n will appear roughly 1/10 as often as leading digits 1..n.

If you have a large sample which is drawn from an admixture of some huge number of random distributions (1,n*10^X), with the "n" of each sub-distribution evenly distributed on 1..9, then the lower leading digits will be moderately more common, yeah?

Prime numbers, meanwhile, become decreasingly common as you get larger and larger, is that not correct? So it seems to me this is the obvious way to model prime numbers, no?
• #### Stock market analysis? (Score:5, Interesting)

on Sunday May 10, 2009 @10:48AM (#27896635)
I am admittedly not a mathematician, but I do have a good understanding of economics and finance, and I am not seeing how a pattern found in prime numbers could have any application to stock market analysis. Where is the interaction between prime numbers and the praxeology of buying and selling securities? Even if you're only focusing on automated buying and selling, those algorithms were still programmed by humans with their own subjective approaches and underlying premises.
• #### Re:9999991 (Score:3, Interesting)

on Sunday May 10, 2009 @10:58AM (#27896739)

Explain one man being hit seven times with lightning. http://en.wikipedia.org/wiki/Roy_Sullivan [wikipedia.org]

Poor bastard. After the fourth! time he began carrying a pitcher of water with him... I find it hard not to be amused.

• #### Re:Other bases? (Score:4, Interesting)

by Anonymous Coward on Sunday May 10, 2009 @10:59AM (#27896749)

Benson's Law is actually independent of the number base used. It wouldn't be much of a mathematical property if it wasn't.

Err, what? The study of representations of numbers is a valid field of mathematics itself.

• #### Re:Other bases? (Score:3, Interesting)

on Sunday May 10, 2009 @11:04AM (#27896773) Homepage

If you got more numbers in 10-19 than 90-99, you probably also have more numbers in 0x10-0x1f than 0xf0-0xff...

• #### Re:Other bases? (Score:2, Interesting)

by Anonymous Coward on Sunday May 10, 2009 @11:06AM (#27896783)

(Warning, IANAM)
It's base independent. Basically, primes are distributed on a logarithmic scale (prime number theorem). For sufficently large intervals, there are always more primes in interval starting at x than the interval of the same size starting at y if xy. Like, there are more prime numbers in 1000..1999 than in 9000..9999.

• #### Re:Like batting orders (Score:2, Interesting)

on Sunday May 10, 2009 @11:08AM (#27896807)

Good call.

Look at this: http://www.chaos.org.uk/~eddy/math/Benford.html [chaos.org.uk]

• #### Re:Other bases? (Score:4, Interesting)

on Sunday May 10, 2009 @11:31AM (#27896961)
Code this have cryptographical uses? IANAMOG, but I know primes play a role in many crypto schemes.
• #### If you're dealing with phone numbers (Score:5, Interesting)

on Sunday May 10, 2009 @11:37AM (#27896991) Journal
It has less to do with math and more to do wit physics: as in how to use a an old school phone. Phone numbers, until comparatively recently would "prefer" lower numbers because they are EASIER TO DIAL. If a company had the phone number (909)999-9009 you would HATE dialing that thing. It would take about half a minute just to dial the damn number.

Ssssshhhhhhik!

Total pain in the finger.

1 as a first number was reserved for "other stuff" like international calls, so the lowest possible area codes (first numbers) went to places like New York City (212 - very quick to dial) or LA (213) because millions of people would be dialing that number, so it made for an overall faster dialing experience for (on average) more people.

This is compared to the relatively few people who lived in more obscure parts of the country, like Saginaw MI (989) or Bryan TX (979).

So, you have millions of phones in 212, thousands in 979. The result: saved effort in dialing.

Also, to this end there was a preference for exchanges to have lower numbers as well to save on dialing effort, and phone numbers with lower (but NON-ZERO) values were sought after. You'd see advertisments like "Call RotoRooter - 213 464 1111 !" or "Call us NOW for a free analysis! 201 738 1122 !" etc. and so on.

So, lower numbers in phone numbers have been a product of primitive dialing technology. Now with touchtone - all that is out the window - but the historic trend is still there and quite powerful - people will pay good money for a 212 area code for the distinction of being in the "real" New York Area code...

RS

• #### Re:Stock market analysis? (Score:5, Interesting)

on Sunday May 10, 2009 @11:37AM (#27896995) Homepage Journal

I've always wondering how I could figure out when someone was trying to pass off a list of fraudulent primes. Glad to see that this problem is finally solved!

You're jesting, but I imagine that many fields of encryption would benefit from this, like dual key encryption, where the security lies in the ability to trust that the product really is of two primes, and that factoring this would be extremely time consuming.

Sets with a backdoor inserted may indeed have a different signature, and to be able to quickly see that one set differs would be invaluable. It wouldn't prove anything, but if, say, keys received from a certain company's key generator stood out like a sore thumb in a Benford distribution check, you would have reason to suspect foul play, incompetence or both.

• #### Independent Verification (Score:5, Interesting)

<eldavojohn&gmail,com> on Sunday May 10, 2009 @11:43AM (#27897031) Journal
Here's what I got on my own counts using the first million primes [utm.edu]:

1: 415441
2: 77025
3: 75290
4: 74114
5: 72951
6: 72257
7: 71564
8: 71038
9: 70320

Which puts the probabilities at:

1: 0.415441
2: 0.077025
3: 0.07529
4: 0.074114
5: 0.072951
6: 0.072257
7: 0.071564
8: 0.071038
9: 0.07032

My computer is currently crunching the first fifty million primes and I will post those as a reply to this post later today when it is done.

These ratios on numbers 2-9 seem far too close in range for this to be a true log scale. Hopefully with more data it will be more logarithmic.

• #### Enron (Score:5, Interesting)

on Sunday May 10, 2009 @11:45AM (#27897059) Journal

was busted by auditors who found the books were "cooked" by applying the law of first numbers described in the /. blurb and TFA. The independent auditors found the first figures were randomly distributed instead of following Benford's law with the number 1 the most plentiful and nine the least -- therefore, the entries were fraudulent.

Benford's law knocked my out at the time; I thought of how many bogus figures I had entered in my expense accounts over the years....

• #### Re:Stock market analysis? (Score:2, Interesting)

on Sunday May 10, 2009 @11:46AM (#27897065)

"I have poor reading comprehension" isn't that great of a joke.

If you really didn't figure it out yet (I suspect you actually have), the ability to detect a pattern that should occur in natural data but probably will not be present in fraudulent data (a sophisticated fraudster can generate to any test...) is what makes it interesting for detecting fraud, not the fact that the pattern was first elucidated from prime numbers.

• #### Re:Other bases? (Score:3, Interesting)

on Sunday May 10, 2009 @11:58AM (#27897149)

The Wikipedia article states that in practice more than the first digit is used.

Is this something like a histogram, and increasing the number of digits analyzed gives you a better picture?

• #### Re:Other bases? (Score:4, Interesting)

by Anonymous Coward on Sunday May 10, 2009 @12:00PM (#27897167)

But how many would contain all 1s? Answer that, and provide a proof for your answer, and you'll make math history.

• #### Re:9999991 (Score:3, Interesting)

on Sunday May 10, 2009 @12:03PM (#27897201)

Or (to put very crudely in layman-like terms): In a world where billions of things happen every single day, "1 in a million" events happen all the time.

• #### Re:Other bases? (Score:4, Interesting)

<sparr0@gmail.com> on Sunday May 10, 2009 @12:13PM (#27897297) Homepage Journal

The ending of a number in binary indicates its oddness. 2 (0b10) is the only prime that ends in zero in binary.

• #### Re:If you're dealing with phone numbers (Score:2, Interesting)

on Sunday May 10, 2009 @12:19PM (#27897355)
Also sped up switching time on old style exchanges, right? The ten position relays at the CO that had to mechanically advance for each digga and were probably no fun to replace...
• #### Choice of base is not mathematically relevant (Score:2, Interesting)

on Sunday May 10, 2009 @12:41PM (#27897513) Homepage

Mathematicians and mathematical results are generally indifferent to base. The mathematical properties of e, for instance, have nothing to do with its decimal expansion (other than the triviality that it never repeats because e is irrational). Mathematicians (and grad students like myself) almost never write something like "e =~ 2.71828...". It's true, but we don't care. There are far more interesting ways to characterize it, such as the base of the unique exponential function which is its own derivative.

Changing from 10 to 16 would not help (or hurt) mathematics in the slightest. Try opening up a serious math book and looking for numerical constants greater than 9 (i.e. ones that would look different in hexadecimal). You won't find very many.

Among the various bases, though, balanced ternary [wikipedia.org] is kind of interesting.

• #### Not predictive (just in case you were thinking it) (Score:4, Interesting)

on Sunday May 10, 2009 @12:59PM (#27897625) Homepage

If this is the first time you've run across Benford's law, you might thought to yourself, "Wow, I can use that to predict large prime numbers! I'll just convert numbers to base X, where X is really big, and only check numbers that start with 1."

It's worth actually trying this, if you get a minute. You'll find that as X gets large, the number of primes that start with 2 gets closer to the number of primes tat start with 1. As X approaches infinity, your distribution approaches a smooth logarithmic curve.

It's neat to see it yourself. This gives you an easy way to experiment with an infinite, easily generated set of numbers that follows Benford's law. It turns out that math actually works, oddly enough.

• #### What about cryptology? (Score:1, Interesting)

by Anonymous Coward on Sunday May 10, 2009 @01:09PM (#27897707)

What implications does this have on crypto? Since a lot of decent crypto is based on random number generation will this render modern cryptographic techniques useless? Will there be some emperical way to determine the remaining digits of a random number based on Benfords posit?

• #### Re:Not surprising at all (Score:2, Interesting)

on Sunday May 10, 2009 @01:12PM (#27897729)
Agreed... in fact, I observed this same pattern back in 1998 (no kidding) in a report I wrote for my High School AP Statistics class (titled "On patterns in the distribution of prime numbers"). I submitted it for extra credit, and I got a B+.

Guess I should have published that guy!
• #### Re:Other bases? (Score:3, Interesting)

on Sunday May 10, 2009 @01:47PM (#27898037) Homepage Journal

But how many would contain all 1s? Answer that, and provide a proof for your answer, and you'll make math history.

Obviously the number of digits would have to be a prime number. But not all prime number of digits would give you a prime number. The first case is when there are 11 digits, the number would be 23*89 in that case.

• #### Re:Complete bullshit (Score:3, Interesting)

on Sunday May 10, 2009 @02:06PM (#27898225) Homepage
I was thinking this myself at first, but apparently something goes wrong with it, because (per TFA) the distribution actually moves away from Benford's Law weighting and toward uniformity as the sample size grows larger. The specific rate of this movement is somehow described by a generalization of Benford's Law; while BL at one point uses 1/x, GBL uses 1/(x^a), with BL as the special case where a = 1.
• #### Re:Other bases? (Score:3, Interesting)

<imipak@@@yahoo...com> on Sunday May 10, 2009 @02:34PM (#27898453) Homepage Journal

Since RSA relies on it being a Hard Problem to factor a number that is the product of two very large primes, it potentially introduces a weakness, as it presumably means some of those products will be easier to factor than others. A number that can be shown to be the product of two primes that both start with 9 should be much easier to work on than a number where both primes start with a 1, as there are far fewer 9- primes and therefore a smaller search space.

The obvious place to start looking would be the RSA's prime challenge. If I'm even vaguely close to being right, then you should start seeing crypto mailing lists looking for vulnerable targets amongst the numbers the RSA has published.

• #### Re:Other bases? (Score:3, Interesting)

on Sunday May 10, 2009 @02:47PM (#27898543)
Since "forever" is often exponential time, the logarithm of forever would indeed make cryptanalysis easy. :)
• #### Re:Is this surprising? (Score:2, Interesting)

on Sunday May 10, 2009 @03:08PM (#27898737)

I don't know, it just seems obvious.

Um, yeah, when you summarize, you omit details (by definition). When the details you omit are the interesting details, what you're left with is indeed obvious and uninteresting.

It's long been known that primes are more common on the low end. The interesting detail is the mathematical relationship between the actual amounts that end up in the various bins if you divide the range and count them up in each bin. Knowing there will be more in the lower bins doesn't tell you this, it just says there will be more in the lower bins. Benford's law tells you about how many (proportionally) will be in each bin. And it's not in any way an artifict of using a base 10 number system, the relationship remains true regardless of how many bins you use (10, 16, 8, whatever).

• #### Some More Information (Score:5, Interesting)

<eldavojohn&gmail,com> on Sunday May 10, 2009 @04:39PM (#27899343) Journal
So I read the comments and see that I need to do this in ranges or 1 to 100, 1 to 1000, etc. Which is fine, I've added another R method and would post the code here if it didn't yell at me for junk characters. So here are your Benford lists:

All Primes 1-100
Counted Occurances:
4, 3, 3, 3, 3, 2, 4, 2, 1
Frequencies:
0.160, 0.120, 0.120, 0.120, 0.120, 0.080, 0.160, 0.080, 0.040

All Primes 1-1,000
Counted Occurances:
25, 19, 19, 20, 17, 18, 18, 17, 15
Frequencies:
0.149, 0.113, 0.113, 0.119, 0.101, 0.107, 0.107, 0.101, 0.089

All Primes 1-10,000
Counted Occurances:
160, 146, 139, 139, 131, 135, 125, 127, 127
Frequencies:
0.130, 0.119, 0.113, 0.113, 0.107, 0.110, 0.102, 0.103, 0.103

All Primes 1-100,000
Counted Occurances:
1193, 1129, 1097, 1069, 1055, 1013, 1027, 1003, 1006
Frequencies:
0.124, 0.118, 0.114, 0.111, 0.110, 0.106, 0.107, 0.105, 0.105

All Primes 1-1,000,000
Counted Occurances:
9585, 9142, 8960, 8747, 8615, 8458, 8435, 8326, 8230
Frequencies:
0.122, 0.116, 0.114, 0.111, 0.110, 0.108, 0.107, 0.106, 0.105

All Primes 1-10,000,000
Counted Occurances:
80020, 77025, 75290, 74114, 72951, 72257, 71564, 71038, 70320
Frequencies:
0.120, 0.116, 0.113, 0.112, 0.110, 0.109, 0.108, 0.107, 0.106

This is the raw data so to turn that into something visual, I dumped it into a Google spreadsheet and made it public [google.com] (note the scale on the y axis). Enjoy!

It seems that the curve is flattening out the more data I collect, but the logarithmic curve may be valid. I have the data for 100,000,000 and will add that to the spreadsheet once it completes.
• #### Re:Surprising? (Score:3, Interesting)

<robert&aldebaran,no> on Sunday May 10, 2009 @04:55PM (#27899433) Homepage

Actually, if you read the decimals of pi backwards, you'll get all the primes after each other.

• #### Re:Enron (Score:4, Interesting)

<pkteison&hotmail,com> on Sunday May 10, 2009 @05:22PM (#27899631) Homepage

Great example. Here's a pretty good article on it: http://abcnews.go.com/print?id=98043 [go.com]

I also like their explanation for "why":
"Imagine that you deposit \$1,000 in a bank at 10 percent compound interest per year. Next year you'll have \$1,100, the year after that \$1,210, then \$1,331, and so on. The first digit of your bank account remains a "1" for a long time.

When your account grows to more than \$2,000, the first digit will remain a "2" for a shorter period as your interest increases. And when your deposit finally grows to more than \$9,000, the 10 percent growth will result in more than \$10,000 in your account the following year and a long return to "1" as the first digit. "

• #### I Found a Fit! (Score:5, Interesting)

<eldavojohn&gmail,com> on Sunday May 10, 2009 @06:57PM (#27900261) Journal
The results for all primes between one and one hundred million:

Counted Occurances:
686048, 664277, 651085, 641594, 633932, 628206, 622882, 618610, 614821
Frequencies:
0.119, 0.115, 0.113, 0.111, 0.110, 0.109, 0.108, 0.107, 0.107

So there's some more data for you. I added that to this spreadsheet [google.com].

So I hope that satisfies everyone who replied to my thread first of all. I hope 5,761,455 primes between one and one hundred million satisfies you.

I used a very simple Non Linear Squares model to solve for a single constant on a log of these values. I think I have a fit. Using Benford's model and the NLS Package in R, I found:

f(x) = 0.020814 * log(161.147689 * ((x+1)/x))

To fit quite nicely, here's the summary:

Formula: y ~ Const1 * log(Const2 * ((x + 1)/x))

Parameters:
Estimate Std. Error t value Pr(>|t|)
Const1 0.020814 0.001940 10.7292 1.343e-05 ***
Const2 161.147689 80.222081 2.0088 0.08452 .
---

Residual standard error: 0.0010413 on 7 degrees of freedom

Number of iterations to convergence: 8
Achieved convergence tolerance: 1.8104e-07

Here is the list of frequencies next to what my model produced:

Benford Prime Rates
0.11907548
0.11529674
0.11300704
0.11135972
0.11002984
0.10903600
0.10811193
0.10737045
0.10671280

NLS Model Results
0.1202106
0.11422279
0.11177125
0.11042794
0.10957828
0.10899193
0.10856276
0.10823497
0.10797641

I would wager that they are correct. Neat discovery!

• #### Re:Other bases? (Score:5, Interesting)

on Sunday May 10, 2009 @10:43PM (#27901701)

The whole of mathematics is really just a language of form and structure, a system to systematize and decribe structure and forms (relationships are a type of form).

So... mathematics is the vaguest thing possible?

You have mail.

Working...