Welcome Guest
Log In | Register )
You last visited December 3, 2016, 4:44 pm
All times shown are
Eastern Time (GMT-5:00)

Benford’s Law and The Lottery!

Topic closed. 10 replies. Last post 10 years ago by Thoth.

Page 1 of 1
41
PrintE-mailLink
Thoth's avatar - binary
Findlay, Ohio
United States
Member #4855
May 28, 2004
400 Posts
Offline
Posted: January 2, 2007, 2:23 am - IP Logged

Benford’s Law and The Lottery

 

“Benford's law, also called the first-digit law, states that in lists of numbers from many real-life sources of data, the leading digit is 1 almost one-third of the time, and larger numbers occur as the leading digit with less and less frequency as they grow in magnitude, to the point that 9 is the first digit less than one time in twenty. This counter-intuitive result applies to a wide variety of figures, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants, and processes described by power laws (which are very common in nature). It is named after physicist Frank Benford, who stated it in 1938, although it had been previously stated by Simon Newcomb in 1881. The first rigorous formulation and proof appears to be due to Theodore P. Hill in 1988.”…Wikipedia.org (an online encyclopedia)

 

According to Benford's Law, the leading digits should be distributed (in base 10) according to the following expression: 

LOG((D+1)/D)

The “D” in the expression simply means a specific Digit (1-9).  To see how the digit “1” should be distributed, the formula would look like this: log((1+1)/1).  This equals .3010, which is 30.10%.  For the digit “2”, the formula would be log((2+1)/2)…this equals .1761 or 17.61%.

When we use the formula to calculate the distributions for all of the digits 1 through 9, we get the following table: 

DigitFrequency

Percent

10.301030.10%
20.176117.61%
30.124912.49%
40.09699.69%
50.07927.92%
60.06696.69%
70.05805.80%
80.05125.12%
90.04584.58%

As the table implies, the leading digits found in a large sample of statistical data will be very unevenly distributed…with far more numbers starting with a digit one than with a digit nine!  As to exactly why this occurs, it can be chalked up to the probability of probability.

Benford’s Law is intriguing in its many possible applications.  One of its most interesting uses is that of fraud detection!  This is simply done by measuring the values of a large data series against the expected results as calculated by Benford’s Law.  Read the following three links to get a better understanding of what Benford’s Law is and how it’s used…..pay particular attention to how these sites imply that Benford’s Law DOES NOT apply to the lottery……Then come back and read the rest of this post.    

http://plus.maths.org/issue9/features/benford/index.html

http://ddrive.cs.dal.ca:9999/page/lvl3/13

http://www.accountancyireland.ie/dsp_articles.cfm/goto/1101/page/Fraud
_Detection_with_Benfords_Law.htm

The articles basically insinuate that you shouldn’t be tempted to play combinations that start with the digit 1 just because Benford’s Law says that the digit 1 should occur much more often than the other digits…“The outcome of the lottery is truly random, meaning that every possible lottery number has an equal chance of occurring. The leading-digit frequencies should therefore, in the long run, be in exact proportion to the number of lottery numbers starting with that digit.”… PLUS.MATHS.ORG

If you look at the frequencies of digits based on the actual digit printed on the balls, then yes, they will appear almost evenly across the entire spectrum at a proportion that is equal to the number of combinations that start with that digit.  This is a no-brainer to most lottery players.  Of course each ball and or combination will appear almost equally over a long period of time! Duh!

However, when one realizes that the numbers on the balls are nothing more than simple pictures and that the time between the occurrences of the pictures is more important than the pictures themselves, then one can understand that the probability of probability is more controlling to the game than randomness itself is.  With that in mind, Benford’s Law DOES apply to lottery games and here is how we can observe it:

First, you must disregard the actual numbers on the balls, at least to the extent that the frequency of the actual numbers are not what you are tracking.  In Pick 3, we don’t expect the digit 1 to be drawn more often than the digit 8.  We also don’t expect a combo like 137 to be drawn more than the combo 589 just because the combo 137 starts with a leading digit of 1.  What we really want to track is the time between the hits of the number on the balls or combinations that they comprise!  In other words, we want to track the skips, not the ball numbers themselves…this is where Benford’s Law of First-Digits exists within the lotto!  It’s really all about hits and skips.

When applying Benford’s Law to skips, we first have to outline the possibilities for the skips.  This is pretty simple since what we are tracking will be the Leading Digit of a skips end.  According to the “Law”, we should see around 30.10% of all skips end with a leading digit value of 1 (if what I’m saying about “The Law” applying to the skips is true).  So first we need to figure out how many possible skips there are and where they all end.

Lets suppose that the last straight Pick 3 number drawn was 764.  When is the next time it could be drawn?  It could be drawn the very next game which is one game later or a back-to-back repeat.  It could be drawn two games later.  Actually, it may very well wait thousands of or even longer!  The point here is that there are certain ranges for skips to end that all contain the same leading digit.  I should point out that zero is an impossible leading digit for a skip to end in this environment.  If a straight number hits then repeats and hits the very next game, it is not counted as a skip of zero, but rather it is said to have hit exactly one game after its last hit, which is recorded as a 1.

So, for all intent and purposes, the skips will end with the leading digits of 1 through 9.  Here is how the ranges for all skips break down in the Pick 3: 

Skips Ending in Leading Digit of 1

1 game later
10 through 19
100 through 199
1,000 through 1,999

Total Possibilities = 1,111 

Skips Ending in Leading Digit of 2

2 games later
20 through 29
200 through 299
2,000 through 2,999

Total Possibilities = 1,111 

Skips Ending in Leading Digit of 3

3 games later
30 through 39
300 through 399
3,000 through 3,999

Total Possibilities = 1,111 

Skips Ending in Leading Digit of 4

4 games later
40 through 49
400 through 499
4,000 through 4,999

Total Possibilities = 1,111

Skips Ending in Leading Digit of 5

5 games later
50 through 59
500 through 599
5,000 through 5,999

Total Possibilities = 1,111 

Skips Ending in Leading Digit of 6

6 games later
60 through 69
600 through 699
6,000 through 6,999

Total Possibilities = 1,111  

Skips Ending in Leading Digit of 7

7 games later
70 through 79
700 through 799
7,000 through 7,999

Total Possibilities = 1,111 

Skips Ending in Leading Digit of 8

8 games later
80 through 89
800 through 899
8,000 through 8,999

Total Possibilities = 1,111 

Skips Ending in Leading Digit of 9

9 games later
90 through 99
900 through 999
9,000 through 9,999

Total Possibilities = 1,111

In this break down, we have a total of 9 leading digits, which gives us 9 distinct groups of 1,111 possibilities each.  These 9 groups combine to give us a total of 9,999 possibilities for any skip of a straight Pick 3 number to end.  Realistically, we could extend the ranges for each of the leading digits even further by adding the range 10,000 through 19,999 to the leading digit one ranges and then 20,000 through 29,999 to the leading digit 2 ranges and so on.  This really isn’t necessary though because as far as I know, a Pick 3 straight has only made it out past 10,000 games only one time in Pick 3 history. 

Building the Sample
Testing the Pick 3 for adherence to Benford’s Law is actually a pretty straightforward process.  The first state I tested was Ohio.  I starting from the very first game ever held, which was on 12/03/79, and included all Pick 3 drawings through the date of testing (10/14/06).  This gave me a total of 10,626 consecutive games (midday and evening combined).  Each game on the list was given a drawing number, starting with the very first game as drawing #1 and the last or current drawing as #10,626.  The results were obviously listed in the order they took place in.  Next, the list was sorted by the numeric value of the straight combos in ascending order (…012, then 013, then 014…etc.) and then their corresponding drawing number in descending order.

This allowed the skips of each straight to be calculated in another column by using the simple IF command in Excel.  Supposing that the combos appear in column F and the drawing numbers in column G, the formula =IF(F2=F3,G2-G3,G2) looks to see if the combo in F2 is the same combo in F3.  If it is the same combo, then it subtracts the drawing number of G3 from G2, effectively giving you the skip of the combo in F2 since the time of its last hit.  If the combo in F3 does not equal the combo in F2, then the drawing number for the combo in F2 (found in cell G2) is displayed because that means it was the first time the combo hit.  Note: the only time when the two combos being compared would not actually equal each other is when the two combos are different.  The Fill command is next used to apply the formula to every game on the sorted list.  Below is very small portion of the list and how it appears after it has been sorted and the formula has been applied:

Once the formula was filled to the bottom of the list and all the skips were calculated, I copied the entire “Skip” column (column H) and pasted only the values back over themselves in order to remove the formulas from each of the cells but at the same time retaining the values they created.  It did this so I could re-sort the list without losing the skip values.  Re-sorting the list isn’t really necessary for the test but I have a slight compulsion to keep things in the order that they occurred in.  I re-sorted the list by selecting columns B through H and then selecting Data, Sort, and then choosing to sort by “Game” Ascending.  This keeps the list sorted from drawing #1 to drawing #10,626 in order of occurrence and shows the number of games ago (skip) that each combo last hit.  Below is a sample of the re-sorted list:

Reading off the list above, look at the evening drawing for 10/12/06 (evening draws are in the lighter blue font).  The combo drawn was 563, which was drawing #10,623.  In the skip column you can see that it was last drawn 11 games prior.  You can verify this number simply by counting up 11 cells where you will see that combo 563 was just drawn during drawing #10,612. Now that the list of skips has been created, all we have to do is count the number of skips that fall into our specific skip ranges.  I performed this task by creating a different worksheet that applied DCOUNT functions to the list.  Once the counts were completed for each of the nine skip ranges, the totals of each were divided by the total games in the sample (10,626).  This gives the percentage of total games (of the entire history) that each of the leading digits of the skips account for. 

The final step in all of this is to simply graph out the data to see just how closely the game follows Benford’s Law.  So how closely does Ohio’s Pick 3 follow it?  Look at the graph below and see the results for yourself.

 

As you can see, it’s pretty close!  I must say that after WIN D made the first post regarding Benford’s Law that I knew it would apply to lottery games, especially Pick 3.  I was already somewhat familiar with the bias, I just didn’t know exactly how to apply it. 

Ohio is not the only state who’s Pick 3 follows Benford’s Law.  Virtually every state that I’ve tested follows it just as closely.  Here’s a few more:

 

There was one state in particular that I was just itching to test.  After reading how Benford’s Law can be used to detect fraud, I though it would be interesting to see how closely INDIANAS lottery measured up.  There seems to be a lot of suspicion here on LP that the Indiana Pick 3 and Pick 4 are rigged.  While I haven’t yet tested their Pick 4 game, their Pick 3 holds up just as good as any of the other states I tested.  This surprised me.  I was actually hoping to discover (and prove) that it was rigged by using Benford’s Law, which by the way, is something that’s not supposed to even exist in lottery games—LOL!

 

Aside from the four states shown above, I have also tested the Pick 3’s of Michigan, Pennsylvania and New Hampshire (Tri-State).  The graphs for each of these three states look just like those above.  I’m sure that the law is universal in every states game and I can just about guarantee that every lotto game, no matter its type or its odds, will also follow Benford’s Law when observing the skips.

 

There is much more that can be said about Benford’s Law and how it applies to Lottery.  I will soon be adding more to this post that will go into more detail explaining exactly why it exists and how it can be used as a possible strategy.  I will also be testing how it applies to the individual digits, the pairs and also boxed combinations as well.

 

 

~Probability=Odds in Motion~

    CARBOB's avatar - FL LOTTERY_LOGO.png
    ORLANDO, FLORIDA
    United States
    Member #4924
    June 3, 2004
    5893 Posts
    Offline
    Posted: January 2, 2007, 6:02 pm - IP Logged

    By any chance, did you test Fla? Thanks for an interesting post. Please continue on.

      Avatar
      NY
      United States
      Member #23835
      October 16, 2005
      3474 Posts
      Online
      Posted: January 3, 2007, 1:02 am - IP Logged

       "This counter-intuitive result applies to a wide variety of figures"

      The result shouldn't be counter-intuitive to anyone who gives it much thought. Any thing that is counted, whether it's how many dollars you owe for your electricity, how many miles make up a river, or how many houses are on a street is going to have more low numbers than high numbers because we start with the low numbers when we count. Anything we count has at least one 1, but won't necessarily have any higher numbers. That's so obvious that you normally wouldn't pay any attention to it. As one example, a short street might have  26 houses, so 11 of the house numbers will start with 1, 8 will start with 2, but only 1 will start with the numbers 3 through 9.

      As far as applying it to skips I'm not going to give it much thought right now, but I expect that everything you see is likely to be just another variation on counting. If a number repeats in the first 200 drawings the skip can't possibly be more than 200, so we get the preponderance  of 1's for all of the skips from 100 to 199. If it doesn't repeat by the 200th drawing but repeats by the 300th we add a bunch starting with 2, but we still only have 12 possibilities starting with 3. Similarly, if we go beyond 999 draws before a repeat we'd have to go another 1000 without the repeat in order to add any results beginning with 2. Since there's a 1 in 1000 chance that the 3 digit number we're tracking will come up on any one of the next 1000 drawings we should expect that most of the numbers that don't repeat before the 999th drawing will repeat between 1000 and 1999.

      As always, the hard part isn't  in predicting overall trends, it's predicting which number wil be drawn on a specific day.

        JADELottery's avatar - MeAtWork 03.PNG
        The Quantum Master
        West Concord, MN
        United States
        Member #21
        December 7, 2001
        3675 Posts
        Offline
        Posted: January 3, 2007, 1:57 am - IP Logged

        Very interesting point of mathematical view.

        Thanks for sharing.

        Presented 'AS IS' and for Entertainment Purposes Only.
        Any gain or loss is your responsibility.
        Use at your own risk.

        Order is a Subset of Chaos
        Knowledge is Beyond Belief
        Wisdom is Not Censored
        Douglas Paul Smallish
        Jehocifer

          Thoth's avatar - binary
          Findlay, Ohio
          United States
          Member #4855
          May 28, 2004
          400 Posts
          Offline
          Posted: January 3, 2007, 12:26 pm - IP Logged

          KY FLOYD: "As far as applying it to skips I'm not going to give it much thought right now, but I expect that everything you see is likely to be just another variation on counting."

          I understand what you are saying about the effects of counting and how starting with the low numbers (the ones) and working up to higher leading numbers (the nines) can create this effect. 

          However, the results of these tests can also be directly attributed to the laws of probability.  While it can be said that a variation of counting is causing this "phenomenon,"  probability can actually be illutstrated to be responsible for it.  The sum of all the probabilities for all the possible skip endings that have a leading digit of 1 total up to a little over 32%, which is reflected by the performance of each states game on the graphs.

          Similarly, the cumulitive probabilities for the other skip ranges are reflected accordingly.

           

          CARBOB...Ill post FL later tonight

          ~Probability=Odds in Motion~

            Thoth's avatar - binary
            Findlay, Ohio
            United States
            Member #4855
            May 28, 2004
            400 Posts
            Offline
            Posted: January 3, 2007, 9:59 pm - IP Logged

            CARBOB, 

            Here is Floridas graph.  It includes game 1 to game 6,800 (12/19/06).

             

             All states should follow the expected results very closely.

            .

            ~Probability=Odds in Motion~

              LAVERNE MALONEY's avatar - smallgirl

              United States
              Member #1987
              August 5, 2003
              8968 Posts
              Offline
              Posted: January 3, 2007, 10:32 pm - IP Logged

              WOW, this is really fascinating.

              You know what I did, I went to my favorite place to search, & that is the Search Past Results. I input a 1 in the pick 3 games search. As of this post I was able to notice that Washington State has not had a 1 come up yet in it's draws.

              I remember when lottaloot had a post about pairs that had not shown as of yet. I found that post of hers to be very beneficial.

              So I would say use that search engine not only to find out what is appearing in the #s but it is also beneficial to find out what is not appearing in the #s.

                Thoth's avatar - binary
                Findlay, Ohio
                United States
                Member #4855
                May 28, 2004
                400 Posts
                Offline
                Posted: January 4, 2007, 4:01 am - IP Logged

                Benford's Law and Pick 3 Digit Skips

                 

                Benford's Law also applies to the skips of the 10 digits (0 through 9) in all three of the Pick 3 positions.  Here are the primary skip ranges that digits can end on:

                1 game later (back-to-back repeat)
                10 through 19

                2 games later
                20 through 29

                3 games later
                30 through 39

                4 games later
                40 through 49

                5 games later
                50 through 59

                6 games later
                60 through 69

                7 games later
                70 through 79

                8 games later
                80 through 89

                9 games later
                90 through 99

                Each of the nine groups has exactly 11 possiblilites.  We could up this by adding 100 through 199 to the leading digit 1 range and 200 through 299 to the leading digit 2 range and so on.  It's pointless to do this however, because it's extremely rare for a digit to make it out a hundred games or more without being drawn.  Here is a graph of Ohio's Pick 3 that compares digit skip performance of all three positions to the expected Benford value.

                 

                Again, there is more than just "counting" causing this effect.  Why should the amount of digits that end their skips at 1 game later (a repeat) or 10 through 19 games later be so much larger than the amount of digits that end their skips at either 2 games later or at 20 through 29 games later?  Keep in mind that both those ranges contain 11 possibilities and all skips essentially occur at random.

                The answer is directly tied to the individual probabilities for each skip.  It can be shown that a skips probability decreases the farther away it gets from its last occurrence.  As a very basic example, consider the following three skip possibilities.

                A    B    C
                5    5    5

                5    X    X
                     5    X
                        5

                Suppose that the last digit to hit in position-one was the digit 5.  When will be the very next time it hits?  Skip A represents it hitting in the next game—the back-to-back repeat.  Skip B is the 5 hitting two games later and skip C is the 5 hitting three games later.  The probabilites of the three skips are as follows:

                LAST HIT 
                A = .10 or 10%
                B = .09 or 9%
                C = .081 or 8.1%

                These are just the first three skip probabilites but you can clearly see how they decrease the further away they get from the time of the last hit. To make a long story short (since I'm gettin tired), when you add the probabilities of the 11 possible skips in the "Benford Range" together, they total 35.23%.  The performance of the digits follow that percentage as shown on the graph.  I'll post more data on skip probabilities soon.     

                ~Probability=Odds in Motion~

                  CARBOB's avatar - FL LOTTERY_LOGO.png
                  ORLANDO, FLORIDA
                  United States
                  Member #4924
                  June 3, 2004
                  5893 Posts
                  Offline
                  Posted: January 4, 2007, 12:57 pm - IP Logged

                  Benford's Law and Pick 3 Digit Skips

                   

                  Benford's Law also applies to the skips of the 10 digits (0 through 9) in all three of the Pick 3 positions.  Here are the primary skip ranges that digits can end on:

                  1 game later (back-to-back repeat)
                  10 through 19

                  2 games later
                  20 through 29

                  3 games later
                  30 through 39

                  4 games later
                  40 through 49

                  5 games later
                  50 through 59

                  6 games later
                  60 through 69

                  7 games later
                  70 through 79

                  8 games later
                  80 through 89

                  9 games later
                  90 through 99

                  Each of the nine groups has exactly 11 possiblilites.  We could up this by adding 100 through 199 to the leading digit 1 range and 200 through 299 to the leading digit 2 range and so on.  It's pointless to do this however, because it's extremely rare for a digit to make it out a hundred games or more without being drawn.  Here is a graph of Ohio's Pick 3 that compares digit skip performance of all three positions to the expected Benford value.

                   

                  Again, there is more than just "counting" causing this effect.  Why should the amount of digits that end their skips at 1 game later (a repeat) or 10 through 19 games later be so much larger than the amount of digits that end their skips at either 2 games later or at 20 through 29 games later?  Keep in mind that both those ranges contain 11 possibilities and all skips essentially occur at random.

                  The answer is directly tied to the individual probabilities for each skip.  It can be shown that a skips probability decreases the farther away it gets from its last occurrence.  As a very basic example, consider the following three skip possibilities.

                  A    B    C
                  5    5    5

                  5    X    X
                       5    X
                          5

                  Suppose that the last digit to hit in position-one was the digit 5.  When will be the very next time it hits?  Skip A represents it hitting in the next game—the back-to-back repeat.  Skip B is the 5 hitting two games later and skip C is the 5 hitting three games later.  The probabilites of the three skips are as follows:

                  LAST HIT 
                  A = .10 or 10%
                  B = .09 or 9%
                  C = .081 or 8.1%

                  These are just the first three skip probabilites but you can clearly see how they decrease the further away they get from the time of the last hit. To make a long story short (since I'm gettin tired), when you add the probabilities of the 11 possible skips in the "Benford Range" together, they total 35.23%.  The performance of the digits follow that percentage as shown on the graph.  I'll post more data on skip probabilities soon.     

                  Thanks for posting the Fla graphs. You and JadeLottery have my mind working overtime, trying to absorb this info. Great stuff!!

                    Avatar
                    Greenwich, CT
                    United States
                    Member #4793
                    May 24, 2004
                    1822 Posts
                    Offline
                    Posted: January 31, 2007, 1:45 pm - IP Logged

                    Thoth,

                    Thanks for the great analysis on Benford's Law (BL)!  I believe that skips are the key to solving lottery games, putting yourself in the right spot at the right time with the right numbers.

                    Great work on the Ohio, Illinois, Maryland, Indiana and Florida graphs.  That's amazing how closely they parallel BL.

                    But, I think what's more interesting is how closely they parallel each other.  Skips that start with a 1 hover around 33-34% while BL states they should be closer to 30%.  All the other values are slightly below Benford's Law.

                    Why are they so similar to each other, yet slightly off of BL.  Is it possible that BL is off?  Or that there is something slightly twisted in the mechanics of the ball machine or RNG?

                      Thoth's avatar - binary
                      Findlay, Ohio
                      United States
                      Member #4855
                      May 28, 2004
                      400 Posts
                      Offline
                      Posted: February 1, 2007, 12:31 am - IP Logged

                      Thoth,

                      Thanks for the great analysis on Benford's Law (BL)!  I believe that skips are the key to solving lottery games, putting yourself in the right spot at the right time with the right numbers.

                      Great work on the Ohio, Illinois, Maryland, Indiana and Florida graphs.  That's amazing how closely they parallel BL.

                      But, I think what's more interesting is how closely they parallel each other.  Skips that start with a 1 hover around 33-34% while BL states they should be closer to 30%.  All the other values are slightly below Benford's Law.

                      Why are they so similar to each other, yet slightly off of BL.  Is it possible that BL is off?  Or that there is something slightly twisted in the mechanics of the ball machine or RNG?

                      Yes, the performance of the skips that start with ones are somewhat higher than what BL would suggest they should be.  I think that the figures for the expected benfords values are slightly off.  The formula LOG((D+1)/D) is pretty generic in a way....it's a close approximation but probably not 100% accurate.  I have read several articles on BL and examined a lot of graphs that illustrate how other types of dated follows the law.  The majority of the time, the ones are always higher than the 30.1% that is projected by the formula.  Benford's Law is said to exist due to the probability of probability, which is exactly what causes the skips to fall the way they do and in the frequencies they occur in.

                      ~Probability=Odds in Motion~