hypersoniq's Blog

Jackpot number selection using range reduction

Regardless of the system used to pick numbers (except for quick picks), you can craft a pick using range reduction, working from both ends... (using a 5 of whatever format)

Step 1, pick the low number. In a non replacement draw, picking the column 1 number first gets you a lower bound. No numbers can be the same or less than this number.

Step 2, pick the high number... this gives an upper bound. No number can be the same or higher than this number.

Now that you have your range, time to reduce it...

Step 3, pick your next lowest number. This must be 1 higher than the lowest, and no more than 3 lower than the highest. This sets the next low bound.

Step 4, pick your next to last highest number. This must be at least 1 lower than the high bound, and at least 2 higher than your new low bound from step 3.

Step 5, your new bounds set you up to pick from what remains in the center section.

Sounds like unhelpful obvious info, but when you are selecting numbers based on some metric, it can help. The simple setup is in a spreadsheet, using cell formatting to black out the numbers no longer in play at each step. It can also be easily reset when looking at other possibilities.

Entry #426

Pennsylvania Cash 4 Life, the first non replacement draw target!

After doing some quick research, this nightly 5/60 + 1/4 game has used the same matrix since it's introduction on April 7th, 2015... 10 years = NOT data starved!

Why this game? 

Because the bonus ball can be brute forced... unlike the other jackpot games with a bonus ball, covering the bonus balls requires only 4 tickets for a draw. At $2 per ticket, that's only 8 bucks... compared to the prohibitive cost of doing the same with something like the Mega Millions, where the elimination of the bonus ball by brute force would cost $120 per draw.

Since the initial window would be possibly 26 drawings, $208 for the Cash 4 Life is definitely cheaper than $3,120 to do the same with the Mega Millions...

Of course a single $250 straight on the pick 3 covers the $208 attempt... and would be the only scenario by which it goes live.

Thinking about it, even at my age of 56, a win would also capture 3 2nd tier prizes...

1 grand prize, taken as a lump sum is $7 million before taxes

3 2nd tier prizes would be taken as annuities, and depending on the taxes, would be worth somewhere between $25,000 to $36,000 per year AFTER taxes!

Pretty sure with a minimum $75,000 guaranteed annual post tax income and at least $3,500,000 clear in the bank... early retirement is an option!

The best part is, the pick 3 win (if any) would pay for the chance!

Also provides a bit of motivation for development... I need to make a csv with all 10 years of data, then make a working copy where the bonus balls are stripped, totally focusing on the 5/60 part.

But reality sets back in and I must still clash with 20 years of failed attempts at a simple 1:1000 game...

Bottom line, you don't know if you don't try...

Entry #425

New summer project, recreate all data files

After running the code for the pick 5, I realize that I am missing 17 draws between the two, they have been drawn an equal number of times! Most (70%) was entered by hand, and therefore prone to transcription or other human error...

So, since I want to start a data file for the PA Cash 4 Life, I may as well keep going and redo the P3, the P5, the Match 6 and also spot check the PB and MM data for accuracy.

That is boring busy work, but accuracy cannot be let to suffer... if I notice it, I MUST fix it!

Same rules apply, the window of double draw promotions will be purged, and the evening 666 incident will also be wiped.

I hope that it doesn't set things back too much, but that is at least a day of importing data 1 year at a time per game, as PA does not have an API to use like some states.

Still working on reading their RSS news feed data, but that would require firing up the Raspberry PI and leaving it on so the update script can be run every day at the same time (with CRON scheduling). Then the scripts can just be run from the PI whenever needed... constant updates of accurate data being the goal.

While figuring out this new system, the roll out of live play may be delayed even further, but I am hoping since the sample size is only 150 draws with a 7 day window added, priority can remain here while the long term data is recollected and the updater is built.

The money saved by not playing anything for months at a time is nice!

BUT, with zero chances in the game, there is a zero chance of winning...

Entry #424

The fun part... operation names for each phase.

I like to use cheesy military style operation names for each phase, knowing full well that they will not be successful...

Phase one... Operation "Kickstart"

This is for the PA pick 3, the goal is to hit both the pick 3 eve and pick 3 mid in a predetermined time frame.

Failure is not hitting anything over 1 month of play.

Success is hitting day and eve pick 3s within the same month (never had that happen before)

Dream outcome, hitting mid day and eve pick 3 in the same week.

Phase 2... Operation "Bailout"

This is for the PA pick 5, which I have never hit straight...

Failure is hitting nothing.

Good is hitting either day or night in any time frame

Success is hitting both the day and evening in a 3 month period

Dream scenario is hitting the day and evening in the same month.

Phase 3... Operation "Break the Chains"

This is for the smaller jackpot games, including PA Cash 5, PA Match 6 and MUSL's "Cash 4 Life"

Failure being the status quo, no top tier prize.

Success is a top tier prize, no time frame. Only needs to work once... BUT the prize must clear 1M after taxes

And finally

Phase 4... Operation "Golden Parachute"

PB or MM jackpot, no time frame. Only needs to work once. This is the operation where success means there is not only plenty of money, but the real prize is getting what time remains back!

Of course, since the goal is to fund all other operations from the pick 3, so a non start on that trashes the chances of the others... back to the spreadsheets for me.

Entry #423

Most recent observations in the pick 3 data

The most common percentage is 10%... the actual middle! The majority of frequency percentages lie between 9% and 11%, again, the middle... the actual 10% value happened in 800 of the 4,500 NNN draws.

Though it would be tempting to start at 10% in each column, the number of NNN draws with exactly 10% in all columns is less common. They are usually a mix. Common numbers are 9.33% 10.67% etc.

Going to have to start working on the indicators of standard deviation and quartiles, and maybe add a few more... bottom line is this is not quite ready for prime time... but progress is being made!

Entry #422

Quick takeaways from the first look at the data

The pick 3 has the same roughly 1/4 of entire draw history classified as NNN. Slightly higher percentage for the evening ball drawn game.

Pick 5 has around 2/5 of all draws classified as NNNNN, and there are slightly more on the evening ball drawn game. Here is the kicker, in over 5,000 draws for each, a HHHHH draw happened only once and there are ZERO CCCCC draws in BOTH mid and eve. There are even a few 7 draw windows in both games where there are 2 NNNNN draws!

The next step is to get an idea of which percentages showed up most frequently, and this will be done on both the entire history, and the filtered copies that are made up of only NNNs or NNNNNs.

I may need to modify the back test script to include the standard deviation and quartiles for each row. Since they are the starting indicator candidates, they should be present in the data set.

I may also consider generating heat maps for the percentages of all neutrals.

It is looking difficult to gauge any set rules at this time, the next set of tests will hopefully shed some light. What I am hoping to see is a way to further reduce the valid play set. This seems important with the pick 5, since the percentages on the NNNNNs have more swings than on the pick 3.

If it does manage to yield anything, it will need to be documented, as will subsequent future live tests. This time when failure occurs, I am going to learn something from it!

On the plus side, high/low, even/odd, sums and replacement techniques like mirrors and vtracks have all been ignored, and the entire system is based on pure frequency... but not dependent on the highs and lows.

Trying to reverse solve the NNN matches will hopefully yield a step or set of steps that can be applied universally for each column to give the best possible guess for each game. The classifiers are chosen with a hard statistical threshold, and the model looks at the frequencies that happen rather than the actual digits... I am simply trying to choose the most likely frequency within that 150 draw training set and play the corresponding number in each column.

Entry #421

Sifting through the mountain of data to find the next step

Today I have a bit of time to begin looking at the data collected from the full history rolling back tests.

One quick observation is how indexing only 7 draws, which keeps 143 of the same draws in the training set, can drastically change things like the standard deviation and the quartiles.

The first step will be to look at what is there in the PA Pick 3 Evening game data, looking for an idea of any connection at all between sets. Perhaps going to the previous 2 or 3 sets.  This step is concerned with choosing indicators that may get a pick based on the most recent data

The next step will involve isolating only the NNN draws to look for the frequency of percentages within those draws. This step is for gathering the profile of all NNN draws in a game history.

Hopefully the third step will be to put together a plan to make an actual guess.

Then the above will be tested on the remaining data sets (pick 3 mid and both pick 5 (quinto) games.

Even if it ends up that I run offsets at 21, 14, 7 and 0 it will still leave 129 draws in common across the range... but 7 draws can really reshape the statistics!

From the standpoint of the games, all PA mid day draws are done with a computer, while all evening PA draws are done with ball machines. The data is so far consistent between methods... it does not matter how they pick them, they tend to fill from the middle. By interpolation we can guess that the appearance of NNNN draws in the pick 4 would be in the mid 60% of all draws, and the pick 2 would be in the high 90% range... but the pick 2 and pick 4 are being excluded because the pick 2 has a weak payout and a minimum $1 bet and the pick 4 is more of a nuisance when it comes to claiming and taxation.

Also, even though the data is all combos, the analysis will still be done per column, so anything found in the pick 3 data can be immediately applied to the pick 5.

Also, with new systems, I tend to make the first play a bit heavier than normal. The plan is $7 a week for 50 cent straight bets, but the first run will be for $1 straight. And the plan remains the same for the pick 5... ONLY to be played on winnings from the pick 3... so that may never be live tested... Gotta keep it real.

Time to fire up the laptop, break out the reading glasses and see what can be discovered.

Entry #420

The bet strategy for the system under development

Since the core of the initial observation is a 7 draw window, it would be the easiest yet.

With the pick 3 it would be super cheap. 0.50 bets on both mid day and evening draws for a total expense of $7

Since the pick 3 and pick 5 have a similar window, but the pick 5 has no 0.50 option, that cost would be $14 per week.

To play both would be $21 for a week.

However, I have a general idea of indicators highlighting when NOT to play (narrow neutral spread)... this could make it even cheaper.

Of course, the goal was to have the pick 3 pay for the rest, so between 0, $3.50 or $7 per week until a hit (straight play only).

The plan is also to increase the pick 3 to $1 while on their money. Also to take a profit from any hit rather than recycling all back into the games. On a $250 straight pick 3 hit... pocketing $150 would leave enough to play the 3s and 5s at $1 for 3 weeks AND be able to get another 2 weeks back at $7

So of course the overall design is one of cheapness.

The jackpot games are a distant target, but that strategy will involve one line over whatever the window works out to be... so still super cheap, and also restricted to play on "house" money.

Will probably fail like all others, but it is the cheapest one yet!

This time I will track hits and count the number of times I had to pony up the $7 for each week over a yet to be determined test window. Perhaps I look at expense over 10 weeks and then if out of pocket was $70, then yet another system can be abandoned to the scrap heap.

Entry #419

Ideas on the jackpot versions of the scripts

As was expected, the pick 3 observation of the frequency of draws from the neutral set of numbers was present, but less pronounced when moving to the pick 5... as would be expected when moving from 1:1,000 to 1:100,000. I expect the to see the same reduction when moving to jackpot games.

When dealing with the history as sorted order columns, there can be a reduction in the set because they are non replacement. Meaning that the expectancy of possible numbers per position can be altered by the limitations imposed by sorted order. For example, in a 6/49 like the PA match 6, the expectancy for the appearance of each ball is 1/49 *100 = 2.04 %.

However, in sorted order, each column is restricted to 44 numbers... the first column will not contain 45, 46, 47, 48 or 49... the last column will not contain 1,2,3,4,5. 1/44 * 100 = 2.27%.

The range for each column will be different, so that will have to be accounted for in the code.

In this jackpot scenario, it will be expected that a good portion of the numbers will be cold because they were not drawn in their positions. This will also need to be addressed. I am thinking that the distribution frequency should be coded in such a way that numbers in range with 0 frequency will be eliminated from the classification phase. I feel this will be some tricky coding. In a pick 3, all possibilities are covered in 10 digits... not so with a jackpot game.

With a lower expectancy, more data will be needed for training. The math was easier with just 0-9 in each column. To get the pick 3 numbers a chance to be drawn 15 times, the number of draws was 150. To do the same with an expectancy of 2.27%, the required training draws would be 681. Also the window of 7 draws may need to be expanded.

Since the initial window was determined by the max number of advanced plays, the same would probably suffice here... jackpot games can be played for 26 draws.

Obviously the requirement then becomes at least 707 draws need to be available in the history file. No problem with the PA match 6 having thousands... but it may become data starved with any matrix changes.

For each type of jackpot game, a new script will need to be created, it will not be as flexible as the pick N script.

And again it may not be the magic bullet, but it might be a decent tool for loading an abbreviated wheel. It would be very interesting to see how this all holds under the non replacement structure of the jackpot games. Would be cool to see a decent amount of NNNNNN draws in the Match 6 back test run!

Entry #418

Planning for the next step

Now that I have some numbers on the frequency of the observation of interest using entire draw histories rather than samples, it is time to start figuring out how to use the collected data.

First thing is that in the production script, I will run the function twice, once with an offset of 7, and again with an offset of 0. This will give classification results for the last 7 draws, but the training set will contain 143 of the same draws. Tracking the change from the playable selection data from the last classification might be one avenue of selection.

Next, looking at the classification data held in the csv files will hopefully reveal something.

One strategy will be to isolate all of the NNN or NNNNN draws on their own sheet to look at the most frequent percentage rankings. Both long term and perhaps over the last 150 occurrences. Looking at both frequency AND range, to maybe build a profile of what area of neutral the numbers come from most.

I still want to see if the quartiles offer any clues, so I may include them in the next back test data output, with the idea that the median (Q2) and it's variance from the expectancy may point to the best pick from the neutral set.

The goal is to develop some sort of guidelines to selection NOW, while there are only 10 digits in 3 or 5 columns BEFORE trying to solve the multitude of issues that will undoubtedly arise when moving to jackpot data.

What would really be helpful would be a local LLM trained specifically in statistics to help see things I will miss.

Entry #417

The quick version of the testing resuts

Both day and evening PA pick 3 draw histories comprise of a little over 25% of the draws being classified as NNN, but the appearance of an NNN happens in over 80% of all 7 draw groups.

Day and evening PA pick 5 draws that are made up of NNNNN happen overall slightly less than 20% of all draws, but still appear in over 50% of all 7 draw groups.

There is something here, just have to determine exactly how to take the next step...

Entry #416

While looking at a mountain of data, an indicator emerges!

So the mission is to classify the distribution of numbers such that they are separated into Hot, Cold and Neutral.

The observation is that when looking at a 7 draw window, the chances are high that a draw consisting of all neutral will be present at least once. The observation also shows that some (but not many) 7 draw windows have NO NNN draws. These draw windows have a common thread! The neutral count for at least 1 column was 5! This is something I had been looking for for a long time... an indicator of when NOT to play! And it is firmly based in statistics because of the hard limit of the standard deviation in each column's distribution.

When running the program on the last 150 draws with no window offset, that is the data I would be making a selection from... one thing I wrote was a quick evaluation of the counts per column. When a neutral count of 5 might appear for any of the columns, it looks to be a better bet to skip a play for that game for that week... because the system trains and evaluates individual columns at a time, this seems to also hold on pick 5 data as well.

Naturally, going through the 2,400 iterations took some time... and the most extreme event happened exactly once where all 7 draws were NNN. But... with NNN accounting for 1/4 of all draws, the distribution across all 2,400 iterations was greater than 90%...

That means for most cases, the full combo set can be reduced from 1,000 to about 300 and still not throw out at least one winning combo in that week.

So it looks like the hypothesis can be accepted, that when considering the raw frequency of a short number of draws (150, in pick 3 AND pick 5), the group of numbers possible can be safely reduced to have at least one winner contained in that reduced set within a week.

Today I am updating all of the draw histories and running the back test on the PA pick 3 Evening again, the PA pick 3 Mid day, the PA pick 5 Evening and the PA pick 5 mid day. This will take hours, but I will have a clearer picture and hard numbers at the end.

Then the hard work begins to try and match other indicators to percent ranges to narrow down to one pick per game with the best chance of a match. Who knows how long that will take...

The cool part is I have a clear indicator of when NOT to play, so the actual attempt will be even cheaper.

When the counts of neutrals per column are all greater than 5, the most expensive week will be a $1 straight per day per game, so $28 to take a shot at the day and night variants of both the pick 3 and the pick 5. The start, however will be 0.50 straight wagers on just the pick 3 games, with a max cost of $7 per week and a shot at $250 per win... will roll in the pick 5 when on "house" money.

Getting ahead of myself again, the process of selecting the correct N number from each column is still going to be a daunting task.

But, I did prove my hypothesis and took the first step to proving random numbers over a short time tend to fill from the middle of the frequency range rather than the extremes.

From the massive output files, I plan on isolating just the NNN draws and begin searching for common threads in the frequency percentages. I still have indicators such as the actual standard deviation and the quartile distribution to use. When we move from number range that provides a safe reduction to a single pick, we move back into the "best guess" territory... and that has always been the case regardless of the system... best guess at a straight hit playing only one combo.

The tweaks to make it fit a jackpot style game will be substantial, mostly the big change is to the expectancy, which will be different for each game type. (6/49, 5/60, 5/69 and 5/70 all have different expectancies, bonus balls will need a different stand alone version of the script as their expectancies differ as well)

The pick 3 will be the first target, but I have always maintained that a win of $50,000 in a single game (such as pick 5) and I would go from standard membership here to platinum. So if you see that change, then you will know this worked!

Happy Coding!

Entry #415

Got the counting right for the back test!

Difficult project, having to count how many nnn lines are in the classification group while still maintaining the flexibility to run analysis on variable columns... now counts accurately! Both pick 3 and pick 5 tested and verified.

Ran short tests over 70 draws so I could verify the counts by looking at the generated csv files.

Almost made a mistake in good programming by using a global variable, but instead went with a return value and an accumulator in the main loop.

Probably sounds boring, and after a few hours per session for a few weeks it was... until it worked! Still love those a-ha! moments when coding.

Next up will be to find half a day to update and run the full sets for both pick 3 and pick 5 games.

Entry #414

Published odds and the permutation factor

On a simple game like the pick 3, odds are super easy to calculate... it is 10, representing each possible digit, multiplied by each other...

Such that 10x10x10 =1,000 possible combinations from 000 to 999

This holds for pick 4

10x10x10x10 = 10,000 possible combinations from 0000 to 9999

And pick 5 as well

10x10x10x10x10 = 100,000 possible combinations from 00000 to 99999

When we apply the same to the top prize on the powerball, things change a bit. In a replacement draw, any number can only be selected once in the first 5... so the raw odds calculation for the 5/69 + 1/26 looks like

69x68x67x66x65x26 = 35,064,160,560

That is over 35 Billion possible combos, but their odds are published as 1 in 292,201,338

Why?

Permutations!

Taking the raw odds calculated first, and dividing by the published odds gives you a whole number of 120.

Therefore we must consider that there are 120 ways to display a pick other than sorted order.

Let's look at another example... the cash 4 life (MUSL multi state game, called cash 4 life in PA)

We will look at the second prize (1,000 a week for life), paid out for matching 5/60)

Raw odds

60x59x58x57x56 = 655,381,440

Published odds = 1 in 7,282,016

Permutation factor (raw/published) is 90.

The odd part is that they do not always result in a whole number for a permutation factor.

Looking at the powerball second prize (pick 5 white balls from 69)

Raw odds 

69x68x67x66x65 = 1,348,621,560

Published odds = 11,688,054

Which when dividing raw vs published gives

115.384610646

Which may mean there are more factors than permutation when calculating published odds...

I wonder why?

Entry #413

In the end, EVERY system outputs a best guess.

It does not matter HOW the numbers are selected... math, dreams, RNG. the environment (PA license plates are good pick 4 sources...)... when you make a selection, you are rejecting the rest of the combos in the hope that your guess is best.

This system I am working on seems to hold around 90% for being able to reduce the combos (NOT 100%), which kind of allows me to accept the original starting hypothesis that numbers tend to fill in from the middle of their frequency distribution. The problem is in figuring out which combo to pick. Once one is chosen, all others are rejected.

For that goal, I am studying the distribution percents that fill in most frequently... but this ends up being the same guessing game because I already have seen how the first step is NOT based on the most frequent digits, but rather how random systems tend to fill from the middle. Still hoping to find some correlation to the median and which side of the expectancy to pick from, but nothing holds with enough consistency.

Anything is possible though.

Would be fun if it occasionally gets a hit, particularly on the pick 5!

Entry #412