hypersoniq's Blog

hypersoniq's Blog has 673 entries and has been viewed 438,901 times.
Lottery Post members have made 559 comments in hypersoniq's Blog.
hypersoniq is a Standard member.

May 24, 2025
11:20 am

Ideas on the jackpot versions of the scripts

As was expected, the pick 3 observation of the frequency of draws from the neutral set of numbers was present, but less pronounced when moving to the pick 5... as would be expected when moving from 1:1,000 to 1:100,000. I expect the to see the same reduction when moving to jackpot games.

When dealing with the history as sorted order columns, there can be a reduction in the set because they are non replacement. Meaning that the expectancy of possible numbers per position can be altered by the limitations imposed by sorted order. For example, in a 6/49 like the PA match 6, the expectancy for the appearance of each ball is 1/49 *100 = 2.04 %.

However, in sorted order, each column is restricted to 44 numbers... the first column will not contain 45, 46, 47, 48 or 49... the last column will not contain 1,2,3,4,5. 1/44 * 100 = 2.27%.

The range for each column will be different, so that will have to be accounted for in the code.

In this jackpot scenario, it will be expected that a good portion of the numbers will be cold because they were not drawn in their positions. This will also need to be addressed. I am thinking that the distribution frequency should be coded in such a way that numbers in range with 0 frequency will be eliminated from the classification phase. I feel this will be some tricky coding. In a pick 3, all possibilities are covered in 10 digits... not so with a jackpot game.

With a lower expectancy, more data will be needed for training. The math was easier with just 0-9 in each column. To get the pick 3 numbers a chance to be drawn 15 times, the number of draws was 150. To do the same with an expectancy of 2.27%, the required training draws would be 681. Also the window of 7 draws may need to be expanded.

Since the initial window was determined by the max number of advanced plays, the same would probably suffice here... jackpot games can be played for 26 draws.

Obviously the requirement then becomes at least 707 draws need to be available in the history file. No problem with the PA match 6 having thousands... but it may become data starved with any matrix changes.

For each type of jackpot game, a new script will need to be created, it will not be as flexible as the pick N script.

And again it may not be the magic bullet, but it might be a decent tool for loading an abbreviated wheel. It would be very interesting to see how this all holds under the non replacement structure of the jackpot games. Would be cool to see a decent amount of NNNNNN draws in the Match 6 back test run!

Comments

Entry #418

May 23, 2025
10:50 am

Planning for the next step

Now that I have some numbers on the frequency of the observation of interest using entire draw histories rather than samples, it is time to start figuring out how to use the collected data.

First thing is that in the production script, I will run the function twice, once with an offset of 7, and again with an offset of 0. This will give classification results for the last 7 draws, but the training set will contain 143 of the same draws. Tracking the change from the playable selection data from the last classification might be one avenue of selection.

Next, looking at the classification data held in the csv files will hopefully reveal something.

One strategy will be to isolate all of the NNN or NNNNN draws on their own sheet to look at the most frequent percentage rankings. Both long term and perhaps over the last 150 occurrences. Looking at both frequency AND range, to maybe build a profile of what area of neutral the numbers come from most.

I still want to see if the quartiles offer any clues, so I may include them in the next back test data output, with the idea that the median (Q2) and it's variance from the expectancy may point to the best pick from the neutral set.

The goal is to develop some sort of guidelines to selection NOW, while there are only 10 digits in 3 or 5 columns BEFORE trying to solve the multitude of issues that will undoubtedly arise when moving to jackpot data.

What would really be helpful would be a local LLM trained specifically in statistics to help see things I will miss.

Comments

Entry #417

May 21, 2025
11:23 am

The quick version of the testing resuts

Both day and evening PA pick 3 draw histories comprise of a little over 25% of the draws being classified as NNN, but the appearance of an NNN happens in over 80% of all 7 draw groups.

Day and evening PA pick 5 draws that are made up of NNNNN happen overall slightly less than 20% of all draws, but still appear in over 50% of all 7 draw groups.

There is something here, just have to determine exactly how to take the next step...

Comments

Entry #416

May 20, 2025
8:21 am

While looking at a mountain of data, an indicator emerges!

So the mission is to classify the distribution of numbers such that they are separated into Hot, Cold and Neutral.

The observation is that when looking at a 7 draw window, the chances are high that a draw consisting of all neutral will be present at least once. The observation also shows that some (but not many) 7 draw windows have NO NNN draws. These draw windows have a common thread! The neutral count for at least 1 column was 5! This is something I had been looking for for a long time... an indicator of when NOT to play! And it is firmly based in statistics because of the hard limit of the standard deviation in each column's distribution.

When running the program on the last 150 draws with no window offset, that is the data I would be making a selection from... one thing I wrote was a quick evaluation of the counts per column. When a neutral count of 5 might appear for any of the columns, it looks to be a better bet to skip a play for that game for that week... because the system trains and evaluates individual columns at a time, this seems to also hold on pick 5 data as well.

Naturally, going through the 2,400 iterations took some time... and the most extreme event happened exactly once where all 7 draws were NNN. But... with NNN accounting for 1/4 of all draws, the distribution across all 2,400 iterations was greater than 90%...

That means for most cases, the full combo set can be reduced from 1,000 to about 300 and still not throw out at least one winning combo in that week.

So it looks like the hypothesis can be accepted, that when considering the raw frequency of a short number of draws (150, in pick 3 AND pick 5), the group of numbers possible can be safely reduced to have at least one winner contained in that reduced set within a week.

Today I am updating all of the draw histories and running the back test on the PA pick 3 Evening again, the PA pick 3 Mid day, the PA pick 5 Evening and the PA pick 5 mid day. This will take hours, but I will have a clearer picture and hard numbers at the end.

Then the hard work begins to try and match other indicators to percent ranges to narrow down to one pick per game with the best chance of a match. Who knows how long that will take...

The cool part is I have a clear indicator of when NOT to play, so the actual attempt will be even cheaper.

When the counts of neutrals per column are all greater than 5, the most expensive week will be a $1 straight per day per game, so $28 to take a shot at the day and night variants of both the pick 3 and the pick 5. The start, however will be 0.50 straight wagers on just the pick 3 games, with a max cost of $7 per week and a shot at $250 per win... will roll in the pick 5 when on "house" money.

Getting ahead of myself again, the process of selecting the correct N number from each column is still going to be a daunting task.

But, I did prove my hypothesis and took the first step to proving random numbers over a short time tend to fill from the middle of the frequency range rather than the extremes.

From the massive output files, I plan on isolating just the NNN draws and begin searching for common threads in the frequency percentages. I still have indicators such as the actual standard deviation and the quartile distribution to use. When we move from number range that provides a safe reduction to a single pick, we move back into the "best guess" territory... and that has always been the case regardless of the system... best guess at a straight hit playing only one combo.

The tweaks to make it fit a jackpot style game will be substantial, mostly the big change is to the expectancy, which will be different for each game type. (6/49, 5/60, 5/69 and 5/70 all have different expectancies, bonus balls will need a different stand alone version of the script as their expectancies differ as well)

The pick 3 will be the first target, but I have always maintained that a win of $50,000 in a single game (such as pick 5) and I would go from standard membership here to platinum. So if you see that change, then you will know this worked!

Happy Coding!

2 Comments

Entry #415

May 15, 2025
11:22 pm

Got the counting right for the back test!

Difficult project, having to count how many nnn lines are in the classification group while still maintaining the flexibility to run analysis on variable columns... now counts accurately! Both pick 3 and pick 5 tested and verified.

Ran short tests over 70 draws so I could verify the counts by looking at the generated csv files.

Almost made a mistake in good programming by using a global variable, but instead went with a return value and an accumulator in the main loop.

Probably sounds boring, and after a few hours per session for a few weeks it was... until it worked! Still love those a-ha! moments when coding.

Next up will be to find half a day to update and run the full sets for both pick 3 and pick 5 games.

Comments

Entry #414

May 14, 2025
11:06 am

Published odds and the permutation factor

On a simple game like the pick 3, odds are super easy to calculate... it is 10, representing each possible digit, multiplied by each other...

Such that 10x10x10 =1,000 possible combinations from 000 to 999

This holds for pick 4

10x10x10x10 = 10,000 possible combinations from 0000 to 9999

And pick 5 as well

10x10x10x10x10 = 100,000 possible combinations from 00000 to 99999

When we apply the same to the top prize on the powerball, things change a bit. In a replacement draw, any number can only be selected once in the first 5... so the raw odds calculation for the 5/69 + 1/26 looks like

69x68x67x66x65x26 = 35,064,160,560

That is over 35 Billion possible combos, but their odds are published as 1 in 292,201,338

Why?

Permutations!

Taking the raw odds calculated first, and dividing by the published odds gives you a whole number of 120.

Therefore we must consider that there are 120 ways to display a pick other than sorted order.

Let's look at another example... the cash 4 life (MUSL multi state game, called cash 4 life in PA)

We will look at the second prize (1,000 a week for life), paid out for matching 5/60)

Raw odds

60x59x58x57x56 = 655,381,440

Published odds = 1 in 7,282,016

Permutation factor (raw/published) is 90.

The odd part is that they do not always result in a whole number for a permutation factor.

Looking at the powerball second prize (pick 5 white balls from 69)

Raw odds

69x68x67x66x65 = 1,348,621,560

Published odds = 11,688,054

Which when dividing raw vs published gives

115.384610646

Which may mean there are more factors than permutation when calculating published odds...

I wonder why?

Comments

Entry #413

May 11, 2025
9:40 am

In the end, EVERY system outputs a best guess.

It does not matter HOW the numbers are selected... math, dreams, RNG. the environment (PA license plates are good pick 4 sources...)... when you make a selection, you are rejecting the rest of the combos in the hope that your guess is best.

This system I am working on seems to hold around 90% for being able to reduce the combos (NOT 100%), which kind of allows me to accept the original starting hypothesis that numbers tend to fill in from the middle of their frequency distribution. The problem is in figuring out which combo to pick. Once one is chosen, all others are rejected.

For that goal, I am studying the distribution percents that fill in most frequently... but this ends up being the same guessing game because I already have seen how the first step is NOT based on the most frequent digits, but rather how random systems tend to fill from the middle. Still hoping to find some correlation to the median and which side of the expectancy to pick from, but nothing holds with enough consistency.

Anything is possible though.

Would be fun if it occasionally gets a hit, particularly on the pick 5!

Comments

Entry #412

May 6, 2025
2:19 pm

Preliminary classification data from PA Pick3 Evening

So far, there were over 4,200 draws classified as Neutral Neutral Neutral... that is out of 16,800 draws placed into 2,400 groups of 7 classified based on the previous 150 draws.... roughly 1/4 of all draws... all hot HHH represented less than 50 draws in 16,800... all cold less than 60 in 16,800. The runners up had 1,000 to 1,200 draws and were mixes of HNN and CNN.

Ran into one 7 draw group where ALL 7 were NNN draws... no repeat combos!

Still doing a deep dive and will have a percentage by the weekend... looking interesting so far. The longest stretch with no NNNs was 3 groups... 21 draws.

What does it mean? Still working on that...

1 Comment

Entry #411

May 5, 2025
2:20 pm

Run time was less than 2 hours!

I ran the time test on the first 70 draws, therefore it needed to load a ton of draws to get to that point, but each pass reduced the offset by 7 draws, so as the pool got smaller, the code ran faster!

I have reviewed the overall file and it does contain 2,400 groups of 7 profiled draws... it only works when there is 7 or more remaining draws... that profiled all but 152 in draw history!

The entire 19,000 row result file will need to be gone through manually, but there is some interesting observations already... the level of repeatability is NOT going to be 100%, as a few groups had 0 NNN draws. These were few and far between, but they existed. I figured they would... I thought I would see more bad ones actually.

I went in looking to see about 70% repeatability of the core observation. Only went through the first few dozen and it already looks like closer to the 80%-90% range. And, there are still segments of 7 where multiple NNN draws have happened... looking to see if multiples are the indicator of a bust segment with 0...

The code ran flawlessly, and in much less time than guessed. Pick 3 mid day with almost 50% less draws should run in an hour or less... same with the pick 5 games. But first this file needs to be studied for a few days to get a feel for the data and it's peculiarities.

This was a few weeks of planning and preparing to run this file, so that is a win!

Now, do I need to repeat this to see if there is a better day of the week to start? Probably not right now anyway... the idea is to run the last 150 draws, which will NOT have anything to classify, and just go off of that data for a week.

Comments

Entry #410

May 5, 2025
11:22 am

Coding can be fun.. honest!

Backtest script is underway for the PA pick 3 eve... the entire history! Wil take a bit over 4 hours to run.

I spent the better part of this morning making sure the csv file was writing correctly, and it did, then I needed a blank row between the classifier output, that was easy as well.

As a test, I wrote the code to process the last 70 draws, 7 at a time. This uses a rolling 150 draws before the classification. All 10 had at least 1 NNN draw! This matches observation.

Here is the fun part. Just to test the multi column functionality, I ran the same exact test on the evening pick 5... I was expecting to see worse results, and that kind of panned out, however... 6 of 10 groups had AT LEAST one NNNNN draw! That changes the approach to the pick 5! That means there can be the hope of catching an NNNNN draw within a 2 week window (rather than one for the pick 3).

When the csv file completes, I will be able to go through one pass and count the groups plus count the groups containing an NNN draw. Total NNN groups / Total Groups = how repeatable the observation is, as a percent.

Then... I can use the same exact file to look for any commonality in the percentages.

And I will be able to do this for the pick 3 day, as well as the pick 5 games...

So, part code, part spreadsheet.

The first NNNNN draw I saw was using the following neutral count per column... 7,7,7,7,8.

That means the winning combo was in a possible field of 19,208 combos... 80,792 LESS combos than the raw 100,000... and the groups without a NNNNN combo were NOT consecutive...

This is a big deal for me, as I spent the last 20+ years using back tests to see that systems did NOT work over any lengthy run...

It is only the first step, but it will prove one thing... there is somewhat of an ORDER to CHAOS...

Even if I have a hard time figuring out the next step, this one was huge! A blanket reduction that has a guaranteed 7 draw window... SAFE eliminations!

Just imagine what this could mean if it holds somewhat similar on jackpot games... power loading an abbreviated wheel... safely eliminating millions of combos... don't want to get too far ahead of development though...

Short term targets are the pick 3 followed by the pick 5.

Longer term, PA Match 6, Cash 4 Life and the big games PB and MM (since they did not change the white ball matrix)

Best part is it will be cheap to play... pick 3 .50 straight and pick 5 $1 straight... $21 a week for all 4 games. However, the pick 5 will not be played without a pick 3 win to fund it... gotta stay cheap!

Comments

Entry #409

April 29, 2025
9:29 am

The plan for the big back test.

After some thought, I think I found a way to have the backtest start at the farthest point in history and move forward, so each group of 7 classified draws will match up exactly with the draw history.

It ends up as simple as starting with a high offset and subtracting 7 rows from the offset for each re run of the function, this will yield one large csv file with the majority of draw history being classified based on the previous 100 draws. This will allow calculating a confidence interval on the appearance of the NNN draws at least once every 7 draws.

What I have observed with random sampling is a confidence level of maybe 95% for the mid day pick 3 and 100% for the evening pick 3. The evening is still a mechanical ball draw with the mid day being all PRNG.

As stated earlier, the novelty of the approach so far is being able to almost guarantee a reduction in numbers to pick from to contain at least one draw that comes from the neutral set, after eliminating the hots and colds. So instead of trying to pick 1 combo in 1,000 it would allow me to pick 1 combo from around 300.

This is the base setup for the system... safe reduction... it does not hold for one draw, or two... but it holds for 7. That 7 was found by experimentation. This number may increase when moving to the pick 5, but that game, if the cycle remains 7 games, could narrow the field of selection down from 100,000 to 30,000 or less... that will not be known until this phase is complete.

Phase 2 will also be aided by the back test as percentages from the NNN draws can be analyzed. Here patterns will also be sought. If I can see that the majority of NNNs are based on a specific percentage, that is the key for picking 1 combo from the hundreds of possibilities.

The goal, as with any system I have ever devised, is one best guess... no wheels or trapping, just one pick. Because I take the column at a time approach, there are no interdependencies in positions, therefore the code scales automatically based on the input file. Going from pick 3 to pick 5 simply involves changing the input file name.

Ultimately jackpot style games will require a major rewrite because expectancies and number pools change. Wouldn't it be interesting though if a duration cycle can be found where these jackpots produce NNNNNN or NNNNN + N hits and ~70% of those combos could be eliminated?

1 Comment

Entry #408

April 22, 2025
8:21 pm

Interesting exchange with Chad G. Petey provides some direction...

After having a discussion about what I had noticed, I have been given a next step... verify the results...

I interpret that to mean how many times a draw history can be back tested to see if those neutral numbers make up one in EVERY seven draws, or if there are gaps... i am working on this now. It was quick to offer assistance, but I made it this far coding solo, so I passed on that.

After the code is finished, set it up to run on ALL AVAILABLE DATA SETS... pick 2 through pick 5 is what I am hearing.

I will start from the top of the list (first 107 draws, starting in 1977) for the pick 3 eve, and follow up with all of the others.

The reason it gave to run such thorough tests is to see if I can develop a confidence level. Such that when using the classification boundaries I set, is the scenario true 100%? 95%? 90%?

7 draws is the observed range for the pick 3 evening, but this can vary, such as the pick 3 day being 8 draws... and this can totally change when adding or subtracting columns.

For whatever reason, GPT was "excited" by the discovery, appreciating the set up and reasoning behind the tests I ran so far...up to saying if the observations hold, it might be worth it to consider publishing the results... I think Chad may have still been dizzy from 4/20...

In the end, it mostly brought up ideas I had already considered, but the full back test was something I can work out and accomplish, so not a total waste of time.

1 Comment

Entry #407

April 18, 2025
10:21 pm

Now may be the time to turn to AI to discuss interpreting output...

I had to come up with an idea, I did. (Too many to list over the past 20+ years)

I had to write the code to implement that idea, because that is the only way to truly understand what is being done, also did that.

I am not having much luck in interpreting the code output past spotting the 7 day cycle of NNN classified draws...

Here might be a good time to see what insight Chat GPT could offer...

I can generate data, I can understand the significance, but I am missing the crucial next step of taking that output and turning it into actionable intel.

For added complexity, the word "lottery" will be avoided at all costs... iterations instead of draws, values instead of ball numbers, etc...

This should be a fun exercise, because it is a discussion that will not have anything to do with code, and will instead focus on statistics and truly understanding what the data is trying to say.

Worst case scenario, it is another dead end, best case... new leads to follow!

To the future...

Comments

Entry #406

April 16, 2025
3:18 pm

What I am looking for, simplified...

What I have... a script that can look at short term draws and classify them.as hot, cold and neutral. Why? Because eliminating the hots and colds shows that you can reduce the numbers to put into a combo by around 70% and still see a winner from that group within 7 draws.

I created the same exact script, only using follower data (what numbers follow the last draw) instead of just counting how many times each number was drawn.

As a first step in reduction, I have tested it at many places in history and can see that reducing the set of combos from 1,000 to about 300 is a good first step, but where to go next?

THAT is what I am looking for! How to go from 300 possibilities down to 1.

I thought that I could find some correlation in the hots, colds and neutrals between both functions, but they don't hold over multiple tests like the first step does. Either system reduces the picks and has winners in those picks in the short term 7 draw window, but the raw frequency does better and sometimes has up to 4 draws within that group made up of the numbers between hot and cold.

If I cut the high and low neutrals, then I eliminate hits in the middle of the neutral range. If I cut the middle range, then I miss matches in the median area... it is still just as random in the middle as it is using all possible numbers. And still too expensive to play.

Outside of random selection, I am out of ideas to further move towards shrinking that set to a playable size.

Comments

Entry #405

April 15, 2025
12:16 pm

Not finding correlations yet...

I am not surprised that there were no apparent correlations between the raw frequency and the follower frequency. Though classification is still about 70% neutral, when looking at the same draw between the two functions, they are rarely lining up as NNN on both.

Now it is time to grab some paper and start rearranging the data and looking for clues. Could a spreadsheet do the same? Sure, but there is something about actually writing stuff down that helps connect to the data. If something looks useful during the manual process, it is most likely simple to automate, but that part is for later.

With the script, I am passing in variables, so it is worth tweaking those, and much easier since I only have to change a variable to call the functions with a new setup. The exciting find of a "scaling factor" to control how much of the standard deviation is used will be helpful in "tuning" the classifiers between the 2 functions.

This factor I am calling neutral_bandwidth, as it scales the classification thresholds by multiplying the standard deviation by a scaling factor. If st_dev = 2, then setting this variable to 0.7 effectively reduces the st_dev to 1.4. By the same token, using 1.3 as a bandwidth factor increases the same st_dev to 2.6

I found this necessary, because even though the numbers have the same chance of being drawn with both functions, they don't seem to distribute in exactly the same way.

This part may take some time...

1 Comment

Entry #404