hypersoniq's Blog

hypersoniq's Blog has 673 entries and has been viewed 438,900 times.
Lottery Post members have made 559 comments in hypersoniq's Blog.
hypersoniq is a Standard member.

March 12, 2025
3:18 pm

Another coding session...

Today I altered the follower script to both allow for a draw offset (so it can match the offset of the hot/cold script) and to limit the draws to the number entered so it can deal with shorter term trends as well. It was a bit more difficult than anticipated because the follower program is a bit more complex. Getting the function to run with a limited set of draws and make it ignore the offset when set to 0 was one of the easier parts. The code ran every time, but the offsets were wrong to start (Pandas iLoc hell) and there were remnants of an older experiment in there causing the mismatch. Since I control the input file which is cleaned and free of blanks, there was no need to keep the logic that was dealing with NaNs due to uneven column lengths. Works as expected and now reduced to 70 lines of code!

Now the fun part, to splice the function together with the follower script and look at results to see if any new information can be gained.

So now it gets complex. After many runs I know that the most frequent follower does not necessarily come up in the next draw. Using the short term follower setup, we can restrict the follower data to more recent trends. The hot and cold script will use pure statistics to classify hot and cold numbers. Putting the information together will hopefully lead to a better pick.

The process, look for recurring HNC (hot neutral cold) patterns, then cross reference them with those numbers on the follower list. If it is a recurring pattern of NNN, pick the Ns that did best on the follower lists per column... then play that combo for a week.

Because of the offset, I can see the short term data available AND it's effect against the next number of draws offset. Then, by setting the offset to 0, I can work with all the current info available to make a pick.

Difficult process that again may amount to nothing, but making the changes was enjoyable. I tried as much as possible over the last year or so to adhere to software engineering best practices of modular reusable code and atomic functions, so this is the chance to put it into practice.

Comments

Entry #388

March 11, 2025
10:12 am

Why do I just not "get it" that the lotteries are unpredictable?

It has been over 20 years chasing the impossible, generating one pick for a chance at a straight hit... pick 2 to power ball, it does not matter...

Systems, statistics, programs, spreadsheets... and in reality, the few that had a hit were coincidental at best.

So far this journey went from having ideas that I did not know how to implement to being able to write scripts and build spreadsheets to do just about everything but win on a regular basis.

I never was about wheels or trapping because I am too cheap of a player. In most cases I would play one number on one game because any more cash output and it would not be fun anymore.

Granted, I learned more about coding and problem solving with this hobby than I learned taking classes, so it was certainly not all wasted time, but...

What is it that motivates continued work towards what I am sure is an impossible problem? I have walked away for years at a time only to wind up updating severely outdated history files for another try.

Maybe it is an ego thing? I can't be too dumb to figure this out, or can I? I have gained more experience through this foolish quest in spreadsheets and now Python, so it was not ALL wasted time, I just don't get why I don't just finally admit what I want to do can't be done and move that time to a different hobby... I would probably be a better guitar player for sure.

As I write this, I am about to fire up the computer and go back to the Hot/Cold script to make sure it does not encounter a "division by zero" error, that was a checklist item. The worst part is that I can generate tons of data that I am unsure how to interpret...

I am sure any of us who try to solve the lottery problem have low moments like this... what keeps you going?

2 Comments

Entry #387

March 9, 2025
9:49 pm

The latest output of the Hot Cold Classifier

The standard deviation of column A is 0.52.
The standard deviation of column B is 0.57.
The standard deviation of column C is 0.56.

Distribution counts of 3458 draws for each column:
Value A B C
0 350 N (10.12%) 347 N (10.03%) 347 N (10.03%)
1 338 N (9.77%) 338 N (9.77%) 324 C (9.37%)
2 332 N (9.6%) 337 N (9.75%) 368 H (10.64%)
3 325 C (9.4%) 330 N (9.54%) 321 C (9.28%)
4 389 H (11.25%) 343 N (9.92%) 352 N (10.18%)
5 337 N (9.75%) 395 H (11.42%) 369 H (10.67%)
6 351 N (10.15%) 354 N (10.24%) 362 N (10.47%)
7 365 H (10.56%) 315 C (9.11%) 336 N (9.72%)
8 339 N (9.8%) 353 N (10.21%) 316 C (9.14%)
9 332 N (9.6%) 346 N (10.01%) 363 N (10.5%)

Final classifier count summary:
A: 2 H - 7 N - 1 C
B: 1 H - 8 N - 1 C
C: 2 H - 5 N - 3 C

Classifications for the last 7 draws:
6 N 6 N 2 H
2 N 1 N 6 N
9 N 4 N 0 N
5 N 2 N 3 C
3 C 3 N 6 N
2 N 8 N 5 H
1 N 5 H 7 N

The output now adds bot calculating standard deviation and displaying it for each column. I no longer set the Hot and Cold thresholds by passing arguments, but rather by direct calculation. The number of draws is interesting that in the case of a discrete uniform distribution, this is how you would obtain a confidence level of 95% with a 1% margin of error. I am not exactly sure why I chose to use the Z score to come up with that number, but this script WAS written for experimentation.

Look how low the standard deviation got as the number of draws increased... at under 100 draws, it was giving a much higher standard deviation. Here is the run for 90 draws...

The standard deviation of column A is 2.48.
The standard deviation of column B is 3.44.
The standard deviation of column C is 2.77.

Distribution counts of 90 draws for each column:
Value A B C
0 9 N (10.0%) 9 N (10.0%) 6 C (6.67%)
1 12 H (13.33%) 10 N (11.11%) 10 N (11.11%)
2 9 N (10.0%) 9 N (10.0%) 9 N (10.0%)
3 13 H (14.44%) 5 C (5.56%) 11 N (12.22%)
4 11 N (12.22%) 12 N (13.33%) 7 N (7.78%)
5 6 C (6.67%) 14 H (15.56%) 12 H (13.33%)
6 9 N (10.0%) 6 N (6.67%) 10 N (11.11%)
7 7 N (7.78%) 7 N (7.78%) 4 C (4.44%)
8 7 N (7.78%) 5 C (5.56%) 9 N (10.0%)
9 7 N (7.78%) 13 H (14.44%) 12 H (13.33%)

Final classifier count summary:
A: 2 H - 7 N - 1 C
B: 2 H - 6 N - 2 C
C: 2 H - 6 N - 2 C

Classifications for the last 7 draws:
6 N 6 N 2 N
2 N 1 N 6 N
9 N 4 N 0 C
5 C 2 N 3 N
3 H 3 C 6 N
2 N 8 C 5 H
1 H 5 H 7 C

Still trying to find that sweet spot to get just the right amount of variance...

Comments

Entry #386

March 9, 2025
10:31 am

So many decisions for one script! Thought hot/cold would be easier...

As the calculation of the standard deviation is almost complete, the need becomes apparent to create a back test, but at what intervals?

There are 2 that move to the front of the pack right away...

1. Process the data in chunks so the function is fed X+Y draws so the chunks are larger but separated by X+Y or...

2. Process the data by chunking only Y draws back. This would allow the process to be back tested and provide a full output of observed HNC patterns for the entire draw history (to X+Y from the origin draw)

The second option would be a more thorough exploration data set to find common HNC classification patterns because it only deals with data at a time that you would NOT have known beforehand, no a priori knowledge, which is ideal because moving forward you would not have that information.

This would simplify the process of getting a pick by simply counting the most frequent HNC patterns. All that would need to be recorded are the Y classifications, which would match the draw history.

A few challenges to overcome...

1. Counting the remaining chunk sizes to ensure there are X+Y draws remaining to process. A simple count of the remaining data frame rows should handle this.

2. Writing the Y classifications to a CSV file, because this requires buffering the output of each column and writing complete rows after all columns have run. This is mostly solved in the output, but I have to ensure it writes the data appropriately so that it is the same date ascending order of the original data.

3. Figuring out the HNC to play, as the classifications are mostly a one to many relationship... there could, for example, be 3 Hots in a column, so which Hot to play?

4. Refactor the input arguments. I will no longer need to specify the hot & cold threshold percents, so perhaps input the expectancy so the calculated standard deviation can be added and subtracted from it to set the thresholds.

And that is just for the pick N games... followers are a future add...

Busy Hobby!

Comments

Entry #385

March 7, 2025
8:12 am

Lottery results from the statistics point of view

Lottery results are what statisticians call a "Discrete Uniform Distribution". The premise that each value in a discrete ( limited membership, like 0 through 9) set has an equal chance of appearing makes the graph of results different from, say, a Bell Curve.

In this distribution, one standard deviation is considered statistically significant. The standard deviation is simply the square root of the variance, which is a measure of how much each data point sits away from the central tendency (the mean).

For the purpose of the difference between Hot, Cold and Neutral classification, the expectancy is each digit should appear 10% of the time in a pick N game where the set is (0,1,2,3,4,5,6,7 8,9).

In the development of the Python script, I went with a gut feeling of >= 12% to be Hot and 8% or less to be Cold, with anything in between to be classified as neutral.

After a few runs of the script, I took the distribution counts an placed them into R studio to run some simple tests on the standard deviation and found that it was usually between 2.5 and 3. I was not far off! So, it is relatively simple to calculate standard deviation at run time using the stats library or Pandas. The next update will be incorporating this functionality into the script. By doing this, I would no longer be unsure of the statistical significance of the Hots and Colds, as it would be correct for each column... 1 standard deviation above or greater for HOT and one standard deviation or below for COLD.

Here is the interesting part... when taking a sample of draws, the larger X becomes (X being the number of past draws), the more the results occurred nearer their expectancy. This would result in fewer overall Hots or Colds and more Neutrals. The bottom line is that too few draws produces volatile variance and too many produce too steady of a variance.

What does this mean? I need to find the "sweet spot" of the number of draws where Hots and Colds are produced. That yet unknown range where there is just enough volatility in the variance to have a shot at gaining actionable output.

So, no finish line yet, but progress is being made!

Comments

Entry #384

February 27, 2025
3:46 pm

The expectancy for numbers in a 6/49

The pick N games is simple, there are 10 possible digits that can be selected in each position, each having an equal chance, therefore the expectancy is that each number has a 10% chance, and therefore a 10% expectancy over any varying amount of samples. Having looked at this through several runs of the hot/cold script, this mostly holds true. While the largest group of digits between 0 and 9 hold around 10%, the hots exceed 12% and the colds are under 8%.

So what if I were to use the program for a run at the PA Match 6? That is a classic 6/49 game.

The expectancy changes!

In a 6/49 game in sorted order, each column has 44 possible numbers, and 5 that cannot appear in that column when sorted. With 44 possible numbers, the expectancy (expressed as a percent) would be (1÷44)×100 = 2.27%

If I run the program as written, it would classify every number as cold! Also, if a number were picked 0 times in the selected window, the program would crash with a "division by zero" error.

All I need to do to make the same program safe to run is a simple if statement that will only process when the count of the digit is greater than 0, otherwise output "0%". The other step would be to auto range the expectancy and calculate the hot & cold thresholds at run time, depending on the range of data in each column of the history file. This will mean more work, but ultimately more flexibility as the expectancy for the 5 white balls and the bonus ball in both MM and PB are different.

One program to do it all...

1 Comment

Entry #383

February 26, 2025
10:35 am

Always more questions than answers, but short term trends can be back tested

As the integration of a limited set of follower distributions is added to the hot/cold script, it becomes important to try and select the proper sample sizes from the whole history files...

It is one thing to calculate by counting, another entirely to learn how to use the information output... and it becomes obvious that no single system is a magic bullet.

Sticking in the core concept of frequency, the short term data of hot and cold is based on frequency, short term followers are based on frequency.

Once the code is written for initial operation (I am down to implementing the limited draw number and Y offset for the follower function) the difficult part does begin... creating a back test to see how this program plays out across the entire draw history...

I have the requisite building blocks in place, but the most important variable will be the back test offset. This will determine the number of draws to repeat the existing script at different points in time. I feel that the largest data set size, the number of short term followers to collect, will be the deciding factor. Then there is collecting this data in a csv file for further analysis. Coding will proceed anyway, but here are the burning unanswered questions left to answer....

How many draws are required to get an accurate measure of followers? 100, 1,000?

How many draws will it take to get a clear picture of repeatable hot and cold trends?

Is it possible to calculate the statistical significance of the distribution and assign the thresholds of hot and cold based on standard deviation? << actively working on this one.

I think I will be off of playing for quite some time while figuring this one out...

In other news, while calculating the "most frequent" combo for the PA mid day, it was determined to be 198, that number came out straight yesterday and of course I was NOT playing it... such is life...

Comments

Entry #382

February 23, 2025
9:20 am

Next dev task, adding followers to the hot/cold script.

This next project will simply add the entire follower function to the existing hot/cold script with the new functionality of specifying how many draws to check for followers, AND an offset, that will match Y so results can be checked. Same 0 functionality to get current playable data will apply as well.

I feel capturing follower data in a shorter term would be a useful addition, as the hot and cold data could help pinpoint the next follower.

The other nice feature is that it will work with any integer data I pass it, so that includes jackpot game histories as well... only possible because of the "per column" design pattern I implemented.

That may be the long range goal, mix in all of the old scripts and combine the information they provide.

This update will be a difficult task comparatively because the follower script is far more complex than the hot/cold script. The money saved by not playing while in development is worth the effort, even with how cheap I play!

Comments

Entry #381

February 21, 2025
11:35 am

How the hot/cold data is verified correct...

It is actually quite simple with a spreadsheet.

1. Were the last Y draws correct and in the right order? They should match exactly the last Y draws on the spreadsheet history.

2. Were the distribution counts accurate? Select X draws above the Y value and manually check the counts, they were equal.

3. Were the hot/cold thresholds applied correctly? The equation is

(D/X)*100

Where D is the distribution count of the digit in question and X is the value of X, which is the total number of draws involved in the count.

For example, in the previous post the first column picked a number that was drawn 8 times as a HOT (criteria being the number of times appearing is >= 12%) so that looks like

(8/60)*100 = 13.33333 %

Which is greater than 12% so a valid HOT by the criteria.

That is one thing that makes coding difficult, is validating that what you expected the program to do to what it is actually doing. If the program ran that means it was free of Syntax errors, but only through testing and validation can you be sure the program is free of Semantic errors... those are a prime source of "bugs" that produce unexpected and erroneous output.

I am definitely a fan of the Pandas library for Python, it is a huge time saver to use a data frame to hold data. It is definitely suited to help in the work those of us do trying to solve impossible problems by manipulating data trying to win a game that has a massive house advantage... maybe one day...

Comments

Entry #380

February 20, 2025
2:31 pm

The output of the Hot/Cold analysis script.

When running the program on the current Pennsylvania pick 3 evening data, here is the output of the current script (with the launch settings as ProfileHotCold("rawP3E.csv",60, 10, 12, 8) ...

Distribution for column A over 60 draws:
1 8 H
4 8 H
0 7 N
2 7 N
6 7 N
8 7 N
3 5 N
5 4 C
9 4 C
7 3 C

Distribution for column B over 60 draws:
9 9 H
0 8 H
4 8 H
2 6 N
5 6 N
7 6 N
1 5 N
8 5 N
6 4 C
3 3 C

Distribution for column C over 60 draws:
1 8 H
3 8 H
2 7 N
5 7 N
8 7 N
9 7 N
6 5 N
0 4 C
4 4 C
7 3 C

Final classifier count summary:
A: 2 H - 5 N - 3 C
B: 3 H - 5 N - 2 C
C: 2 H - 5 N - 3 C

Classifications for the last 10 rows (side by side):
5 C 5 N 8 N
6 N 5 N 6 N
3 N 7 N 6 N
6 N 6 C 2 N
2 N 1 N 6 N
9 C 4 H 0 C
5 C 2 N 3 H
3 N 3 C 6 N
2 N 8 N 5 N
1 H 5 N 7 C

a few things of note... this was not a script for general use, hence the lack of headers in the distribution counts, but here is how that works...

example: 1 8 H

1 is the digit, 8 is the number of times it came out (frequency, the distribution is sorted by frequency, descending), H is the classifier because it was >= 12% of the 60 draws in X (13.33% actually)

The classifier count is per column, with A being the first column of results. it is a quick visual summary of the 3 distribution columns above it.

Also notice the frequency of patterns in the last 10 draws... N being the most common overall, with 4 draws with an N N N profile and 2 with the pattern N C N... it's as if the hot ones are less of a factor.

I wrote it that if I enter a 0 for Y, it skips classifying any draws because it is the goal to have current data classified to make a pick.

Not sure what good it will do, but it was another idea turned into code, so on that front, it was already a win.

1 Comment

Entry #379

February 18, 2025
11:13 pm

First impressions of the new Hot/Cold Python script...

As I cleaned up the output to give the needed info in a readable format, I had noticed a few things...

1. Using a percentage of total draws means the digit post script (H1, H2 etc...) is not needed, as the numbers that meet either hot or cold threshold can change. For instance in one column of the evening pick 3, when running with 35 draws for X, there was a column with 0 hots and only one cold. That means the numbers tended to fall within a tight tolerance of their 10% expectations. 12% or greater for Hot and 8% or less for Cold may be too much... I made it so that I could pass those in when calling the function.

2. Running on the pick 5 mid day with 50 for X showed an all Hot draw halfway through the Y draws (10)

3. The most common classifier is Neutral, that is less than 12% or greater than 8% of expectancy. Perhaps a run should be made with 11% and 9%?

4. I wanted a raw count of H N and C for each row, so I wrote a few more lines of code to display that. This is how you can quickly see that the Ns were dominant.

5. Trends change, the H gives way to N and C the larger Y is set... this could be the way to determine ideal X range, however. I am going to run a set with X=50 and Y=50 to see what patterns emerge (if any)

6. This is only an aggregator, like odd/even or high/low... still need a way to narrow down to a single pick.

Although I was thrilled to go from a blank screen to working code in a few hours, I did invest much time over the last week or so thinking about what I wanted the program to do and planning it out.

The addition of follower counting in a shorter term might be the next add in to this... if a recurring pattern is found, it would be nice to cross reference the follower distribution... if a pattern of HHH emerges, the counting script gives the HNC numbers that can be selected from the follower script distribution lists... such that if 2 is a hot number, and 2 is not very high on the follower list, it might be the indicator needed to pick that number.

That follower script is already functional, and probably not worth integrating into this code since I can simply run both and compare the output screens side by side. I just need to use a passable offset parameter to the follower script so it does not look at ALL of the draws... using pandas iloc[] was made for such a task... probably less than 5 lines of code.

Maybe Thursday will be a good day to make an attempt at that theory... a paper play to test the concept...

Happy Coding!

Comments

Entry #378

February 18, 2025
8:53 am

Planning for edge cases when coding.

As I am getting ready to start coding the hot/cold script, I realize that after seeing these distributions before, there is a real possibility of numbers at a boundary (such as the 3rd hot number and the first neutral number) having the same frequency. For example, in X draws a 4 is drawn in a column 6 times, but a 7 is also drawn 6 times... if the 4 is H3 and the 7 is N1 then the hot/neutral designators don't really apply...

In this situation, I wonder if it would be best to "grow" the neutral zone for that situation, which would result in H1, H2, N1 ... N5, C1, C2, C3. Likewise for the barrier of N and C...

The other solution would be to calculate the percentage, such that a number needs to be greater than it's expectancy of 10% of the draws to be classified as HOT, and below it's expectancy of 10% to be counted as COLD.

Maybe H would be >= 12%, C would be <=8% and all others default to neutral... This solution could also completely cut a category if none of the hots or colds reach their respective thresholds.

This is definitely a programming life cycle thing, spend weeks planning so the relatively small amount of time spent coding has the best chance of success... programs can complete because they are free of syntax errors, but the results may not be useful if there were any semantic errors that you fail to plan for...

2 Comments

Entry #377

February 17, 2025
6:51 am

The end goal of the hot/cold script

The glaring omission in the original follower script was that there is no taking into account any trends.

I would look at the distribution list for each column and have zero clue which one in the list would be the next out.

The goal here, outside of learning about short term trends is to eventually integrate this into the follower concept, adding the Hot/Neutral/Cold to the distribution output, and perhaps use a shorter term for the follower count to also capture the more recent followers rather than looking at the entire game history.

The ultimate goal being to put together many of the ideas into one program to give the best guess possible. And sticking to column-at-a-time allows the flexibility to analyze any of the pick N games by simply pointing to a different csv file.

But one thing at a time... finally get a day off tomorrow to move this hot cold idea into executable code.

1 Comment

Entry #376

February 14, 2025
11:08 pm

Variable term hot/cold analysis script, concept notes

The script itself is nothing I have not used in other scripts, so I don't expect any protracted length of time in coding. Following the similar format of reading from a csv file... the planned work flow...

Add x and y as passed in to determine how many draws to use.

Read csv input into a pandas data frame because it is incredibly powerful to calculate offsets.

Take the last Y draws and put them into a list, in order.

Jump x +y draws (a variable I will label "depth") back and count the frequency distribution of all digits 0 to 9 for X draws.

Assign rank by requesting the distribution from the default pandas statistics functions, adding the labels to the output (H1 through C3)

Using that distribution construct, assign the Hx/Nx/Cx patterns to the last Y draws.

Perform this as a funcrion that can loop through any pick n game, like I made for followers... one script can run on pick2 to pick 5...

Print the results to the display...

The trick here will be to find out how many draws to collect the X data from, as the most likely starting candidate for Y will be 7, as it is the max advanced play on daily games.

If 30 draws is the right amount, then a simple run with x=30 and Y=7 should hopefully explain how hot and cold numbers tend to distribute in the short term... but the program.will only need a weekly run, unlike the daily requirement of the follower script.

Comments

Entry #375

February 12, 2025
2:07 pm

What about tweaking parameters?

I am considering a python script for the pick 3 type games that can take variable parameter input to adjust the settings.

Something like a scan of X number of draws to determine hot, neutral and cold, and then look out Y future draws to determine the composition from the hot/neutral/cold group.

Such that running the formula

displayComposition(30,10)

Would take the last X+Y draws (in this case 40), grab the frequency distribution of the X (30) draws, split them into hot/neutral/cold, then display the next Y draws (in this case 10) with a composition such as HNH or CCN, to help determine the composition of hot, neutral and cold numbers that were drawn.

The point of the variables is simple, I don't know the ideal number of draws to do a recent history on, so this allows for some experimentation.

Because it will be modular, it will be able to be called several times in one run with different parameters, such as

displayComposition(30,10)

displayComposition(250,20)

displayComposition(1000,7)

Since the composition would change with each change in X, we would be searching for some general guideline in the Y output, such as a higher amount of HNN draws when using X history...

I can do a great number of tasks with Python, but I am still sure I am not asking the right questions... after over 20 years of ideas, mostly in excel, I am losing motivation. Therefore I need some different avenues to explore, and one which I have neglected is the analysis of shorter term trends. Everything up to now has been done with entire game histories.

So a grouping of the top 3 hot, the middle 4 neutral and the last 3 cold seems like a fair split.

Output looking like

H1 = 7

H2 = 4

H3 = 2

N1 = 6

...

C3 =1

Would be the result of the analysis, and the output of the Y draws would look like

761 - H1, N1, C3

...

442 - H2, H2, H3

The generalization, which group it comes from, such as H, can be further refined with the digit that represents WHICH H it was, such as H1 being the hottest of the hots and C3 being the coldest of the colds.

Being able to change the number of draws out with Y can help to determine just how long the trends can extend, and also open the door to a sliding back test by partitioning the history into chunks of size X+Y.

There are still plenty of unknowns such as optimal values for X and Y, but it seems like a fitting start to begin short term trend analysis.

Comments

Entry #374