hypersoniq's Blog

Now may be the time to turn to AI to discuss interpreting output...

I had to come up with an idea, I did. (Too many to list over the past 20+ years)

I had to write the code to implement that idea, because that is the only way to truly understand what is being done, also did that.

I am not having much luck in interpreting the code output past spotting the 7 day cycle of NNN classified draws...

Here might be a good time to see what insight Chat GPT could offer...

I can generate data, I can understand the significance, but I am missing the crucial next step of taking that output and turning it into actionable intel.

For added complexity, the word "lottery" will be avoided at all costs... iterations instead of draws, values instead of ball numbers, etc...

This should be a fun exercise, because it is a discussion that will not have anything to do with code, and will instead focus on statistics and truly understanding what the data is trying to say. 

Worst case scenario, it is another dead end, best case... new leads to follow!

To the future...

Entry #406

What I am looking for, simplified...

What I have... a script that can look at short term draws and classify them.as hot, cold and neutral. Why? Because eliminating the hots and colds shows that you can reduce the numbers to put into a combo by around 70% and still see a winner from that group within 7 draws.

I created the same exact script, only using follower data (what numbers follow the last draw) instead of just counting how many times each number was drawn.

As a first step in reduction, I have tested it at many places in history and can see that reducing the set of combos from 1,000 to about 300 is a good first step, but where to go next?

THAT is what I am looking for! How to go from 300 possibilities down to 1.

I thought that I could find some correlation in the hots, colds and neutrals between both functions, but they don't hold over multiple tests like the first step does. Either system reduces the picks and has winners in those picks in the short term 7 draw window, but the raw frequency does better and sometimes has up to 4 draws within that group made up of the numbers between hot and cold.

If I cut the high and low neutrals, then I eliminate hits in the middle of the neutral range. If I cut the middle range, then I miss matches in the median area... it is still just as random in the middle as it is using all possible numbers. And still too expensive to play.

Outside of random selection, I am out of ideas to further move towards shrinking that set to a playable size.

Entry #405

Not finding correlations yet...

I am not surprised that there were no apparent correlations between the raw frequency and the follower frequency. Though classification is still about 70% neutral, when looking at the same draw between the two functions, they are rarely lining up as NNN on both.

Now it is time to grab some paper and start rearranging the data and looking for clues. Could a spreadsheet do the same? Sure, but there is something about actually writing stuff down that helps connect to the data. If something looks useful during the manual process, it is most likely simple to automate, but that part is for later.

With the script, I am passing in variables, so it is worth tweaking those, and much easier since I only have to change a variable to call the functions with a new setup. The exciting find of a "scaling factor" to control how much of the standard deviation is used will be helpful in "tuning" the classifiers between the 2 functions.

This factor I am calling neutral_bandwidth, as it scales the classification thresholds by multiplying the standard deviation by a scaling factor. If st_dev = 2, then setting this variable to 0.7 effectively reduces the st_dev to 1.4. By the same token, using 1.3 as a bandwidth factor increases the same st_dev to 2.6

I found this necessary, because even though the numbers have the same chance of being drawn with both functions, they don't seem to distribute in exactly the same way.

This part may take some time...

Entry #404

Adding a simple statistic, interesting findings between methods...

It looks like the neutrals in the raw frequency do not necessarily line up with the neutrals in follower frequency.

I am specifically looking at the case where the raw frequency of a draw is classified as NNN. Sometimes the follower frequency is classified as NNN, but also there can be colds and hots mixed in for the same classified draw.

I already count the number of followers for each column, so the simple add will be getting the percentage of the number of followers to the number of follower samples. If I have 1500 samples and the followers of the last draw first position was 150, then that would be the expected 10%. Maybe the variance in this number might help to decide which classification of follower might end up matching the neutral classification in the raw frequency...

Trying to build guidelines so that there is less guesswork in making a pick.

I am wondering if there is some other way to apply a frequency count so that a third function can be written to sort of "triangulate" the whole thing. Free styling at this point.

Also, I have to cook up a back test routine and capture the data to get a read on long term trends with the follower frequencies matching up to raw frequency data. I have the rough outline... 

1. Set the offset to 7, capture that classifier data to a .csv file. In this way, I have something that exactly matches the draw history

2. Increment the offset by 7, repeat, but only capture the first 7 in the classifier data.

3. Repeat this until the 1500 draws in the follower frequency run out of room at the start of draw history.

4. Isolate the classifiers in each and put into a spreadsheet where the classifier combo for each draw can be concatenated... such as NNN and NCH would be written as NNNNCH.

5. Find the most frequent combined overall classifier in the resultant list.

The hope is to have a fair chance at making one good guess by picking from the neutral list of raw frequency and matching it to the right set from the followers. Hopefully this time I am asking the right questions...

Getting bored of not playing anything so I am probably going back to a QP on the PA match 6 until I am ready for live tests. Cost is the same at $14/week.

I will be making a new version for writing output, because V3 works and I don't intend on losing it... so V4 will be the back test version. Important that I kept V2, because it allowed me to verify the process in V3.

Before the Chat GPT era existed, we used to troubleshoot with stack overflow, that is where I found my issue with Pandas series and how to resolve it. Old school!

Entry #403

A breakthrough! Script works.

Now I have the data I was looking for from the start. Learned a bit about getting expected output from the followers... even though there was 1,500 draws, the data of followers was limited to the expectancy, so the formula was not to calculate by the sample size, but rather the sample size divided by 10.

I also learned that while direct standard deviation works for raw frequency, it needs to be constricted a bit to produce similar results... so that was as easy as st_dev x 0.8

Where I went wrong was in the type... rather than using Pandas series for collecting the data to output, I just needed a list! That problem plagued me for days.

To validate the data, I compared the output of the previous version to make sure the counts agreed, and they did.

I may not be any closer to the winner's circle, but I do enjoy being able to take an idea and completely realize it in code from a clean sheet. Even the bug squishing was enjoyable ultimately because it is an exercise in real world problem solving, and there are no guides or documents for lottery analysis software, you are on your own in this domain.

Now for the really hard part, finding correlations in the output. This will take far longer than writing the script, and there may not be a solution, but at least I have some busy work to keep engaged with the hobby.

Entry #402

Stuck at classifying followers.

Strange error that I am trying to solve. When I put in a diagnostic line to determine the data type of the output, both functions return type as Pandas.series

This was as simple as 

print(type(distribution_list))

Here is the problem... in one function it works, in the other function it tells me the list is unhashable and I should probably change type to a tuple... but that is wrong. Both lists contain similar data from similar calculations...

It SHOULD work, but does not.

One thing I had noticed while looking at the output of the follower numbers that coincided with the neutrals of the other function is that there are more followers considered Cold matching with raw frequency numbers classified as neutral... could that be the correlation? In the raw frequency neutral set AND in the follower frequency cold set?

I will not know until I solve the problem. Been frustrating at times, but this is really the "fun" part of coding, solving problems!

Worst case scenario, post the function to Chat GPT and have it find where I went wrong. You would be surprised at how much better LLMs work at coding well if you give them sample code. However... that is how coders get lazy. I try DIY first. I even try to keep libraries to a minimum if there is a way to code with straight Python. I started using pandas because it solves many problems out of the box with less code and high utility vs. the huge amount of code to do the same thing. For me, I could not imagine NOT putting lottery data into a Pandas data frame.

I thought the conversion would be easy because I am doing the same thing with the same data type... nothing is ever easy...

Entry #401

The thought behind choosing draw samples

There is one fact of the discrete uniform distribution formed by lottery results in a pick 3 (or 2/4/5), that is that each number has a 10% chance of being selected.

150 draws is the sample size I use for classification of raw frequency data. Why? Because given the fact above, each number has a chance to be picked 15 times in each position. The expectancy is each digit CAN be drawn 15 times. The variance can be easily seen by how much above or below 15 any digit has been drawn.

1500 is the sample size for follower frequency data. Why? Because followers are only based on the last draw, at that number each digit still has a chance of being selected 15 times. The variance is still easy to spot.

Since each set, though counting a different metric, is expected to show 15 draws per number, the comparison holds at 15 on the same 10% expectancy.

If I wanted to observe more or less, the variables would have to be adjusted. For 10 expected appearances they would be set to 100 and 1,000 and for 30 they would be set to 300 and 3,000.

The number of classification draws is set to see recurring patterns in as short a period as possible. In the pick 3 evening draw in PA, that works out to 7 draws, which also coincides with how many draws they allow for advanced play.

In order to back test, I simply increase this variable, but only consider the first 7 classification results. This is the key to automation should I want to profile the entire draw history.

The mid day game has an observed period of 8 draws, while most NNN draws are found within 7, it does not always hold like the evening draw.

Modifying the follower function to output similar to the raw frequency function is nearly done.

Since neutrals are usually 70% of the observed frequency, the reduction in possible pick 3 combos is 7x7x7= 343, thereby safely eliminating 657 straight combos... if extended to the pick 5, using the average of 70%, 7x7x7x7x7 = 16,807, eliminating 83,198 of the possible 100,000 combos... but is it done safely?

It depends on the NNNNN period in the classification... if it is longer than 7 to 14 days, then not so much. I am not in full development of the pick 5 application of this script, only focusing on the pick 3 until something works.

Up next is finishing the modification of the follower function to give similar output as the raw function. Already have the standard deviation and quartile output.

Then the real fun begins, finding ANY correlations between the 2 data sets... with the full knowledge that there may not be any at all.

Entry #400

Coding update.

In the raw frequency function, quartiles were literally 2 lines of code! Thanks to the statistics library, you call it as a quantile on your list, and use N=4, this gives you 25%(Q1), 50%(Q2, Median) and 75%(Q3) all on the same line. The second line was added so I could round the output to 2 decimal places for readability.

That was the easy part... changing the follower function to allow classification of the output to match the functionality of the raw frequency data is proving much harder... because  counting followers is a 2 step process where you first get a list to show every number that followed the last draw, THEN you have to count that list. If I try to make it a one shot function, it fails when validating the data. So basically the current project is adding the classification part after the second part completes but before it fetches the next list.... getting there slowly, but surely.

Looking at the two sets of data together has no immediate correlation presenting itself, though adding quartiles is showing that the neutral numbers drawn seem to sit close to one of the 3 quartiles, more so than the standard deviation. SD is the classifier, but the quartiles are where I feel some consistency will be found.

One funny error was that I accidentally hooked the quartile calculation to the list of digits rather than the frequency... the output there is ALWAYS the same... 1.5, 4.5 and 7.5 when your list is 0 to 9. Fortunately I recognized exactly where I messed up after I saw the output

Entry #399

Simple version control

As I begin coding, I start with the V1 working script and make a copy of it (save as whatever_V2)

I also add to the top comments what I am implementing, sort of a checklist. Rather than erase the comment, I simply change "implementing" to "implemented".

This helps tell future forgetful me what present forgetful me was intending.

I have some scripts in double digit versions, but I always know where I was in the development process.

I do something similar with all of the spreadsheets that hold the game history as well. There is the current spreadsheet, and there are also older versions that were analyzing one thing or another. The master data is actually stored in CSV files so they can be easily transferred into any new sheets AND read by Python scripts.

Also, I back up everything. Once at the end of every coding session, including draw history updates, to a second internal hard drive, and separately on a monthly basis to a thumb drive.

Before I started up again, I did go through the painful process of losing everything when the old laptop died. Mostly power ball stuff and a ton of Vtrac sheets.

The version control is super important because you always have a working version of a script as you try to implement new features.

Another way to do this is with free version control software like git, but for the simplicity of what I do, that seems like overkill.

Entry #398

Moving forward with coding changes.

Regardless of if the mid day PA pick 3 catches an NNN on it's last draw tomorrow, moving forward with proposed changes. The only difference with regard to tomorrow's result is whether or not I ditch the mid day game and move forward only on evening draws. NNN in 7 draws was observed on the evening data set every time. The day game was spot checked and HAD shown similar result. Remember, the day game is RNG and the night ball draw... there may be a difference here worth considering. The code base is super flexible with input files, so processing 2 games or 1 is simply done in seconds. Cutting days cuts the play cost in half, so tomorrow's mid day result will tell. To be fair, MOST observed weeks in back testing mid day had the NNN fall later in the list than the evening game, with last week (before the test) arriving on day 7.

Still feel "onto" something here... blocking out 4 hours of coding and testing on my next day off (Tuesday).

Entry #397

A search for correlations to raw frequency and follower frequency...

Spawning an idea... process followers the same exact way... via classification! Same rules, +/- one standard deviation around the 10% expectancy... classify via Hot Cold and Neutral the same way... 

Then...

Record the intersections of the resultant neutral sets ! 

Not as safe of a reduction, but it might help identify a correlation between the raw frequency and follower frequency. Would need to increase the follower set to 1,500 because holding the followers to their theoretical 10% expectancy will make 150 raw draws perfectly mesh with 1,500 follower draws... apples to apples.

If there is no direct correlation in frequency, the data should show that also. Then I will know... one way or the other.

This will take some time to implement, so there may be some "radio silence" after the test is done. The follower set may not be as uniform... but it should be, in theory anyway.

Entry #396

Midway through the 7 day test, 50% success.

As 5-0-1 was drawn in last night's PA Pick3 Evening game, that number was indeed a NNN draw. (All from the neutral set, which is what remains when eliminating the Hot and Cold numbers).

Still looking for that NNN combo to be drawn in the day game, but there are still 4 draws remaining.

There were 7 numbers in each column in the neutral set for evening, while the day draw has 6 in each.

IF the test matches literally every back test observation made, then this is the first and only "safe" elimination I have found in over 2 decades. It will prove that over a span of draws, the NNN combination WILL appear at least once. This is the very first step, as any subsequent number eliminations introduce the chance of throwing out the winners.

Since I am not playing until I can narrow down to one pick, I am going to first freeze the data by not updating the draw sheets all week, and seeing what further eliminations would be a good idea. I am also still searching for ANY correlations between the hot/cold data and the follower data.

For the next test I will be looking to eliminate the high and low neutral from each set... not the high and low numbers, but the high and low frequencies. This will also be back tested at several points in draw history to see how many of the NNNs become eliminated by such a reduction.

Every single system before this one was based on the full history set. The systems would either produce a hit in the first month and then not again for over a year, OR I would get bored of how far off they were and give up after only a few days.

This time is different, because the pattern NNN cycles every 7 draws (or less!) Which is useful as the base of any further system development. If you could start of number selection with a group of numbers that produce a straight hit every 7 days,  while safely eliminating between 60% and 70% of the 1,000 possible straight combos, then why not move forward with that?

And the best part is, it is pure statistics that got me to see this pattern that occurs so frequently. It was counting the frequency of drawn digits over a relatively short draw span (150). One standard deviation (calculated on the fly by the Python script) in either direction of the 10% expectancy of each number's frequency yields a set that will produce a straight hit within 7 draws. There is no gray area, and this was made possible by classification. The H N and C are classifiers of frequency.

It is going to be difficult to pick just one from the rather large list of resultant combos (may be over 300 in some cases), but it is better than starting out with all 1,000 combos. This is where I will need to go next, to find what other systems can be fed this list of numbers in the range and help narrow down further.

Maybe there will be a correlation of the frequencies and their distance from the expectancy, or maybe from the Median, which SHOULD be 10% in a truly random discrete uniform distribution. Let's not forget that both the pick 3 day and pick 3 evening full data sets fail the chai squared test for true randomness (but not by much).

The betting strategies are also such that even playing the number at a .50 straight/.50 box could allow play of up to 5 picks and still have the potential to pay for itself with the possibility of a decent profit on a straight hit.

As alway, this will be worked out in the pick 3 before any attempt is made in the pick 5. The NNNNN does not appear as frequently in that game as the similar NNN in pick 3, though it is by far the most represented classification.

Eliminating numbers is part of the game. When you make just one pick, you are rejecting the other 999 combinations anyway, I am just trying to do that with a bit more purpose.

Entry #395

A 7 day test of the hot and cold classifier.

As I start out trying to turn my most recent code into playable picks (far from that yet), I have made many observations at many different points in the draw history. I have noticed that within 7 draws, you will see at least 1 drawing with numbers classified as neutral in each position.

As a hypothesis, I am stating that based on observation, a Neutral Neutral Neutral drawing will happen at least once every 7 draws. I have to prove that... this is what I will hope to do over the next 7 days. I am only attempting to prove the very first part of a multi step process that is still being worked out... so you are where I am. The following numbers were generated and classified using a draw history size of 150, and an offset of 0, which means it is the data that would be available when making selections.

The Tests

Mid Day Pennsylvania Pick 3 Neutrals

Column A = 0, 3, 4, 5, 7, 8

Column B = 2, 3, 4, 5, 8, 9

Column C = 2, 3, 5, 6, 8, 9

 

Evening Pennsylvania Pick 3 Neutrals

Column A = 0, 2, 4, 5, 6, 7, 8

Column B = 0, 4, 5, 6, 7, 8, 9

Column C = 1, 2, 5, 6, 7, 8, 9

The assertion (based on observation) is that we will see a straight hit from the numbers above at least once over the next 7 days, on BOTH the day and the night numbers.

Why this is important... yeah, it seems like one of those things that produces too many picks to play, BUT this is the start of the selection process. IF it holds true here is what the data tells us... hots and colds that exist above or below one standard deviation from the 10% expectancy can be SAFELY ELIMINATED!!!!!

For the mid day, there are 6 neutrals in each column (6x6x6= 216) and we have effectively eliminated 784 of the 1,000 combos!

Since the evening uses different draw data (7x7x7 = 343) we end up effectively eliminating 657 of the 1,000 possible combos (10x10x10 = 1,000)

Of course, any further steps WILL potentially eliminate the winning combos, but we have to start somewhere, and this is the only potential SAFE elimination.

So, over the next 7 draws of each, we will be looking for at least one straight hit from the mid day data AND at least one straight hit from the night game. That is the only success scenario. If either set fails to produce a winner, then it is back to the drawing board. After 20+ years of this I am getting tired of that trip!

Let's see where it goes from here...

NOTE: I am not playing anything here, this test is just to prove that I got the first part right. 200 to 300 combos is still far too many to get a return on.

Entry #394

How quartiles will be used.

While the dividing of a set of numbers in statistics is known as quantiles, for example, dividing into 100 quantiles is the same as a percentile, we are using the special case of a quartile, as our observed draw frequency list will be divided into 4 parts, giving 3 results.

The first quartile exists at the 25% mark.

The second quartile, also known as the median, exists at the 50% mark.

The third quartile sits at the 75% mark.

This data is not looking at drawn numbers, only the frequencies of those numbers (how many times were they drawn in the selected date range). These should be easy to calculate, as they are built into the available statistics functions in the Pandas library.

Of course due diligence will require me to enter the frequency data into R Studio to compare, ensuring accurate data. Once we can trust the results, the hope is to incorporate all of it together.

Remember, any time you choose to eliminate data, you are probably throwing out the winning combo, that is just the nature of these games.

The odds of winning are still 1:1000, but you have to start somewhere.

When eliminating hot and cold, for example, in an output per column of the neutral numbers might look like 7N, 7N, 6N. This means that you can calculate the combos by multiplying these numbers. So 7x7x6 = 294.

The assertion (based on observation) so far is that within 7 draws of the calculated data, from one to four combos will come from this reduced set. NNN is the only regular pattern to be seen.

Further reduction may be had by either eliminating the high and low frequency neutrals... that could look like 5x5x4=100 combos, but that may eliminate the winning combo. Part of the observation has been that of the NNN combos observed, they tend to sit between the first and third quartiles... it is "filling in" from the middle!

If we could use this info to eliminate 70% to 90% of the combos, a paper play test might be in order. And we may as well post that here so anyone watching can get an idea of how systems are tested BEFORE making any actual bets. I will start posting data sets so we can follow along as soon as I have the quartiles set up in the code, so most likely tomorrow.

Entry #393

Making sense of the few clues in the data.

As would be expected, the data generated from the combined follower and hot/cold scripts takes some time to study. Leaving the ability to specify the number of draws being studied and the classifier offset so the follower data can also be analyzed from the same end draw has been a big help in such topics as observing follower delay, or how many draws until the followers hit.

One thing that pops out is the need to code in some more statistics. I moved the output to R Studio to gain more info, but that is a process. Turns out there is a Python library that allows access to R calculations! If you have ever used R, you know the commands are powerful.  I can enter the draw frequencies as a set and simply run 'summary(set)' to gain information such as the quartiles and the mean(average).

Here was the exciting discovery highlights so far...

1. There are 27 total draw profiles that can be observed using H, C and N... by FAR the one that occurs the most is NNN. 

2. No matter where in history I took a sample, there is from 1 to 4 NNN draws in the first 7 that follow! Why does this matter? Imagine the choices being pared down by ELIMINATING Hot and cold numbers... if you need to pick from all, there are 10x10x10 = 1,000 possible combos... a sample that happens quite frequently is one or two hots and colds per column, leaving a neutral count such as 7,7,6... 7x7x6 = 294 !!! This part eliminates almost 70% of the combos! ONE to FOUR of combos in that set will appear in the next 7 draws.

3. Further eliminations may be possible by removing from the choices those neutrals that exist outside of the first and third quartiles! All of the NNN combos observed so far fell within that range. Imagine your choices now get lowered to 4x4x4 = 64 or something similar... that is a reduction of over 90% from the original 1,000.

This is the next phase... using the follower data to further focus on the one best guess. The higher the position on the follower list may be the piece of the puzzle that helps zero in on the one best guess that matches the NNN criteria with the best performing follower number that also lands between Q1 and Q3.

A guarantee? There is no such thing in these games. A better chance to catch a pick 3 straight more than once a year? Maybe. Still a cheap system, total cost if done at .50 st/ .50 box is only $14 for a week... there would be a separate evening and day combo.

I am using the first 7 draws because that is the max number of days you can play in advance at the PA lottery kiosk. This would definitely change for a game like the PA Match 6 where they allow play for 26 days. But I am not getting too far ahead right now... focusing on the pick 3. The next challenge will be much easier as it is a per column basis... the pick 5, which has been the target for over 2 years now. 

There is always much to learn, and eventually I will grow weary of this system if it fails to do better than previous ones.

I still have coding to do so I can print out the quartiles for each column... should not be too difficult as I already have it spitting out the standard deviation for each. Then I need to find a direct correlation between the neutral numbers and the follower list... there may not be any, but it does have it's place here at the beginning to help narrow down to one combo.

Sometimes things take a while to "click" so I don't rush things. For each hour coding, there are weeks to even months of thought and research. Writing code becomes much easier when you have an idea of what to look for.

Entry #392
Page 1 of 28