hypersoniq's Blog

Follower system loses momentum in the spreadsheet

Back tests have value, even if it is to identify the shortcomings of an overly complicated system.

The spreadsheet creation was also a fun dive into formulas and applying error correction masks. The knowledge gained from clean sheet python scripts to getting back to a spreadsheet with more focus than previous attempts was surely worth the time invested, and the money saved by NOT playing the system!

I must go back to the drawing board for a new estimation function, but while I await inspiration, I will continue to hone the spreadsheets on simple guesses, like mirroring the last draw, and testing that out through history files.

The follower second pass data had a most common error of 0 across all 3 digits... problem is they don't line up for wins.

This next spreadsheet only attempt will be based on replacing the follower guesses with the mirror numbers of the last draw, counting the hits, applying the error correction step and counting those hits. Other simple one line workouts could be applied as well and tested in the same manner.

By designing the sheet with VLOOKUP for the mirror replacement scheme, changes and tweaks (and hit counts) can be explored. Live play tests can be shorter and cheaper, and the goal of a pick 5 straight hit can still be attempted... cheaply.

 

Back to the drawing board. 

 

Happy Coding!

Entry #256

Spreadsheet ideas for error identification.

After evaluating several formula ideas for capturing the difference of the drawn number and my pick, I ultimately decided to go with a vertical lookup table (VLOOKUP) that has 100 rows of every possible drawn digit and every possible prediction. This keeps the results signed (+/-) so I can then count each observed error with the countif and use max to find the most frequent signed error for the data in each position.

Got a busy work week, so I might not get to finish the sheet, or even update the draws until the weekend, which is the perfect tine to test the indexing function of my python back testing script to just process resent results by changing the offset variable.

So, since the systems are not profitable in raw form, looks like I will be waiting to start playing for awhile longer. But, by taking the time to think through each step, I am ending up with home brew tools and a work flow that can be used with any single pick system.

I had a feeling that the follower system was not profitable, I have no gut feeling on the result of error correction applied to it, but the trend will most likely be that it increases the number of hits, but not by enough to have been historically profitable.

When my old laptop bit the dust, I lost over a decade of excel files. Never again! Everything I am now working on is backed up to an internal storage drive AND an external drive.

The biggest problem I may be facing is that outside of followers, I am out of ideas...

Entry #255

Prepping the spreadsheet while the programs run

I have officially begun the next phase, so while the estimated program run times for the pick 3 mid /pick 3 eve/ pick 5 mid and pick 5 eve are a cumulative 5 hours... prep can be done.

Step 1 is to update the draw history sheets... done

Step 2 is to make a fresh copy, so the originals can remain for running the short forecast code later... done

Step 3 is to wait until the programs are done running... in process...

The questions will be answered one at a time, first being how many hits? So the test for that will be a one cell formula using the following format...

IF(AND(condition 1;condition 2;condition 3), 1 if true, 0 if false)

The benefit of this setup is counting hits is a simple sum. Filtering for 1 will show the dates of the hits, enabling calculating gaps.

The difference (or error) will be drawn-guess

As for how to interpret that info, that will have to wait until I get that far... but work is in process!

Entry #254

Python development complete! Time to move the data to Excel.

Though it is still running 37 minutes after I started it, in less than 20 additional lines of code I was able to get back test data. It runs the follower code for the first 1,000 draws, adds the next draw and repeats the process. Using the mid day PA pick 3 data for the test run, so the eve file will take twice as long... but when run will provide a list of picks to compare to draw results.

Now I will be able to see results.

The spreadsheet design part of this is based on the questions I want to answer.

1. Would this system ever have won? If it did, were there enough hits to be profitable? What were the gaps between wins?

2. How far off were the system picks from the draw results? That is the difference between the drawn number and the guess.

3. Is there one difference that happens more than others? If applied as a mask, can it produce more hits than the raw guess?

4. Is it better to use some form of absolute subtraction to obtain the difference or would it be better to use an Xlookup table?

Some daunting work ahead, but for most of my time playing the lottery, excel was my wheelhouse. This part will also take time to get right.

The code changes were tested at each step to get them right, the same care will be taken with the spreadsheet.

This was a HUGE step, and it only took 2 days. I am hooked on Pandas data frames in Python, that was the secret to cracking the execution of the back test scenario with such a complex estimation function (follower data).

The end goal is to obtain an error correction "mask" that can be applied to the original follower system picks for a better guess. The notes I left myself in this blog helped maintain focus. Even without much interaction, this is a very useful Lottery Post feature!

Entry #253

Free time is around the corner!

I have to take a final exam in Mobile Application Development later this morning. After that I am planning on starting to build an error correction back test version of my follower program.

There are 5 areas that need to be implemented, and I have solved 3, I will begin making and testing those changes. Obtaining the error correction masks is the primary task at hand as I have suspended play until that is complete.

I have "off" of school until Feb. 1st, so I plan on wrapping this phase up before then. Once I can generate the error data (how far was the pick off from the forecast) then begins the trial and error phase of learning how to work that data into a mask that can be applied to future draws.

3 more classes to go to wrap up a part time 8 year journey to a Bachelor's Degree in Computer Science. In that time I need to create a strong coding portfolio (not using lottery code) and hammer out a respectable resume as the ultimate goal is a fully remote software developer job. 2024 going to be busy from start to finish!

Entry #252

Last play of the year

Going with a last play for this year with pick3 and pick5 day and night. Also a match 6 since it managed to hit over 3M! Gonna pick up a PB as well so I do not have to do anything tomorrow but celebrate the new year with the family. 

16 days left of this class, then I implement error correction into the follower system, then the play strategy is most likely 4 days a week, and the goal is pick 5, so that goes into full rotation.

Have a wonderful new year!

Entry #251

2024 goals

I have found that it is more positive to make goals rather than resolutions. For 2024 I have the following goals...

1. For the first 33 weeks of the new year, my MAIN goal is to finish my last 3 classes strong and get that Bachelor's Degree in Computer science that I have been working on part time for 7 and a half years.

2. For the remainder of the year, going to be looking fr a solid coding boot camp so I can make a portfolio to go along with the new degree. Something 100% online and free or as cheap as possible.

3. After April 12th, I will have 2 years in supply chain systems (a position at work I was able to qualify for with my associates degree) so I will make it a goal to apply for something directly in the IT department, as most of those jobs require 2 years of supply chain systems experience. My long term goal is to get into a software engineering role within the company I have been at for 14 years.

4. For Lottery, the stated goal is a straight hit on the PA pick 5, using the follower system I am still perfecting.

5. After graduation I also want to get back into playing music. Mainly the guitar, but I also have an old Ensoniq ASRX pro tabletop sampler I plan on restoring.

As if that were not enough... I also plan on redoing the basement (which is finished) and make a work bench out of an old picnic table to house my 3d printer and electronics hobby supplies.

What is your plan for 2024?

Entry #250

Next evolutionary step of the follower program.

As it is written, the follower refinery script constitutes an estimation function, which is  another way to say it is a definitive "best guess". 

Since this is ultimately an attempt to manually implement a machine learning strategy,  the next step will be an error function. I finally figured out how to set up the back test! This process will involve

1. Loading up the first 1,000 games of any of the pick n games into a pandas data frame that will be run through the program, this time with the pick isolated from the rest of the distribution list and written to a csv file. The remainder of the draws will be held in another data frame, and the oldest entry popped off and pushed to the bottom of the original list to be run again. With the result appended to the new csv file.

2. This continues until the entire history has been pushed to the data frame that contained the first 1,000 draws.

3. The "manual" error function... this new file with all of the picks will be merged with a copy of the draw history file, and the difference recorded between the result and the pick.

The first test will be to see how many times the picks hit, but the most important function will be to find the most common difference by position and use that as a "mask" to apply to future picks. This new mask can then be tested in place to see if it produced more hits than the original guess.

I believe this to be a fair test because it only uses data for each guess that you would have had access to at that time in history. Plus it will give a definitive answer to the win gaps.

Crazy hectic schedule for the next 3 weeks with class, but I will start coding this during the class break.

The ultimate goal would be the ability to automate the error test and generate a mask, but small steps make progress!

So to recap, the pick is the follower with the highest Markov probability (which is the most frequent) and the mask will be the error correction factor with the highest Markov probability (also the most frequent). Ultimately leading to one pick that hopefully results in a few wins.

Designing this phase as modularly as the first means that I should soon have the ability to back test ANY system that generates one pick.

Happy Coding!

Entry #249

Ran a small back test with the follower script.

After processing and analyzing a 2 week back test of the follower script (still working on the back test automation, this one was painfully manual) and have reached a conclusion that the follower with the highest Markov probability is still the best pick, but the consistency is all over the place. Saw a few 2 of 3 in position on the pick 3, and twice a 3 of 5 in position with the pick 5, but no straight hits (even in paper play yet.)

So that leaves me with 2 good (but not great) sources for picks...

1. The mode of the followers.

2. The mode of the followers of the followers.

Though in 14 draws of each game, the 2nd option was the most frequent, option 1 still had a fair share of positional hits... seems like I should launch the betting strategy with a little competition between the 2 options.

It is good news because the mode is definitive, no further analysis required. But, like I thought starting out, it is coincidental at best, but it DOES represent my BEST guess.

I did convert the script into one big function that can be called on any of my history files, but I am still stuck on how to iterate over the draw history to do a proper back test, though I am considering splitting the history file into an 80%/20% training/testing configuration. 80/20 is used in machine learning, so it might be a viable option.

At least I can get started playing! Been awhile at the drawing board.

So what exactly did I make?

It is a program that generates a list of the followers of the last draw in each position, and using iteration, can work without modification on any game, from the pick 2 through the jackpot games. By a carefully crafted recursive algorithm, I can also repeat the process on the follower list to get the most frequent follower of the last follower. Further recursion eliminates too many winning numbers. After statistical analysis, it was determined that the most frequent number in the generated list is the number that also has the highest Markov probability, though this does not guarantee that all positions will be drawn to match that probability as all 10 digits have non zero probabilities. So, in summary it is an elaborate pick generator that is tilted stronger to forecasting than prediction, but is still just a best guess.

What did I learn along the way?

How to convert csv history files into pandas data frames, and how to handle those data frames when the columns are of unequal lengths (the last pass).

How to work with NaN values and make sure they don't mess up calculations. The highlight was use of the lastValidIndex() function to make the last pass work correctly.

How to calculate and display statistics, both from the Python statistics library and those built into Pandas.

How to nest functions that are compact and atomic inside larger functions that allow for re use of code with data files of varying row and column counts.

 

How not to give up when the coding seemed like it was never going to work, and how to validate that the results were what was intended.

Even if it never wins, I got a great deal of knowledge out of the process. Due to the huge number of resources from free coding tutorials to chatGPT, this may be one of the best times to be alive if you want to learn how to write programs!

I do not always play, so my initial "week" may exceed 7 days, but will constitute 7 draws. With the competition set up between the 2 picks, that first "week" will cost $56 ($8 per day). If the pick5 hits even once, i will upgrade to platinum, but outside of that indicator, this will be a silent start.

Happy Coding!

Entry #248

Time for a live test

2 emergent statistics are the front runners for the right guess, not consistently however. Sometimes it is the most frequent follower of followers, sometimes it is midpoint in the distribution list. There are 8 other numbers in the list, but these 2 sources are most frequently correct... so maybe a live fire test is in order...

Run a pick 3 mod and eve test and see which pick is closest. Still digging for clues in the output, but paper play isn't very motivating.

$4 for the "cause"...

Entry #247

Adding complexity to gain insight

The next round of script modifications will do the following...

1. Create a new function for the last run (the followers of followers level) that does the same thing as the main analysis function, but also writes the statistics and distribution tables to a csv file. Once that works,

 

2. Nest the entire script inside of a larger function that can be called sequentially on different input csv files. This step is mandatory if I want to do limited back testing while on the hunt for patterns and connections in the data. Which leads us to...

 

3. Figure out how to feed in draws from the history list in succession so I can see the results for like a week at once. Here the generated CSV can be opened in Excel to make use of some features like solver and power query, plus the host of graphing tools available. Once this laundry list is complete, then I need to record all of the findings to develop a model from which to get a pick.

 

I hope I get to play sometime in 2024...

Entry #246

The real target is "most of the time"

The problem with predictive statistics is that your analysis holds "most" of the time, but not always. If it holds always then it is usually too general, and if it holds rarely it is most likely over fit from too narrow of a scope.

The current state of the follower script is that I have noticed that most of the time, the next number is located within one standard deviation from the center of the distribution list. (Up or down, for a full range of 2 standard deviations) That does not eliminate the fringe cases where the most frequent or least frequent are selected. Maybe "most" is the best we can hope for...

Still looking for other clues in the statistics, but it did not take long to find a starting point.

I will eventually have to find a way to automate the analysis, as the slow manual process now involves poring over 16 separate data sets, but that is where the discoveries happen.

As the current reduction results in about 3 numbers per colum, a 3x3 matrix of picks would cost $27 to play. If it guaranteed a hit, then it would be a no brainer to stop here, but that goes against the original design goals, so the studying continues...

Entry #245

The script was easy, now comes the hard part...

The script I have dreamed about for several years is finally completed. 100% functioning as intended. 

Now comes the hard part... interpreting all of the statistics and making a pick. I spent several hours in a discussion with ChatGPT that involved no coding, it was purely about the meaning behind the descriptive statistics. As it would happen, the tail end of each column of data produces a frequency distribution list that is the base calculation required to calculate Markov probabilities! So there is no need to process further, as the most frequent number would have the highest Markov probability.

The downside... the next drawn number rarely comes from the most frequently drawn. It does give a clear indication of a pick to try, so I accomplished that goal of a one shot forecast, it just does not win... yet.

I have noticed some anomalies by comparing the overall distribution of each step in processing...

1. The draw history exhibits a mostly uniform distribution.

2. The first pass that records the followers becomes a bit less uniform.

3. The third pass (the most frequent follower of the last follower) is hardly uniform at all. Not exactly a bell curve, but there exists a heavier concentration of certain numbers at the top 1/3 of the distribution.

The next time I have a spare hour or three I intend to remove a week of draw history and run the script as I add each result in, recording ALL of the output and comparing it to see if there are any other indicators not immediately obvious that might help pinpoint the next drawing. I must be diligent to look for patterns that MOSTLY hold true across draws to avoid manual over fitting. Until I find some of these yet to be discovered universal truths, I will continue to paper play the pick 3 mid and eve games.

Once I am ready, I will use the following bet strategy...

Step 1. $1 straight on mid and eve pick 3 until a win happens... then I will be on the state's money... total cost $14 per week

Step 2. IF a win happens, then for one week play $5 straight on the pick3 mid/eve AND the pick5 mid/eve... that will cost $140 and if there is no win that week, back to the step 1, only adding the pick 5 mid/eve for 2 weeks... total cost $56.

With one $1 straight win pick3 in pa =$500, this will take the total cost of step 2 ($196) and the cost of the normal weekly $14 until that first hit happens ( would have to hit within 21 weeks to be profitable).

I don't want to start yet because I do not believe that the highest Markov probability has the chance of being the pick for the entire game. I treat all columns in the history file separately. I am not playing to get 1 in 1,000 or 1 in 100,000 rather trying to pick 1 in 10 for 3 or 5 times in a row, respectively. 

More importantly I learned a great deal about how to use and manipulate pandas data frames in python. I followed software engineering best practices and ended up with a script that produces the desired output in only 87 lines of code. I gave my script a fitting name, follower_refinery.py

My goal for 2024 is a straight win on the pick 5, and I will leave no stone unturned until that goal is reached. When I reach that goal I will upgrade my LP membership to Platinum.

So when I am still a standard member in 2025, you all will know I failed. But maybe this time will be different... I can't ALWAYS lose, right?

Entry #244

Outlining the proposed changes to the new follower script

For my own reference I will be making the following additions

1. Adding descriptive statistics in addition to the Mode that is already present

2. Adding a function that prints these statistics for the entire draw history before processing the followers so as to see the changes made at each step.

3. Removing the user input section to limit the output to running twice, as subsequent runs tend to be too small of a result set and usually eliminates the next number drawn.

These changes will be implemented one at a time and the program tested at each step to ensure that the code functions properly.

Also on the back burner will be making a history file for PA cash 5, because after doing some researching I found the matrix change from 5/39 to 5/43 happened in February of 2008.

I do not anticipate much difficulty in implementing these changes except for the time required as free time will be in short supply for the next 8 weeks as my mobile development class is pretty intense and work is always busy around the holidays.

Entry #243

Next steps with the follower script

The next phase of development will involve generating some descriptive statistics in addition to just the mode (most frequent), because that alone does not work.

Owing to the simplified framework and knowing how the data is handled internally, these additional statistics will add probably 5 to 6 lines of code, bringing the total script length to 72 lines. (A massive improvement over 2,200 lines, which required a ton of editing just to add 1 thing).

Data can always be removed from the history file to run tests when you know the next draw result but don't want the code to know...

It is good to have a direction, even if it does not lead to the desired destination, there is much to be learned along the way.

Entry #242