After processing and analyzing a 2 week back test of the follower script (still working on the back test automation, this one was painfully manual) and have reached a conclusion that the follower with the highest Markov probability is still the best pick, but the consistency is all over the place. Saw a few 2 of 3 in position on the pick 3, and twice a 3 of 5 in position with the pick 5, but no straight hits (even in paper play yet.)
So that leaves me with 2 good (but not great) sources for picks...
1. The mode of the followers.
2. The mode of the followers of the followers.
Though in 14 draws of each game, the 2nd option was the most frequent, option 1 still had a fair share of positional hits... seems like I should launch the betting strategy with a little competition between the 2 options.
It is good news because the mode is definitive, no further analysis required. But, like I thought starting out, it is coincidental at best, but it DOES represent my BEST guess.
I did convert the script into one big function that can be called on any of my history files, but I am still stuck on how to iterate over the draw history to do a proper back test, though I am considering splitting the history file into an 80%/20% training/testing configuration. 80/20 is used in machine learning, so it might be a viable option.
At least I can get started playing! Been awhile at the drawing board.
So what exactly did I make?
It is a program that generates a list of the followers of the last draw in each position, and using iteration, can work without modification on any game, from the pick 2 through the jackpot games. By a carefully crafted recursive algorithm, I can also repeat the process on the follower list to get the most frequent follower of the last follower. Further recursion eliminates too many winning numbers. After statistical analysis, it was determined that the most frequent number in the generated list is the number that also has the highest Markov probability, though this does not guarantee that all positions will be drawn to match that probability as all 10 digits have non zero probabilities. So, in summary it is an elaborate pick generator that is tilted stronger to forecasting than prediction, but is still just a best guess.
What did I learn along the way?
How to convert csv history files into pandas data frames, and how to handle those data frames when the columns are of unequal lengths (the last pass).
How to work with NaN values and make sure they don't mess up calculations. The highlight was use of the lastValidIndex() function to make the last pass work correctly.
How to calculate and display statistics, both from the Python statistics library and those built into Pandas.
How to nest functions that are compact and atomic inside larger functions that allow for re use of code with data files of varying row and column counts.
How not to give up when the coding seemed like it was never going to work, and how to validate that the results were what was intended.
Even if it never wins, I got a great deal of knowledge out of the process. Due to the huge number of resources from free coding tutorials to chatGPT, this may be one of the best times to be alive if you want to learn how to write programs!
I do not always play, so my initial "week" may exceed 7 days, but will constitute 7 draws. With the competition set up between the 2 picks, that first "week" will cost $56 ($8 per day). If the pick5 hits even once, i will upgrade to platinum, but outside of that indicator, this will be a silent start.
Happy Coding!