hypersoniq's Blog

Time for a live test

2 emergent statistics are the front runners for the right guess, not consistently however. Sometimes it is the most frequent follower of followers, sometimes it is midpoint in the distribution list. There are 8 other numbers in the list, but these 2 sources are most frequently correct... so maybe a live fire test is in order...

Run a pick 3 mod and eve test and see which pick is closest. Still digging for clues in the output, but paper play isn't very motivating.

$4 for the "cause"...

Entry #247

Adding complexity to gain insight

The next round of script modifications will do the following...

1. Create a new function for the last run (the followers of followers level) that does the same thing as the main analysis function, but also writes the statistics and distribution tables to a csv file. Once that works,

 

2. Nest the entire script inside of a larger function that can be called sequentially on different input csv files. This step is mandatory if I want to do limited back testing while on the hunt for patterns and connections in the data. Which leads us to...

 

3. Figure out how to feed in draws from the history list in succession so I can see the results for like a week at once. Here the generated CSV can be opened in Excel to make use of some features like solver and power query, plus the host of graphing tools available. Once this laundry list is complete, then I need to record all of the findings to develop a model from which to get a pick.

 

I hope I get to play sometime in 2024...

Entry #246

The real target is "most of the time"

The problem with predictive statistics is that your analysis holds "most" of the time, but not always. If it holds always then it is usually too general, and if it holds rarely it is most likely over fit from too narrow of a scope.

The current state of the follower script is that I have noticed that most of the time, the next number is located within one standard deviation from the center of the distribution list. (Up or down, for a full range of 2 standard deviations) That does not eliminate the fringe cases where the most frequent or least frequent are selected. Maybe "most" is the best we can hope for...

Still looking for other clues in the statistics, but it did not take long to find a starting point.

I will eventually have to find a way to automate the analysis, as the slow manual process now involves poring over 16 separate data sets, but that is where the discoveries happen.

As the current reduction results in about 3 numbers per colum, a 3x3 matrix of picks would cost $27 to play. If it guaranteed a hit, then it would be a no brainer to stop here, but that goes against the original design goals, so the studying continues...

Entry #245

The script was easy, now comes the hard part...

The script I have dreamed about for several years is finally completed. 100% functioning as intended. 

Now comes the hard part... interpreting all of the statistics and making a pick. I spent several hours in a discussion with ChatGPT that involved no coding, it was purely about the meaning behind the descriptive statistics. As it would happen, the tail end of each column of data produces a frequency distribution list that is the base calculation required to calculate Markov probabilities! So there is no need to process further, as the most frequent number would have the highest Markov probability.

The downside... the next drawn number rarely comes from the most frequently drawn. It does give a clear indication of a pick to try, so I accomplished that goal of a one shot forecast, it just does not win... yet.

I have noticed some anomalies by comparing the overall distribution of each step in processing...

1. The draw history exhibits a mostly uniform distribution.

2. The first pass that records the followers becomes a bit less uniform.

3. The third pass (the most frequent follower of the last follower) is hardly uniform at all. Not exactly a bell curve, but there exists a heavier concentration of certain numbers at the top 1/3 of the distribution.

The next time I have a spare hour or three I intend to remove a week of draw history and run the script as I add each result in, recording ALL of the output and comparing it to see if there are any other indicators not immediately obvious that might help pinpoint the next drawing. I must be diligent to look for patterns that MOSTLY hold true across draws to avoid manual over fitting. Until I find some of these yet to be discovered universal truths, I will continue to paper play the pick 3 mid and eve games.

Once I am ready, I will use the following bet strategy...

Step 1. $1 straight on mid and eve pick 3 until a win happens... then I will be on the state's money... total cost $14 per week

Step 2. IF a win happens, then for one week play $5 straight on the pick3 mid/eve AND the pick5 mid/eve... that will cost $140 and if there is no win that week, back to the step 1, only adding the pick 5 mid/eve for 2 weeks... total cost $56.

With one $1 straight win pick3 in pa =$500, this will take the total cost of step 2 ($196) and the cost of the normal weekly $14 until that first hit happens ( would have to hit within 21 weeks to be profitable).

I don't want to start yet because I do not believe that the highest Markov probability has the chance of being the pick for the entire game. I treat all columns in the history file separately. I am not playing to get 1 in 1,000 or 1 in 100,000 rather trying to pick 1 in 10 for 3 or 5 times in a row, respectively. 

More importantly I learned a great deal about how to use and manipulate pandas data frames in python. I followed software engineering best practices and ended up with a script that produces the desired output in only 87 lines of code. I gave my script a fitting name, follower_refinery.py

My goal for 2024 is a straight win on the pick 5, and I will leave no stone unturned until that goal is reached. When I reach that goal I will upgrade my LP membership to Platinum.

So when I am still a standard member in 2025, you all will know I failed. But maybe this time will be different... I can't ALWAYS lose, right?

Entry #244

Outlining the proposed changes to the new follower script

For my own reference I will be making the following additions

1. Adding descriptive statistics in addition to the Mode that is already present

2. Adding a function that prints these statistics for the entire draw history before processing the followers so as to see the changes made at each step.

3. Removing the user input section to limit the output to running twice, as subsequent runs tend to be too small of a result set and usually eliminates the next number drawn.

These changes will be implemented one at a time and the program tested at each step to ensure that the code functions properly.

Also on the back burner will be making a history file for PA cash 5, because after doing some researching I found the matrix change from 5/39 to 5/43 happened in February of 2008.

I do not anticipate much difficulty in implementing these changes except for the time required as free time will be in short supply for the next 8 weeks as my mobile development class is pretty intense and work is always busy around the holidays.

Entry #243

Next steps with the follower script

The next phase of development will involve generating some descriptive statistics in addition to just the mode (most frequent), because that alone does not work.

Owing to the simplified framework and knowing how the data is handled internally, these additional statistics will add probably 5 to 6 lines of code, bringing the total script length to 72 lines. (A massive improvement over 2,200 lines, which required a ton of editing just to add 1 thing).

Data can always be removed from the history file to run tests when you know the next draw result but don't want the code to know...

It is good to have a direction, even if it does not lead to the desired destination, there is much to be learned along the way.

Entry #242

On using ChatGPT to help generate starter code and squash bugs

I am finally near completion of my modified follower script. It is far less complex and does the same exact thing for any csv history file from pick 3 to powerball... and went from 2,200 lines of code to under 100!

Bottom line... do not mention lottery... and prepare to Do Your Own Research (DYOR) because generative AI is not taking any software development jobs anytime soon!

 

It was a long and difficult process to get to the code I have now. The benefit is obvious but here are the pitfalls...

 

1. Open a bunch of notepads, because many times when implementing a requested change it messes with the core functionality. Develop a quick and dirty versioning system, like typing V1, V2 etc... at the top of each text file of copied code... then Read it! Make sure there are no obvious logic errors like endless loops inside of functions.

2. Type your comments and code samples into notepad first, then paste into the GPT comment box. After a few days of accumulated chat, it gets SLOW!

 

3. Be prepared to re state your goals over and over again. While it is amazing software, it is like working with 10 Second Tommy from the movie 50 First Dates. (NOT an exaggeration!!!)

I have one more change to add on my own, then the refactoring of the follower code will be complete. It took the better part of 2 weeks.

I still have to figure out how to interpret the results and get a pick, but that is like every other system.

I used Chat GPT to learn about certain aspects of programming that I was not well versed in, like Pandas Data Frames and the proper implementation of recursion in function design, but I had a clear vision of what I wanted the code to do and just needed a few tweaks to get there. This time I provided the starter code and the tweaks and problem solutions were more evenly divided than just asking it for a program like it was some sort of Genie...

It is a great asset because literally no one else I know in real life is into programming, but it is not a clean sheet code generator either, you need to have a clear vision of what you want.

If you wanted to learn how to write code from scratch, there is probably no better resource.

Entry #241

When programming Python, indents matter

The Python programming language is sensitive to indent levels. When you view the source code of an HTML document (like clicking "View Source" in your browser, the code is indented to aid in readability, but not required for functionality. Python on the other hand is sensitive to indentation, it can quite literally change the scope of a code block! 

I am in the process of refactoring the follower script I created to remove unwanted code and streamline the functions (down to under 200 lines of code from the original behemoth that was over 2,200 lines and it outputs the same data!). In this process I had an idea and quickly coded it, not paying attention to where I put the indent level, so it was in the wrong scope and just made the program go into an endless loop. The fix was indenting by one level (4 spaces to be Pythonic, or one tab, if you used the setup options in IDLE (the python built in integrated development environment[IDE]) to make 1 tab = 4 spaces). This was a frustrating bug to squash, but my thick headed persistence paid off again.

Just sharing this for any other would be Python coders out there who may find themselves staring at a screen full of code wandering why it is not functioning as intended.

This code reduction was possible by removing the extra skip follower lines, they added no benefit and mostly just generated noise. Also removed the most recent list weighting, as lottery data is not seasonal (verified by time series analysis using SARIMAX).

Happy Coding!

Entry #240

Not done with followers just yet

Viewing from different "vantage points" yielded no particular benefit.

Creating a recent list for weighting was not statistically significant because the data does NOT exhibit seasonality.

Enter recursion... I never tried to reprocess the resultant list for a follower among followers...

Recursively reducing this follower list by running it through the same function might just be what I was missing... a better indicator of what is next than just the most frequent number in the follower list.

It is sometimes difficult to get a proper base case (when recursion should stop) without running experiments, so that is the focus for now, with parallel development on the time series classification angle.

Pick 3 and pick 5 are still the targets. And this time I am writing the functions to auto detect the number of columns so I do not need a different function for each game type... a one size fits all approach!

I walked away from the laptop for a few days to try and re-imagine the next steps and this was my " a ha" moment...

Ready to get back in the ring!

Entry #239

If machine learning were easy, everyone would be using it...

This project is a great deal more difficult than I imagined it would be... which is a good thing I suppose.

The first run of the program took 5 minutes, but I mis-configured something and it only printed out a percentage, and not an actual set of picks. That is what I am working on now.

To contrast, my follower code ran in seconds.

Everything is highly commented as I learn what each line does, as the starter code was "Frankensteined" from several tutorials. Here is where ChatGPT will come in handy, helping to understand the parameters and hyper parameters that make up the code.

On the bright side, it only took 2 hours to go from a blank script to a script that ran without errors to the end.

I did develop what I think is the best strategy once the semantic error (output of the wrong data) is resolved, and that is to hold back 7 days of results for the final test. In this way, the parametric tuning will affect the actual results rather than just affect the training phase results.

I ended up with a 3 LSTM layer design, as the 2 layer was having problems with the randomness of the data.

The next run has some statistics about the picks during the testing phase, excited to see them because I left them out of the initial script.

Based on the imports and configuration, this actually scratches the surface of deep learning, and if that can be done in 5 minutes with a quad core i7 with 16GB of RAM and a 960M graphics card with 700 cuda cores, how fast would it run on the laptop I am buying next year? That one has a 24 core i9 with 64GB RAM and an Nvidia 4090 with 7,000 cuda cores!

The goal is the PA pick3 and pick5, but the program should be flexible enough to run the pick 4 as well in the future.

I am probably learning more now than I will in the class I will have in machine learning, so that is worth it already, should ace that class!

Also reading a new book called "Python for Data Science" that seems to be explaining much of what tutorials assume you already know.

I'll stop learning when they put me in the ground.

Entry #238

Progress has been made

Preparing the data is arguably the most important part of machine learning, and that part is done. I have freshly minted one-hot encoded draw data sheets with appropriate column headers for the PA pick 3 and pick 5 (mid and eve).

The spreadsheet was the right tool for preparing the data and performing the feature encoding, as it was as easy as dragging formulas down the sheet, then copying only the function results to the appropriate columns.

Since the Python coding will take a few days (optimistically), I am keeping the data in spreadsheets for updates and only exporting to CSV when ready to make the initial runs. I also have to split the data into 80% training data and 20% test data, but that can be done in Python so I only need the one sheet per game.

Excited to see what the final plan will be... going to start with the components in scikit_learn, inly moving to PyTorch or TensorFlow if absolutely necessary.

Entry #237

Prep for coding

First thing is to come up with a naming convention for the csv column headers for the draw results and the one hot encoded features.

Next is to update all of the draw files, adding the features (that will take a few hours)

Then, constructing the code so that it is one atomic function that generates the week long pick list for each column of draw data.

Then the hard part, graphing and statistics incorporation. Going to learn from this attempt, recording picks vs results. Graphing and counting hits in the 20% validation data before the forecast executes.

The next week or two will be busy.

Entry #236

Grasping the theory, next step is coding.

The use of machine learning is a deep rabbit hole to be sure. Estimation functions, activation functions, error functions, gradient descent, weights... then there are "hyper parameters" that can affect the performance, layering, hidden layers, noise, bias... stateful or stateless LSTMs. The learning will be continuing throughout the coding of the first iteration.

The desired output will be time series, one pick in each position for 7 days, since we know the dates. Why not just for the next day? Because we are working with sequence data, trying to find sequence patterns. This will run a great deal longer than any program I have written prior, hours rather than minutes (the 2nd layer).

Still one pick per draw. It can be modified to output a sequence of variable length, if the week length does not have a positive result, it could be extended to a month. It can be ran in several configurations at the same time.

Going to build the script to be modular, so it can handle any daily game, from pick 2 to pick 5. IF there are positive results, then it can be adapted to the red ball in PB or MM, but that is not an initial goal.

Generating a graph of the test phase IS a goal. 80% of the draw data will be used for training, the remaining 20% for testing and the final output being a series of 7 time increments into the future.

A graph of where the system picks right and where it misses will definitely help to set realistic expectations and possible error correction factors. Also going to keep a spreadsheet of the parameters to record changes made.

For all of the years I worked on lottery systems I do not believe I was asking the right questions... this could change all of that. Or not. The work will not be wasted, as machine learning has many applications, but wouldn't it be cool if there was some success!

Entry #235

Time for the next approach... Machine Learning!

The current deep dive has led to some hypotheses I would like to experiment with.

The lottery is definitely laid out like a time series problem, list of dates with data for each. The problem with time series is it uses moving averages, never gives a clean prediction... enter classifiers... look at the numbers as categories without math properties!

I am still weeks away from usable code, but here is the plan. Use time series analysis for classifiers with an LSTM (long short term memory) in 2 layers... splitting the data 80% training, 20% validation to construct the model, then forecast for 7 days (since the run time will be in hours, not seconds).

This time even more modularity. Last system (followers) had 18 modules, one for each position of the pick 3/4/5 mid and eve... this one will have but one!! Every change to the old system required doctoring 18 instances of the function code.

This begins the introduction of "features" to the data, namely odd even to split the numbers into 0,2,4,6,8 and 1,3,5,7,9 and added to that high and low, splitting the numbers into 0,1,2,3,4 and 5,6,7,8,9.

The example is one, 

Value = 1, E=0, L=1

E is a binary value, 1 = even, 0 = odd

L is a binary value, 1 = low group, 0= high group

These will be included in the evaluation by a method called "one hot encoding", which uses binary feature information to help find patterns.

A 2 layer LSTM has a better shot at finding difficult patterns, while a 3 layer LSTM is overkill and may induce bias and over fitting. For reference, a one layer LSTM does not do as well with complex patterns in a sequence.

This is my actual first step in machine learning, months before I get to take a college class in it, so the time will not be wasted regardless the results. Found a win win!

Along with this will be learning seaborn and matplotlib to graph out data.

All of this came from a starter chat with ChatGPT about pattern matching and the difference between prediction and forecasting.

Next big thing? Doubtful but always optimistic until the first live test results. A positive step in coding? Absolutely!

Entry #234

This system just does not work... so what next?

Gave it 150%, learned a great deal about Python along the way, but the wins are just not there.

Still my go to for a Power ball ticket until something more functional comes along, but I have tested enough to throw in the towel on this one.

I do have a viable framework for future attempts, and I will be doing a deep dive on time series analysis for strings, yes, that means encoding the numbers as letters.

In the mean time (deep dives can take months), I am going to develop a fun system for the sole purpose of back testing. Something I could do easily with excel, but the goal is learning.

More details later, it will be a workout type system where you won't need the draw history to play, but it's about processing this workout in python and checking for hits, both straight and boxed. Might take a bit to work out the code, but it should fit into the existing framework, only not applicable to the jackpot type non replacement games, strictly pick 3, 4 and 5.

Rather than pack it up again, this is prime opportunity to hone development skills in python and excel, as the output will be written to an excel sheet (not just a csv).

Entry #233
Page 1 of 17