hypersoniq's Blog

hypersoniq's Blog has 673 entries and has been viewed 442,557 times.
Lottery Post members have made 559 comments in hypersoniq's Blog.
hypersoniq is a Standard member.

November 9, 2023
8:37 am

If machine learning were easy, everyone would be using it...

This project is a great deal more difficult than I imagined it would be... which is a good thing I suppose.

The first run of the program took 5 minutes, but I mis-configured something and it only printed out a percentage, and not an actual set of picks. That is what I am working on now.

To contrast, my follower code ran in seconds.

Everything is highly commented as I learn what each line does, as the starter code was "Frankensteined" from several tutorials. Here is where ChatGPT will come in handy, helping to understand the parameters and hyper parameters that make up the code.

On the bright side, it only took 2 hours to go from a blank script to a script that ran without errors to the end.

I did develop what I think is the best strategy once the semantic error (output of the wrong data) is resolved, and that is to hold back 7 days of results for the final test. In this way, the parametric tuning will affect the actual results rather than just affect the training phase results.

I ended up with a 3 LSTM layer design, as the 2 layer was having problems with the randomness of the data.

The next run has some statistics about the picks during the testing phase, excited to see them because I left them out of the initial script.

Based on the imports and configuration, this actually scratches the surface of deep learning, and if that can be done in 5 minutes with a quad core i7 with 16GB of RAM and a 960M graphics card with 700 cuda cores, how fast would it run on the laptop I am buying next year? That one has a 24 core i9 with 64GB RAM and an Nvidia 4090 with 7,000 cuda cores!

The goal is the PA pick3 and pick5, but the program should be flexible enough to run the pick 4 as well in the future.

I am probably learning more now than I will in the class I will have in machine learning, so that is worth it already, should ace that class!

Also reading a new book called "Python for Data Science" that seems to be explaining much of what tutorials assume you already know.

I'll stop learning when they put me in the ground.

1 Comment

Entry #238

November 7, 2023
12:07 am

Progress has been made

Preparing the data is arguably the most important part of machine learning, and that part is done. I have freshly minted one-hot encoded draw data sheets with appropriate column headers for the PA pick 3 and pick 5 (mid and eve).

The spreadsheet was the right tool for preparing the data and performing the feature encoding, as it was as easy as dragging formulas down the sheet, then copying only the function results to the appropriate columns.

Since the Python coding will take a few days (optimistically), I am keeping the data in spreadsheets for updates and only exporting to CSV when ready to make the initial runs. I also have to split the data into 80% training data and 20% test data, but that can be done in Python so I only need the one sheet per game.

Excited to see what the final plan will be... going to start with the components in scikit_learn, inly moving to PyTorch or TensorFlow if absolutely necessary.

1 Comment

Entry #237

November 3, 2023
3:54 pm

Prep for coding

First thing is to come up with a naming convention for the csv column headers for the draw results and the one hot encoded features.

Next is to update all of the draw files, adding the features (that will take a few hours)

Then, constructing the code so that it is one atomic function that generates the week long pick list for each column of draw data.

Then the hard part, graphing and statistics incorporation. Going to learn from this attempt, recording picks vs results. Graphing and counting hits in the 20% validation data before the forecast executes.

The next week or two will be busy.

Comments

Entry #236

November 1, 2023
8:25 am

Grasping the theory, next step is coding.

The use of machine learning is a deep rabbit hole to be sure. Estimation functions, activation functions, error functions, gradient descent, weights... then there are "hyper parameters" that can affect the performance, layering, hidden layers, noise, bias... stateful or stateless LSTMs. The learning will be continuing throughout the coding of the first iteration.

The desired output will be time series, one pick in each position for 7 days, since we know the dates. Why not just for the next day? Because we are working with sequence data, trying to find sequence patterns. This will run a great deal longer than any program I have written prior, hours rather than minutes (the 2nd layer).

Still one pick per draw. It can be modified to output a sequence of variable length, if the week length does not have a positive result, it could be extended to a month. It can be ran in several configurations at the same time.

Going to build the script to be modular, so it can handle any daily game, from pick 2 to pick 5. IF there are positive results, then it can be adapted to the red ball in PB or MM, but that is not an initial goal.

Generating a graph of the test phase IS a goal. 80% of the draw data will be used for training, the remaining 20% for testing and the final output being a series of 7 time increments into the future.

A graph of where the system picks right and where it misses will definitely help to set realistic expectations and possible error correction factors. Also going to keep a spreadsheet of the parameters to record changes made.

For all of the years I worked on lottery systems I do not believe I was asking the right questions... this could change all of that. Or not. The work will not be wasted, as machine learning has many applications, but wouldn't it be cool if there was some success!

1 Comment

Entry #235

October 24, 2023
12:36 pm

Time for the next approach... Machine Learning!

The current deep dive has led to some hypotheses I would like to experiment with.

The lottery is definitely laid out like a time series problem, list of dates with data for each. The problem with time series is it uses moving averages, never gives a clean prediction... enter classifiers... look at the numbers as categories without math properties!

I am still weeks away from usable code, but here is the plan. Use time series analysis for classifiers with an LSTM (long short term memory) in 2 layers... splitting the data 80% training, 20% validation to construct the model, then forecast for 7 days (since the run time will be in hours, not seconds).

This time even more modularity. Last system (followers) had 18 modules, one for each position of the pick 3/4/5 mid and eve... this one will have but one!! Every change to the old system required doctoring 18 instances of the function code.

This begins the introduction of "features" to the data, namely odd even to split the numbers into 0,2,4,6,8 and 1,3,5,7,9 and added to that high and low, splitting the numbers into 0,1,2,3,4 and 5,6,7,8,9.

The example is one,

Value = 1, E=0, L=1

E is a binary value, 1 = even, 0 = odd

L is a binary value, 1 = low group, 0= high group

These will be included in the evaluation by a method called "one hot encoding", which uses binary feature information to help find patterns.

A 2 layer LSTM has a better shot at finding difficult patterns, while a 3 layer LSTM is overkill and may induce bias and over fitting. For reference, a one layer LSTM does not do as well with complex patterns in a sequence.

This is my actual first step in machine learning, months before I get to take a college class in it, so the time will not be wasted regardless the results. Found a win win!

Along with this will be learning seaborn and matplotlib to graph out data.

All of this came from a starter chat with ChatGPT about pattern matching and the difference between prediction and forecasting.

Next big thing? Doubtful but always optimistic until the first live test results. A positive step in coding? Absolutely!

Comments

Entry #234

October 4, 2023
6:24 am

This system just does not work... so what next?

Gave it 150%, learned a great deal about Python along the way, but the wins are just not there.

Still my go to for a Power ball ticket until something more functional comes along, but I have tested enough to throw in the towel on this one.

I do have a viable framework for future attempts, and I will be doing a deep dive on time series analysis for strings, yes, that means encoding the numbers as letters.

In the mean time (deep dives can take months), I am going to develop a fun system for the sole purpose of back testing. Something I could do easily with excel, but the goal is learning.

More details later, it will be a workout type system where you won't need the draw history to play, but it's about processing this workout in python and checking for hits, both straight and boxed. Might take a bit to work out the code, but it should fit into the existing framework, only not applicable to the jackpot type non replacement games, strictly pick 3, 4 and 5.

Rather than pack it up again, this is prime opportunity to hone development skills in python and excel, as the output will be written to an excel sheet (not just a csv).

1 Comment

Entry #233

September 29, 2023
11:35 am

Tightening the follower skips

Perhaps the skips 1, 2, 3, 7, 14, 21, 28 should be changed to 1, 2, 3, 4, 5, 6, 7.

Much like Excel, it is easy in Python to make a copy to experiment with changes.

The quick alteration is as simple as changing 144 lines of code using the find/replace 8 times.

Will it make a difference? Who knows...

As for the future, apparently there exists libraries that can apply time series analysis to string data. That is interesting!

I had mentioned that the numbers drawn have no inherent mathematical properties, and a system that can use letters in place of the numbers might have a better shot. Here encoding things such as even/odd and high/low become features of each ball. You can't just plug raw draw data in and expect machine learning algorithms to spit out winners, or even "learn" anything.

If the numbers 0-9 are replaced with a-j and Odd/Even is encoded as w,x and high/low encoded as y,z... then you have data with features that can be analyzed. With the date as the row key, perhaps some of the classification algorithms or even deep learning can be applied.

I am far from this point yet, but it helps to visualize certain aspects along the way.

The quick grouping of skips will be ran as a test, and IF it does better, then all of the variable names can be changed appropriately. Will generate a pick from both versions of the script and try them head to head for the evening draw of pick3/pick4/pick5 and match 6. Already made picks and bought tickets for MM and PB, a few more wouldn't hurt!

2 Comments

Entry #232

September 26, 2023
9:28 am

Might not be "THE" system, but it establishes a framework..

The only thing that won yesterday was $2 on the match 6, and that was with the help of the 2 QP lines.

I am 6 classes (one class every 2 months, 1 year and 7 weeks away) from an online Bachelor's Degree in Computer Science. My last 2 classes will be Data Mining/Machine Learning, then Artificial Intelligence.

Right now this project is producing mostly dismal results where I thought it might do better, but the work is not wasted. Instead this project is more of a framework for what may come.

1. It is modular, the current follower modules can be swapped for eventual machine learning algorithms using the same overall script.

2. Picks are atomic, meaning the first position of pick 3 is processed independently of the other positions.

3. The framework can read and process the .csv history files as function call parameters, which will serve any future Machine Learning initiatives well, do not have to re invent the wheel.

All of the current initiatives will serve future development as well. This is the list of what I am slowly assembling...

1. A back test system to make recursive picks and feed forward the draw history.

2. The RSS feed parser for updating draw history files with one click.

3. Visualization with the matplotlib and seaborn libraries.

4. Rolling ALL of it into a desktop application, and eventually an Android app so I don't have to fire up a PC or Laptop to update and get picks.

If I can accomplish these goals by next October 27th, I will be ready to leverage AI into a system on my own terms... not really a fan of any AI lottery offerings.

There are no guarantees on lotteries but I can guarantee that I will get better at python coding along the path AND not get left behind with the obvious impact AI is having on computing in general.

It's good to have goals, right?

Comments

Entry #231

September 25, 2023
11:20 am

Moving along with playing the system.

After a disappointing weekend of paper testing the pick 3, time for some actual play. Shooting for pick3, pick5, Match6 and Powerball today.

Tweaked the ratio of recent draws to full history. 28% for pick3/4/5 and 21% for jackpot games. (Up from 21% and 14% respectively) the old percentages did not provide enough data and the weighted picks just ended up mirroring the full pick. Too much and they will mirror the recents, I think I have found balance at these settings, even threw out a few unique numbers!

After the full run today, I will use the system the way it was intended, to produce single picks, and rotate between pick3/4/5. For the jackpot games, will stick with the match 6 and the big games only when the lump sum value is >= 100 million. I intentionally left PA cash 5 from the list in protest of the doubled ticket price, and treasure hunt does not seem worth the time to generate a history file. Not interested in the cash 4 life either.

I did the work, still need to do a pick 3 back test, all the coding, all of the research, and all of the implementation details, maintaining history files daily... now it's time to roll it out live and hope for the best.

Also testing a theory and skipping the next few MADDOG jackpot game challenges to see if the picks do better.

We do not code systems assuming that the lottery draws are truly 100% random, we look for their flaws in the history...

My belief is that you do not need to be better at math than the lottery organizations, just better at statistics!

Comments

Entry #230

September 23, 2023
11:53 am

Reflecting on old Excel systems

For many years the pattern was...

1. Dream up a new system

2. Create the spreadsheet

3. Play until bored

4. Give up for a few years

Some of these systems got 1 hit, like the chain idea where it is like followers, but with the following difference. If a 6 was drawn what followed (easy), BUT, if a 5 was drawn before the 6, then what followed 5 6? That chain system hit once on the pick 4 straight.

The next was an exercise in error correction, where a number was picked and combined with the last draw via absolute subtraction to give a pick. Such as drawn 123 - "mask" 625 [abs(123-625)] =502. That system peaked with a box hit on the pick 5 with 2 pairs ($1,700)

I had sheets for v-tracks, mirrors, +111, even/odd etc. Most were lost when the old laptop crashed 8 years ago. They never produced a notable hit and cost too much to play. It was at that point the goal always became a single "best guess".

I think with this follower system, it is at the point where play will start but boredom may set in. I enjoy the development part way more than the day to day play (excepting a win, of course) so I hope something promising comes along soon

Comments

Entry #229

September 20, 2023
10:52 pm

Just figured out how to add weight to the follower data!

So, we have a list that captures ALL follower data for the last drawn number in each position from 7 different vantage points. Then I added the most recent followers that auto range based on their last number drawn. So here is how the weighting will work...

1. Keep the full combined list of follower data

2. ADD to it the combined recent data lists, this adding weight to the recent followers.

Step 2 acts like a theta weight in linear algebra, as it makes recent followers a "feature", and then the theta is what gets changed to fine tune the overall data.

Maybe it works, maybe it does not. It is adding 36 lines of code to a 2,200+ line script and editing another 36 lines of code, so not that bad of a project. The bottom line is getting back to just one pick per game for the games it supports.

I have no idea how many recent picks are the right amount to add weight, perhaps a percentage of the follower count or the draw history length?

It must be narrowed down to just one before back testing can be coded, so hopefully there is a glimmer of hope in this new modification.

3 Comments

Entry #228

September 17, 2023
10:07 pm

Coding is complete, now the big decision... which version?

The code that takes a pick from the entire 7 sets of follower history is complete.

Which leaves me with a choice...

1. Keep the short term lists with the most recent data or

2. Keep the long term lists with all of the data.

The biggest positive to using the full data set is that there is zero seasonality to lottery histories.

The biggest drawback is that the MOST of something actually rarely happens.

The biggest positive to the short term recent system is that it runs faster.

The biggest negative... it keeps failing tests.

Since this was originally coded for the dailies like pick 3, that may yield the best testing ground... play both for a week n pick 3 and see what gets closest.

Test starts tomorrow...

Comments

Entry #227

September 13, 2023
10:01 am

Why followers? And an idea for the next script change

When I look at the many ways in the forums to pick lottery numbers, I try to keep in mind what we have available to us. With the draw history, we have the follower data at hand (except for the next draw, of course). The basic premise... (using pick 3 as an example) if they drew a 6 in position 1 last night, I can look to see what followed a 6 in position 1 all through the history. There results a distribution that is about as uniform as raw draw data, but there is one number that has followed the MOST.

So what we are gathering is the MOST FREQUENT follower of the last number drawn.

Doing this manually takes too much time. Doing this with Excel is a clunky process as well, usually involving filtering and conditional formatting. Using a python script to read from a comma separated value file (csv) can perform the same task in about 1 second.

My other hypothesis is that using different follower scenarios together might add weight to the overall pick. So I use direct followers, aka skip1. Also skip2, skip3, skip7, skip14, skip21 and skip28. Why? Imagine a camera at a stop sign where there was an incident. One camera angle can tell a part of the story, but 7 cameras all at different angles paint a more accurate picture of what happened.

My initial code looks at only the most recent followers in each category, but that is where I probably went wrong... I am reading the last chapter of a long book...

The next iteration of the script will gather ALL follower data from each skip scenario, roll them into a huge list, and make it's pick by using the statistical MODE (as it does now with partial lists) on ALL of the data, ALL at once!

I realize that all numbers have the possibility of following, and that the MOST of something only appears SOME of the time, but I am at a loss for other ideas at the moment.

Although I enjoy writing scripts in Python, I know I am still a novice coder, so a change of that magnitude will take some time to get right. Time I am willing to invest. This behemoth of a script is already 2,200 lines long, and that is with following software engineering best practices like modularity.

It may not be a breakthrough in cranking out straight hits, but it will be the cheapest "system" to play as it only draws one pick, and I feel that based on my limited knowledge of both lottery systems and coding, and my biases about lottery data (the pick 3 is 3 sequential games of 1 in 10, position A has no influence on positions B or C) this does represent my Best guess.

After this update, I will start making improvements on the visuals, maybe a GUI interface, and I am already working on parsing the PA lottery RSS feed (since they don't provide an api) to make updating the draw history files easier and faster. I can update ALL draw files and generate picks in about 10 minutes. I might consider adding PACash5, Treasure Hunt, Pick2 and PACash4Life, but only if the RSS feed parser script works out.

Anyone else experimenting with Python or another programming language for the purpose of lottery picks?

Comments

Entry #226

September 10, 2023
2:13 pm

Test # 2, The Jackpot Games (PB & MM)

The next round of testing will begin Monday (9/11) with the Power ball and end on Saturday(9/16), also with the power ball.

There are actually 3 games involved, the Mega Millions, The Power ball, and PA's Power Ball Double Play (a separate game with a separate draw history, also a different prize payout... no multiplier and top prize always $10M)

Since you MUST enter a draw into the main PB game, there will be 2 picks for each powerball draw. Adding the multiplier and double draw entry fee, that means both picks are eligible for prizes in both games. For the purpose of this test, results will be separated, so the first line only counts for the main game, and the second only counts for the double draw. Any coincidental payouts won't count.

2 tickets loaded with options for the power ball is a cost of $8

The Mega Millions with the megaplier is a cost of $3

Total System Test Cost = $30 across 3 PB and 2 MM draws. Test duration = 6 days. Pick Method = My Super Seven Python follower script described in detail in earlier blog posts.

The Goal = A hit or combination of hits that exceed the test cost of $30.

The expectation... it is all coincidental anyway, the realistic expectation is that I donate $30 to "Older Pennsylvanians"

Unlike the last test, this only has to work once!

8 Comments

Entry #225

September 9, 2023
11:07 pm

Follower category weights

The script I wrote generates 2 distinct categories of data.

1.full count of follower data over the entire game history. Presented as a ranked count, most occurrences to least.

2. The most recent lists, which vary based on the last draw (if a 5 came up in that position last, then the next list length will be 5.)

I am wondering if maybe there is a way to weight the recent list numbers with the full list count results.

The data for the short lists is combined to get the pick. I could also print the individual scenario picks (skip 1, skip 2, skip 3, skip 7, skip 14, skip 21 and skip 28), but to what end?

I have also thought about adding 2 new perspectives... skip 182 and skip 364. Not sure if they will help.

It all comes down to the pick, I am sure I could do more, but what?

Comments

Entry #224