hypersoniq's Blog

The 4 pick N scripts I have are updated. Halfway there...

All 4 pick N scripts are updated to include the latest updates, which are

1. Calculating HOT and COLD by the 1st and 3rd quartile rather than expectancy +/- 1 standard deviation.

2. Adding the number of rows since the last appearance of each digit in the distribution

3. Adding a function that adds the chi-square goodness of fit test and displays both the chi square statistic and the p value for the 150 draw sample.

The 4 versions of the script are

1. The production version, which runs the last 4 7 draw windows

2. The test version that steps through each of the last 7 draws

3. The development version, which is the sandbox for developing new features

And finally

4. The sliding window full history back test. This one outputs to a csv file and allows counting of some over all stats such as how many All Neutral draws happened and in what percent of all 7 draw windows at least 1 all neutral draw happened.

There is no window step through version for the Cash 4 Life, just production, back test and development. So in that light I am better than half way done.

And I have a legit use for the scipy library!

Good coding day!

Entry #486

Budget always in mind when creating "systems"

One of the main reasons I have always tried to develop a straight shot system is of course the cost. One ticket is cheaper than a group of tickets. However, one ticket is a pain to calculate every day. Hence my move to a 7 day "forecast" window. Same cost as a daily system, but 1/7 the work involved.

Two pitfalls I avoid...

1. Playing the night pick for the day game or playing the day pick for the night. That would double the cost, and the rarity is not worth the expense.

2. Pennsylvania's "Wild Ball". Though I would have caught 2 box hits in the last 2 weeks, I cannot justify voluntarily doubling the cost for the weak payouts.

Also why I avoid working with systems that result in a matrix or working with pairs and mirrors.

At the pick 3 level, I find it more entertaining to be correct than to cash in winners that barely cover the expense of play.

Of course, on their money, expense is recalculated, such as the Cash 4 Life cash ball brute force gambit... I would never spend that kind of money on the lottery, but I WOULD spend their money...

This is my last week of house money for the pick 3, it has been a fun run for nearly 2 months.

Strategy of using the new "draws since last appearance" statistic for each digit will be altered this week. Last week I used the SMALLEST number as a tie breaker, but when reviewing the results, it appears I should have divided the longest out by 2 and used the closest to that median as a tie breaker. Q2 variance from expectancy still seems a good starting point, however I look forward to creating that Markov Decision Process script to help figure out the best interpretation of these statistics. This is probably going to take a long time. It is basically an attempt to answer the questions "Given the following statistics, which neutral number has the best chance of being drawn in the next 7 draws?" AND "which of these statistics are important, and how important is each?"

There are techniques for peeking inside of the AI "black box" to see exactly HOW it is "learning" and what it "learned"... I will need to work on those as well.

Given the biggest time soak is updating the draw histories, I am still working on scraping the PA RSS feed, since they are too cheap to have a proper RESTful API... this is one of those projects that starts with the best intentions but always ends up on the back burner when just about anything else comes up... I know what I WANT the scraper to do, read the RSS feed and store the draw info for selected games (right now Pick 3 day/Evening, pick 5 day/evening and Cash 4 Life) then read the last row of the history file and insert draws past the last recorded date... I know how to store and write the data, I just have to actually sit down and DO it...

When it comes to crastination, I'm a Pro!

Entry #485

082, not in PA, but hit straight in OK...

So at least one of the numbers was a straight hit somewhere. The other number was a box hit in IN earlier in the week.

Last draw for the mid and eve today, then using the last of the house money to play next week tomorrow.

Interesting run so far.

The Python QP generator for the power ball, one ticket per draw for the last 5 draws with all the add ons... cost $20 (including tonight), won so far $35, net profit $15.

Progress has been made on the chi-square goodness of fit addition to the column statistics. Simple to send the function the last 150 draws using Pandas by

df.tail[col](150)

Where df is the data frame and col is the current column. Big coding day tomorrow to integrate these changes into all of the scripts.

Then, I get to start the design for the Markov Decision Process. That will hopefully see some coding progress after the flowchart is done.

Entry #484

Knowing what I have to work with

So, the script produces 2 separate categories of stats.

1. Statistics of the draw sample (150 draws)

These include the variance, standard deviation, the quartiles Q1 (to set the lower classification threshold), Q2 (the median) and Q3 (to set the higher classification threshold), and after the next few days, the chi square statistic and it's P value

2. Statistics of the digits (0 through 9)

These are made up of raw frequency, percentage of appearances within the 150 draws, the classifier (Hot, Neutral or Cold), and the number of draws since it last appeared.

Now I have to figure out how to properly interpret what is shown, there are variants that have to be updated (like the last 4 windows or the back test script, and the Cash 4 Life variants as well)

The back test MUST be run to see the difference between all neutral draws per window using quartiles as a classifier vs the already run version that used standard deviation from expectancy as the classifier.

My V4 on the first script (v2 to switch to quartiles, and v3 to add the number of draws since last appearance) will add the chi-square goodness of fit stats. Once that is done, it is a simple paste of that function into the other pick N variants, and because I saved all versions, I only have to modify one of the Cash 4 Life scripts and paste the functions for the rest, that being the last 4 and the back test.

I will have to spend a few weeks figuring out if any stat has bearing on which digit to choose and why... getting down to the wire, there is $14 left on the house money voucher, so Thursday is most likely the last week before I begin picking up the tab again. A single straight win between now and next Wednesday means the state picks up the tab for the rest of the year... no pressure...

The PA lottery does not seem to like whatever numbers I play as of late...

The odds are high, the budget is low, still entertaining to make these attempts once a week rather than every draw, so I believe I will stick with this one longer than the past attempts (I would have already been done with the older systems.).

To Quote Sonny & Cher, The beat goes on...

Entry #483

The Null Hypothesis of the Lottery

Very simply put, it is impossible to predict random number sequences.

In 20+ years of trying, I cannot reject the null hypothesis.

We all have our own reasons for continuing to try.

Entry #482

PA bet slips and kiosks = speed and accuracy

The kiosks in PA read the bet slips, and they usually have a supply on hand... usually...

I find it best to hang onto a small inventory.

The ones I have are based on the current strategy... 10 slips for the pick 3 (because they would usually be the ones missing at the retailer), 2 for the pick 5, 2 for the Cash 4 Life and 6 for the Mega.

Because of the layout, it takes 2 pick 3 slips to play mid and evening because the day/night/both option is for the whole slip, not individual games. The same is true for the pick 5.

I keep 2 Cash 4 Life because my window is 26 draws and they only go up to 10 advanced draws. So one set to 10 got played twice and one set to 6 got played once. The first attempt was not successful, but it was on the house.

The Mega Millions tickets will never get used at this rate, but if they DO get used, it will also be "on the house", though that slip allows for the full 26 advanced draws. Should that brute force attempt ever come to pass, it will be some undertaking. 

But back to the slips... I like them because it removes human entry error, also blazingly fast.

Since the only sure play I will make is the pick 3 once a week, I keep more of these on hand, and the rest is bare minimum just in case the opportunity presents itself.

Since this system began, I have used the Cash 4 Life set once, the pick 5 twice (also no luck) and have not touched the Mega Millions slips.

The plan for the mega slips, there is a post it on the back of each, with letters A through F. They are for identifying the mega ball range.

A = 1, 2, 3 and 4

B = 5, 6, 7 and 8

C = 9, 10, 11 and 12

D = 13, 14, 15 and 16

E = 17, 18, 19 and 20

F = 21, 22, 23 and 24

Why only use 4 per slip? Because at 26 draws, each game is $130 @ $5 per ticket, this keeps the total to $520 per slip, so 2 $250 straight tickets and one box ticket cashed in for each stays under the $600 claim form radar. Each one being played at a different kiosk (8 retailers within a mile, so one trip). And the ONLY way this gets played is on house money. Keep in mind at 50 cents straight and boxed on the pick 3, funding this will require 12 wins, so probably NEVER going to happen, but if it does, the plan is in place!

Entry #481

Posted the numbers for this week to all-states just for fun

Since Pennsylvania does not seem to like to pick my numbers, maybe the coincidence engine will kick in for another state.

Would only let me post starting tomorrow so they are up for 5 draws.

Mid Day 903

Evening 082

As I wonder how to link phase 1 to phase 2, I am thinking phase 1 output to a csv file which the phase 2 script will read. I like this work flow concept because phase 1 is starting to get too complex to build on and still understand the operations. In this way, phase 2 can be a clean sheet design. I may go through some tutorials on building software for the Markov Decision Process and also on the implementation of reinforcement learning so I can synthesize a solution that "learns".

Entry #480

What the Phase 1 output displays

So now running the phase 1 displays statistics for each column...

1. The first, second (median) and third quartiles

2. The standard deviation

3. The variance

This section will eventually also contain the P value for the chi-square goodness of fit test.

Then, the output that was calculated for each digit in each row is displayed side by side.

The digit, the frequency, the classification (H, N or C), the percentage of the digit's frequency to the whole model (150 draws), and now the number of draws since it's last appearance. This is repeated for each column. So 0, for example, shows the data for 0 in the first column, then the second column and the third column. This stretches out 2 more columns when looking at the pick 5.

The new metric of last appearance was easy to test, by comparing the number given to it's position in the history file, for each column.

Now, I need to figure out how to interpret the data. This first week the following scenario was used...

1. Look at the percentages closest to the median.

2. If there is a tie, use the lowest last appearance as the tie breaker.

So going forward, this will be evaluated each week, to see if things need to be changed. Maybe furthest out is better? Maybe instead of the median as a guide, look at numbers about to transition classifications? There will be plenty of time to experiment until I have something resembling a phase 2 implementation... but the all neutral QP generator will be retired for the time being.

Entry #479

So far, the story remains the same

There was hope starting a new pick 3 system.

There was a straight hit to fund some experiments on other games (pick 5 and Cash 4 Life)

There were more losses than wins, but as of this moment the system has cost 0 out of pocket because of the 1 win.

There are 2 more "free" weeks to play.

Constant re evaluation of the system leads to some major changes...

1. Implementing the inter quartile range instead of 1 standard deviation on either side of the expectancy to classify draws... that was a major change with minimal coding because I was already looking at the quartiles.

2. Implementing the recency as a tie breaker. I should have that done and tested in the next 2 hours.

3. The planned implementation of the chi-square goodness of fit test for the draws that make up the 150 row model... that will take a bit longer.

4. Preparing to turn the results into inputs for a Markov Decision Process by using back testing and reinforcement learning to automate the selection process.

I will eventually implement these changes and will most likely find myself disappointed with the results, then it will be another phase of sitting on the bench until I come up with another "big" idea.

I do believe that this is the last system for the year. Planned and pondered for months. 

The pick 3 is just as difficult as any game when gunning for one combo... True odds... 1:1,000 every time. I have had my "coincidental" hit that most systems produce in the first few weeks, so now it's time to see if there is more to it. I am comfortable with the budget for the remainder of 2025... there are 20 more weeks and 2 are on the house, so 18 x $14 = $252. Because I am doing 50 cents straight and 50 cents boxed, only one win will keep the project on the state... which means I will need a second win to have another shot at the pick 5/ Cash 4 Life combo.

Entry #478

No hit on the pa pick 3 mid, last draw for evening.

Not looking good for the all neutral QP generator.

The new addition of the recency metric should help put together another pick without having to resort to the RNG script.

Gave it a few tries and it was not meant to be.

Of course, they could still draw 749 tonight...

I am still liking the play for a week process.

Entry #477

Planning the phase 1 changes.

When adding functionality to an existing script, the best practice is to add, debug and test each new feature on it's own.

Since my script is modular and responds to the number of columns imported from the csv history file, the function will be modified to add each feature, starting with the most recent appearance of each digit in the distribution. Using pure frequency, there are frequent ties that were in need of a way of being broken.

The second pass will print the P value of the chi-square goodness of fit test for the 150 draws in the current run.

So using version control, the main code will be branched to implement each change and only rolled back into the main code when working. This way I always maintain a WORKING copy to roll back to in the event of errors.

Phase 2 will be a clean sheet implementation, therefore will take much longer to plan and code.

No shortage of work in this hobby!

Entry #476

Talking shop with Google's Gemini...

Going over some statistical conversations with Google's Gemini the past few days. I like certain aspects better than Chat GPT, such as it references it's sources for further exploration.

What resulted is I am planning to add some new data to the output of the classifier script...

1. A row count of how many draws since each number last appeared. This will be used to break ties.

2. Including a chi-square goodness of fit test for the 150 draw window. The P value can be used by phase 2.

Speaking of Phase 2, I will be implementing that as a Markov Decision Process. Similar to a reinforcement learning AI, it will calculate the best pick and be "rewarded" if it shows in the 7 draw window.

A long discussion on data features and statistical methods has shown that I had a good setup, just lacking in features. 

This may take a long time to figure out, because I have never coded a Markov Decision Process before. I think that it will require running the full sliding test again so the MDP can "learn" what to look for.

I did start the conversation off by asking it to refrain from coding examples and stick to statistics and theory, it did that much better than Chat GPT. It did not push Microsoft solutions like copilot, and it did not hallucinate like Claude or GPT.

Using these AI agents is helpful to both confirm your ideas, and to get a healthy dose of reality check when your ideas are not good. It is like having someone to discuss ideas that is also fluent in theory and has access to the entire internet of information. (Even behind research paper pay walls!)

When I went into this hobby 20+ years ago the goal was to find slight bias in the systems. Now I have a better idea of what to look for... I may be getting closer to finally asking the right questions that have eluded me for decades.

As a result, I must get to work on how I will implement these things in my existing code base. I will continue to use the all neutral QP generator for the last 2 weeks of the "house money" on pick 3, but then the test will be suspended until this new phase 2 is ready for live testing. Not that I wouldn't continue with the QP generator IF more house money becomes available, but that is not looking too promising at this point.

Entry #475

The Markov chain as 2nd layer...

So, phase one deals with outlier identification. Phase 2 would not have to look at the history, as the model in phase one did this. It would basically take the last draw in the 150 draw history chunk and calculate all of the probabilities for each neutral and return the digit with the highest probability.

Of course there would need to be a process in place to prevent ties, perhaps this could be it's last appearance from phase 1.

Still a great deal of design work to do to understand the process, but this is definitely a start. This is all about the last state of the data, row 150, and the probabilities of the entire neutral distribution.

Entry #474

A new potential phase 2

While it has been interesting, though not quite so rewarding using the all neutral QP generator in the last few weeks, I have an idea to further process the data in phase one.

This would involve building Markov chains using the output and using the resultant Markov probability to narrow down to the neutral that would be most likely to appear in the 7 draw window for each column.

I will have to consult the writing on the math version of stack overflow and maybe some R documentation.

With the rpy2 library, I can execute R commands directly within Python, while I can use R's reticulate library to do the reverse.

I have cursory knowledge of Markov's theories at best, so this will involve some time to get a picture of what needs to be done. The QP generator will get some more use for the time being.

This would make the selection process a 2 layer approach, with the first layer classifying the numbers by frequency, then the next layer taking this data and determining the best digit to play in each column.

The goal is always to get ONE combo to play, regardless of the game. A classification phase one with a probabilistic phase 2 would result in just that... ONE best guess.

I am also replacing the definition of hot and cold from using the standard deviation to using the observed inter quartile range, as this would allow for a more robust division of classifications that is not as sensitive to the outliers in the group... standard deviation varies wildly between runs.

Entry #473

Python's random library... interesting info

Python's standard random library uses the Mersenne Twister algorithm to generate random numbers. This can also be found in the add on NumPy library as the MT19937 bit generator. It generates 53 bit precision floats over a very long period of 2^19937-1. It is, however NOT cryptographically secure.

If you are interested in the seed that generated your numbers, you can always supply your own with random.seed(46) for example, but if the seed does not change, the sequence stays the same.

Instead it will rely on it's own seed generation which you can view by querying the state...

state = random.getstate()

print(state)

Will return a tuple of information. Saving this will allow you to start up again with the same state by using 

continuedState = random.setstate(state)

Therefore I believe my QP generator to be "random enough".

Not sure what this info is good for other than Pyrhon trivia, but I also fall in the rabbit holes on occasion...

Entry #472