hypersoniq's Blog

Switching to Match 6 worked out... $27 on a $2 ticket!

Deciding to get out of week 3 of the pick 5 to chase the now $4,650,000 Match 6 jackpot was worth it. On the ticket, 2 of my numbers picked were drawn, then 3 on the first quick pick line, and 2 on the second quick pick line. The 3 on one line was a return of the $2 spent, and the 7/18 is worth $25.

That was running the top of the Markov Chain Follower report... 2 of 6. That could have done better, but one takes what one is given. I would have had 3 of 6 if I viewed the classifier report and ran the NA line, which is playing the last "neutral above the median" before the HOT numbers.

3 more plays for this week, maybe alternating between the 2 picks is in order.

Have a feeling that the jackpot will be hit, by at least 1 winner, BEFORE it crosses the $5,000,000 mark. Then I can go back to that last week of pick 5.

After the fed and pa taxes are taken, a solo winning ticket would be worth $2,786,745 (or more, I always take the worst case scenario... 37% federal and 3.07% flat PA tax)

Shortcut calculation for winning in PA is <prize> x 0.5997

That is the quick estimate of what a prize is really worth.

Still waiting for several decades to use that formula for real...

Entry #636

Time to upskill, SQL for proper data interrogation

Going to start on the path of getting the answers that spreadsheets and python are not answering.

Installing PostgreSQL 18.3 and building a knowledge base of queries.

Should be an interesting learning journey!

Entry #635

Divergence from the usual plan...

Since results are just not there in the pick 5, I am skipping the pick 5 for the next 2 days... going to try a one off experiment with the Python random number generator...

Since it is documented that the PA night games use 9 ore draws, 3 rehearsal draws, a live draw and 3 post draws every night, I reworked the QP generator to output 16 draws and will play the 13th pick for both the Cash 5 and Match 6 for tonight.

Skipping 2 pick 5 draws keeps to the budget for the $4 expense.

I expect nothing, but it will be interesting to emulate the process...

Entry #633

Vertical sum spreadsheet version 2 plan

Since the previous 2 systems are focused on a per column approach (because the positional digits are independent), I will stick with a per column approach here as well. That means eliminating the horizontal sum portion, which will simplify the spreadsheet.

The end result will be a cascading sheet, where the groups of columns get shorter as the calculations move across.

The first columns will hold the draw history, the next columns will start ten rows down and sum the 10 rows that make up the first k sample. The next set will make up the N sample size, so they will start 30 rows below the k samples and provide both an average and a mode.

There should be a way to run a basic back test right in the sheet once the data is populated for the average and mode of 30 k samples. That would be to take the average and mode (separately) and subtract the last 9 in the k sample and see if it is equal to the next digit. In this way, if the initial k and N parameters are inconclusive, they can be changed.

It is a great deal of tedium creating such a sheet, but that will serve as the base of coding a function in Python, as the commands to get such data in a pandas data frame are much faster.

So we will see if there is any merit to such a system by counting hits on the back test. That will decide whether it is going to include a mode or an average (or both) when it moves to Python.

This is still a single approach on 3 independent histories, so there will remain that synchronization issue. Hopefully the specific profile information for each gathered from the other 2 functions might be used together at some point to determine the "state" of each position.

I could make the single script prototype open to settings where k and N are different for each position...

At some point I may need to record the output of all of these functions and feed it into a machine learning algo or 2 to find stuff I am currently not seeing.

The sheet is built, with one tab for mid day pick 5 and one sheet for evening pick 5... so far the sums(k) are done as is both the rounded average and mode for the sample size (N)

From here I need to calculate some summary statistics (the range of the sums, the range of the samples and their distribution), and then come up with a formula to test the effectiveness. This part may take some time...

At a quick glance, there is an expected difference in the positions, some run higher (consistently) than others.

Would have been easier with the pick 2, but in this test I wanted more columns to compare.

Entry #632

Decision day on the millionaire 4 life.

Not about playing it, have no plans on that at the moment, but on including it into the app. I have not yet found the ID that corresponds to a data page like I have for the other games. If they did not create one, then there is no pathway for creating one using that method. So I will explore the options...

1. Writing a quick script to run through a group of IDs, using the scraper to extract the H3 page element containing the game name, and excluding the H3 message for an ID with no data.

2. Look at another participating state that either has an api or a similar page of year at a time data.

3. Explore the MUSL site for the same.

Or

4. Drop support for the game in the app.

The current setup has data separated for bonus ball games, but it does not need to be separated if it is the only workaround. It will just not have correct settings for the bonus ball itself and the stats will be off.

Either way, I am removing the Cash 4 Life data and metadata.

Have an idea for visualizing the classifications from the back test data... running an animation of the classifier changes in each column side by side using the matplotlib and Seaborn libraries. This will be a stand alone side script not intended to make it into the app.

Also taking the next steps with vertical sums, which is still in the spreadsheet stage. The changes will include summing 10 draws instead of 3 (pick 3 example) and taking both the AVERAGE() and MODE() of 30 samples instead of the whole column. This incorporates the size calculation I did where k=number of balls in the game x N samples, which is 30. I am designing this vertical sum to be a drop in function for the app.

At this point, the plan is to have 3 separate functions to analyze the data, and find a way to infer from the 3 results to make a best guess. This week, the expense is only $8 as the days of play were reduced from 7 to 4, but the mostly winless annual tally is -$104.

I did plan on playing all year. The goal of completing the app was met almost 10 months early, so now I can use the time on refinement and learning how to better interpret the data.

The new routine is to first check where the winning numbers sat on both functions before updating the draw history, so I will wrap that up before the python/spreadsheet work for today gets into full swing.

So far it has been determined that the endpoint I use has not been updated for new games since the introduction of the Cash 4 Life 10 years ago... yet the data is updated daily... so they must point to a table or database that is the single source of truth for all game data. Time to investigate the front end "view past winning numbers" page to determine where the new data for Millionaire for Life is stored.

Entry #631

The follower stats screen

The stats screen for the follower function provides different data... note the total followers... adding up the frequency of each follower will match this number. The last 10 are the last 10 followers, there is no indication of how many draws elapsed in this range. And the last number drawn is a quick check to make sure the updater is working. The draw in this example is NOT the latest, and that is an added benefit of making the updater user initiated as you can see what you had to work with the previous draw, see where the winners came from... then update to get current data.

Entry #630

Data tells a story

This image from the app confirms that not only are the draws independent events, but the individual positions are also independent of each other. Wide swings in variance and the distribution of classifiers. One thing noticed is that the few draws I have seen since adding the above and below median categories are the winning numbers almost always come from a transition area.

Classifier stats

Entry #629

Lessons learned so far

So, I have learned that every single game history comprises  a discrete uniform distribution. Testing with a Chi Square Goodness of fit test verifies that whole histories to single columns of a smaller sample pass this test. Bottom line, whatever the draw method (ball machines or RNG), the data is random enough.

Not only are past draws independent, but in the case of pick N games, the positions are also independent. Think about that... the pick 3 history is completely different for each position. I can see this easier with the new classification divisions that highlight where each neutral exists in relation to the median. Looking at pick 3 data shows 3 different histories, with different statistics for each. If there are cycles in the data, they most certainly do not happen at the same time. When follower frequency is measured, the 10 most recent followers actually prove to be better at zeroing in on a specific next digit than the whole Markov Chain display... but not always and not predictably. The shapes described by the classification are different with each position .. some have true median plateaus, while others have zero numbers at the expected value, and can tilt in favor of an above median or below median neutral band. This is the entire reason that older systems (and current ones as well) seem to have that "synchronization issue", where a system might pick 2 of three correctly, but be way off on the third.

Based on the above, it would be easy to pack it all in, I have certainly tried. I have taken many lessons in spreadsheet use and programming enough to say it was not a wasted journey... but it could just be that the journey is the thing... ever chasing that elusive solution to the impossible problem that keeps driving development and refinement.

The goal has never changed... make one pick based on the past draws. The methods have evolved, the data collected has certainly changed but the goal is still the same.

I think that learning to better interpret the data is that elusive next target. I get that feeling that statistics holds the answer, but I am still falling short on asking the right questions. I remember thinking the same thing over 20 years ago when I created my first draw history spreadsheet...

Entry #628

Using a spreadsheet to design a Python program

As I focus more attention to the Vertical Sums, there is a specific work flow that comes to mind... using a spreadsheet to design and experiment with the parameters on one page, and designing the output on another. This was the same process used early on to create both the follower and classifier scripts.

The plan for the vertical sum sheet will be to experiment with the sample size N and the number of rows to sum k.

Since we are working without guidelines, we will have to find them.

The concept would be to pick a sample size that is not data starved, but not so big that it loses volatility. Here solving for k first, and then making the sample size N a multiple of k makes the most sense.

Here we will be applying an interpretation of the "rule of 30". Since we know the distribution to be both discrete AND uniform, a value of k set to 10 seems a logical starting point since the pick N games have 10 digits each. And for N, we apply k x 30 for an N (sample size) of 300.

This would scale with games like Match 6 where k would be 49 and the sample size would be k x 30 = 1,470.

That is not a magic formula, but you have to start somewhere.

The basic idea is to roll the sums forward one draw at a time and record that sum. When you have enough sums, take the average of the sums (or alternatively, record the most frequent recurring sum) and then sum the last k-1 (9) draws and compare it to the average (or mode) and simply subtract the last 9 from the average (or mode) to get a guess. At the spreadsheet lab level, it is possible to record both the average sum and determine the most frequent sum and compare the accuracy.

While not as robust or complex as an AutoRegressive Integrated Moving Average (ARIMA), it still functions to analyze the time series lottery data from it's own past values.

This first sheet is where the design decisions can be made.

The second sheet will be used to experiment with what data should be output. This is where we would record the average and mode result of all of the k sums in the N sample size. Any other relevant data can be placed here.

When done, sheet 1 will help describe the per column script in Python, while sheet 2 will be a model of the output.

In order to fit a Vsum function into my app, I need to determine the output based on the framework... the table view would contain the 30 sums in the 300 sample size, and the stats card view would contain the average and mode of those sums and also calculate the last k-1 leaving you with a pick for each column.

When I designed the app, it was with pre existing csv files that can be updated. I also included a cache folder with the idea that maybe in the future I would need to generate a temporary file. This might be a use for one! A csv file that could be generated for display, then deleted when exiting the app... this way switching between functions for analysis could be done to compare data with the followers or classifiers and not need to be completely recreated.

Here is the best part... developing first as a universal stand alone script means I never have to touch the stable code base of the app unless it proves to add value. If the spreadsheet stage fails, Then a script never even gets written.

The hard part is done. I have created a portable framework. I can get back to exploring ideas again, only now with a proven work flow and targeted outputs. And if something does not fit the table/stats output, it can stay a single script. I cannot even describe the joy found in this hobby by automating updates. The whole GUI learning experience was worth the effort... even if that win is still elusive.

Entry #627

Vertical sums as a function.

The concept is easy to imagine, but what would it look like as a function?

A sample size would need to be determined, then a step run over that sample would give a table of N vertical sums... but how many? One per ball of the game? A fixed number from 3 to 10?

You would need, for example, 5 number sums in a table, but also the sum of the last 4 games... in the framework, that sum would be placed on a stats card on the screen after the table, along with average sums for the sample. The thought process being average sum of each 5 in the sample minus the last 4 is your pick.

Time will be needed to get an idea of the sum size and sample size. But that is the basic premise for a start.

Entry #626

May just be removing the Cash 4 Life without replacement.

PA has yet to put up a results print page for Millionaire for life. They removed the link to the print pages (one year at a time) from the results screen, but still update the draw data daily.

Going to write a scraper to grab the game titles from ID 1 through 100 to try and find it. If there is none, then Millionaire for life will be skipped.

The process of splitting the N range in the app was easy, and is already working on the windows version, pushing to android on Friday, after the removal of the Cash 4 Life.

The N range needed a split because they make up 70% of the data, and a split at the median was a natural choice. This represents the size and shape of the column distribution without the express need to graph it.

Also noted in the search for M4L data, they do not have a print page for their cash pop game either. Not that missing cash pop was a concern.

Here is hoping that the print pages there keep going, because it is already lame enough that they do not have an api...

Entry #625

The stats...

After completing the manual tracking spreadsheet for 21 days of the pick 2, one thing is fairly obvious, the "lag test" is most certainly NOT a factor. I will include it in the big back test, but I think it is time to shift gears from weekly play to a return to daily play, which will reduce the expense from $14/week to $8/week. So given I am 8 weeks in with a -$96 result so far, this will reduce the expense by $264 and drop the projected cost (past draws included) to a total of $448 for 2026, or the balance of future play down to $352.

I will also split up the Neutral classification into 

NA = neutral above the median

N = neutral equals the median

NB = neutral below the median

That gives 5 categories:

H>NA>N>NB>C

Consideration of the draws out is also helpful to determine where these numbers come from and how to use each of the screens in the app to get a pick.

So the Cycles will remain mostly the same...

4 weeks of Match 6, covering 16 draws instead of 28.

3 weeks of pick 3, covering 12 draws instead of 21.

And 3 weeks of pick 5, covering 12 draws instead of 21.

Follower data, particularly the single appearance of a digit in the last 10 + classifier information, including draws out and drift from the median = the new direction for play... saving money AND having a new combo for every drawing.

The script for the back test will be the largest single script that I have ever written... so that will take some time, but at least I know what information to collect.

Saving money and diving deeper into the stats is the next best step, as refinements can happen between individual iterations rather than on a weekly basis.

None of this possible without the framework I have built... had I still needed to update draw histories and run 2 separate scripts, it may have been dropped already as too labor intensive. THIS is why the app was built.

Tonight ends the week long first cycle of the pick 5. First solo shot at the pick 5 will be Thursday.

Entry #624

Restarting the manual test for the pick 2.

A few data points that I wanted to capture were left out... the most recent appearance of the winning numbers, the median, AND, a new metric that will be worked into the app... a division of neutrals into NA and NB, the last letter indicating if the neutral frequency is Above or Below the median. IF the neutral frequency is exactly the median, it will stay N.

21 draws of each (pick 2 mid and eve) should be enough to hammer out what data to write to the csv and what data to output at the end of the test.

Entry #623

Sometimes ideas need manual workouts before coding

As a prelude to doing a major back test on the data files, I think I need to manually work out the process of exactly what to write as output to capture the relevant information in a useful way.

The pick 2 should be the easiest way to start. Using the windows version of the app (where I have direct access to the csv files) I will record, then remove the last 21 draws, running the follower for each draw and recording the results in a blank spreadsheet. Both day and evening variants. What I am looking for is what exactly to record and check. This way I have a controlled environment to see which data is important to record and which can be left out. Taking notes on the manual process at each step will define the algorithm that will then be used to write the script.

I will also be looking at the classifier output for each step. Manually checking for any correlation between functions, though this raises the complexity, it is one reason I put both functions into the app.

This may take a few days to figure out, but then the proper back test algorithm can be assembled.

What are the target metrics?

1. Did the direct follower output match the next draw? If not, which frequency level did the next draw come from?

2. Did the numbers in the next draw appear in the list of the 10 most frequent followers?

3. What difference in position did the next draw hold on the classifier output, and how many draws out were the winning numbers?

4. IF a draw was correct on the follower list, what was the inter quartile range on the classifier for each digit? Wider on a match? Narrower on a match?

The coding is not always as easy as it sounds. The classifier took a year to get working, and the first follower script is over 2 years old now. But it is not a race... better to be correct than quick. I do feel that this hobby keeps me more involved in learning various things about creating software to solve problems than college ever did.

Entry #622