hypersoniq's Blog

How I used an incorrect back test on followers...

My followers script itself is well over a year old. When I originally ran the first back test, it was 100% within Excel.

The idea was simple, loop the script with each digit as the last draw so I would know the top follower of each digit in each position, put these into a lookup table, and count the number of straight hits.

Of all possible "replacement" strategies, this one performed best, meaning at the time, it almost returned enough straight hits to make the imaginary $1 straight per day over the entire history come close to breaking even.

Here is the fatal flaw... you would NOT have known the most frequent followers of 16,000 draws (at the time) in the past! I look at all of the work I put into both the script AND the spreadsheet and then realize I broke the time line by using data from the future to test old draws.

What should I have done? The correct way to test the hypothesis should have been to create a rolling back test... one where older draws are calculated only using data that was available BEFORE that draw... one that indexes and recalculates one draw at a time for every row of history.

It's not like I don't know I make a ton of miscalculations at this stuff, what is important now is to realize when a mistake was made and take corrective action. Excel was the wrong tool, it is better for static analysis.

So with this understanding of why the data I had was incorrect, I can plan a proper back test using the current follower script.

I know that followers need data to start, so I cannot go back to draw #2 like I did in the excel sheet. If I use the first 1,000 draws to prime the data pump, that gives each number a fair chance of appearing 10 times as a follower for each possible digit. That is number of balls * 100 as a generic formula.

I know that the follower distribution returns a list sorted descending by frequency, so the highest follower per column would be something like distA[0].

I know the calculation would be if nextDigitA == distA[0] && nextDigitB == distB[0] && nextDigitC == [0] (for the pick 3) that a correct hit would be identified.

So, if the comparison is held in it's own function, it can be altered to check for other scenarios as well.

I also know that the data inside of the distribution does indeed represent a true first order Markov chain, as the higher the frequency, the higher the probability of that transition. Not perfect, not a guaranteed winner, BUT, the "memoryless" property holds... only the current state (most recent draw) is used to generate the transition probabilities to the next state.

So this now has to be wrapped into a universal script that can be used on a per column basis for any game. Now that I got some practice with structured data return, I can write everything to a dictionary structure and get the output more readable than the old version of the script.

Other metrics could include that when the next draw does NOT equal the current top transition option, how far off was it? Is there anything consistent about those errors? Writing those to a csv file for import into Excel would be the right time to use Excel in the process. If an error matrix is produced, maybe it could yield a "correction factor"?

This is one of those things that will take some time to think about before ever writing a single line of code, but the follower function is already written, just have to build the back test around it.

This is one of those posts where the idea starts to come together for a coding project. The broad idea is a proper back test of the follower script, mixed in with a few off the cuff ideas on what a solution might look like. The back test for classifiers was written once I had gained enough skill and insight to create a sliding test. After working on this idea, I will revisit that one and see if improvements can be made.

I will also print out which numbers hit when a straight match comes up. Curious...

Entry #591

Fixed one annoying application bug yesterday...

The functions work, the data is properly displayed in columns and the numbers match the stand alone script version for both classification and Markov chain follower distribution. The problem was, the first table was always blank! You could navigate back and return and the data would be there... but what a strange issue, as zero error messages were there to identify the problem.

So I had to do some research and it turns out I was making a correct call, but to the class and not the instance! The moment I saved the change with the proper reference, the bug disappeared! I never had to consider presentation layers and UI timing (and scope) before inside of a single script. I do not know that I would have explicitly learned of that issue without having the opportunity to troubleshoot my own project.

The downside was I did not start on graphics as planned, but the one annoying issue that was persistent since the beginning is now solved!

I also need to make the progress bar less tall on the draw history update screen... it takes up 1/4 of the screen height... but it works. I am now at that phase where I will not accept issues that are within my power to track down and fix. Everything is still Android safe as well. Sticking to the vision!

Entry #590

Moving forward with app development

The current state of the application is 100% functionality, partial UI implementation, zero "branding".

So there needs to be full use testing so that there are no surprises. In addition, it is time to think about generating graphics for the app. I already have a place in the directory structure to hold graphic assets.

There is also the need to do with the classification output what I did with the Markov follower output, which is to run game classifications and compare line by line to original script output.

Because of the high level separation of concerns present in the framework, each area can be addressed without risk of "breaking" the other areas. Back end is separate from UI which is also separate from the logic section where the back end and UI components are "glued" together with function call builders and game meta data.

So far, the entire project was assembled with IDLE, the IDE that ships with standard Python (no PyCharm or VS Code overkill), also the .kv files are created and edited with Notepad++, a HIGHLY recommended free tool for ANY type of coding, from C to web dev (CSS and RUST come to mind). It is also great for git MD (mark down) files.

It is launched from the command line (I use Windows Power Shell) until it is time to wrap the whole thing as an EXE and create a desktop icon for it.

For the use tests, I need to take notes on different areas if changes or improvements need to be made.

1. Back end - generic functionality and output of the core functions of history update, classification and Markov Chain Followers.

2. UI/UX - does every screen look and act like a unified project? Here js why I chose the kivyMD extension as the use of Google Material Design starts off with a pleasant interface out of the gate. Also, the classifier script runs longer than the follower, perhaps a spinning graphic to indicate "processing" will be better than a user staring at what looks to be a frozen screen...

3. Logic - functions are built correctly, game info is consistent.

4. Graphics - no need for overkill, but if a screen looks like it would benefit from graphic elements (like the choose game screen using the small graphics for each game from the PA lottery Website or the project logo), they can be imagined, created, implemented and evaluated.

Compromise has already been leveraged against the original vision... "user settings", "All Neutral Quick Pick" and "ball view" have been scrapped because they were not worth the extra time and effort for what they returned.

So still plenty to do and learn, but it is honestly much more rewarding to tweak a functional app rather than the skeleton of an idea!

Entry #589

Prizes possible in the 2026 big test.

From the outset, the expense is known... $728 ($14 /week)

That works out to 5 cycles of all 3 games plus 2 weeks of Match 6 until the end of 2026

Each 10 week full cycle is 4 weeks of Match 6, followed by 3 weeks of pick 3 (day & eve) and 3 weeks of pick 5 (day & eve)

So, knowing the expense, the scenarios that could make the entire year profitable...

5/6 on the Match 6 = $1,000 (pre tax)

Any 2 straight pick 3 hits = $1,000

One single hit on the pick 5 = $50,000 (pre tax)

Of course, the Match 6 brings to the table a potential jackpot that starts at $500,000 (pre tax).

So the goal is pretty simple... learn to interpret the data I am generating with each cycle to get more accurate.

Also, any extra play such as MM or PB will wait until that $728 expense is gone. May not be the best plan, but it is easily affordable and offers chances at prize levels other than $500 on a straight hit for pick 3, which was the majority of last year's play.

Entry #588

A test for lag in the Markov Chain

Since I have over a week left on the Match 6, I plan to run a back test on the lab (script) version of the Markov Chain follower program. The data is exact between the app and the script, and the lab has it's own copy of the data files.

Test 1. Check for lag by deleting the last 21 draws and running the script, checking the highest output as it prints for a straight hit over the next 21 draws.

Test 2. Using the 10 most recent followers that outputs with each column, see if that adjustment appears straight anywhere over the same 21 draws.

By deleting draw history, the test has zero cost!

Also, as a new year thing, I am going to begin back testing jackpot games as well. I usually stuck to back testing the pick 3 because I would not mind if I missed out on a $500 win, but without testing on the other games, how do I make improvements?

I will go back 10 times, so the test will cover 210 draws, and I will repeat it on the pick 5.

Then, when the Match 6 cycle is complete, I will run a similar test on that game.

The classification portion will be used as a reference, so the follower data is not picking all colds. Guess we could say that will be the third test....

3 picks...

1. Raw frequency of the followers

2. Followers guided by the most frequent of the last 10 followers

3. Ranging for the pick based on placement in the Hot, Neutral and Cold table.

Whichever method works best in the back test will be carried forward for the first cycle of the pick 3 and the pick 5.

Hoping this will be the year I can do better than a handful of pick 3 straight hits...

Entry #587

So much goes into creating an app... holy graphics!

I am old (therefore old school), I thought the best way forward with graphic elements (like the logo, icons and button graphics) would be nice and small .png files. Enter svg (vector vs. raster)... since I am yet unaware of the display sizes, if I create vector graphics, they will have no noticeable compression artifacts at a wide range of sizes.

Of course, most of my graphics programs are raster based (krita5, apple proCreate) so that is where I shall begin my next part of the learning phase.

Also, going file to file and making sure each python file has a docstring (a comment beginning and ending with three ") that describes what the file does. This will help later as there are tools that can help create skeleton documentation by parsing docstrings. I remember Java having a similar mechanism.

But this division of work, first ensuring functionality, then worrying about how it looks, was the right move for sure. There is not any code that is confusing in there, and appearance is confined to the UI screens and their matched kv files, the logic is sound and not involved in the "look" aspect.

At this point, I should probably plan a way to back test the Markov chain distribution. That will not involve the gui, and instead remain in my script "lab" sandbox. I spent so much time getting things to work, but not enough time learning proper interpretation of the generated data.

That back test should probably answer the following questions...

1. At face value, how many times was the highest frequency follower a straight match for the game? (Will start with pick 3) as a simpler count, the first element of the distribution is the highest frequency, so did column1 match distOne[0] AND did column2 match distTwo[0] AND did column3 match distThree[0]?

2. As a total system, where did the matches come from most? I already know it is not the highest element most draws... so where in the dist list did they come from? Such as distOne[4] AND distTwo[3] and distThree[8]... building up a matrix of hit distribution.

3. What about lag potential? Did the highest frequency in the chain match somewhere over the next N draws? Now, maybe not, but since we are looking, may as well look everywhere.

Why this back test now? Because it is better to move forward then stare blankly at half finished ideas. Also, I feel that based on the data so far, this Markov chain strategy will be helpful with the vertical sum concept I am cooking up.

As for now, the priority is the app... after graphics creation and implementation, the app will be just about done. But I learned that if I put a next project in the back of my mind early, the solutions tend to start leaking through rather than starting from scratch.

Between the processing steps available in the app and the draw history update feature, this project is already worth the effort, even if it never wins a single draw because I have the opportunity to upskill from writing scripts to pro grade software frameworks. When I saw how fast this project came together because I followed industry best practices at the script level, continuing the same practices at the app level really solidify the concepts, and I would not be starting from just ideas when I branch out into something NOT related to the lottery.

On that point, lottery data, even though nothing ever truly works out, is a long time hobby and allowed me to power through the challenges because I was motivated... not so much with most of the "To Do List" tutorials out there.

Entry #586

Three weeks into the year, M6 so far.

With a 4 week cycle, the Match 6 is currently at a net loss of -$26. The hits amounted to $16 so far. Playing each week separately regenerates 2 new QP lines.

A combination of follower data and classification data was used for the first pick.

One more week on that M6 combo and we get a first look at the pick 3 cycle (3 weeks) and finally 3 weeks of the pick 5. That will complete the first full 10 week cycle.

Now that the app can be used for displaying the data (and updating the draws), no more running different scripts!

Once it looks a bit better I will put some screen shots up.

Entry #585

What a milestone! The windows app works!

BOTH the classifier function AND the follower/Markov chain function work! They display the data in tables, and then can drill down into the column stats view with the additional stats collected!

When I started building this, I designed from the top level screen down each button path...

The main screen has 2 buttons, "update draws" and "choose game"

I went down the update path first and got that 100% functional.

The choose game brings up a collection of buttons representing 14 current PA lottery games. When you choose a game, it takes you to a screen where you can choose to calculate classifiers or show followers.

The next screen for each displays the column headers and the aligned column data. From there you have 2 choices... go back or display the column statistics... the stats screen shows the stats unique to the chosen processing method... and it does it correctly!

Every screen has a back nav button.

I went through multiple games to ensure specialized features worked (like the bonus balls in PB, MM and Cash4Life)

It is not perfect yet, but the thing just works!

The learning curve was a bit steep, but not as bad as I thought. Having done some Ruby on Rails tutorials a few years ago helped with the MVC pattern concepts.

Because I kept with the mantra of make it work first, then make it look good, I know how all of the parts fit together, so any changes to ball matrices or new games can be easily added, and if a game gets discontinued, I know how to remove it. As I learned more about kivy, the screens near the end look better, so I will have to make sure the new concepts get propagated to the earlier, more utilitarian look of the starting screens.

I wrote the follower script over a year ago. It went through many tweaks to get it modular enough to use in an app... even before I knew what shape the app would take.

I am just super excited that this went from project folder last month to functional prototype application now!

The only functional change I need to make yet is to put my frequency sort back into the classifier output... then the rest will be cosmetic changes, creating and adding graphics and ensuring there are no more edge cases left to test for.

I will say that one decision I made was to NOT have a settings page where the numbers could be tweaked... that is best left to the original script sandbox. That saved on generating another screen that was context sensitive AND I did not have to sanitize user input.

Every screen can make use of Google's Material Design elements, so getting a unifying theme together should be straightforward.

Up until today, I was faced with the follower output working but the app crashing when trying to view the column stats for classifiers... that was fixed on the python side by reading the data shape for the script output and matching it to a specific display format (using kivy "card" widgets instead of generic display areas).

The end to end tests were a success! No idea how long it will take to make it look nice and to incorporate graphics, to properly document and to get ready for the Android port, but I have a gut feeling that I will hit my 2026 goal of having this app on my phone before year's end!

Just had to celebrate the functional win! 😎

Entry #584

Keeping the old scripts may pay off again...

By looking at how I sorted things when printing to the console, I can see I have a disconnect because the data is not just printing out, but being packaged as a return value per column. I should just need to sort the return data structure. Also, zero frequency numbers in jackpot games have returned... I had those ignored in the old code, as well as the multi level lambda sort. Have to rewire those lines in the current context of the return values.

Not done yet, but at least I have an idea of what changed, so the solution can be targeted to specific lines of code.

Also, I thought I had a novel way of handling bonus games where it runs the scripts twice, as I have the white balls and bonus balls in separate csv files... but the second run does not allocate an extra column and therefore does not appear. Not sure of the exact problem yet, but it sounds like a simple adjustment in the meta data for those bonus games.

So, problems, but not major problems.

While not setting hard deadlines as this is the goal for all of 2026, I hope to be able to craft a pick from the windows app by the time the pick 5 cycle comes around. (A little more than 5 weeks). Looking forward to when this is on Android! The discipline part is making sure the windows version is 100%, as this greatly increases the odds of a smooth android port.

Onward!

 

Update: almost there with the follower script output... it is back to being sorted by, and including the frequency. Column stats for the follower still need a bit of tweaking, mostly just layout/UI stuff. Brain cooked for today, back to it on the next coding opportunity.

A s a bonus, the data matched exactly the output of the original script, so the back end remains solid!

Entry #583

Almost...

The GUI app is running! All functions are operational without crashing, and the data displays cleanly!

Somehow I lost my sorting, and have to fix followers to show counts, but it ran on multiple games with a clean exit!

I am way ahead of where I thought I would be at this point... but not at the finish line just yet.

Serious progress has been made! The updater works flawlessly, and each of the 2 main scripts executes without syntax errors... now the tweaking of output is the next level... have to get the output to match that of the original scripts.

So the road map is as follows...

1. Fix the data output to restore sorting and putting the data where it belongs on the output screens

2. Make it look pretty. This includes making images and icons for "branding".

3. Write the technical documentation for the windows version.

4. Get it 100% polished

5. Start the android implementation.

I know it sounds ultra simplistic, but this is a huge step for my coding journey... I am starting to truly understand how separating layers into a framework is done, and why. Writing the game meta data file that connects the user interface to the back end scripts was an involved process... that it mostly works is crazy!

End to end run with a clean shutdown! The data output should not be too hard to fix... i hope. 

One immediate benefit is that I am getting much better at interpreting python trace errors back to their source AND figure out what went wrong... so much easier when you try things that are being processed and catch errors with descriptive messages.

This is so much more satisfying than following a tutorial, feels like real learning!

What a coding day!

Entry #582

I was going to need to make 20 files to display the games...

But thanks to some research and documentation browsing, I will only need to make 4... dynamic column allocation! Part of a Kivy screen is python code, and part is a language for screen layout in a language specific to kivy... an exact analog for the java/kotlin android files with their xml layout files, which means I can read the header row for each csv file and dynamically allocate columns on the screen!

The difficult part to figure out was the need for a "logic layer" that glues everything together... what Python was made for! Here I have meta data that includes arguments for function calls based on each game's configuration. In this way pick 2 knows it needs 2 columns, while bonus ball games and the match 6 know they need 6 columns. This takes the place of when I run the scripts having to enter the specific configuration of each game in the function call. This output configuration can be used for both the classification and the follower scripts. The second set of files will be needed to display the column statistics for the classifiers.

4 files instead of 20! The added benefit is the ability to add any game configuration in the future... such as if they bring back a variant of the Super 7... with only the need to add a csv file and it's associated meta data profile.

Got the idea while searching stack overflow and found guidelines in the kivy docs on implementation.

Did a test render for the pick 2, and the screen was clean with perfectly aligned sample data, so the next step is to fully wire in the follower script and compare output to the original follower script to ensure it matches. 

Getting there!

Entry #581

How the pick has changed for this year.

So last year, the process was entirely limited to the classifier output. Pick a line that had all neutral values, the highest when sorted.

This first pick (the match 6 line) was stitched together using the classification AND the follower output.

1. Run both scripts and split screen the output for each.

2. Instead of just picking the highest neutral line or the most frequent follower, find the number in each column that sits highest on BOTH lists.

3. Record result and play for the cycle.

Sometimes that number was not the highest neutral, nor was it the most frequent follower. In fact, 2 of the 6 numbers selected were classified as HOT. This part of the selection process will be refined throughout the year.

I may not even hold the same combo for the full cycle, but this start with the Match 6 gives me 4 weeks of GUI development time.

That is where the true purpose of the GUI project sits... it does not generate a pick, it gathers, processes and presents data. It is still on me to figure out how to interpret the output.

At the onset of the idea was to be able to generate this data quicker and easier than the process I had in place. This gave me the idea of making an automated draw history updater, which works! I go from the manual process of copy/paste updates on the games I play (at the time) to updating the histories of 14 different games in seconds with a button click. This part of the GUI is already wired and working! The next part was wanting to be able to view the output of either script, that is where I am working now... the screens are getting there with each coding session. But now it still requires the laptop.

With the android part of the puzzle, I will be able to generate and look at the data from anywhere, anytime with no laptop needed!

The end goal is also to keep the framework flexible, so that any new script that can generate a different data view can be added without difficulty, to both the windows app AND the android app. This is where time taken now will yield benefits in the future.

I have a true MVC (Model, View, Controller) framework, where the data and the back end scripts are separated from the UI layer, and intersect at the logic layer. When the app is fully functional, I can go back and tweak the UI components to make it look good without affecting the functionality. The difference between the chaos of scripts and the elegance of a well defined framework. Thanks to Kivy, I can leverage my skill with Python for all of it, no dabbling with other languages or some other framework with it's own learning curve and particular quirks... it is a clean sheet build that is being engineered, not just slapped together.

The process is surprisingly similar to android native development, where a screen is registered and enabled in Java (or Kotlin), but has a separate file to control the layout (kivy's own .kv files, similar to android XML layout files).

Because I chose the KivyMD path, this allows incorporating Google's Material Design UI elements, so I can go for more of a native android look and feel vs the more utility based standard Kivy elements. I have not even made images yet, that will be a mini project unto itself one day... a logo, an icon, button images... all things to learn and experience.

With how challenging this small project is, makes one really appreciate Todd's wizardry with creating, updating and managing the LP!

Entry #579

2026 update... small win on Match 6

So, there were 3 numbers on one of the 3 lines, and one each on the other 2. 3/6 pays $2 and 5/18 pays $5, so a $7 win! This puts the year's net loss from -$14 to -$7. The 3 on one line were on my picked line!

I started exactly on the 1st, but you can pick a different start day of the week, so playing Monday at the end of December let me pick 7 draws, starting on Thursday (Jan. 1st). Ticket is good through Wednesday, then I will pick up another, but before Thursday since I can pick the day to start. This "day of the week" feature allows not missing a day due to the kiosk being out of paper or the kiosk being offline (things that happen enough to warrant concern!)

The Millionaire Raffle ticket was a big nothing burger, but that was on last year's budget.

Even though it will be quite some time before the GUI app is ready to roll, I kept the old scripts, so numbers can still be crunched and I do not have any stress taking the needed time to ensure a 100% functional windows app, because that will ensure a 100% functional Android app on the first try!

Sticking with both goals, playing and developing.

This might be the only time I am able to cut the play expense in half with a single draw... with the Match 6, anyway.

Entry #578

How the followers are a first order Markov chain...

So the premise is simple... we will stick with the pick 3 to keep it simplified.

A follower distribution is simply counting how many times each possible number has followed the last number drawn in each position. In a discrete (only digits 0 through 9) uniform (random, every number has an equal chance to appear next) distribution, if a 6 was drawn, the expected value of each number from 0 to 9 to follow it is 10%

When we count each of the digits that has followed a 6 in the draw history, we are looking for deviation from that 10% across all 10 possible values. This is how we train that first order Markov chain model to give the result, it is simply a count of the number of times each of the ten digits followed a 6 (or whatever the last number drawn was). This is done against the entire draw history (so for pick 3 evening in PA, that is over 17,000 draws).

They say that the data will converge once there are enough draws to that expected 10% per value.

In addition to the distribution values, I also include the last 10 followers of that number to see if maybe a recent trend is in place that would indicate a better pick than the one with the highest frequency count per column.

So it fits the Markov chain "memoryless" property because it is only presented with the most recent draw. If I were to use all 100 values possible when you look at the last 2 draws, that would become a second order Markov chain... this requires 10x the data, and we know that past draws are not dependent on each other, so there is really no need to go into that extra level of complexity.

Is it a valid predictor? Of course not, as one thing I have learned from these years of study is that contrary to what Gail Howard says, the most of something does not always happen... the most frequent follower in each column is not always the next draw. The hot numbers rarely come out together. What it ends up being is a way to see micro bias in distribution reality vs distribution expectancy.

Has it worked? In the brief time I had actually used it to play, it did bring in one straight pick 3 mid day hit within the first few months.

One thing I have noticed in all of this development time is that I still need to learn how to interpret the volumes of data I can generate.

Why am I including this in the GUI app that is for classification? Because I feel this is an important part of the puzzle. While I have not seen more than 1 straight win on either system, it might be a critical read of both system outputs that reveals a better guess. Plus, the GUI was always intended to be a framework, good for including any number of systems.

Hooking up the update scripts to the kivyMD framework was a walk in the park compared to integrating the classifier and follower scrips... but that is the path I have chosen.

As a dev note, using git to implement proper version control is so much better than my ad hock file naming convention of adding copies appended with _Vx.

Entry #577