hypersoniq's Blog

A few more history scraping details to work out...

1. When updating, it should be checked that games not in need of an update get skipped, such as 2 or 3 per week jackpot draws.

2. Consider an update per game check box along with the update all function.

3. Make sure PB and MM read all the info, but divert bonus balls to their own file which just includes the date and column A. In both create and update versions.

Regardless of how long it takes to incorporate this into a full fledged application, this one project alone will save at least an hour per week! It will also be the key to the "calculate anywhere" vision I have for the mobile app... update ALL of history with a button click, then process the game of interest with another button click... 2 clicks to a pick!

Can't believe I did not think of this sooner... wasted all of those coding sessions trying to wrangle their sloppy RSS feed.

I can also see incorporating other scripts into the mix, so alongside classification, I could check daily things like follower data or whole history distribution statistics.

Easier to continue on the journey when you have an idea of the end product.

Right now, if I were doing Agile. This would be solving the user stories: "User wants to update all games quickly with a single button", "User wants the flexibility of choosing from multiple active games" and "User does not need to know the behind the scenes functioning of the program, it just needs to present up to date information on demand".

Entry #546

Coding issue... bonus balls.

I have my build script down to a single function that works flawlessly for scraping the PA lottery website for the full histories of the following games...

Pick 2,3,4 and 5 mid and eve

Match 6, Cash 5, Treasure Hunt.

The issue... only able to grab the white balls from Cash 4 Life, Power Ball and Mega Millions.

Why? I used a regex to drop the non numeric data so I would skip storing the pick N wild ball data, and it was like using a chain saw where a scalpel was called for...

I need to come up with a way to not only read the bonus ball data, but to divert it to it's own csv file...

The diversion is the easier part.

Gotta spend some time on this... i am almost there with the build script, and if it works, the update script will be easy.

Entry #545

Great coding day so far!

Starting with the first year of data for the PA Treasure Hunt (2007) is going well.

So far, I can use the requests library to fetch the url, target only the table where the results are stored, split the space separated numbers into a list, sort the entire list by date ascending, AND write it to a csv file... not a bad few hours work.

Up next is creating a loop to go from 2007 to 2025, sort ALL data by date ascending, write the column headers (overlooked that on the first run) and create the entire history file literally in seconds!

This is the build script, where it goes from the first year of the game to current. The update version will only need to use the current year, but will also have to check the csv file for the last date and only grab what is new... still thinking about that logic...

Even if nothing else develops, I will be able to

1. Guarantee there were no human errors on my side recording accurate game history

2. Update ALL history files in seconds rather than killing an hour

3. Will have the opportunity to spot check each file at random spots to validate the data.

But there is so much more, already planning to take the master HNC classification function and be able to set it up for any game... ONE function, ALL games!

Getting the update script to work and the solitary function for all games are must have additions to begin moving forward with GUI development, and now that code is being written and tested, progress is being made! Ideas are becoming functional programs!

So there PA Lottery, who needs a REST API when a little thought and coding get the same results... and I did not need to decipher that horrific RSS feed!

This is typical me, more excited about working code than actually winning anything...

Entry #544

The 2026 cycles

So, given the 10 week multi game cycle strategy, here is the breakdown...

Match 6 (4 week cycle) will get played 5 and 1/2 times

Pick 3 (3 week cycle mid and eve) will get played 5 times

Pick 5 (3 week cycle mid and eve) will be played 5 times

The pick N games are at $1 straight and the Match 6 is a single $2 ticket, replayed each week to generate new QP lines.

Cost is $14 per week, total cost is $728...

Scripts will only need to be run 16 times.

I have 8 weeks left on the pick 3 @$1 straight to catch a hit.

Total lottery losses for this year (counting the next 8 week @$14 each and the $20 millionaire raffle ticket) are under $200, not bad for as many chances that I have taken this year. I have already dismissed this year's losses. Anything between now and Dec. 31st won will go to fund next year, so 1 hit will cover 35 weeks and 2 hits will see 2026 fully funded AND this year end with a meager profit.

Let's GO!!!!!

Entry #543

History parsing script planning is making progress

So, in Python there is a library called Requests which can get a web page. There is also a library called Beautiful Soup which can determine the structure of web page data. This is how I get around the PA website not having an actual REST api to return json formatted data.

Once the structure of the results is determined, the data of interest will be stored into a data type like a Python dictionary, sorted ascending by date, and written to a csv file.

In planning, to capture ALL of the PA games, this will require 16 csv files, as PB and MM bonus balls will be isolated. Or 18 if I decide to keep PB double draw data as well.

For the sake of accuracy, I will construct each file as a separate run of the build script.

When it comes to the update script, this will read from the current year and only append draws greater than the last date in the csv file... this will update every game, all at once.

The actual script to classify frequencies is literally the same at it's core, with different parameters for each game, so when designing the script, these will be passed as parameters, reducing the classification part to just one script. Keeping it modular!

Because of features like grid view dynamic layouts and it's ability to be ported directly to Android, I am skipping tkinter and pyQT for the gui and going directly with Kivy.

Still have to manually update and get a pick for the next P3 cycle, but a plan is in place!

Once the core functionality is in place, features like graphs will be added later. This will be a perfect sandbox to experiment with feature requests!

And that is what will be going on behind the scene while I am playing the cycles next year. Still time to have the PA lottery pay for it all...

Happy Coding!

Entry #542

The update script, side by side deveopment.

There will be two versions of the script.

1. Build.

Using loops and passing in info such as the first year of the game (at it's current matrix, where relevant), the game code and the csv file to create will allow a massive script to be generated that will create fresh and accurate full history files. This will also form the basis of the next version...

2. Update.

Here is the version that will make it into the app... it will pass in the name of the file to update, checking the last entry for the last update, and building a lost of draws to append to each csv file... for all games!

Will test with a single year on build, then test with update for reading dates from files and appending only draws ahead of the last date point.

The goal here is to reduce the current process to a few button clicks and get results in minutes or seconds rather than the current time consuming manual processes.

It comes down to a vision... I have a vision of being able to run this app on a mobile device at home (or anywhere) and have results when I get to the kiosk. Then it does not matter the work schedule or other time commitments, a pick will always be readily available. It might be fun to cycle through pick 2 through pick 5, or alternate between match 6 and cash 5. Also to be able to flip between power ball and mega millions...

The app will also have the ability to check for wins in a cycle, and make a notice when a current cycle is about to expire. If I take my time with it, it might also track expenses with a nifty periodic profit and loss statement.

If I find some new or altered idea, it will be easy to update since I am creating the entire app.

Either way it will be good experience to create and maintain an entire code base for multiple platforms (Desktop and Android). Code, comments, documentation, maintenance... It is literally making my own tutorial!

Entry #541

Changes needed before the development of the desktop app

1. Create a scraper with the direct URL to the PA lottery results by year. This will be needed to make life easier and updates automated for the mobile version as well. When this is done, I will rebuild ALL history files from draw 1 to avoid any data entry errors over the years. And this time I will keep the double draws on pick N!

2. Include ALL of the PA games. Pick 2 through the big jackpot games.

3. Create a snapshot tool that captures the output to be reviewed when the cycle ends... where did the draws come from vs. Where I thought they might have come from.

4. Keep the stand alone scripts for every version to test that the app generates the same exact data.

5. Plan to generate documentation... as the project grows in complexity, future me will need reminding!

That should keep me busy for awhile...

Back to a new pick 3 cycle Thursday, using the highest all neutral line in the sorted output instead of the line in the dead center of the neutral lines.

The coin flip used to pick 666 for this week is not going so well.

I am excited by the possibility of clicking "update all" instead of the painstaking manual process!

Entry #540

Got a name for my project!

After over thinking it since the idea formed, I have finally picked a project name... CHANCE

Cold, Hot And Neutral Classification Engine. 

Describes exactly what it does! Does not say winning... which tracks with results... It presents the results of classification, which then must be interpreted.

Entry #539

Reviewing the neutral lines.

Had I played the highest all neutral line in the last cycle, I would have caught the rare evening pick 3 hit...

Hindsight is always 20/20...

As much as it would be fun to take another crack at the Cash 4 Life on a pick 3 win between now and December 31st, my goal is to have the state pay for 2026. A $500 straight hit will cover 68% of the year, or 35 weeks.

2 hits still cover the whole 12 months of play with $272 left over. That would be enough to add a shot at the Cash 4 Life and cut that 2025 loss to under $150.

The reason I am so focused on a fully funded attempt next year is that any win, be it $2 on Match 6 or $50,000 on Pick 5 will be 100% profit!

But this is why I took the time to properly review the last cycle. Each iteration provides new information.

Phase 1 brings the sets down to the neutrals and about 300 or less possible combos.

Phase 2 presents 10 combos from HHH to CCC

What I will call Phase 3 is selecting just 1 of those 10 combos.

The selection process is similar for any game.

Other than winning, the goal of coding going forward is

1. Make all of the scripts into a desktop application

2. Convert that desktop app into a mobile app so I can update draws from anywhere and look at output without needing to be anchored to a traditional computer.

This project is no small undertaking, and will be in the works for many months. Keeping the draw histories as csv files will not even put a dent in the memory on this phone... it has as much memory on it as my initial hard drive on my laptop... 256 GB.

So from now through the end of 2026 the goals are set.

1. Win within the budget.

2. Develop the app so this party can go mobile.

3. Get a $50,000 pre tax hit so I can reward myself with a full year of LP Platinum membership.

Seems like a worthwhile pursuit. Win or lose, gaining the ability to engineer and develop both a desktop application AND a mobile application will be worth the time investment alone... gotta keep pushing past that comfort zone!

Entry #538

Skipping a week before next cycle

I need to properly analyze the last cycle, so taking a full week to do so. Still playing, and the coin flip determined 666 for this week.

Need to follow a few different avenues with the past data this week to better make a selection in the next cycle. Win or lose, playing every day in 2026.

Entry #537

408 mid day repeat in PA

Same combo for twice the fun? Hope someone had it, I did not. My 390 ends a 3 week stretch winless.

Entry #536

The occasional "off budget" purchase.

Yesterday the PA Millionaire raffle tickets started their sales, so I bought one. It was later in the afternoon, but was still among the first 5,000 sold.

That was $20

The last 4 numbers were 2 pair, so I played those 2 pair for 50 cents boxed day and evening for 7 days

That was $7

I also have to start a new cycle on the pick 3 tomorrow,

That will be $14

So this week I will spend $41 instead of $14.

Adding $27 to my projected year end loss still keeps it below $200

I have had worse loss years... but it isn't over yet... one win on p3 or p4 erases the loss.

Entry #535

A better explaination of what I am trying to do with my system.

Starting with pick 3, since that is the "simplest".

Numbers are random, they form a discrete uniform distribution when looking at past draws.

Frequencies, however, run from high to low (or low to high, depending on perspective)... frequencies are almost linear (almost but not quite)

By rigid inter quartile classification, I am sorting the numbers in the 150 draw sample into bins.

Every time I add new draws and run the script, the numbers change, but the frequencies stay in close proximity to previous runs.

So I am NOT looking at specific numbers, just targeting a specific frequency range to pick from, playing whatever numbers happen to populate that frequency range on that particular run. This is why I do not look at individual numbers... skips won't matter, even or odd won't matter, high or low won't matter, pairs won't matter... it is still trying to find a needle in a haystack, but based on the classifications, the haystack got smaller... 1:~300 vs. 1:1,000.

It is also why I make smaller history samples, because full history does not contain that volatility that changes the numbers with each run.

I know I might be using a wrong sample size, or a wrong window size, but it is not like there is a resource for determining such things.

To simplify, I have created a filter based on frequency. Classification is phase 1

I am still experimenting how to interpret the data and make a better pick... this is phase 2.

I have applied this idea to multiple games so far. Pick 3, Pick 5, Match 6 and Cash 4 Life.

The other key element in this is play cycle... it is never going to give the next draw with any accuracy, so a range was determined for each game...

Pick N plays the same combo for 21 draws

Match 6 plays the same combo for 28 draws

Cash 4 Life plays the same combo for 30 draws.

What drives development is back test results... if picking from the 300ish possible neutral combos, there is over a 90% chance that at least 1 will appear straight in the next 21 draws... that is over the entire draw history of the pa pick 3 evening, with 17,000+ draws to look at.

My level of difficulty is still high, as I am only playing one of those combos. Watching how the numbers churn through the frequency filter feels like almost seeing the mechanics of randomness itself. When pulling 17,000 randomly generated 3 digit combos from random.org, the same 90%+ is exhibited... this means that regardless of the source, ball draws, computer draws, or actual random numbers... the observation holds that frequency better explains the churn than the numbers themselves.

Finding that needle is still vexing, but it is some progress with each cycle. I may never win anything big, but IF I do, this will be the way.

Entry #534

Preparing for the big games

As I was looking at the script for Cash 4 Life, after a successful update to multi level sort, I realized that I can't just add in the bonus balls alongside the white balls, the variables change... Mega and PB will require a separate bonus ball script for each.

The brute force approach to Cash 4 Life's cash ball, which is to play all 4 with the same white balls, becomes prohibitively expensive... even on their money!

One draw of Cash 4 Life, a $2 ticket, is $8... trying to cover all of the Mega Balls at $5 a ticket would be $120 PER DRAW!

The "cycle" for Cash 4 Life is 30 draws, which is why I need a pick 3 straight win to try it. Total cost is $240

The jackpot game cycle will be 26 draws (selected based on advanced plays) and therefore the costs would be staggering...

Mega Millions for 26 draws covering all 24 mega balls = $3,120

Power Ball for 26 draws covering all 26 bonus balls = $2,704

So it becomes more imperative to create a bonus ball classifier...

Power Ball (@ $4 per ticket) for 26 draws = $104

Mega Millions (@ $5 per ticket) for 26 draws = $130

The combination would be $234...  cheaper than an attempt at Cash 4 Life to play BOTH games...

Since Cash 4 Life disappears near the end of February 2026, an alternative is needed when house money becomes available.

The cycle would stagger because there are 2 MM draws a week to PBs 3.

As I am stitching the gui together, the functions for white balls and bonus balls can be ran sequentially in the same run.

It seems like I am getting ahead of myself again... still need a Pick 3 win to do anything... but even these scripts that I have written for several games take time to dial in and verify.

Tomorrow is the last draw of this pick 3 cycle... moving to the highest all neutral line for the next one. Going to stick with the 21 day cycle. 2 hits @ $1 straight needed to fund next year... 3 to turn a profit for this year.

So, this tweak to the strategy will be in play... IF there is a hit on one game, say the mid day, in week 1 or 2, the remaining weeks will be skipped and I will double down on the other game for the rest of the cycle... still the same $14/week... if they both happen to hit early, then subsequent weeks in the cycle will be skipped... that is certainly a budget friendly option...

Entry #533

Some of the odder spreadsheets I have made

The one that jumps out first is the one that imagines the draw to draw differences by position onto a grid, measuring both the angle (positive or negative) and the line length to connect those points.

I honestly had no idea what was going to be done next... but it was a valuable lesson in using the index/match formula. The angles and line lengths were pre calculated from every pair from 0-0 to 9-9. A lookup table was created, but instead of using Xlookup I went with index/match. Turns out each approach has it's use.

Another one was a literal manual version of machine learning, where I had generated a guess for each draw using follower data, then added weights based on measuring the error from my pick vs. the next draw, feeding the error measurement back into the pick system on a per column basis. That hit 2 weeks in but not again.

These are all for pick 3.

My old power ball sheet (the one lost in the old laptop) had a grid and lines connected for a simple visualization by position, I remember the first attempt tracked all on 1 sheet and was visually a nightmare... so i separated it into 6 sheets, one per position.

The sheets that I made macros in VBA for were replaced with Python scripts. I was never a fan of VBA. Back testing was not a pleasant experience when the picks were complex.

From the old python scripts, there was the "buddy system", where you would generate the pick based on followers and then this script would analyze each follower and display the numbers that most frequently appeared with each position's follower. This one was for the PA match 6... it also did not win.

Then there was the latent follower sheets, and subsequent Python scripts... this looked at delayed followers in 2 ways... 

1. LRS (Latent Repeating Sequences)

2. LOOPRS (Latent Out Of Position Sequences)

LOOPRS was too complex to manage in just Excel, the Python script proved via back testing that it did not operate at a profit.

LRS was the one I used to get a straight hit on the pick 4 ONCE. Was $1 st/$1 box for $5,400. Back testing showed it only worked 10 times in ALL of the game history, I was just lucky when it worked while I was on it... third pick! Gave up playing it after 1 month of losses.

My pick N sheets had a permutation checker, so I could enter a combo and see how many times it hit both straight and boxed.

What were the lessons learned? That numbers really have no value other than it is the symbol painted on the ball. Trying to base everything on the numbers alone was not helping. Hence my shift toward patterns of frequency.

It is sometimes interesting to see where you have been to help steer where you are going.

Entry #532