hypersoniq's Blog

Running out of coding optimizations, moving to the tests on the Pi.

It has been an exhaustive search, and I am at the point now that I can not further refine and compact the code to run and still get the desired output.

The next phase is to update the draw histories and move the whole project to the raspberry pi and run the timed test.

The expectation is that the pure Python solution will generate near the same result, which is not a feasible solution. BUT... enter Cython! 

Cython will convert the pure Python code into the C language, compiled specifically for the device it is running on and remove all of the overhead of Python, which is an interpreted high level language and reduce it to compiled C, which runs machine code at the processor level.

The first experiment... run the "cythonized" time reporting version, from this we can accurately predict the total run time.

The second experiment... IF that run time is reasonable, start the production code. On a rough estimate IF Cython operates as documented, total run time becomes a function of the clock speed, and the new ARM processor can handle roughly 2,400,000,000 instructions per second. Holding to the complexity of the code this could run in approximately 23 days... over 1/4 TRILLION calculations without overclocking! I am willing to accept any run time under 60 days.

I believe I have done the software engineering properly, chose the correct data structures and implemented the algorithm as efficiently as possible. Memory management was also considered. By only storing the top 3 lists for each pass of 10 billion, storage requirements are low and the computationally intensive parts like writing to a file or reading from a file are the bare minimum (8 reads, 28 writes, 28 screen prints total)

There are no shortcuts, in the replacement genre, where you transform the last draw into another number, there are exactly 10 billion possibilities. The game histories vary between 5k and 17k.

The order of operations

Pick 5 mid

Pick 5 eve

Pick 3 mid

Pick 3 eve

Pick 2 mid

Pick 2 eve

Pick 4 mid

Pick 4 eve

Hopefully it can launch this week..

Entry #282

Script progress

The script works!

Feels like an accomplishment already, even though the limited test was for the first 100, everything worked as expected!

Every day seems like new lessons are learned. I removed the innermost function because I learned that in Python, inline code has way less overhead under the hood than function calls.

Last thing to do is run the test script with some timing code to accurately calculate the loop times.

There are 8 csv files containing draw histories and there are 8 output csv files that will hold the top 3 lists for each column for each game.

I decided to scrap the email notifications as well as I do not currently have the time to learn another library. 

Streamlined and simple. Once I have the times, I can continue to seek out more optimizations.

I do not have time info yet, but that is relatively easy to set up.

From the first draft prediction of 9,000 years, I brought the pre optimized code down to 9 years. 1000x sounds impressive, but I need to do better.

I also need to run the test script on the Pi itself to get a true prediction. So I have made huge progress, but not quite ready to launch the program just yet.

There are several factors that have contributed to the optimization so far...

1. Removing the inner function in exchange for inline code. The potential savings are the overhead associated with 280,000,000,000 function calls.

2. Reducing the top 10 list to a top 3. This is a 70% reduction in the size of the heap. Less elements mean less comparisons.

3. Reducing the program screen prints to just 28 lines. No need to overwrite as this is super clean... it just prints out the game name and column name before it starts looping. It is a tradeoff for run time, I will only know which column oof 10,000,000,000 it is working on, not where it is progress wise.

That is where I am at so far... progress has definitely been made. The Raspberry Pi has been set up and is waiting for the files to be sent so it can get going. This last stretch will probably be the most difficult as I attempt to gain a realistic run time. The program works... but can it work faster?

Then there is the ever present reality that none of this makes a difference with playing and winning... but at least I will be able to say I have tried everything!

Happy Coding!

Entry #281

Planning to keep on with the follower system while the program runs across april.

Updated the draw histories and spreadsheets today. Took about 2 hours because I needed to change the lookup tables to run from 0 to 9 rather than 1 to 0, as the former is the output of the new program.

For whatever it may be worth, I noticed something I have never seen before on any single shooter system I worked on... a back tested straight hit on the pick 5! 100,000 to 1 odds...

It may end up being I already have the highest optimized lists one could hope for, but the project continues... 

More reasons to believe the code will run in a reasonable time frame... I was looking at python code on Stack Overflow (the best coding troubleshooting forum in existence) where someone running Python had run a for loop ten billion times in just under an hour!

The complexity of the code I wrote will stretch that out significantly, but that is in line with early per operation time estimates where execution would take 32.45 days with my particular code... no cPython or overclock required. The new Pi runs at a respectable 2.4 GHz out of the box.

Entry #280

Raspberry Pi 5 arrived! Checklist for launch...

The TO DO list...

Hardware (RasPi 5)

1. Install the operating system

2. Update the software

3. Install the required Python libraries

4. Enable SSH (secure shell) for headless operation.

5. Use secure FTP to transfer history files and the scripts.

The script...

1. Put the pieces together.

2. Pare down unnecessary operations.

This includes limiting the screen print and csv writes to a bare minimum. Right now the printing of the current list was scrapped, output will only be the current game name and column processing. 28 prints and 56 writes... total for all 8 games.

3. Finish experiments with the email on completion function.

The process looks to be possible with the early speed predictions coming out at about 32 days run time for all 28 columns / 8 games.

Will be using a separate output csv for each game so they can be put into play as they finish rather than wait the entire 32 days to get started.

There is no point in moving to another language as all iterations must be done for completeness here, c++ skipping what it deems to be "unnecessary iterations" defeats the purpose.

The 2 main functions are checking for hits by finding the previous draw number index in a 10 item list and comparing it to the next drawn and incrementing a hit counter, and then appending the counter to the current list and finding if it makes the top 10 heap structure... light as I can possibly code it to get the needed functionality.

I have an idea on a run time guess, but no timeframe on the completion of the script... still a work in progress, but it is cool to be setting up the device that will run my magnum opus...

Entry #279

Won't 280 billion of any operation take too long to realistically execute?

If the core algorithm took even 1 second to run, the process would take almost 8,879 years to run... fortunately the algorithm, both checking for a match and checking the top 10 can be performed thousands of times in 1 second.

An accurate estimate still needs to be made with a timer on a limited set of maybe 10,000

The time of the algorithm is of utmost importance, so the time consuming initial operations are only done once per run... loading a csv file into a pandas data frame... the actual match and sort are the heartbeat of the operation and are being optimized to be run thousands of times per second. 

Even with the optimizers in place, the estimate is now measured in months. How many months depends on my ability to make the code as streamlined as possible.

Entry #278

Productive coding session this morning!

Testing the puzzle pieces before assembling the final script.

Verified that the counter list mechanism works. Did that by printing out the first 1,000.

Verified that the draw history file reading does what is expected by using a column of test data with a known number of hits. Also verified it appends the correct hit count at the end. This was crucial as it is the back test part. Without this there could be no forward progress!

Currently working with a test script to create a heap data structure that returns the top 10 lists by hit count. This part has to go smooth as it will refine 10,000,000,000 tests to 10 lists. It is how such a large project can stay within memory range to run on a Raspberry Pi 5.

After that, it will be a simple matter of putting the pieces together.

Entry #277

Raspberry Pi will arrive Thursday!

Getting geared up for the mother of all frequency analysis programs, the hardware is on the way!

Going to add a timer to the head and tail of the program, calibrated for days. A few weeks of tweaking and testing then the giant program can be launched. Going to run the entire series of games at once, with each top 10 list of lists recorded to it's own .csv file.

The timer will record the time stamp of the program start and program end.

I always end my scripts the same way, printing out the elapsed time and the phrase "SoniQ BOOM!"

8 files of column lists will be small, smaller than even my current set of history files. The home network will enable upload and download via secure FTP, and monitoring the headless Pi via PuTTY.

Also included functionality to send me a text message when the run completes, as this should run for days if not weeks.

All of the code is optimized for the Pi device limitations, with dual recursion being what keeps the project within memory specs. Not that memory management in Python is as big of an issue as memory management in C or C++. I have even solved the screen buffer issue by just outputting what game is being analyzed and the current list, using the functionality of the print statement that can wipe and reuse the current line.

Excited for the big show, probably less so with the expected results. It is definitely about the journey more so than the destination on this run... 280 BILLION iterations of an algorithm of my design. The ultimate back test of the entire genre of replacement tables all at once.

It will be mildly disappointing when the first weeks of testing continue to output losing numbers, but the journey...

Entry #276

Building blocks in Python

Toward the new experiment, pieces of the puzzle have been unlocked!

A loop to increment a list as if it were a counter has been tested. It works!

A passing of a list from one function to the other, along with a data frame column has also worked.

Appending the hit count to the end of the list has also worked.

The current challenge is to create the running top 10 list of lists and have it populate and pop off the low list (by hit count). Going to work with the Python heapq (heap queue) library to enable this functionality.

Using a loop to increment the list has the advantage of being able to run the top 10 comparison within the loop, then print the top 10 list to both the screen and to a .csv file when done iterating.

The next challenge will be to figure out how to get the right comparison such that it looks at the row before,  finds the number in the list at that index, and compares it to the next row to see if it was a match.

Then the final challenge is to put all of these parts together and solve the puzzle.

Then the hardware...

The Raspberry Pi 5 has an 8GB version that ships with a case that has a cooling fan, a 128gb mmicro sd card, a power supply and a giant heat sink. These single board computers can run constantly for decades... it is what they were designed to do! The board itself is $80, but the kit with all the parts runs $170. I already have a raspberry pi version 3b, that only has 1gb of ram. Both have a tiny form factor, about the size of a double deck of cards.

When the experiment is finished, i can use this new Pi to act as a better server for learning web technologies, so the investment has use beyond the lottery number crunching hobby. I can also put the GPIO board on the older Pi and learn some IoT electronics.

The use of a loop to iterate the lists has one more advantage... I can set the start and end points, therefore a limited test can be run to check the validity of the software BEFORE getting into the 10 billion loops. 

Did I mention that the Pi can run headless? That means no monitor, keyboard or mouse required. Once set up I can log in remotely with a program called PuTTY to check progress and download results.

At the end of this, the largest coding project I have ever undertaken, if it fails to produce hits it will still have been worth the effort since the puzzle pieces learned here can be used to solve other problems. I sort of look at this as an exhaustive backtest. A brute force effort to run through 280 billion possibilities (all possible iterations of the replacement scenario of which mirrors are but one) and I can truly say I have tried everything in the attempt at a single pick straight shooter system.

Entry #275

Considering a break in play while the new program is built and tested.

After some thought, I think it is best to hit the pause button on the dailies until the next phase is complete.

The spreadsheet has already proven that it can pick a winner, and knowing there is more work to do which will require funding outside of playing tickets makes it seem like a good time to put daily play on the shelf while development progresses.

The Raspberry PI setup will cost $170. At $8 a day, it would take 22 plays to cover it. Will continue when the coding and program runs are done.

The draw histories will need to be maintained, the code needs writing and testing, and I will still continue with the match 6.

The spreadsheets are already done and the dual recursion framework will be applied.

Not sure how long it will take, but not cutting any corners... full on software engineer mode!

Entry #274

Best List Of Ten Billion Observations Tested: aka BLOT BOT

Gotta give a new system a name, with that the acronym BLOT BOT jumped out. That is 10 billion observations per column of 28 columns across entire draw histories, and the core developments have already begun. 

The biggest decision now is whether to pick up a Raspberry PI 5 or to run it on the cloud.

I have what I call a $5 server on a could plan that costs around $5 a month. Mostly used for different python script testing. I wrote a script that took samples every minute of the network fees on the Stellar cryptocurrency blockchain for an entire week. The usage was about $3. The results gave an interesting set of data about fee spikes, so if you were to do transactions, you could see when the cheapest times of day were.

I may go either way, but I kind of want to get a raspberry pi 5 anyway, to practice web development on the back end as well. I have a pi 3 that I used as a server for some ruby on rails tutorials. 

Actually, the divide and conquer approach could be employed... run the day games on the PI and the night games on the server! Will cut the run time in half!

Currently working on the list iteration part, the use of pandas data frames from the follower system will handle the data storage for the history files.

One internal recursive function will handle the iterations and top 5 lists, and another will process each column for the zero counts.

Running from the command line will allow the print function to overwrite and stay on one line, that way for the status it will show the currently processed column and list, so when I monitor the output I can figure out the progress.

Long road ahead, but the wheels are currently turning!

As my favorite sidekick ChatGPT says at the end of every session... Happy Coding!

Entry #273

Planning the ultimate search for the top replacement values.

I had started with raw follower data, moved to a sequentially indexed "mirror" system, then moved to followers within that indexed system.

The basic premise, given the digit in the last draw, which one has the highest history of making a match?

With follower data plugged in, so far the zero count (where there is a match) is the highest, but there is yet to be a hit.

The next thing to do would be to check ALL possibilities of replacement values for each digit to find the best one for each column of each game.

For the basic mirror system, the replacement scheme is

0=5

1=6

2=7

3=8

4=9

5=0

6=1

7=2

8=3

9=4

On one side is the index for the last drawn number, on the other, the number to replace it with.

The total possible combinations of replacement values comes out to ten Billion!

So, why not test them all?

For each column, that means recording the number of "hits", and that will result in running through 28 columns for the mid and eve PA pick 2 through pick 5 data, 280 billion iterations of a recursive algorithm in total... 

How do we do that without running out of memory? By only recording the top 5 sets for each column.

How to do it? Lists!

Starting with [0, 0, 0, 0, 0, 0, 0, 0, 0] and ending with [9, 9, 9, 9, 9, 9, 9, 9, 9]

Each iteration will append a zero count, so that one of the top 5 output lists might look like [5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 750] and each of these lists will be stored in a "list of lists" which will be sorted by the appended zero count and compared to the most recent one, if it has a higher zero count it goes into the list and the lowest count gets popped off of the list.

The output will be the sorted list of lists, containing the 5 highest zero counts. 140 total lists. The zero counts will allow sorting into the top 5 without needing to store them all.

The main design challenges...

Writing a list generation loop to generate the lists.

Looping through the history files and counting the "hits", and appending that number to the current list.

Sorting and ranking logic to maintain a running top 5 list of lists.

Perhaps writing output to a csv file in case the program crashes.

Setting up a system to run continuously. (Raspberry Pi version 5 should do the trick)

Fitting the whole system into my current recursive framework.

Will take some time and small tests to get the logic right. But that is the ultimate rabbit hole dive for any replacement system.

Happy Coding!

Entry #272

How this new system actually operates

The system has 2 components, a Python script and spreadsheets. There is a spreadsheet for each of the 8 games (P2, P3, P4, p5 mid and eve).

There are also 8 .csv files containing a copy of all of the draw histories. This is what feeds the Python script with data. This script reads in the data and processes each column to find out what the follower distribution is for each digit.

There are 28 columns to process across the 8 games. The output is a distribution frequency for each column for each of the 10 digits. The inner functions run a total of 280 times over a total run time of 45 seconds. The output is used in the spreadsheet lookup tables.

Moving to the spreadsheets, the draw dates and draw history runs down the far left. Then it is the estimation function, which is the "guess" that is done by looking at the last draw and replacing the numbers with the corresponding values in the lookup table.

Moving over, there is a hit counter that checks to see if the guess matched all digits in the next draw. This is followed by the error function, which tells me how far off each guess was, and in what direction. A -1 means you guessed one too high, and a zero means a match.

Finally is the lookup tables for each column, populated with the 10 digits and the 10 replacements. Entering the replacement values allows me to see how the zero count changes.

In order to play the system, I have a picture of each of the lookup tables so picks are as easy as opening up the lottery website and processing the last draw of each game.

The values in the lookup column are composed of the most frequent follower for each digit based on the output of the python script.

Since follower data changes slowly, it is not required to run an update (roughly an hour process) before every draw. 

I was going to post an image of the pick 2 mid table, but the blog won't take it (probably too big).

This system should suffice until I graduate in July and get some free time back. I probably learned more about actual coding on lottery projects than I did in classes, with the exception of the intro to programming class where I discovered Python was my favorite language.

The areas I am looking to improve are 

1. Updating the draw histories by parsing the PA lottery RSS feed or scraping their results pages. Shame they can't just have an API like some other states, JSON data is much easier to handle in Python.

2. Making the python script auto populate the lookup tables on the spreadsheets.

3. Coding an Android app where I can set up all of the tables, pull in the last draws and generate picks for all games with one click.

In a nutshell my system takes follower data and reduces it to a simple substitution of the last draw digits as easy as one would apply a mirror system.

Happy Coding!

Entry #271

Plugging follower data for all numbers into a "mirror" type replacement strategy

It only took a short time to modify the existing python script to loop through all digits for all pick n games and gather the top followers.

Took a bit longer to run through the process of counting the zeros (hits) in each column, but because the follower data created so many extra zeros compared to the modified mirror system it looks like a valid upgrade of the existing software.

The pick 2 was the obvious start point, by the time the first game (mid day) was done I had established an updated work flow that reduced the previous update process by 80%. Script run to finished spreadsheets for all games in less than 1 hour. The longest process was draw history updates and making pick 2 .csv files for the Python part.

The year recap so far...

January saw the follower script in it's original form that had to be run daily to get picks. This system was winless for the majority of the month.

End of January through yesterday saw the new "mirror" system in play. This greatly reduced the need for updates and also picked a winner straight on the pick 3.

Today... the merging of the 2 systems. Keeping with the machine learning concepts of seeking a global minima for each column (highest number of zeros) and the ease of use of that mirror system while integrating the observed frequency of follower data. This will be tested throughout the rest of this month and March.

That pick 3 hit basically returned all of my money spent on both previous systems and will still fund the first 10 attempts of this new system.

Still looking for that elusive pick 5 straight hit. I believe that this progression will get me closer to that goal.

The ultimate refinement of this software structure would be to go through all 10 billion iterations of possible lookup table values but I realize my coding ability is not quite there yet. One day...

Happy Coding!

Entry #270

First "hot week" passes with no hits...

No wins, but reevaluation of the parameters leads to a shorter duration (4 days rather than 7) and only going 5x on the pick 5, 2x on the others.

Hoping to get the sheets updated this weekend and re run the zero counts. 

Follower table python script should be ready to mix in the testing.

Ever forward...

Entry #269

System first hit! And it was not the pick 2...

Mid day draw in PA was 0 2 2, my pick for that draw was also 0 2 2. This is the first hit of the system! $1 pays $500.

This will trigger (and fund) a "hot week" where the plays are 5x for the next 7 plays (not days as I don't play 7 days a week).

I can also take back what I spent so far and finish up this month on house money.

The reality check is that this may be the only win produced by this system, but it also could have produced zero, so the work put in was justified outside of just learning experience.

$42 a week for 4 weeks (counting the Match 6) now was $168 spent. A "hot week" has a price tag of $280, leaving $52 of house money. I shouldn't really count the match 6 in this system because that is played on a ticket good for 26 days that was already paid for and is good until the first week of march, but I am counting out of pocket expenses to include all tickets. So that $52 will go further as it will just buy daily games.

The easiest way to play is to keep the win on a voucher and use that to play until I get down to the amount spent then cash it in.

Here is hoping the other games heat up during the hot week!

Happy Coding!

Entry #268