hypersoniq's Blog

Now what?

So, since my big idea fizzled out I think I will return to python and finally figure out how to read PAs pathetic RSS feed and turn it into a way to automatically update all my draw files. I think I will also include the current cash 5 matrix just to have the data in case a jackpot game idea comes along.

So, I guess play wise I will start back with the followers, because of the 10,000,000,000 possible replacement strategies... they were #1.

Using the pick 2 system, this limits out of pocket expenses to $2 per day played, the other games have to wait until a hit to get action... how can hitting a 1:100 game prove to be so difficult?

I firmly believe they could have a pick 1 and I would be wrong 90% of the time...

Parsing arcane data structures from the PA lottery and making them usable in python to write output to 12 different csv files... good times 🙄

Happy Coding!

Entry #297

What was learned...

After the analysis of the top 3 lists in the one pick 2 game that ran, not only was the follower list the top list, but I already had lists 2 and 3 in the distribution for 2nd and 3rd place in the follower output anyway.

I learned that the math behind the followers already was the answer to the question of how to generate the most column matches in a game.

The logic behind followers, what number most follows any given number drawn, was the only choice, as it gave the most frequent follower for each digit, therefore the top performing list of ten billion was the one that had the most matches and this is all based on frequency.

So, I basically saved enough money by not playing to cover the cost of the raspberry pi 5. The down side is I thought this was the possibility. Follower frequency analysis was no magic wand... it only takes you from the expected 10% chance on a number to maybe 14% best case.

Today I will start clean slate with the pick 2 per the revised betting strategy. Not expecting much as so far I have not won the pick 2.

What did I learn outside of the answer I had a feeling that I already had?

1. Code optimization. I learned plenty about what kind of overhead python can have, particularly with variable typing.

2. Code conversion. I had a successful conversion from Python to C.

3. Connecting remotely to another device, transferring files and controlling from a remote SSH connection.

4. How to use, debug and tweak the gnu c compiler. Only had one segmentation fault caused by missing a memory cleanup at the end of the program.

5. How to recreate the functionality of a pandas data frame in C.

It was an interesting few months for sure. Who would have thought that the original program I wrote for followers already held the answers. I do not have to run the full script on all of the games, because the fact that I had vision to all 3 of the top lists from a program I wrote in January in a few days that runs in python in about 90 seconds.

Because I play around 4 days a week, this will actually be cheaper than the match 6 game which I play every day (by playing a ticket for 26 draws at a time)

It was fun, but now I need ideas for the jackpot games... I can write fast python programs and now can convert those to C if they require heavy calculation... I am just out of ideas!!!

Gladly entertaining all wild speculations and ideas!

Entry #296

Rebooting the follower replacement system with a newer cheaper system.

So, as mentioned earlier, the whole out of pocket expense comes from the pick 2... that's it. Any other plays will be based on a pick 2 hit.

After working out some scenarios, it seems like the better bet is to completely eliminate the pick 4, even on their money.

Also, want to mix in the pick5 faster, so the 4 draws after a pick 2 win would be 

Eve... 1 pick 2, 4 pick 3's, 1 pick 5 ($6)

Mid... 1 pick 2, 4 pick 3's, 1 pick 5 ($6)

4 draws at that rate is $48, so the last $2 from a pick 2 win would go to the next "only pick 2" play.

That puts the whole system as the cheapest yet, well under the estimated $730 match six ticket throughout the year.

That's all I have for now.... still no clue where to start to analyze the BIG games... deflated yet again...

Entry #295

C speeds things up by almst 4,000%, verdict is in...

After the c conversion, the speed boost was over 4,000% from python and guess what the top list was... it exactly matched that list I generated from follower data...

So for the first hypotheses we can say that you might be able to bump your luck up from 10% to 14%, which means we can probably accept the null hypothesis that numbers systems do not increase your odds of winning in any meaningful way.

We can also reject the null hypothesis that the best results do NOT come from follower data. Of 10,000,000,000 iterations of lists against winning numbers, the best list matched the follower data exactly... I mostly went through this exercise for nothing but the education.

It took 28 hours without overclocking and only analyzing the pick 2 evening data. Same result in python would have taken 13 years...

On to the next idea... awaiting inspiration.

Happy coding!

Entry #294

Getting a first look at c conversion of the python program today.

Got ahead on homework for the class, I am able to carve out a few hours before work to start setting up for the conversion to c.

I will need to start by finding the import path for the Pandas c headers, which I found in a local folder. Already have an idea of the csv read mechanism, have to figure out csv write in c, also calculating the malloc() c function to allocate just the right amount of memory.

Have to explicitly declare the variable types, nothing requiring more than a short INT. I would go character but the appended match count can be as high as 1,800 (or more)

Whatever garbage collection I need to implement and whatever pointers I need to iterate through each column will be focal points today.

Compiling and clearing errors will probably then be the workflow until a successful run of the test program happens. Only then can I convert the part of the program that counts to 100 and modify it to count to 10 billion, and also remove the timers. The outputs will be active as well. I disabled the csv write for the last timer test but re enabled them to make sure that the tests are complete.

This is one of those times where I am glad I first did a flow chart so the logic can be converted into another language easier.

Not going to be a quick journey like developing in python, but I am ready for that.

Entry #293

Pandas can be used in C !!

There are C extensions for the Pandas library! This means no need to reinvent the wheel (or the data frame that holds the draw histories).

That was found in the pandas documentation, so I just have to download the pandas c extensions and put the directory into the include path when compiling.

This is huge as the initial algorithm can stay intact.

There are also csv file write commands that are functionally equivalent to their python counterparts. 

Researching the operations to have a better shot at first time success with the translation from Python to C. Not as scary as first imagined.

Entry #292

The Python to C roadmap...

It looks daunting...

Memory management is the first part that has me concerned. The fix looks to be divide and conquer...

Garbage collection... that should be fun.

Pointers... this will most likely be the mechanism for column traversal.

Solution for the test program is to run individual tests with data of the same length. So we would start with the pick 2, as it has the fewest operations. Will split into eve and mid as well.

Production software will be 8 programs with the memory allocations to match the data size of each game.

Runs will most likely be sequential rather than in parallel since it will eliminate the possibility of threads vying for the same memory blocks and corrupting the data.

These can be added to execute in sequence using the built in Linux CRON job scheduler.

The order will consider the play strategy which has been modified to represent the least possible expense.

Pick 2 Mid

Pick 2 Eve

Pick 3 Mid

Pick 3 Eve

Pick 5 Mid (the goal)

Pick 5 Eve (the other goal)

Pick 4 Mid

Pick 4 Eve

The new play strategy...

Pick 2 only until a hit.

On a pick 2 hit, next 4 plays are 

Pick 2 x 1

Pick 3 x 5

That is $12 per mid/eve cycle, leaving $2 to get back to just the pick 2

On a pick 3 win we deal in the pick 4 and 5

Pick 2 x 1

Pick 3 x 5

Pick 4 x 1

Pick 5 x 20

That is $54 per mid/eve cycle for the next 4 played cycles... $216 on house money then it drops down to 4 cycles of the pick 2 win strategy for a total cost of $266, taken from the pick 3 hit profit of $2,500 that leaves over 2k.

The ONLY out of pocket expenses will be $2 for the pick 2 cycle! That is a 75% expense reduction in regular play.

It reduces greatly the exposure of the plays on the pick 5, which is the target, but part of the exercise was to develop the system into an entire play strategy that minimizes out of pocket expenses while still having the potential of decent profit. If it can't beat the 1:100 odds then there is not much point in going after 1:100,000

Entry #291

Double checking the math...

I do not have a way of knowing how many operations will be expected, but I can compute clock cycles based on the CPU speed and the run time of the test.

As written in Python, the elapsed time of the run was 190 seconds. In one second, a 2.4GHz processor goes through 2,400,000,000 cycles.

So, 190 x 2.4 Billion = 456 billion cycles.

To extrapolate that into the full run, since the test only did the first 100 iterations of 10 billion, we need to multiply that answer by 100,000,000.

The answer is a staggering 456 Quintillion cycles... again, as written in pure python.

That is 456,000,000,000,000,000,000 !

That is why the run time estimate is 554 years.

That is why Python and its seemingly minuscule overhead when running short scripts is the WRONG tool for the job.

Why?

1. It is interpreted. To have a chance at a run, this needs a language compiled at the processor level.

2. Dynamic data typing... python infers the data type, this makes it very flexible but that comes at a compute cycle cost. What this project needs is a static data type system where we can explicitly set the data types. Will save a ton of overhead vs constant re evaluation of the same variables.

3. The algorithm is tested and optimized as far as I can take it and still get the desired results.

 

The two leaders in a new language for this project are c and rust. However, c seems like the most likely candidate for the job.

What are the drawbacks of using c?

1. I am not very familiar with c, outside of a few programming exercises in school and using an Arduino, whose sketch programs are c like.

2. Memory management, including allocation and release will now be on me rather than an interpreter.

3. I also have to deal with pointers and manual garbage collection.

4. I have to manually create a data frame structure since I will not have access to the pandas library.

5. I still won't know the run time until the test program is run in c, this could all be for nothing.

There are many challenges ahead, but many have already been met, a working algorithm is created already, the flow chart will be of great use in converting to c. The rest of the system, the spreadsheets for validation and implementation already exists.

All I can do now is move toward the next solution. I have no idea how long it will take, but giving up when faced with that ludicrous cycle count is not an option... this is fascinating stuff!

The kicker is even with a successful run, it will still probably not help pick winning numbers. One plus is the memory situation was already reduced to bare minimum when deciding to use the Raspberry Pi 5, the generated csv files are small and the main loop does not hold data from each pass, only the incrementing of a single integer variable to count matches. When the next column is scanned, the variables reset.

The decision to completely move from Python to c was not taken lightly, and would not have been made if the algorithm did not work.

Might be on that borderline where hobby meets obsession...

Happy Coding!

Entry #290

A formal statement of the problem the project aims to address.

The BLOTBOT project is a thorough attempt to analyze the per digit replacement system of selecting lottery numbers. One such popular variant is the mirror system.

The null hypothesis: there is no statistical advantage to be gained by studying past draws to predict future results using per digit replacement in a pick N lottery game.

The alternative hypothesis is that there IS a statistical advantage to be gained by studying past draws to predict future results using per digit replacement in a pick N lottery game.

Where we are planning to deviate from the scientific method is by exhaustively testing all possible variations rather than sampling.

There is a second hypothesis also being tested.

For this part...

The null hypothesis is that direct follower data will not be the highest performing list in the 10 billion lists possible for each column of each game.

The alternative hypothesis is that direct follower data WILL be the highest performing list in the 10 billion lists possible for each column of each game.

So, with one massive test, we can make an honest attempt at answering both hypotheses.

If the first hypothesis is accepted, then there really is no point in continuing on the current path. That would be the indicator to maybe back away from the daily games for good. Not sure yet. It may be the indicator that I am just not smart enough to beat the lottery at their game, and should employ some other techniques like unsupervised machine learning to help find patterns that I fail to see.

On the other hand, working to get to this point has allowed me the opportunity to put some of the theory I learned in classes like algorithm design and software engineering into direct use. I am not the type to refuse to admit I was wrong, i have been studying the lottery for decades and have spent more time thinking about the problem than actually playing.

It is that burning desire to solve problems that kept the chase alive so far. I want to know, even if it means confirming that the chase is a waste of time and I should move on to something else as a hobby. As always, time will tell.

Entry #289

Other promising ideas to save the project.

All of the times so far are based on pure python code running on one core. The raspberry pi 5 has 4 cores. If I incorporate multithreading in some way, that could be an immediate 4x reduction in run time.

Since Pandas is a single thread library, it might make sense to let one core handle the day and night for each game with 4 separate instances of the same script. At a  Python only level, that should bring the 554 years down to 138.5 years. The low end of the PyPy scale shows a 7x improvement over the same code run in straight python, that would bring it down to around near 20 years... not bad from starting at 3,000!

Obviously some research will be required, but that makes some form of c subroutines even more attractive.

Not giving up easily, not giving up at all!

Happy Coding!

Entry #288

The number of operations is staggering...

Just for the pick 3 evening, that is the longest draw history, the code will step through the draw data in each column, which is over 17,000 draws 10 billion times. That's per column... 3 columns plus the evaluation of the current outcome to the top 3 list means that just that game will take an estimated 500 TRILLION operations.

This is why the run time estimate is still in the hundreds of years.

Cython is going to take much more time and work than originally thought. I may end up having to add that to windows just to do the development.

There is an interim technology I have planned to try called PyPy, which is a just in time python interpreter that runs in place of python and uses python code directly, no C conversions necessary. In general offering 7x speed ups, but in the case of what I am doing, it could be hundreds or thousands of times faster. Or not. At it's face, the python program cannot be run fully, and although I have taken the estimated run time down from 3,000 years to 554 years, it is obviously too ambitious of a project... for now.

As of the current time, it was a nice thought exercise, but not practical to run. Not giving up just yet, but the likelihood of running over a quadrillion operations is not in the cards with Python.

Other languages to be considered are obviously C, but also taking a good look at Rust as well.

The Cython output was way too fragmented to learn from, but now that I know the algorithm, it is just a matter of doing the same thing in another language. As a result this will drag out the process much longer than I had hoped, and now that the next class started, I might not be able to even look at it for the next 9 weeks.

Oh well, it has been an educational journey so far for sure.

Entry #287

Reframing the big test, and resuming play while waiting.

After thinking about how the follower system is actually calculated, based off of the most frequent follower of each digit in each column, it is with almost certainty that I am already using the data that the test program might produce as it's highest count.

As a result, the test may be re-framed to just run the pick 2 and see if that is correct. I still need to run the C conversion of Cython, but that was happening anyway.

So the new strategy will be to use these numbers I have now, and make an "additive" strategy...

Going to resume playing, but only the pick 2.

IF there is a hit, the pick 3 will be added UNTIL the free plays from the pick 2 win run out, then back to just the pick 2.

IF there is a pick 3 hit in the pick 2 win window, then the pick 4 and pick 5 will be added.

This keeps out of pocket expenses (which they will be since I used the last of the pick 3 winnings to buy the Raspberry Pi 5 kit) to a $2 per played day maximum. As I only play usually 4 days a week, that will make the system cost $8 per week, rather than $8 per day... a 75% reduction!.

I already feel better about the new strategy because it is now as low of a play budget as I can make it while still actually playing.

Cython mastery will come in handy for far more than the lottery, so I will be super motivated to continue to learn all about it, it is a common practice for data scientists who use Python to write c code subroutines when working with intense calculations on large data sets. That will give me practice with writing c code outside of the scope of Arduino sketches.

So the new goal is to run the ten billion tests on only the pick 2, for the purpose of testing my hypothesis that the follower data, which I can generate in 90 seconds, will indeed be the highest matching list of the top 3. If it is not a match, then I am honor bound to run the rest of the tests.

I still need to hope for Cython to make even the pick 2 test runnable in a lifetime... therefore luck is still a factor.

Happy Coding!

Entry #286

Still learning all about Cython

Progress is slower than anticipated. Cython may be the last ditch effort to get the program to run in a reasonable amount of time, but it is not as simple as the examples have shown.

Generating the c file was straightforward enough, but then when using gcc to compile it, there was a missing python.h header file that I had to include in the compile command.

Figured that out, now stuck with literally hundreds of "resource undefined" errors thrown by the gcc compiler.

From the research so far, it looks like the Pandas library headers must also be included.

I also have to rerun the setup file and include an --embed tag in the cythonize command.

With these hurdles to clear I am left with several options, the two most likely are

1. Continue to resolve reference errors until it works.

2. Write the program directly in C.

I may end up doing something like writing directly to C, but python is the language I solved the problem with... it is my go to, but not very efficient at fast execution because it does not let you explicitly declare data types. The constant inference from the 19 variables in the code are definitely a bottleneck.

Again, if it were easy everyone would be doing it.

On the Pi itself, awesome little computer! I set up SSH so it can be run without a monitor, keyboard or mouse.

Transferring the files is accomplished with FileZilla on secure FTP, and PuTTY was not necessary as ssh is built right into Windows power shell!

There is a program to allow connecting to the visual part (x windows) but we are going for low overhead so the setup is functional.

2 classes left, and the next one starts Thursday and looks to be daunting... "Data mining and Machine learning"... so my experiment time will be limited.

I could have given up when the first time test indicated 3,000 years. I could throw in the towel with the current Python estimated run time of 554 years... almost a 6x boost... but that is not the goal.

Ever forward...

Entry #285

Why multiple tickets for the same draw do not divide the odds

It is a common practice to look at a ratio and reduce it. In pure numbers that is just fine, but in the lottery these numbers represent something and therefore cannot be reduced.

Let's use the simplest example possible... a pick 2 game.

Posted odds are 1:100

The one represents your ticket, also known as the favorable outcome.

The 100 represents the draw, or the possible outcomes.

Buy one ticket, such as 45, then your odds are 1 favorable outcome to the 100 possible outcome

Buy 2 tickets with different numbers... say 45 and 71, and your odds ratio is 2:100

Some would say 1:50, but in every draw, the possible outcomes stay at 100

To extend the example, say you spent $50 to buy half of the possible outcomes... your odds are now at 50:100, this is why they only pay out half of the odds... IF you win, it pays $50 and you earned nothing. And if their draw was in the group of 50 that you did not cover, they took your $50.

This is why I take the more difficult route of trying to pick just one combo to play in a draw.

You win far less, but it feels like you have accomplished something when you get a hit. Plus you spend far less money in the process.

The same holds true for jackpot games.

Throwing extra money at one draw does not significantly increase your odds of winning, it is better to spread the same money out over multiple draws.

Best of luck!

Entry #284
Page 1 of 20