hypersoniq's Blog

Finished the 6 intro to Python challenges!

Rosalind.info is somewhat similar to free code camp in that they give you challenges, clearly defined input and output, and it is up to you to figure out how to make it work, and measure your result as pass/fail. Probably NOT the place to start if you are new to coding, as you would need another resource to figure out the components of the challenge solutions. They give you 5 minutes after downloading challenge data to upload your solution.

There were 6 challenges, each of increasing complexity. The interesting part was the number of times each challenge was solved.

Challenge 1 solved 62,282 times

Challenge 2 solved 50,502 times

Challenge 3 solved 41,878 times

Challenge 4 solved 36,005 times

Challenge 5 solved 27,686 times, and

Challenge 6 solved 23,541 times

The pattern is similar with the other challenge areas. The next section I am attempting is the "Bioinformatics Stronghold" which has over 100 challenges. The easiest solved 75,714 times and the hardest challenge being solved only 117 times.

Over the site there are a total of 284 problems and 130,323 participants... maybe I am not as late to this party as I originally thought!

Coming from the computer science direction may give a slight advantage on the coding, but I will need to build domain knowledge along the way.

Entry #651

Finding a place to start learning Bioinformatics

After much searching, I think I have found the best place to start... rosalind.info

It is a site that presents coding challenges that teach the basics of the computational skills required in the bioinformatics field. Optionally there is a track of challenges that go with a textbook...

Bioinformatics Algorithms: An Active-Learning Approach by Phillip Compeau & Pavel Pevzner.

Ordered the book, starting the tracks.

If new to coding, they have a brief intro section to Python on there. I am starting here just to get used to their challenge submission process... most are timed (5 minutes), meaning you need to write the code and test it with their sample data before pulling down the dataset for the challenge.

The book was optional and costs some $$, but the website is free.

This should be an interesting shift from lottery hobby coding!

Entry #650

The reality of my systems attempts...

On the surface, no system works! IF there were any bias whatsoever, it would have been found in the Markov Chain follower analysis.

For a week straight after finishing the app, I would update the app in the morning, write down the follower picks for 8 games (PA pick2, 3, 4 and quinto... day and eve), then check the list in the evening... there was 1 win in 7 days on the mid day pick 2... checking every day... now I did not play these... just wrote them down. I ran a similar paper play test for the PA treasure hunt and cash 5... no winners (some small prize matches, but it was all paper play).

They were not the first paper play tests I ever ran, but they were unique as the entire process was carried out in Android.

The statistics from the classifier function clearly show the distributions and "geometry" of the jackpot type games (TH and C5), but there are zero correlations between the stats and future draws... none!

I am glad I took the time to plan and build the app, but the bottom line remains that there is no way to analyze past draws to get a consistent wining pick. I have spent the better part of 20 years (on and off) only to confirm the gut feeling that the task is impossible. I never gave up, but the end result is the same as if I had.

So what good (outside of keeping sharp with coding and spreadsheet skills) came of the hobby?

Learning to incorporate a budget. The goal of picking just 1 combo was indeed the biggest takeaway... 1 ticket gets you in the game, any more for a single game is a waste.

Learning which statistics told the story I did not want to hear... while standard deviation is commonly used, the information provided by variance directly is more useful in determining exactly how chaotic these games get, from samples to full histories. Distribution quartiles show just how tight of a spread separates hot and cold.

Identifying the "churn", even if there is no way to predict the changes... I see how numbers change but frequencies tend to hold steadier... but guessing which numbers will be next in the churn is just that, a guess.

If there was a solution, someone would have found it by now... foolish ego thinking I might be smart enough to solve the puzzle!

Still plan on using what I built as a better solution to QPs, still plan on buying a ticket for games, but realized it is now on that fine line of a waste of time to go any farther down the rabbit hole.

Draws are independent.

Positions are independent.

Machine Learning cannot solve it.

AI cannot solve it.

certainly some random 57 year old with a 10 year old laptop and a CS degree isn't going to solve it.

One ticket is all you need to have that license to daydream.

Moving forward I will be looking at simple coincidental systems that meet the budget criteria (ONE ticket) and result in exactly 1 pick per draw. The "slide rule" concept will be first. Once I define the variables and constraints, I will develop that here in the blog for all to see. I may even incorporate some of these "coincidental" systems into the app... but just for the entertainment value.

The majority of the "hobby time" will now shift to exploring the bioinformatics landscape and applying 20 years of working with noisy data toward a domain that might one day do some good for others.

Entry #649

Setting up the slide rule concept

So, the main idea is to slide the columns forward so that the number in the last draw row matches the actual last draw. The question becomes how far forward to slide...

Since the most wild per column statistic is variance, there seems like a logical target.

For a pick 3 example, the evening variance on the draw from 3/24 (6-0-5) was...

Column 1 ... 4.8, go back four sixes (or 5 sixes if rounding)

Column 2 ... 6.04, go back 6 zeroes

Column 3 ... 2.22, go back 2 fives

In this way there is a definitive starting point. The decision to use the most significant variance digit or rounding can be back tested. Also, initial exploration does not require any coding, easily tested in a spreadsheet.

The result would be a list of combos to play... and the date to play each. If I were to run the experiment today, it would yield

Last draw from yesterday

One combo for tonight, one for Friday, one for Saturday all the way to Wednesday.

Simply play the number for that indicated date. You win or you lose (or you paper play). That easy!

Maybe the variance is not the best solution, but any hits on this (or any other "system") are purely coincidental at best. Back testing is equally simple, just pick an earlier date and " slide " the data down to see if there was any luck from doing that in the next 7 draws.

Now I chose variance, which is data that I gather from the last 150 draws in my classifier function in my app, you can use any arbitrary source, such as 5 back on every column. (Not 5 rows, the 5th appearance of the last winning number).

If it proves as useless as every other way during the spreadsheet testing phase, then it can die there without ever writing a single line of code. And paper play is always within the budget!

Entry #648

Since it is not possible in scientific terms, what about having some fun with new system design?

Like the title says, have some fun... since I am done with the pick N games at the moment, pick 3 makes a perfect testbed...

The "Slide Rule"

Make an app where the columns can be slid up or down and locked in place.

The concept, slide down and match the digit in each column to the last winning number.

Write down the 7 draws below it that will form from the sliding action. 

Watch them for a week, did any match?

I can experiment with how far back to go and watch multiple such lists simultaneously since I won't be playing.

Literally zero data analysis value, but does it have entertainment value?

Maybe...

Entry #647

Strings of the same digit over history

One area I had given thought to was to isolate the distance between appearances of each number... not average (that won't help) but full lists of such information for each digit in the pick 3. (Where paper play is always free)

If they are displayed in isolation, that could be a helpful indicator of how each individual number appears in relation to the noisy full result stream. When placed side by side, perhaps there may be some unseen order to these distributions.

Or maybe not. This seems like a good task for SQL querying... or maybe the raw power of pandas?

Then there is the layout, a number to indicate distances between digits, but also a way to indicate and highlight repeated digits. And deciding the output.. oldest first? Newest first? And the visualization... how to see the digit road map side by side with all other digits...

I have a feeling that diving into bioinformatics will yield some new techniques.

Entry #646

First MM draw, no $

Not even 1 number... interesting how 3 of the white balls were from the hot category, along with one NA and one NB. The mega ball was from the NA category.

This illustrates how regardless of the stack of information I gather, there is just no clear cut prediction. White ball 51 was not even in the follower transition options for the third position... the mega ball was 04, the second NB from the transition, I played the first NB from the transition point.

So since it is all coincidence anyway, at least this "strategy" equates to 1 chance per game... however it is yet to be determined if alternation in both games is the best move or just going with PB...

Entry #645

First PB draw, no $

From the classification, the draw after sorting was 

Neutral Above Median, Hot, Neutral Below Median, Cold

And the power ball was Neutral Above Median.

I matched 1 white ball, but not in position, so technically I matched none.

The double draw matched zero.

Will get the first look at MM Friday.

All of my selections came from the transition areas in the neutral band. Trying this time to use the classifier distribution counts to see if there was an imbalance that might work itself back to balanced... this is why the same combo was played for all 3 PB draws... the balancing may not happen exactly when you want it, but it seems to happen eventually. There was no useful data from the draws out metric.

Entry #644

The new cycle, MM and PB only...

Given that just about every single thing I looked at with regard to accurately predicting numbers over the last 20 plus years has come up dry, it is time to change the strategy.

The new plan, as covered before, is to take on the jackpot games in alternating weeks. The old budget was $14 per week every week, so that was $28 in a 2 week span, this new budget, which will include adding on the multiplier and double draw for power ball will cost $12 one week and $10 the next... $22 every 2 weeks, already a $6 budget cut!

The Match 6 was hit last night, that was to be the switch point.

I know I cut back to 4 plays a week and cut the budget to $8/week but, that was not really working out well.

So for this first week, taking the $34 won on the Match 6, going to play both games (PB and MM) for an initial startup cost of $22, the $12 remaining will go to the first alternating week of the PB.

Since the initial system of classifiers was the basis of play (pick 1 combo and play it for the week), that will continue... one combo for 3 PB draws or 1 combo for 2 MM draws... that means I only need to hit the kiosk once in a week. The Markov Chain Follower data will be used to break any ties, but this iteration will focus on classifiers at transitions. 

Because the app has been fixed for the truncating bonus ball table issue, it is entirely possible to make a pick on the go, but because I have windows and android both running, picks can be made with both data screens open at the same time.

Looking at the alternation, the annual cost would be $572. That is much lower than the old budget that had an annual cost of $728. The cost of playing both jackpot games every week would have been $1,144... hence the alternating weeks.

The plan is to first check the results to see where in the tables the winning numbers came from on the classifier table and THEN update the draws after the cycle completes, that way it can be seen which areas the winning numbers were drawn. Follower data becomes irrelevant after the first draw.

The big back test is going to be way more complex than I originally anticipated, the csv file generated will be massive, as one single row will have data from all of the classifiers AND all of the followers... that would be 40 cells just for the pick 2, add another 10 columns for the stats... PB would have a minimum of 200! This means writing the csv and ingesting to a database table, because that will be one busy spreadsheet! Wide for the jackpot games (many columns) and tall for the pick N games (many rows). But it is the only way to get an accurate back test, because the entire output of both functions needs to be captured on each row.

Now if I separate the functions, that would be a bit more useful as the individual functions can be back tested, then the tables can be merged later. Might go that route.

It all may prove to be a waste of time, but I got this far. Every software engineering goal was met with the app, except the winning part... that very well be truly impossible... but it can never be said that I did not try...

Entry #643

The issue with vtracks...

I was here when they were taught, there was one thing that always struck me as odd, that is the confusion when encoding numbers within other numbers. The heart of the vtrack system is essentially encoding each pick N digit AND it's mirror into a v number. 

The original encoding...

v1 = 0 and 5

v2 = 1 and 6

v3 = 2 and 7

v4 = 3 and 8

v5 = 4 and 9

When looking at something like v111, it leaves 8 combinations... 000, 005, 050, 055, 500, 505, 550 and 555.

I always found that to be a bit confusing.

My modest suggestion for a new encoding...

vA replaces v1

vB replaces v2

vC replaces v3

vD replaces v4

vE replaces v5

Now instead of v555, it would read vEEE, still representing the same set of numbers: 444, 449, 494, 499, 944, 949, 994 and 999.

To be fair, the whole concept lost me on "mirror states" and numbers that "travel", but if I were to revisit the initial encoding idea, it would be after changing the numbers to letters as outlined above. The other reason vtracks never resonated for me is that it involves playing multiple bets for the same draw, but it was as wrong at prediction as any other system... the entire concept of lottery as entertainment loses it's appeal when the budget would get high enough that one would NEED to win to cover the losses.

I lost ALL of my vtrack spreadsheets when the first laptop bit the dust... never replaced them.

Where I left off was replacing all of the complex encoding formulas with a much faster lookup table, that allowed applying the encoding rapidly across entire draw histories. There were patterns to be seen, but they were not accurate predictors... but what if they could help identify cyclic regimes and map out their changes? Hmmm...

Entry #642

Match 6 in PA is still rolling that jackpot...

As of last night, the rolling 6/49 is at $5,860,000. This is a slow roll, as the jackpot does not go up much with each draw. The first ticket played this week matched 3/6 ($2) and 5/18 ($5).

Over the next few weeks I will stay on the M6 cycle until it gets hit, then it is alternating between PB and MM for the rest of the year. If dumb luck is a component, may as well go big. There is slightly less frustration involved when a one in 200+ million odds ticket misses than not even catching a single 1 in 1,000 after multiple attempts.

Fun facts:

On the power ball red ball, the longest out is 8, which has last been seen 96 draws ago. Classification is NB, or the Neutral Below the median. It is the lowest of the 7 NB numbers. In the followers, it is 18th on the list.

On the mega millions gold ball, the longest out is 15, which has last been seen 105 draws ago. Classification is the coldest of the cold numbers. In the followers it is 16th on the list.

Today's coding will involve removing the white ball list truncation discovered earlier. Already fixed on windows, just need to push the update to the Android version. Will also take another look at finding the single source of truth for the PA draw histories so Millionaire for Life can be added.

On an unrelated note, on 3/9 the final payment was made on the 30 year mortgage! Our 31st anniversary is coming up next week as well.

Entry #641

The variance "bowl" formed by jackpot white balls

While the app was meant to present the data, not necessarily a pick, certain interesting things emerge from the statistics.

Here it is noticing that in a sorted order, the samples for the white balls in the bonus games PB/MM AND the regular non replacement games Cash 5/Match 6/Treasure Hunt all have something in common...

Variance, when plotted, looks like the vertical cross section of a bowl (a U shape). Higher at the first and last columns, which is likely due to the constraints of sorted order. While the variance is not exactly the same in each game, they all exhibit a similar bowl shape, like an inverted bell curve.

Not sure how that would help with a pick, but the shape persists across all of the non replacement games. The bowl is not symmetric, both power ball and mega millions have a slightly higher variance in the last column vs. The first.

Continuing to study the stats for further such information...

Entry #640

So what exactly is it I am trying to achieve?

Been thinking about the current plays... did not get any hits outside of the Match 6 so far, and they were small wins.

After calculations and analysis over a few decades, my conclusion is, though it MAY be possible to use computing power to help find winners, I simply lack the ability to do it! I don't see patterns because there are none! The Markov chain would have found them applied to followers... in any other data, the steps taken would have been rewarded with results... identifying trends in other domains yet ending up with nothing useful in random data after multiple attempts using every statistical technique ever learned might just mean it is true, this is really an impossible quest.

That being said, I did learn a great deal of software engineering thanks to this hobby. I did not spend months hand crafting an application that works on both Windows and Android not to use it...

It's time to change focus...

Time to scrap the current cyclic play scenario. If my app would ever be a coincidental match to a number, that is all it would be... coincidental. So, if the whole premise is disproven, impossible to pick a single combo for the next draw in any game, be it RNG or ball drawn, then why continue to play for peanuts?

So a new strategy emerges... alternating weeks on PB and MM... if it is dumb luck anyway, even the base jackpots with another winner would be enough to retire comfortably... why not just skip over the small games and just go for the big ones?

New plan, alternate weeks of PB (3 draws, $12 total with the add ons) and MM (2 draws, flat $10). That comes in under the price of the old system which budgeted $14 per week... in 2 weeks it is $6 cheaper!

If a jackpot starts climbing then pause the alternating games and stick with the higher jackpot.

So, one pick per draw, taken from the data in the app. I already know the follower data is not helpful in the selection process, so it will be based on the transition areas from the classifier, using follower data as a tie breaker... once I patch up the app to display all of the rows.

So that is the new plan for the rest of the year. NOT playing both games at the same time (even though a weekly budget of $22 is not unreasonable), but alternating until jackpot levels begin to diverge. However, I will take $22 out of that $27 Match 6 win and start next week with BOTH games.

The vertical sums went nowhere, because on the surface it ends up being a pick from 10 draws prior, not back testing enough wins to justify playing. When applying the averages or modes, there were draws where the value exceeded the 0-9 range, even with a "lottery math" tweak, back tests were not profitable at all. That will not make it into the app, but it is interesting trying new ideas once in awhile.

There is still the path to be explored with SQL, but I don't know which direction that may take.

I am not able to make one thousand to one odds work, so let's go for a few hundred million to one... maybe I AM an idiot...

Entry #639

Finding bugs in apps, an ongoing process.

When you create an app and you are the only user, bugs might go unnoticed for longer. The app has 1 major flaw at the moment, it cuts off the data rows to match the shortest column... not noticed in 11 of 14 games because they display properly.

The obvious cause... in either the classifier function or the follower function, a second run of the function is made for the bonus ball games, this is to have the right statistics appear for the range of balls, such that in the power ball the 5/69 will have a different expectancy than the 1/26. Well the issue is ALL of the columns are cut off after 26 rows... the stats screen still displays the correct information, but the table is cut off in ALL columns to the shortest row, the bonus ball.

To trace this, the flow of information needs to isolate the exact moment where the data is combined and written to the table.

It is not in the functions, they run twice.

It is not in the logic part of the screen display.

It is in the "game factory", where for bonus ball games it sets up both runs. I would not know this if I let AI write the code... this is why the project took a few months instead of a few hours.

The fix will be relatively simple, in the second run, pad the rows so it matches the larger row count of the white balls.

That of course means fixing it first on the windows side, then pushing the fix to Android.

Having been focused on the pick 3, pick 5 and match 6, I would not have noticed right away. All of the single runs work perfectly... the match 6 pads shorter columns with (none, none) on the followers and zero counts on the classifier.

That being found when looking at the Cash 4 Life screen while trying to figure out how to find the data for Millionaire 4 Life since the lazy PA lottery did not make a page for "year at a time" data. It only had 4 rows in both functions. Then checking power ball and mega millions, it was then obvious that the data cut off at the end of the bonus ball row.

At least the updater still works for the 13 remaining games, they did not screw that up...

PostgreSQL 18.3 does now run on my 10 year old laptop! Had to first uninstall PostgreSQL 11, the version I used in school. The latest R studio is now in place, as well as updating the python libraries to include the one for interacting with SQL and Biopython.

On that front I have some project ideas already...

1. A dashboard that tracks research and clinical trials for tackling type 1 diabetes (T1D).

2. An app that will compare samples of white blood cell components and beta cell components from datasets of people with and without T1D.

Nothing on the edge, just a place to get started... a "Hello Bioinformatics World" project.

Entry #638

Next set of tooling for the quest...

Got PostgreSQL 18.3 installed, as well as the library that allows python to interact with PostgreSQL. LextEdit is already installed. Next day off will be connecting everything via ODBC (open database connectivity) drivers and creating tables out of the draw history CSV files.

Also got the most recent version of R studio set up. Will have to explore the latest packages (like python libraries) to get it set up properly.

Still amazed at how much modern software will still run on this 10 year old laptop!

Also had a thought... since I have so much time working with noisy chaotic data with the lottery hobby, an interesting path for non lottery related development would be bioinformatics... to that end I have installed the Biopython library.

Entry #637