hypersoniq's Blog

Gearing up for the big PA Match 6 live test

Having only ever studied the data in sorted order columns, there were some unexpected results when gathering a count of the numbers regardless of position.

The first stand out is that none of the numbers are the same! All 6 are different when counting from the entire grid.

In sorted order, by column, a 1 is the most frequent in column A and a 49 is most frequent in column F. They also appear at double the frequency of the other 4 positions.

When counting the numbers regardless of position, 4 ends up being the most frequent number overall. This makes sense because the numbers are not restricted to a single column. The application of the most frequent numbers to appear by position, the "Vertical Horizon" script, has no missing entries, some of them have fewer distribution entries though, such as a top 5 rather than a top 20, because of how the script only records non zero entries.

In the by column counts, 1 appears around 455 times and 49 appears almost 460, but the top overall picks have counts above 500.

All of the counting was done in the spreadsheet. There was no need for a script for this task.

For each pick generated by counting, there will be 6 corresponding lines generated by the script. Grand total of 14 tickets (because I am also playing the 2 seed lines) at $2 per.

Counting the free quick pick lines, it will be 42 lines for tonight's game.

For the purpose of this experiment, I am only counting matches on the lines I picked. Which ever one does better will be the one moved forward to the big games for a run.

One winner took home $1,680,000 on the September 25th Match 6 draw, so the jackpot reset to $500,000. It is up to $620,000 for tonight.

So, the counting is super easy in a spreadsheet!

To count the most frequent in a column, simply use the "mode" function giving the column data range as the argument. You can use any empty cell, I usually pick one below the history rows. For my sheet this was

mode(B2:B3882)

Grab that cell and drag it to the right until the other 5 spaces are filled and you have your most frequent by position! Hint: the last cell should contain 

mode(G2:G3882)

Note: I start with row 2 because I use a header row. Now, counting frequency by the whole history is slightly different. Somewhere on the sheet where you have the room, create a column (for arguments sake we will arbitrarily choose column J) and fill it from 1 in J1 to 49 in J49. This is our target list.

Now we make our counting function in K1 to read 

countif($B$2:$G$3882;J1)

Note the dollar signs, they turn the default relative references into absolute references. When we fill down, the only thing that will change is the target cell. So do that, drag K1 and auto fill down to K49. Now you have the counts! Let's sort them to find out the top 6...

Open a new sheet in the current workbook. Select the range J1 to K49 and copy.

Go to the new worksheet and use "paste special" to copy only the values (we don't want the formulas here, just their results.) Into cell A1.

Now select cell B1 (this will be the K1 from the first workbook) here is a shortcut for selecting the entire range that contains data... CTRL+SHIFT+Down Arrow. This should then select B1 through B49. Click the SORT Z-->A (descending). It will ask if you want to extend the selection, click OK.  Now you have the list of all 49 numbers, sorted by frequency!

The process will alter slightly for different game matrices, but you get the concept.

Entry #336

Two different paths to take for a seed pick...

There are other ways to generate a seed pick that have a statistical grounding but do NOT rely on the frequency of followers...

This is specifically for jackpot type games.

The first is to identify the number in each sorted order column... this provides the actual most drawn by position over all. (Pure spreadsheet analysis)

The second is to expand the countIf formula to include ALL columns and identify the top N numbers to have been drawn in the entire game in ANY position. (Also Pure spreadsheet analysis)

It is probably worth a test to see which method works better to focus on a path forward...

Because of the python script that can provide the numbers most likely to appear with a certain number when it is drawn, it is the selection of the seed number that will be the target of short term experiments.

The cost of testing both methods will be $28 on the PA Match 6. The $14 test on the follower seed was a $9 loss. But the results have been recorded, and will be compared with the results of these 2 tests.

After this, some down time as the big jackpot games get a history file update (not since the early part of last year!) And the winning method will get a test on the Mega Millions ($3 x 7 = $21) and then the Powerball ($4 x7 = $28)

So $77 slated for the next 3 tests total.

Unlike the dailies, this method only needs to work one time to be a success...

Entry #335

Maybe an LLM can help find the next direction...

As long as you are not specifically asking for a lottery solution, LLMs like ChatGPT and Claude will discuss prediction methodologies.

Like anything else in life, you tend to get out of it what you put into it. The art of getting useful information already has a name, "prompt engineering".

Perhaps it is time to start asking the right questions... experimentation with these LLMs is still free, so may as well sink some time into seeing what the tech can do to solve this problem.

Entry #334

Time to do some trigonometry!

After studying the last result with the program output, there was no way to utilize the given data to cone up with the next result that holds across columns or draws.

Next up is to break out the visualizer to try and test some trigonometric operations on the angles (and/or line lengths). Looking for something consistent that might provide the next angle when given the last few... sine, cosine, tangent, secant, cosecant, hyperbolic tangent.... just some of the operations that will be tested to help give a consistent result.

The alternative would be to undertake a massive coding effort in order to measure error in the picks, like I did for the pure followers, only expanding to include the angles and lines... I would rather not have to start that one, as that could take months...

Mis application of math on trying to solve the lottery is something we have all done (who try to predict), whether it be a workout or a simple +1, -1, +1... just trying to take that to the next level.

Anyone have ideas about how we could use calculus?

Entry #333

What a different story the data tells now...

Using the distribution of angles, there is formed a bell curve that centers on 0, or the repeat. This is because 0 shares no "negative split" with a corresponding angle.

Adding the lines... the lines tell a wildly different story. The most popular following move is a difference of 1, or a line length of 1.41... corresponds to both 45 degrees AND -45 degrees.

The +/- 1 is the most popular line follower and it is not even close... in every position! The repeats, or length = 1, rank 4th at best.

So now, how can this be used?

One idea is to look at the ranked angular data and pick the smallest non zero move that appears highest in the list... such as if a -45 is ranked higher than the 45, then simply subtract 1 from the last draw... if a 9 or a zero appear as the last drawn number then -45 on a 0 and 45 on a 9 violate the grid constraints, so the choice is easier.

While no system will work every time, if this could help steer closer to a correct pick, then by all means it will be utilized.

I took a snapshot of the follower script output, decided NOT to play today anyway, so a "paper pick" analysis of the concept can be tested after the draws.

All sheets were modified, but still have to update the pick 5 sheets to current, as they sat stale for quite some time awaiting a pick3 hit to get started.

It may be just more meaningless information creating noise, but it does tell a way different story than just studying the angular data.  For instance, the length of ~9 can only happen when a 0 follows a 9 or a 9 follows a 0. It cannot happen any other way.

Now to determine exactly how important this new information is and figuring out how to apply it consistently to picks.

To confirm...

Raw follower data distribution = a near uniform distribution

Angular follower data distribution = a bell curve centered on 0 degrees

Line length follower data distribution = a more logarithmic or even near exponential distribution, centered on +/- 1

So if we know that the +1 or -1 fits the line data, we might use the placement of the 45/-45 in the angle data to help decide if that 1 is added or subtracted...

Entry #332

Today's challenge... integrate the remaining vector data...

Angles are in place. They sit beside the normal draw history. Today adding the line lengths that are a part of the vectors. (Vector = angle and line length).

The distribution will differ from the angles as there are 19 possible angles (counting negatives) but only 10 possible line lengths, as these are positive only. And though they are equal in number to the actual digits, they are predicted to have a more logarithmic distribution than the raw draw data, as they maintain a many to many relationship like the angles do...

Length of 1 could be any repeat, for example.

Hopefully this makes it possible to narrow down a pick to the cumulative score on the distribution lists that do not exceed the constraints.

The process is time consuming, but not very difficult. For the pick 3...

1. Add 3 columns that will hold the new data

2. Add a column to the lookup table beside the angles

3. Enter a formula in the new columns that will look up the line lengths

4. Prepare the data on a new worksheet by "paste special_values only"

5. Export this hybrid worksheet to a .csv file.

Since I have decided to skip the pick 2 and pick 4, this only involves modifying and updating 4 sheets.  P3 Mid and Eve and P5 Mid and Eve.

The follower script does not even need a single modification, as it is flexible in how it reads the data. 

It is more a prep for future development, but also falls into that category of "at least I tried everything I could think of."

Entry #331

The positive takeaway from the current attempt..

This would undoubtedly be that I can store data in the same file and only target the columns needed.

This means I can store draw data alongside it's corresponding angle data and just choose to isolate one or the other. I can also expand on what is there.

For each angle (signed, forming a bell curve type distribution) there is a corresponding line length that comes from the vector, this will also be added. The expected distribution would be more logarithmic, with shorter distances expected to make up the majority of the draws.

Why is this exciting? Because I would not have to go through abstraction processes like "one hot encoding" to turn features into binary data, I can just use the actual data instead.

This process would create data points (draw history) with features (lead in angle and line length) which would allow the use of machine learning algorithms to help figure out what in the data may be of importance.

The engineering part of machine learning goes through two steps when you are unsure what is significant...

1. An unsupervised algorithm where it can spot and report patterns and help estimate weights

2. A supervised algorithm where it will actually use the info from the previous step to help build and train a model to obtain predictions.

This was always an idea, but now it gets a step closer to realization.

Now the history file can be packed with information. Basic aggregations like odd/even and high/low, alongside the vector components. Using step 1 to determine what is important and what is irrelevant, then using this information to craft step 2.

Yet I do not need to remake versions of the history files because individual columns can be targeted in the scripts. Multiple scripts, one history mega file for each game...

That will not be an easy slam dunk like these scripts that spit out follower distributions, but it represents a goal, and when you have a goal, you have motivation...

Much to do...

Entry #330

Something is still missing...

For this year, I moved into gathering follower distributions, basically raw Markov probabilities. The number most likely to follow another in a column is not always right, but sometimes they are. So generating one pick is possible, but it does not win with enough regularity to be interesting. It always seems that the columns are out of phase with each other. 

The new addition aimed to help fix that by analyzing the raw Markov probabilities of the digits that appear with a locked column in the other columns. This introduces 3 new picks into the mix, taking the cost from $1 to $4, and if none of the digits appear in the seed, you have no chance of a win (which is why the decision was made to change from $1 straight to 0.50/0.50 straight/box.

Another fundamental change was brought in when I decided to move from the frequency of raw numbers into the realm of vectors. The angles formed between draws shift the data from a near uniform distribution to more of a bell curve distribution because the angles are both positive and negative.

All the ideas, all of the imagined and then realized code solutions... still no winners...

It is time consuming to constantly update every draw history file to get accurate data to work with, each csv has a spreadsheet as well. Angular data is calculated via lookup tables and needs to be copied every single draw.

It seems like a great deal of work for something that does not produce results.

Having used the lookup table because it was the quickest solution, and having recent success with the Python dictionary structure in the latest script... I am thinking of quite literally automating the boring stuff...

I will continue to take a shot here and there with the new scripts to give it a fair chance, but I feel the need to focus on automating the update process.

The script will pull data from the PA lottery RSS feed and attempt to read it back to fill in the missing dates in the .csv files. If I use a dictionary in place of the lookup table, it can generate angular data on the fly... hopefully.

I have once again depleted the idea vault, the angular and cross column ideas were based on the concepts of other lp members.

I just missed a box hit today on the PA mid... had 2 of 3 on the last number generated. So I do have something to experiment with, but it may be time to finally set to the task of automating the update process. It is a motivation killer to go through 8 spreadsheets and 16 csv files on a daily basis, then run 2 scripts just to get a pics.

But something is definitely missing from the solution of generating a winning combo, and I am not sure what that is...

Entry #329

Cutting the cost of play, choosing the angular followers as a seed.

Within the vertical horizon output, I noticed that the picks generated by the angular followers had more matches (not a win yet, but more matches) than the direct follower pick.

By choosing one, the cost of playing the live test for awhile is cut from $16 to $8, and covers 1 day and 1 eve draw for that $8.

Also continuing with the $0.50/0.50 straight box combo over the usual $1 straight only, at least until a few draws play out.

A dual system test on the pick 5 would cost $24, and that can be funded by a $40 or $80 box hit just as easily as a straight hit, and have slightly better odds of happening a little more frequently.

Really breaking out of the "comfort zone" of 1 pick or none... intermingling column data, and I found a use for the Python Dictionary to count in the new program, that probably represents in an 18x reduction in code in that section vs using and updating single variables. So even if a win is never produced, it was valuable experience in coding again, the one true constant gain in this hobby.

Entry #328

First picks...

For the PA pick 3 evening are 2 sets, one using the regular follower seed and one using the angular follower seed...

A. 5 5 5,   4 0 6    (seeds [numeric / angular])
B. 5 4 0,   4 1 2    (locking the first seed)
C. 0 5 0,   8 0 9    (locking the second seed)
D. 0 8 5,   0 9 6    (locking the third seed)

Also for tomorrow's PA pick 3 mid day drawing...

A. 9 4 8,   3 7 9
B. 9 2 8,   3 7 2
C. 1 4 0,   2 7 5
D. 9 1 8,  9 4 9

Because of the sheer volume of picks, they were played 0.50 straight / 0.50 box. (Excepting of course the 5-5-5, which is $1 straight)

Total cost = $16

Did my part, now it is up to the state... again...

Entry #327

Successful run of new system... now to update the draws and test.

Finally came up with code that tracks the most frequent numbers in the other columns when given one column as a seed.

Passing in a combo locks the columns by those numbers and prints the frequency distribution of 0-9 in the other columns.

Tomorrow (well, later today) will be all about updates so the data can be tested. First by verifying (does the code count of the seed numbers match a countif in the spreadsheet? Also, when filtered, do the top counts match in the other columns?) Then by updating all but the last draw to see if it works better with followers as a seed or angular followers...

Surprisingly rapid development on this one once I dialed in the main data structure, which was a list of tuples. Because tuples are immutable, there is slightly less memory required when running the code.

Weighing in at less than 80 lines of code...

Also will put the finishing touch on the version that does pick 5 games... that one is looking like it will come in at around 100 lines of code.

Of course it remains to be seen if it picks any winners at all, but don't ALL systems break apart at that stage anyway?

Syntactically and semantically correct is my part, the rest is up to the state...

Entry #325

New projects need corny project names, so...

I have begun planning of "Vertical Horizon"...

This is the combo frequency analysis program. It will be using the vertical follower analysis from "Follower Foundry" as the seed combo.

Some early file read systems to put it in the correct data structure were successful.

Initializing the list as history = [ ] creates the main list, and when reading in the rows,

Row_x = history.append()

Creates an addressable combo entry.

What we have is the ability to isolate internal combos by position, so the first position of the 1,000th draw would be 

history[999][0]

This is respecting the 0 indexing of Python.

To gather information on a specific first digit, would need to iterate

history[n][0] == 5

And count history[n][1] and history[n][2] for the pick 3.

In that way, the accumulator for the main loop could be 

For each [n][0] == 5:

    If [n][1] == 0:

          B_zero += 1

Etc. Then the B zero through 9s will be combined with their counts into an accumulator list of lists, sorted by the count... this will then be used in printing the results side by side by highest count. Same for the C numbers.

Whole process repeated for the history[n][1] and history [n][2] positional seeds.

The play strategy has been revised to include only one pass at the pick 5 on a pick 3 win, eliminating the pick 2 and pick 4. Will also probably play the seed number as it is a follower pick.

Cost...

Pick 3 day $4

Pick 3 eve $4

Pick 5 day $6

Pick 5 eve $6

Though the pick 5 numbers won't see the light of day until there is a pick 3 win. (If one happens)

The hardest part of this setup will be verification... never worked across columns before in any meaningful way. Unless I count the top result from the history in excel and see if that number combo matches the frequency... that might work.

Entry #324

Starting a new frequency project.

For my entire lottery number crunching history I have only ever studied each column of games like the pick 3 in isolation, treating it as sequential games of 1 in 10 rather than the true 1:1000 the game actually is.

The next project will attempt to count frequencies across combos. Definitely a first for me.

It will analyze, by position, a starting combo and count the numbers that appeared with each number in each position. It will end up with 3 likely combos to play rather than 1, but what is the point of reinventing a wheel that uses the exact same mechanics again and again?

Credit where credit is due.... this idea was brought to light by Dr. San in the Lottery Discussion forum.

I do not think that we will break any ground other than getting me out of my "columnar isolationism", but challenges are a good way to stay sharp coding.

Since this will generate more lines to play, I will be skipping the pick 2 and the pick 4, instead focusing on the pick 3 (3 combos per draw) and adding the functionality for the pick 5 (5 combos per draw).

After the massive coding part will be a short test on the pick 3, and the seed combo will be the pick generated from the frequency script.

In other news, PA RSS feed sucks to work with, they should just put up an API like most other state lottery sites... would not be surprised to hear that they run the entire operation on a Commodore VIC 20 in COBOL...🙄

Still trying to make that auto updater for the csv files, but hitting way too many obstacles in reading the results cleanly and consistently...

JSON pa lottery coders... JSON! Try it sometime!

Entry #323

Early observations with most recent data

1. The PA mid day seems to be more in tune with the direct follower data. (RNG draw)

2. The PA evening seems to be more in tune with the angular data. (Mechanical draw)

 

This makes sense as I had no hits on the follower system at night on any game but did catch a straight on the mid day pick 3. Because all of the follower and angular follower data can be generated in one script, they can be studied more carefully.

Now it is time to try and notice what indicators may show where in the distribution list the next pick will come from, as it is not always the most frequent.

I must resist the desire to take one observation and make it a rule without stepping through multiple draws to see if indeed the observations hold.

Will be trying to import the data into R studio and getting some actual statistics and graphs, but I still need to find the missing puzzle pieces... why must this be so difficult?

Oh well, back to the data until I leave for work.

Happy data analytics?

Entry #322