hypersoniq's Blog

hypersoniq's Blog has 673 entries and has been viewed 440,692 times.
Lottery Post members have made 559 comments in hypersoniq's Blog.
hypersoniq is a Standard member.

October 18, 2024
8:50 am

Permutation counter design considerations

As this project moves through the planning stages, it is important to have an idea of the following;

1. Input. This will be the data structure that will be read into the script and stored in a certain configuration within the program that can be used as intended. Here it might be advantageous to use the "list of tuples" structure I have created for the vertical horizon project. There is a specific plus of already dealing with leading zero numbers, because 001 is not the same as 1.

2. Processing. What do we want to do with the data?

3. Output. This will be the list of ALL combos (1,000 for the pick 3 and 100,000 for the pick 5) followed by their straight hit count, their permutation hit count and their combined total past winnings, a dollar value based on the combined straight and box hits. It will print to the screen so I can be sure it is working, but will also write to a .csv file for further sorting and filtering using spreadsheet tools.

Logic dictates that the highest dollar value is the best paying combo, and the top combos will change over time, so a betting strategy will need to be in place. I think this lends itself better to playing the combo for a week rather than a day, so the strategy is

Pick 3 .50 straight / .50 box, total cost for a week is $14. Re run the program once per week to get a pick for the next week. Continue until a hit, then add in the pick 5, though here we will be looking specifically at the combos that have 5 unique digits so a box hit stays under the taxable claim form level. That cost at $1 straight and $1 box would be $28. Since 1 box hit on the pick 3 would be $40 or $80 (depending on if there was a pair in the top combo) the pick 5 play will wait until the pick 3 has a box hit.

On a $40 box hit, the 5 will replace the 3, keeping the cost at $28 and holding the $12 to offset the cost of the next week's pick 3.

A box hit on the pick 5 with 5 unique numbers pays $75 less than a $500 pick 3 straight, no claim form required as it is less than $600. A win there opens up the possibility to play both the 3 and the 5 for a month and still have some profit left.

So now that the input data structure is decided and I have an idea of what the output should be, I can focus on processing to get the project underway. This will entail counting directly and counting permutations. The process is far from being up and running, so if it turns out that the permutation functions in pandas works better than the functions in python, the data structure may also change a bit, but reading in rows across columns will be the input method. As this means my history .csv files do not require any additional modifications.

Happy Coding!

1 Comment

Entry #343

October 16, 2024
10:40 am

Next up, pick 3 permutation counter.

Going to create a counter for permutations of the pick 3 combos.

It stands to reason that the combo that won the most times straight is not necessarily the one with the most appearances boxed as well.

This one will count all of the combos for straight hits, but then also for hits on any permutations. Of course the triples will be kept out, so that is going to count the rest of the possible combos, all 900 from 001 to 998.

This time I am aiming to generate a report rather than just save to a csv file for further processing. (Though that will also be done).

Output format will be sorted by historic winnings...

NNN. # straight hits. # permutation hits (boxed) $calculated winnings based on history

There are libraries in Python that make dealing with permutations orders of magnitude easier than the long and complex spreadsheet formulas.

Why all the extra work? Because this process gets a great deal more difficult when scaling up to the pick 5!

Pick 3, 3 unique digits has 1 combo with 5 additional permutations... pick 5, 5 unique digits has 1 combo with 119 additional permutations!

Plus, leaving out the 5 of a kind combos, there remain 99,990 combos to sift through.

Same work flow as prior attempts, perfect the design on the pick 3 then just scale up...

Should be relatively easy, right?

4 Comments

Entry #342

October 11, 2024
11:32 am

The Pick 3 combo plan.

So, Today I will start looking at the pick 3 with the whole combo rather than by individual columns.

Pick 3 is easy to start with.

1. Count the most frequent number to have appeared straight.

2. Count all of the permutations of the combo fetched from step one, which would be box hits of that straight number.

3. Try it out on 0.50/0.50 tickets for maybe a month.

Different number combo expected from the day and night histories.

I can probably hammer that out in under an hour. There are at most 5 additional permutations to cover when looking at a pick 3 straight number comprised of all unique digits.

This balloons to 119 additional permutations if looking at a pick 5 number, so that will happen later.

The difference between the most frequent combo and the next most frequent combo will determine how frequently the process must be run again.

Next step, write a python program to search the database for the most frequent combo when all permutations are summed. This will obviously take longer, and will auto scale between pick 3 files and pick 5 files.

The only thing new will be testing the viability of box play. I can even create new CSV files to check combo followers distribution... nothing in the code base so far has been truly abandoned.

2 Comments

Entry #341

October 10, 2024
10:25 am

The big show, setting up the seed test for MM and PB.

After watching the results in the ongoing seed test on the PA Match 6 favoring the 6 highest numbers over all vs the most frequent per column, there are some differences that will need to be taken into account for the next round of seed tests...

1. There is no "bonus" ball in the match 6. All 6 columns were used together to generate the top 6. In the big games, it will be reduced to a field of the first 5 columns, with the bonus ball handled using straight up columnar frequency.

2. There are 2 free QP lines given on each Match 6 ticket that will not be present in the big games. These QP lines factor into the prizes won so far, which is why the seed test must be re run for the mega and PB.

3. There is not as much history under the current big game matrices... this could lead to a data starved situation when using the seeds later to generate lines. Should this condition occur, the decision will be to use a QP number for any missing numbers in a given line. On the kiosks in PA, if you enter too few numbers on a ticket, it will offer to fill in the rest of the ticket with QP numbers.

The tickets are also a greater expense, so the seed test will be limited to the same 10 draws as the Match 6. The one that has the most numbers appearing will move on to become the input seed for a vertical horizon test for that game.

The VH test will be a one off for each game because of the expense of buying 7 tickets vs the usual 1 per draw.

Still struggling to make auto updates happen, but still working on it. It is not just the parsing of the RSS feed, but also the selection of the first draw NOT in history and encoding the data correctly. I realize I have no choice but to manually update the PB and MM databases from where I gave up on them last year, but hopefully I get a breakthrough on the update automation process. I have been reading a free online book titled "Automate the boring stuff with Python" that provides some solid starting points for such a task.

With the dailies I am looking into gathering some stats on the whole combos rather than individual numbers in columns. This should be able to open up pathways to not just count winning straight combos, but also to count their permutations to get a top list of number combos that perform the highest, that might help the synchronization disconnect when just looking at individual digits.

Rather than give up again and wait for inspiration, I am going to just push through the idea drought and expand on more coding techniques to extract more info from the code base I have already created.

Happy Coding!

Comments

Entry #340

October 7, 2024
10:25 pm

What if we have been taking the wrong approach this whole time?

Attempt after attempt has been made trying to make order from chaos. If anyone has figured it out, odds are that they take that secret to their grave.

What if, instead, a better random number generator were built? Not a pseudo random number generator (PRNG), but a complete source of randomness used to generate the "picks". API calls to random.org, or even taking those and applying irreversible encryption similar to how the Bitcoin block chain uses double applications of the SHA256 one way encryption algorithm...

Set up the parameters for each run, such as 3 digits ranging from 0 to 9 in a pick n game to 6 digits from 1 to 49 in a 6/49 game...

Instead of the frustration of dead ends, it is like playing a quick pick, but without the lottery computers taking part in the process...

This is intriguing...

This is achievable...

This might be the next move instead of giving up again!

Comments

Entry #339

October 6, 2024
6:47 am

Seed vs seed in Match 6, day 2

The seed only test (also $28) over 7 days is already showing a bias in favor of picking the top 6 numbers regardless of position vs. The top number in each column.

The first day had 0 hits, the second day saw a $5 hit on the newer method (top 6 over all). 5 days to go.

I will not have a day off until Wednesday, so that will probably be my first opportunity to update the history files of the PB and MM.

After the Match 6 test expires, might be worth a short run head to head seed test on the big games, maybe 5 draws of each.

I wonder if something similar could be applied at smaller scale to the daily games? Now that I have finally expanded from being a columnar isolationist...

Also cooking up an idea about using hidden Markov chains, but that is still far away from any type of test implementation.

1 Comment

Entry #338

October 3, 2024
12:23 pm

Tonight will be the second test on the PA Match 6.

Since the last test ended without a clear cut winner, going to try again tonight. The winnings from the last attempt will make this one cost $10.

Same setup...

Generate seed draws using the spreadsheet

One using the most frequent number by column and one using the 6 most frequent numbers regardless of position.

Run each seed through the vertical horizon script, which will generate a line of the most frequent numbers to appear with each seed digit in it's position... this adds 6 lines to each.

Result is 14 separate tickets.

Winner determined by which system of picked lines have the most numbers. The QPs (2 per ticket) are not factored into measuring the result.

If this ends up being another test without a clear cut winner, then we will just move into playing just the seed numbers for a week on the jackpot games and see which has the most matches in 2 (MM) or 3 (PB) plays.

Still have some calculating to do regarding the bonus ball concept, but it hopefully provides a direction in which to move forward.

1 Comment

Entry #337

September 30, 2024
8:21 am

Gearing up for the big PA Match 6 live test

Having only ever studied the data in sorted order columns, there were some unexpected results when gathering a count of the numbers regardless of position.

The first stand out is that none of the numbers are the same! All 6 are different when counting from the entire grid.

In sorted order, by column, a 1 is the most frequent in column A and a 49 is most frequent in column F. They also appear at double the frequency of the other 4 positions.

When counting the numbers regardless of position, 4 ends up being the most frequent number overall. This makes sense because the numbers are not restricted to a single column. The application of the most frequent numbers to appear by position, the "Vertical Horizon" script, has no missing entries, some of them have fewer distribution entries though, such as a top 5 rather than a top 20, because of how the script only records non zero entries.

In the by column counts, 1 appears around 455 times and 49 appears almost 460, but the top overall picks have counts above 500.

All of the counting was done in the spreadsheet. There was no need for a script for this task.

For each pick generated by counting, there will be 6 corresponding lines generated by the script. Grand total of 14 tickets (because I am also playing the 2 seed lines) at $2 per.

Counting the free quick pick lines, it will be 42 lines for tonight's game.

For the purpose of this experiment, I am only counting matches on the lines I picked. Which ever one does better will be the one moved forward to the big games for a run.

One winner took home $1,680,000 on the September 25th Match 6 draw, so the jackpot reset to $500,000. It is up to $620,000 for tonight.

So, the counting is super easy in a spreadsheet!

To count the most frequent in a column, simply use the "mode" function giving the column data range as the argument. You can use any empty cell, I usually pick one below the history rows. For my sheet this was

mode(B2:B3882)

Grab that cell and drag it to the right until the other 5 spaces are filled and you have your most frequent by position! Hint: the last cell should contain

mode(G2:G3882)

Note: I start with row 2 because I use a header row. Now, counting frequency by the whole history is slightly different. Somewhere on the sheet where you have the room, create a column (for arguments sake we will arbitrarily choose column J) and fill it from 1 in J1 to 49 in J49. This is our target list.

Now we make our counting function in K1 to read

countif($B$2:$G$3882;J1)

Note the dollar signs, they turn the default relative references into absolute references. When we fill down, the only thing that will change is the target cell. So do that, drag K1 and auto fill down to K49. Now you have the counts! Let's sort them to find out the top 6...

Open a new sheet in the current workbook. Select the range J1 to K49 and copy.

Go to the new worksheet and use "paste special" to copy only the values (we don't want the formulas here, just their results.) Into cell A1.

Now select cell B1 (this will be the K1 from the first workbook) here is a shortcut for selecting the entire range that contains data... CTRL+SHIFT+Down Arrow. This should then select B1 through B49. Click the SORT Z-->A (descending). It will ask if you want to extend the selection, click OK. Now you have the list of all 49 numbers, sorted by frequency!

The process will alter slightly for different game matrices, but you get the concept.

1 Comment

Entry #336

September 28, 2024
9:04 pm

Two different paths to take for a seed pick...

There are other ways to generate a seed pick that have a statistical grounding but do NOT rely on the frequency of followers...

This is specifically for jackpot type games.

The first is to identify the number in each sorted order column... this provides the actual most drawn by position over all. (Pure spreadsheet analysis)

The second is to expand the countIf formula to include ALL columns and identify the top N numbers to have been drawn in the entire game in ANY position. (Also Pure spreadsheet analysis)

It is probably worth a test to see which method works better to focus on a path forward...

Because of the python script that can provide the numbers most likely to appear with a certain number when it is drawn, it is the selection of the seed number that will be the target of short term experiments.

The cost of testing both methods will be $28 on the PA Match 6. The $14 test on the follower seed was a $9 loss. But the results have been recorded, and will be compared with the results of these 2 tests.

After this, some down time as the big jackpot games get a history file update (not since the early part of last year!) And the winning method will get a test on the Mega Millions ($3 x 7 = $21) and then the Powerball ($4 x7 = $28)

So $77 slated for the next 3 tests total.

Unlike the dailies, this method only needs to work one time to be a success...

1 Comment

Entry #335

September 22, 2024
2:36 pm

Maybe an LLM can help find the next direction...

As long as you are not specifically asking for a lottery solution, LLMs like ChatGPT and Claude will discuss prediction methodologies.

Like anything else in life, you tend to get out of it what you put into it. The art of getting useful information already has a name, "prompt engineering".

Perhaps it is time to start asking the right questions... experimentation with these LLMs is still free, so may as well sink some time into seeing what the tech can do to solve this problem.

1 Comment

Entry #334

September 19, 2024
11:13 am

Time to do some trigonometry!

After studying the last result with the program output, there was no way to utilize the given data to cone up with the next result that holds across columns or draws.

Next up is to break out the visualizer to try and test some trigonometric operations on the angles (and/or line lengths). Looking for something consistent that might provide the next angle when given the last few... sine, cosine, tangent, secant, cosecant, hyperbolic tangent.... just some of the operations that will be tested to help give a consistent result.

The alternative would be to undertake a massive coding effort in order to measure error in the picks, like I did for the pure followers, only expanding to include the angles and lines... I would rather not have to start that one, as that could take months...

Mis application of math on trying to solve the lottery is something we have all done (who try to predict), whether it be a workout or a simple +1, -1, +1... just trying to take that to the next level.

Anyone have ideas about how we could use calculus?

Comments

Entry #333

September 17, 2024
1:20 pm

What a different story the data tells now...

Using the distribution of angles, there is formed a bell curve that centers on 0, or the repeat. This is because 0 shares no "negative split" with a corresponding angle.

Adding the lines... the lines tell a wildly different story. The most popular following move is a difference of 1, or a line length of 1.41... corresponds to both 45 degrees AND -45 degrees.

The +/- 1 is the most popular line follower and it is not even close... in every position! The repeats, or length = 1, rank 4th at best.

So now, how can this be used?

One idea is to look at the ranked angular data and pick the smallest non zero move that appears highest in the list... such as if a -45 is ranked higher than the 45, then simply subtract 1 from the last draw... if a 9 or a zero appear as the last drawn number then -45 on a 0 and 45 on a 9 violate the grid constraints, so the choice is easier.

While no system will work every time, if this could help steer closer to a correct pick, then by all means it will be utilized.

I took a snapshot of the follower script output, decided NOT to play today anyway, so a "paper pick" analysis of the concept can be tested after the draws.

All sheets were modified, but still have to update the pick 5 sheets to current, as they sat stale for quite some time awaiting a pick3 hit to get started.

It may be just more meaningless information creating noise, but it does tell a way different story than just studying the angular data. For instance, the length of ~9 can only happen when a 0 follows a 9 or a 9 follows a 0. It cannot happen any other way.

Now to determine exactly how important this new information is and figuring out how to apply it consistently to picks.

To confirm...

Raw follower data distribution = a near uniform distribution

Angular follower data distribution = a bell curve centered on 0 degrees

Line length follower data distribution = a more logarithmic or even near exponential distribution, centered on +/- 1

So if we know that the +1 or -1 fits the line data, we might use the placement of the 45/-45 in the angle data to help decide if that 1 is added or subtracted...

Comments

Entry #332

September 17, 2024
8:59 am

Today's challenge... integrate the remaining vector data...

Angles are in place. They sit beside the normal draw history. Today adding the line lengths that are a part of the vectors. (Vector = angle and line length).

The distribution will differ from the angles as there are 19 possible angles (counting negatives) but only 10 possible line lengths, as these are positive only. And though they are equal in number to the actual digits, they are predicted to have a more logarithmic distribution than the raw draw data, as they maintain a many to many relationship like the angles do...

Length of 1 could be any repeat, for example.

Hopefully this makes it possible to narrow down a pick to the cumulative score on the distribution lists that do not exceed the constraints.

The process is time consuming, but not very difficult. For the pick 3...

1. Add 3 columns that will hold the new data

2. Add a column to the lookup table beside the angles

3. Enter a formula in the new columns that will look up the line lengths

4. Prepare the data on a new worksheet by "paste special_values only"

5. Export this hybrid worksheet to a .csv file.

Since I have decided to skip the pick 2 and pick 4, this only involves modifying and updating 4 sheets. P3 Mid and Eve and P5 Mid and Eve.

The follower script does not even need a single modification, as it is flexible in how it reads the data.

It is more a prep for future development, but also falls into that category of "at least I tried everything I could think of."

Comments

Entry #331

September 16, 2024
3:18 pm

The positive takeaway from the current attempt..

This would undoubtedly be that I can store data in the same file and only target the columns needed.

This means I can store draw data alongside it's corresponding angle data and just choose to isolate one or the other. I can also expand on what is there.

For each angle (signed, forming a bell curve type distribution) there is a corresponding line length that comes from the vector, this will also be added. The expected distribution would be more logarithmic, with shorter distances expected to make up the majority of the draws.

Why is this exciting? Because I would not have to go through abstraction processes like "one hot encoding" to turn features into binary data, I can just use the actual data instead.

This process would create data points (draw history) with features (lead in angle and line length) which would allow the use of machine learning algorithms to help figure out what in the data may be of importance.

The engineering part of machine learning goes through two steps when you are unsure what is significant...

1. An unsupervised algorithm where it can spot and report patterns and help estimate weights

2. A supervised algorithm where it will actually use the info from the previous step to help build and train a model to obtain predictions.

This was always an idea, but now it gets a step closer to realization.

Now the history file can be packed with information. Basic aggregations like odd/even and high/low, alongside the vector components. Using step 1 to determine what is important and what is irrelevant, then using this information to craft step 2.

Yet I do not need to remake versions of the history files because individual columns can be targeted in the scripts. Multiple scripts, one history mega file for each game...

That will not be an easy slam dunk like these scripts that spit out follower distributions, but it represents a goal, and when you have a goal, you have motivation...

Much to do...

Comments

Entry #330

September 14, 2024
3:43 pm

Something is still missing...

For this year, I moved into gathering follower distributions, basically raw Markov probabilities. The number most likely to follow another in a column is not always right, but sometimes they are. So generating one pick is possible, but it does not win with enough regularity to be interesting. It always seems that the columns are out of phase with each other.

The new addition aimed to help fix that by analyzing the raw Markov probabilities of the digits that appear with a locked column in the other columns. This introduces 3 new picks into the mix, taking the cost from $1 to $4, and if none of the digits appear in the seed, you have no chance of a win (which is why the decision was made to change from $1 straight to 0.50/0.50 straight/box.

Another fundamental change was brought in when I decided to move from the frequency of raw numbers into the realm of vectors. The angles formed between draws shift the data from a near uniform distribution to more of a bell curve distribution because the angles are both positive and negative.

All the ideas, all of the imagined and then realized code solutions... still no winners...

It is time consuming to constantly update every draw history file to get accurate data to work with, each csv has a spreadsheet as well. Angular data is calculated via lookup tables and needs to be copied every single draw.

It seems like a great deal of work for something that does not produce results.

Having used the lookup table because it was the quickest solution, and having recent success with the Python dictionary structure in the latest script... I am thinking of quite literally automating the boring stuff...

I will continue to take a shot here and there with the new scripts to give it a fair chance, but I feel the need to focus on automating the update process.

The script will pull data from the PA lottery RSS feed and attempt to read it back to fill in the missing dates in the .csv files. If I use a dictionary in place of the lookup table, it can generate angular data on the fly... hopefully.

I have once again depleted the idea vault, the angular and cross column ideas were based on the concepts of other lp members.

I just missed a box hit today on the PA mid... had 2 of 3 on the last number generated. So I do have something to experiment with, but it may be time to finally set to the task of automating the update process. It is a motivation killer to go through 8 spreadsheets and 16 csv files on a daily basis, then run 2 scripts just to get a pics.

But something is definitely missing from the solution of generating a winning combo, and I am not sure what that is...

2 Comments

Entry #329