While visiting Canada last week and having briefly entered a convenience store, I took an passing, casual interest in their lotto games, so I picked up a few play slips. I like to calculate odds as a mental challenge, and I turned my attention to the Lotto Max game, which appears to be the largest jackpot lotto in Canada. It is a nationally pooled lottery with ticket sales through regional or provincial administrations. This is a roughly similar structure to Powerball and Mega Millions in the U.S.
Briefly, the game is played as follows. The player buys a play costing C$5 which gives him/her 3 lines. Each line is 7 numbers selected from 1 to 50 inclusive. The player has the choice of manually selecting the first line or having the lottery RNG/computer generate, but in either case the lottery computer randomly selects the next two lines. In the actual lotto draw, 7 numbers are selected and then a "bonus ball" from the remaining 43 numbers. With the bonus ball draw, the player's numbers that are losers in the main draw then have the opportunity to match the bonus ball. The lotto then pays out at different prize tiers depending on how many "main draw" numbers are matched and whether the bonus ball is matched (the bonus ball being irrelevant if 7/7 numbers are matched in the main draw, i.e. the jackpot is hit).
So far this seems straightforward, except there is no clear explanation on any of the regional/provincial lottery commission websites on whether each line is independently sampled "with replacement" or whether they are dependently generated "without replacement" such that duplicate numbers are prohibited. I calculated the probabilities both ways, but when I compared against the posted winning odds tables on the websites, it was apparent that the case should be dependently generated lines without replacement (avoiding the possibility of duplicates).
How does this distinction make a difference? Let's take the case of "3/7 + no bonus ball". The probability of a single line matching this case are as follows:
p = [(7C3)*(43C4)/(50C7)]*[39/43] = 0.039221
Now let's suppose that each line is selected to avoid duplicates (or in probability parlance "unique events"). Then the overall probability of one out of three lines matching the "3/7 +no bonus ball" event description is 3 * 0.039221 or 0.117663. Translated into odds this is 1 in 8.5. This matches the posted odds on the regional lotto commission websites (e.g. Ontario, Quebec, Atlantic, etc.). HOWEVER, when I "googled" images of actual tickets, it became apparent that duplicate numbers are NOT avoided - in other words each of the 3 lines appears to be independently sampled ("with replacement").
How would this independent with replacement sampling on each line affect the overall odds for the 3 lines? In our "3/7 +no bonus ball" example, the overall probability is:
P = 1 - (1 - p)^3
This is the probability that at least 1 line matches the "3/7+no bonus ball" event. P works out to be 0.113108. In terms of odds, this is 1 in 8.84. So if I am correct about how the random numbers are generated in this lotto game (and I think I am), then the published probability on the lotto commission websites is off by about 4% from the real value (and more optimistic).