Edinburgh United Kingdom Member #97833 September 24, 2010 41 Posts Offline

Posted: July 3, 2012, 2:11 am - IP Logged

Hi,

I'm looking for help on the following problem (it's similar to the lottery): I'm following a phenomenon and I'm taking readings 15 times a day. Each reading can be classed A, B or C. So, every day I'm getting a string of 15 readings. The readings are evenly spread during the day and they are independent. They don't influence each other in any way. So far, I have n days and n readings. I need to predict the string (the whole row) on day n+1.

Here's an example of what I'm getting:

Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12 Y13 Y14 Y15 ========================================================================== B C A B A B B C A C B C C B C C C A B B C B C C A A C B A A C A B B A C B C C B C A C A C C B B C B A B A C A C C B A B B A C C B C C B A C A A C B A A A B A B B A C B A B C A A A B A A C B A B B A A B A A B A C B A B C C C C B B C B A A A A A C B A B B C C C A C B A B A B A B C B C B C A A B C C B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B C C C A A B C C B C A C B B C B C C B A C B A C B A C A A B C C C A C C A A B A B B C A A B C A B A C A A C A B C A A C B B B B C C B C B A A B B A C B B A B C B A C B C C A B B A B B C C B B C C A B C C C B

Shall I treat this as a matrix? Shall I treat each column independently bearing in mind that the readings are independent? How many lines should I have in order to be able to make a prediction? Any ideas would be apreciated. Or even a nudge in the right direction would do me so I don't waste any time. I can do a little bit of programming in QBasic (a bit obsolete, no graphics, but it works fine). I'm ready to help in exchange for good suggestions.

Economy class Belgium Member #123700 February 27, 2012 4035 Posts Offline

Posted: July 3, 2012, 12:17 pm - IP Logged

Quote: Originally posted by martor854 on July 3, 2012

Hi,

I'm looking for help on the following problem (it's similar to the lottery): I'm following a phenomenon and I'm taking readings 15 times a day. Each reading can be classed A, B or C. So, every day I'm getting a string of 15 readings. The readings are evenly spread during the day and they are independent. They don't influence each other in any way. So far, I have n days and n readings. I need to predict the string (the whole row) on day n+1.

Here's an example of what I'm getting:

Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12 Y13 Y14 Y15 ========================================================================== B C A B A B B C A C B C C B C C C A B B C B C C A A C B A A C A B B A C B C C B C A C A C C B B C B A B A C A C C B A B B A C C B C C B A C A A C B A A A B A B B A C B A B C A A A B A A C B A B B A A B A A B A C B A B C C C C B B C B A A A A A C B A B B C C C A C B A B A B A B C B C B C A A B C C B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B C C C A A B C C B C A C B B C B C C B A C B A C B A C A A B C C C A C C A A B A B B C A A B C A B A C A A C A B C A A C B B B B C C B C B A A B B A C B B A B C B A C B C C A B B A B B C C B B C C A B C C C B

Shall I treat this as a matrix? Shall I treat each column independently bearing in mind that the readings are independent? How many lines should I have in order to be able to make a prediction? Any ideas would be apreciated. Or even a nudge in the right direction would do me so I don't waste any time. I can do a little bit of programming in QBasic (a bit obsolete, no graphics, but it works fine). I'm ready to help in exchange for good suggestions.

Thanks.

martor854

What does Y stand for? What do A, B and C stand for? What does the word day stand for?

Edinburgh United Kingdom Member #97833 September 24, 2010 41 Posts Offline

Posted: July 3, 2012, 1:11 pm - IP Logged

Hi SergeM,

Thanks for taking the time to have a look at my problem. Here are your answers:

Y1, Y2,... = Column headers, instead of ordinal numerals (I, II,... or #1, #2,...).

A, B, and C are readings. Imagine a meter that shows A at the moment of the first reading, C for the second, B for the third, a.s.o. This meter can only show A, B or C. The readings are independent of each other. It's like a random generator problem.

"day" is a day (24 hours or a working day). I don't think it's relevant to solving the problem. The point is that reading always start at the same moment (your choice) in the day and the 15 readings are evenly spaced (0.5 hour, 1 hour, a.s.o., again, your choice). In other words, 1 day = 1 row in our table/matrix.

Since readings are independent of each other, I would treat each column separately, like a time series. I wonder whether Fourier transforms can be applied to linear time series? Or, perhaps, some software? Or any other method. Ideas, welcome!

Economy class Belgium Member #123700 February 27, 2012 4035 Posts Offline

Posted: July 3, 2012, 1:25 pm - IP Logged

Quote: Originally posted by martor854 on July 3, 2012

Hi SergeM,

Thanks for taking the time to have a look at my problem. Here are your answers:

Y1, Y2,... = Column headers, instead of ordinal numerals (I, II,... or #1, #2,...).

A, B, and C are readings. Imagine a meter that shows A at the moment of the first reading, C for the second, B for the third, a.s.o. This meter can only show A, B or C. The readings are independent of each other. It's like a random generator problem.

"day" is a day (24 hours or a working day). I don't think it's relevant to solving the problem. The point is that reading always start at the same moment (your choice) in the day and the 15 readings are evenly spaced (0.5 hour, 1 hour, a.s.o., again, your choice). In other words, 1 day = 1 row in our table/matrix.

Since readings are independent of each other, I would treat each column separately, like a time series. I wonder whether Fourier transforms can be applied to linear time series? Or, perhaps, some software? Or any other method. Ideas, welcome!

mid-Ohio United States Member #9 March 24, 2001 19903 Posts Offline

Posted: July 3, 2012, 3:59 pm - IP Logged

Sounds like something I did years ago when working as an industrial engineering technician taking work samples. You need to chart your data to see if any occurs most often or can be associated with a particular time. You may come up with even more ideas once you start charting the data.

* you don't need to buy more tickets, just buy a winning ticket *

United Kingdom Member #70630 February 7, 2009 734 Posts Offline

Posted: July 5, 2012, 12:30 pm - IP Logged

Quote: Originally posted by martor854 on July 3, 2012

Hi,

I'm looking for help on the following problem (it's similar to the lottery): I'm following a phenomenon and I'm taking readings 15 times a day. Each reading can be classed A, B or C. So, every day I'm getting a string of 15 readings. The readings are evenly spread during the day and they are independent. They don't influence each other in any way. So far, I have n days and n readings. I need to predict the string (the whole row) on day n+1.

Here's an example of what I'm getting:

Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12 Y13 Y14 Y15 ========================================================================== B C A B A B B C A C B C C B C C C A B B C B C C A A C B A A C A B B A C B C C B C A C A C C B B C B A B A C A C C B A B B A C C B C C B A C A A C B A A A B A B B A C B A B C A A A B A A C B A B B A A B A A B A C B A B C C C C B B C B A A A A A C B A B B C C C A C B A B A B A B C B C B C A A B C C B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B C C C A A B C C B C A C B B C B C C B A C B A C B A C A A B C C C A C C A A B A B B C A A B C A B A C A A C A B C A A C B B B B C C B C B A A B B A C B B A B C B A C B C C A B B A B B C C B B C C A B C C C B

Shall I treat this as a matrix? Shall I treat each column independently bearing in mind that the readings are independent? How many lines should I have in order to be able to make a prediction? Any ideas would be apreciated. Or even a nudge in the right direction would do me so I don't waste any time. I can do a little bit of programming in QBasic (a bit obsolete, no graphics, but it works fine). I'm ready to help in exchange for good suggestions.

Thanks.

martor854

Hi martor854,

I have sent you a sample worksheet, if that is not what you are looking for please advise.

Regards,

billybouy...

Sometimes we can't see the woods for tree's, " so we have to clear a path "

mid-Ohio United States Member #9 March 24, 2001 19903 Posts Offline

Posted: July 13, 2012, 10:20 pm - IP Logged

Quote: Originally posted by martor854 on July 7, 2012

Hi RJOh,

It all started from a real phenomenon but I'm after a generally applicable algorithm.

Regards,

martor854

Then your best bet is to backward engineer each result and see if one algorithm comes up more than any other. This won't be 100% but I think if it's concerning lotteries then even 10% would be excellent.

* you don't need to buy more tickets, just buy a winning ticket *

Edinburgh United Kingdom Member #97833 September 24, 2010 41 Posts Offline

Posted: July 14, 2012, 2:24 am - IP Logged

Hi RJOh,

That is the best answer yet. It should be applicable to almost anything of the sort, including the lottery. Could you give me an idea on how I should start? Some indication on what to read, maybe? Also, we can on it together. I can do a bit of programming in QBasic. It may not be much but it's simple and fast.

Pittsburgh, PA United States Member #130598 July 20, 2012 37 Posts Offline

Posted: July 30, 2012, 12:09 am - IP Logged

<quote>'m looking for help on the following problem (it's similar to the lottery): I'm following a phenomenon and I'm taking readings 15 times a day. Each reading can be classed A, B or C. So, every day I'm getting a string of 15 readings. The readings are evenly spread during the day and they are independent. They don't influence each other in any way. So far, I have n days and n readings. I need to predict the string (the whole row) on day n+1</quote>

Hey Martor854, I like this problem and started brainstorming some ideas. I have to admit about a half hour in I had more questions then answers. lol. I'm no genius and I still have a lot to learn. But I wanted to ask you about some of what you said, quoted above and below. First off, speaking strictly mathematcs here, I was under the impression that independent events have no influence on each other, thus using past data to predict future data isn't really possible. (Tongue in cheek here since we are on lottery post which is exactly what most people are trying to do, including myself). That being said, my question is, what is a preferred method in the general sense to predict future outcomes of independant events from past independant events? And I'm not being a smart a**, I mean literally what method seems to be preferred? I don't know what a matrix is. Other then basically a list of numbers in a box or tablature style format (as you have listed your results) does "treating the numbers as a matrix" mean something specific? I'm unfamiliar with that idea.

Now for a couple of thoughts... I think columns and rows are each independent in your example and can (and should) be treated as such. In other words, if I wanted to track the frequency of an occorrunce of a reading in particular, I would track it both for a day and for a "time" of day (or position). But I would also try to track every iteration of possible frequency. Which I'm not sure if that is possible or practical, but if we can assume certain things, such as, each value has to occur at least once every day. Then we can safely assume that there will never be more than 13 of a value on a given day, which is a finite number to work with. So I would track the frequency of A followed by B, A followed by C, and A followed by A (ad iterum for B, C) and I would also track occurences of Triples+ (or any occurence when A follows AA or more) and I would track these occurences both throughout a day and a day's position, (or simply rows and columns). As for how many lines should you have in order to be able to make a prediction? Well technically I think if you had a perfect algorithm essentially you would need 3 days. Depending on how much of the previous data is integeral to your algorithm. Perhaps with a perfect algorithm 1 would suffice.

My initial thought was to track occurences (in rows and columns, including the occurences of "following" not just "showing") of each value and track it across time (daily and weekly) using an XmR chart. Perhaps a compound XmR chart with each value overlaid the next with different colors to more aptly see if there is any distinction. Since the results are obtained from some mechanism and mechanisms are not always 100%, there is a good chance that each value has a higher rate of occurence at a specific time or interval. A compound XmR chart will clearly show this if it is the case. With that data in hand, I would tie a value range from .0 to .99 to each time and day and value. At .99 (or close) I would say indicates a high likelihood of that value occuring in that spot. Determining how to increment the value of A, B, and C, on each day and time slot will be a little tricky but I would base my increments predominantly on the data obtained from the frequencies found using multiple XmR charts. (*** It should be noted that this method really is only going to be effective if there are charecteristics of events that determine the "readings". If these events are strictly speaking random, then chances are this approach is not going to be very effective. If these events aren't Random, only unknown because there is too much data to crunch then this would be my preferred method)

That's how I would START the problem. Not saying it would SOLVE the problem. Essentially the charts will either indicate nothing of significance, a little something of significance or a great deal of significance. From there I would try to extrapolate an algorithm for the occurence of each value in and of itself. With all that data maybe a solution would begin to show itself. That's my two cents. :)

<quote>Shall I treat this as a matrix? Shall I treat each column independently bearing in mind that the readings are independent? How many lines should I have in order to be able to make a prediction? Any ideas would be apreciated. Or even a nudge in the right direction would do me so I don't waste any time. I can do a little bit of programming in QBasic (a bit obsolete, no graphics, but it works fine). I'm ready to help in exchange for good suggestions.</quote>

Edinburgh United Kingdom Member #97833 September 24, 2010 41 Posts Offline

Posted: July 30, 2012, 2:59 am - IP Logged

Hi AlgorithmGuru,

Thank you for your contribution. I find it very fair and sensible. I have my doubts, too, that past, independent occurrences can help with predicting new ones. So, all that remains is probability. But, maybe I’m wrong.

Your view is interesting but I don’t know what an XmR chart is. I’d appreciate it if you could give an example or indicate where I can get one.

Intuitively, I’m inclined to go for Markov chains/model (like somebody else suggested) as I think our problem is similar to predicting the weather. I’m only scratching the surface. My lack of training in maths will make it difficult, though. I must find a “hands-on” example. If I make any progress, I'll keep everybody posted.

Pittsburgh, PA United States Member #130598 July 20, 2012 37 Posts Offline

Posted: July 31, 2012, 12:43 am - IP Logged

Hey martor. There is a wealth of information online about XmR charts or "Individual moving range" charts. It basically means you are charting data that has one source, (for instance daily sales, or in your case, daily frequency of an event and NOT an average recording, such as the average temperature today). It's a great method because it not only tracks the individual readings but also the movement between readings. Which is what can help to show patterns. If all the numbers plot within the limits of the XmR chart, then the "process" is considered stable. There are specific formulas for plotting the limits on an XmR chart. I encourage you to look them up online. I'm not able to post links to websites yet, but I can tell you if you google "Xmr on Excel" there is a great explanation of how to set up an XmR chart in excel (It's the first hit) and also information on how to understand the data and further methods of examining the chart once you have it constructed. As a side note, I was first introduced to XmR charts by a book Called "Understanding Variation - The Key to Managing Chaos" written by Donald J. Wheeler. The book is in fact nothing more then a practical introduction to XmR charts and why other charts fail in the same setting. It gives a lot of insight on understanding the data. It was published in 2000 by SPC press. At any rate. Good luck. :)