Carbob: I have a couple of questions. If I input my draws, 1-500, new to old and I predict from 501-510. Are the numbers I get are they for draws 501 thru 510, or are they for the next draw? My input is descending.
The data in the input file should be arranged from the oldest to the newest results. That is, record 1 for the oldest draw, record 500 for the newest draw. It's also important that the last record really reflects the last known draw. That's because to make a prediction, VRA uses the current "state" as a point of reference, and looks for the similar states in the past. The resulting prediction, dependng on the model, is a regression through all the similar states in the past. If you are using the dataset where the last record reflects some data 100 draws ago, and try to predict for tomorrow, all you'll get is the prediction for the draw that occured 99 draws ago. Think about something very deterministic, such as the motion of a second hand on the clock. Even if you didn't know how the clock works, it would not take you long to realize that if the second hand is at 22, the next point will be 23, and that would indeed be a good prediction. But to make the meaningful prediction about the second hand in the future, you must know where it is now. If someone showed you a broken watch whose second hand is motionless at 22, you would still make a prediction of "23" for the next moment, and it would still be a perfect prediction... for the event that that has already occured. It's the same with VRA when it comes to predicting: unlike some other programs which probably use some combinatorics to figure the "good" and the "bad" numbers, VRA constructs the current state (phase) using the last known results. Subsequently, it searches the past history to find similar states (close neighbors in the statistical terminology). What happened immdeiately after those states makes the basis for the next prediction, using various averaging and smoothing techniques which you control with your model parameters. In other words, the entire premise (for all the models in VRA) is that if event X was followed by event Y in the past, and we are currently observing event Z which is similar to event X, then something similar to Y will happen tomorrow. If you understand that concept, you understand VRA.
valelli: nemesys can you help, ive got several excel worksheets that ive been using with vra, and for some unknown reason it has stopped working, the vra programme starts up ok, but when i go to open a signal (ie one of my sheets) it closes down the vra programme, its only just started doing this, is there something wrong somewhere, and before anyone shouts virus, ive excluded that already, any thoughts
I recently discovered a bug that prevents VRA from loading some Excel files. To get around it, use a "Save As" option in Excel and specify a CSV (comma-separated values) format. That will save the data file in a simple text file which will make it easier for VRA to read. I'll fix that problem in one of the next VRA releases. Additionally, the entire data set must be arranged as a single column of numbers. VRA will probably load multiple columns as well, but it will treat it as a single series (column) anyway.
FrankieH: Do you have a suggestion for the settings of "Neighbors" and "Fixed Size"? My settings are set for Neighbors (1) and Fixed Size (25).
The "Number of Neighbors" parameter controls the smoothness of the fit. When the value is large, VRA will include the neighbors that are relatively far from the current state. In the extreme case, it will include all neighbors in the past. When the value is small, VRA will only include the very close neighbors. To put it more precisely in the statistical terms, the "Number of Neighbors" parameter controls the balance (or trade off) between variance and bias. With the large values of this parameter, the fit will be smooth (because it averages over many points), but it may deteriorate to predicting just the same number all the time. With the small values, the fit will look like a zig-zag trying to fit more closely, but you may be way off on many points. There is no standard scientific way to find the best value. The standard recipie is: "Let the data choose it".
The "Fixed Size" option lets you choose the maximum separation between the neighbors, instead of specifying the number of neighbors directly. The end result is similar: effectively, you are controllong how many neighbors you "admit" to the neighborhood.
LANTERN: It seems to me that the software needs to be optimized for lottery predictions and then maybe it would be the new "Holy Grail" of prediction software.
Thanks for the feedback and participation to everyone. I'd like to restate for the record that I hold very little hope that VRA can be of any value when it comes to predicting the lottery results, or any other results of unpredictable nature. In fact, I'll make it my purpose to work against you guys and demonstrate with VRA that the predictions that it generates are no better than a random guess. By it's nature VRA is just as good at showing that something is random as it is at showing that something is not random. Let's see who wins.
Vick: didn't do good tonight but yesterday's was a firecracker.:
result nearest neighbour
5 / 20 17-4 - % of mean size max = 500 3 7 10 13 17 20 24 27 30 34 37 41 44 48 50 54 57 61 64 67
results here:
http://lottery.sympatico.msn.ca/cgi-bin/english?job=show_results&lottery=on_daily_keno
I know very little about the lottery, so can you translate the term "firecracker" into the money terms? That is, how much would you win if you used VRA to play that draw?
paurths: The overall result of the test was that each time i checked with a real draw, after creating about 300 predictions (ouch, my fingers and eyes!!!), i had all six of the nbrs at least a dozen times. Trouble is that it not once repeated whit the same dimension and delay.
Finally i gave up on testing b/c it was way too time consuming.
I feel your pain, and for the large part, it's there because VRA is really not meant for lottery prediction, although I am thankful to Nemesys and others for popularizing it here. I'll consider how to make it easier for you, the lotto guys.
Nemesys: The best values for dimension seem to be 21 or 22
That seems way too high to me. There is that thing called "the curse of dimensionality", which is saying that the larger the dimension, the larger the data set you need to fill that hyper space with enough points. Here is what it means in more intuitive terms: take a dozen of pennies and arange them randomly along a straight line of 20 inches long. Now, point your finger anywhere on that like, and you are likely to touch a penny. Now, arange them randomly anywhere in the square of dimension 20x20 inches, and point your finger anywhere -- you are much less likely to hit a penny. Now do the same in the cube of dimension 20x20x20, and you are even less likely to find a penny. And so it goes on, making it exponensially less likely to find the neighbors in the hyperspace of ever increasing dimension. For the hyperspace of dimension 20, you would need billions of data points in your data set to find a meaningful number of close neighbors. Thus the curse: if you are studying the system which lives in a high-dimensional space (that is, the system driven by a large number of independent variables), you'll probably never find out how it works, because you would need a data set whose size is so large that it'll take you the age of the Universe just to gather the data.