Hi all
In the past I have tested many secondary data sets to measure it's predictability using my own tools
but a few weeks ago I ran across a battery of tools that may work better than what I am using. Here
is a link to the NIST webpage http://csrc.nist.gov/groups/ST/toolkit/rng/stats_tests.html The National
Institute of Standards and Technology.
First I would like to know if anyone has a link to a free C compiler that will compile the tools available for
download. I have turbo C++ which is capible of compiling C code but I get a number of errors when I try
to run makefile. I would like to find something that will run the code without making any changes except
the two lines in makefile that need to be linked ti the compiler being used. The setup instructions are
located in section 5 of the PDF which is very well documented. The PDF can be downloaded by clicking the
top link on the download page.
The software can accept a string of ascii such as 111010101001010011110000101010 and evaluate it
for it's randomness. The tool can run all 15 test or just a few depending on ones needs. The 15 test are
shown on the main page with a brief explanation.
I always try to test the data I analyze to see if it's somewhat predictable and these test should be very useful
as I am working on a new predictor but want to know up front what I am up against. Below is a sample of the
data I am attempting to predict/forecast the next value. The two columns of data below will be formatted into
strings, top to bottom for 14 total strings. Each string will then be analyzed using the 15 test to help decide
which prediction algorithm would work best. The date here is far less random then the actual numbers they
represent. What I am trying to do is calculate which type of analysis would work best. I am thinking that some
of the test will indicate the data is random while others will find it's not. Lets say that runs or (1) or runs of (0)
show the data to be random but another test shows some weakness. The predictor code should be built around
the test showing the weakest which should improve effectiveness of the predictor used.
I don't know if others have used this type of analysis before but would very much like to know your thoughts or
results on this method. I guess one could sum it up to making predictions based on the weakest random elements
of the data being analyzed.
RL
0101101 0001111
0011001 0011011
0110010 0101100
1100001 0100100
1000110 0001110
1111111 0111111
0001101 0110000
0000001 1001100
1100101 0011100
0010110 0100010
1010011 0000011
0101100 1010011
0100101 1000001
0111100 0100010
0111010 1001011
1001011 0011011
1011110 0111101
1110000 0001001
1011110 0101100
1011010 0010001
1010100 0001000
0001111 0101101
0010001 0011001
0111010 0111011
0110100 0000011
0000100 0011110
0000101 1001100
1111011 0001101
0110001 0100000
1000011 0001100
0101111 1000111
0011011 0101111
1100001 0000011
0010001 0110010
1100111 0101101
1011101 0000011
0110111 0111100
1000111 0101001
0011000 0111110
1001010 0000100
0011011 0010011
0111010 0100000
0011001 1011101
0011111 0000101
0111000 0110000