Information Gain in Discretized Timeseries
I've been learning about information gain and playing around with it and its relation to timeseries data. For several reasons it is sometimes desirable to descretize timeseries data. For one it limits the number of unique values attainable by any time series, so for a finite lenght series there are only a finite number of possible different series.
My question was how does discretization of timeseries affect information gain? For this I generated 100,000 random timeseries each 50 steps long, by layering ten sin waves with random phase, amplitude and frequency on top of each other. They were then normalized between [-1, 1]:
x = [0., 0.25645654, 0.51291309, ... , 12.56637061] series = [0, 0, 0, ... , 0] for i =1:10: phase = rand() * 2 * PI freq = rand() * 10 amp = rand() * 10 series += amp * sin(freq * x + phase) series /= max(abs(series.min()), series.max())
Next I create a data set by assigning each timeseries a random class label (A, B, C, D) with the following probabilities (.5, .2, .15, .15). This data set is used for calculate the information gain with each level of discretization.
Then I calculate the information gain for each timeseries after discretizing them by discretizing the interval [-1, 1] using these intervals (0, .01, .05, .1, .2, .5). Note: a 0 interval is the raw timeseries.
0.01 = 1.78390882425
0.05 = 1.79349956755
0.1 = 1.77665350621
0.2 = 1.77245058685
0.5 = 1.79737444506