########################################################### Continuous Wavelet Transform (CWT), Scalogram Peak Detection, and Linear Discriminant Analysis (LDA) with Stepwise or Optimal Selection of Variables ########################################################### I. Method ========= I.1. Epochs and Rereferencing of Channels ----------------------------------------- The EpochLength was left unchanged. All channels were rereferenced to (A1+A2)/2. In data set Ib the VEOG channel was deleted (no eye artifact correction was performed). I.2. Log Scale -------------- The signal was resampled on a logarithmic scale -- the new sample rate was a linear function of time; the distance between two adjacent time samples was increased linearly. The pertinent parameter was ScaleDilation per second (SDps) and it was set: SDps = 3 for both data sets. I.3. Time Domain Filtering -------------------------- The signal in each trial and each channel was low-pass filtered by convolution with the function: exp( -(4*time/Scale)^2 ) * cos(2*pi*time/Scale). Frequencies higher than 2/Scale vanished, frequencies between 1/Scale and 2/Scale were attenuated, and frequencies lower than 1/Scale remained unchanged. Because of the log sampling, the effect was as if Scale had been dilated linearly with time and the filtering frequency had been reduced SDps times every second. Scale was set: Scale = 80 ms for data set Ia; Scale = 100 ms for data set Ib; This means that in the beginning of the epoch the filtering frequency was 10 Hz and at the end of the epoch -- 3.33 Hz. This procedure takes into account the fact that late ERP components usually have longer duration than earlier waves. Thus a more economical presentation of the signal was obtained. I.4. CWT -------- The signal from each channel and each single trial epoch was CWTransformed for scales ranging from Scale/2 to 4*EpochLength. The scalogram was sampled on a logarithmic grid allowing scale invariant resolution of 12*12 points per scale per octave. This means that a peak in the scalogram was always represented by approximately 12*12 time-scale samples regardless of its width. I.5. ERP Component Identification -- Peak Detection --------------------------------------------------- Student's two-sample t-value was calculated for each time-scale point of the CWT in each channel. ERP components were defined as local extrema in the thus obtained t-value scalograms. These were the points of maximal difference (variance taken into account) between the two experimental conditions. I.5.a. Spacial Extrema ...................... In a simplified version of the method, extrema are defined not only in time and scale but also in the spacial dimensions (maximal amplitude over all scalp positions). In the case of many channels, the data set is reduced dramatically (by a factor close to the number of channels) and the obtained variables have a straight-forward interpretation as ERP components. In the case of many channels, this method has also another, practical advantage -- only few selected electrodes are required for further spelling practice. I.6. LDA and Stepwise Selection of Variables -------------------------------------------- The thus extracted components were subject to LDA. Redundant variables were eliminated by a stepwise procedure. First, the discriminant function was calculated with all N components and the HitScore ( = number of correctly classified trials from the training data set) was computed and stored. Then, all (N-1)- component subsets were checked in the same manner and the one that maximized the HitScore was selected. This was repeated until all components were exhausted. The subset with the highest HitScore was selected. Then the eliminated variables were added back one at a time and always the one whose addition maximized the HitScore was selected for inclusion in the subset. Eventually, the subset with the largest HitScore was chosen. The discriminant function corresponding to this subset was finally applied on the test data. I.6.a. Optimal Selection (Checking All Subsets) ............................................... If the number N of original variables is small enough (N<25), e.g. in the case of spacial extrema (see I.5.a.), selection can be performed by checking all 2^N - 1 subsets. Again, the subset with the highest HitScore on the training data is selected. I.6.b. No Selection ................... Discrimination with all variables yields very good results as well. It is a reasonable alternative in the case of too many variables when stepwise selection becomes unacceptably time consuming (see II). I.7. Note --------- The described method does not rely on any assumptions about the kind of the ERP response to target stimuli. ANY significant difference in the ERP waveforms to the two experimental conditions will do. Neither does the method rely on visual inspection of the curves. Indeed, the following results were obtained without such visual inspection. II. Computational Load ====================== All transformation algorithms, especially the CWT, which is the most time consuming transformation, have been optimized for application on large data arrays. On a 704 MHz AMD CPU with 256 MB RAM, the CWT of the 483840 one-second signals of the P300 speller training data set ( = 42 characters * 180 trials per character * 64 channels) takes about 20 min (about 2.5 ms per signal). The stepwise procedure with 439 components (P3 speller) takes about 13 h; with 547 components it takes 30 h. With the SCP data sets the number of channels is low and so is the number of extrema; consequently, the stepwise procedure takes only about a minute or two. The HitScore computation for all subsets of variables takes 70 s with 13 components (P3 speller) and 320 s with 15 components. The time needed for classification of the test data is negligible, because it involves only matrix multiplication of the data array with the coefficient vector of the discriminant function.