###########################################################
Continuous Wavelet Transform (CWT),
Scalogram Peak Detection, and
Linear Discriminant Analysis (LDA) with
Stepwise or Optimal Selection of Variables
###########################################################
I. Method
=========
I.1. Epochs and Rereferencing of Channels
-----------------------------------------
The EpochLength was left unchanged. All channels were
rereferenced to (A1+A2)/2. In data set Ib the VEOG
channel was deleted (no eye artifact correction was
performed).
I.2. Log Scale
--------------
The signal was resampled on a logarithmic scale --
the new sample rate was a linear function of time;
the distance between two adjacent time samples was
increased linearly. The pertinent parameter was
ScaleDilation per second (SDps) and it was set:
SDps = 3 for both data sets.
I.3. Time Domain Filtering
--------------------------
The signal in each trial and each channel was low-pass filtered
by convolution with the function:
exp( -(4*time/Scale)^2 ) * cos(2*pi*time/Scale).
Frequencies higher than 2/Scale vanished, frequencies between
1/Scale and 2/Scale were attenuated, and frequencies lower
than 1/Scale remained unchanged. Because of the log sampling,
the effect was as if Scale had been dilated linearly with time
and the filtering frequency had been reduced SDps times
every second. Scale was set:
Scale = 80 ms for data set Ia;
Scale = 100 ms for data set Ib;
This means that in the beginning of the epoch the filtering
frequency was 10 Hz and at the end of the epoch -- 3.33 Hz.
This procedure takes into account the fact that late ERP
components usually have longer duration than earlier waves.
Thus a more economical presentation of the signal was obtained.
I.4. CWT
--------
The signal from each channel and each single trial epoch was
CWTransformed for scales ranging from Scale/2 to 4*EpochLength.
The scalogram was sampled on a logarithmic grid allowing
scale invariant resolution of 12*12 points per scale per octave.
This means that a peak in the scalogram was always represented
by approximately 12*12 time-scale samples regardless of its width.
I.5. ERP Component Identification -- Peak Detection
---------------------------------------------------
Student's two-sample t-value was calculated for each time-scale
point of the CWT in each channel. ERP components were defined as
local extrema in the thus obtained t-value scalograms. These were
the points of maximal difference (variance taken into account)
between the two experimental conditions.
I.5.a. Spacial Extrema
......................
In a simplified version of the method, extrema are defined
not only in time and scale but also in the spacial dimensions
(maximal amplitude over all scalp positions). In the case of many
channels, the data set is reduced dramatically (by a factor close
to the number of channels) and the obtained variables have a
straight-forward interpretation as ERP components. In the case of
many channels, this method has also another, practical advantage --
only few selected electrodes are required for further spelling practice.
I.6. LDA and Stepwise Selection of Variables
--------------------------------------------
The thus extracted components were subject to LDA. Redundant
variables were eliminated by a stepwise procedure. First, the
discriminant function was calculated with all N components and
the HitScore ( = number of correctly classified trials from the
training data set) was computed and stored. Then, all (N-1)-
component subsets were checked in the same manner and the
one that maximized the HitScore was selected. This was repeated
until all components were exhausted. The subset with the
highest HitScore was selected. Then the eliminated variables
were added back one at a time and always the one whose addition
maximized the HitScore was selected for inclusion in the subset.
Eventually, the subset with the largest HitScore was chosen.
The discriminant function corresponding to this subset was finally
applied on the test data.
I.6.a. Optimal Selection (Checking All Subsets)
...............................................
If the number N of original variables is small enough (N<25),
e.g. in the case of spacial extrema (see I.5.a.), selection can be
performed by checking all 2^N - 1 subsets. Again, the subset
with the highest HitScore on the training data is selected.
I.6.b. No Selection
...................
Discrimination with all variables yields very good results as well.
It is a reasonable alternative in the case of too many variables
when stepwise selection becomes unacceptably time consuming (see II).
I.7. Note
---------
The described method does not rely on any assumptions about the
kind of the ERP response to target stimuli. ANY significant difference
in the ERP waveforms to the two experimental conditions will do.
Neither does the method rely on visual inspection of the curves.
Indeed, the following results were obtained without such visual
inspection.
II. Computational Load
======================
All transformation algorithms, especially the CWT, which is the most
time consuming transformation, have been optimized for application
on large data arrays. On a 704 MHz AMD CPU with 256 MB RAM, the CWT of
the 483840 one-second signals of the P300 speller training data set
( = 42 characters * 180 trials per character * 64 channels)
takes about 20 min (about 2.5 ms per signal).
The stepwise procedure with 439 components (P3 speller) takes about 13 h;
with 547 components it takes 30 h. With the SCP data sets the number of
channels is low and so is the number of extrema; consequently, the stepwise
procedure takes only about a minute or two. The HitScore computation for
all subsets of variables takes 70 s with 13 components (P3 speller)
and 320 s with 15 components.
The time needed for classification of the test data is negligible,
because it involves only matrix multiplication of the data array
with the coefficient vector of the discriminant function.