Our class labels were generated by computing 4 different feature vectors. Individual classifiers were trained on each feature vector, and the output of these classifiers were then combined using an average meta-classifier.

The four different feature vectors were:

- Autoregressive coefficients produced using the Burg method. Order 5 models were trained for each channel.
- Power estimates produced by applying short band pass FIR filters to data from each channel, then squaring and averaging the filtered data. We used order 31 filters centered over the delta, theta, alpha, beta, gamma, and high gamma frequency bands.
- Raw EEG data from each channel.
- Discrete wavelet transform of each channel. We used a 3 level decomposition with order 4 Symlet wavelets.

Test data for each subject were pre-processed by removing the mean from each channel, rescaling the channel to the same standard deviation of the training data, and then adding back in the channel mean.

Each classifier was trained on a set of overlapping segments of 150 data points (using the 100 Hz data), shifted by 32 data points each time.

Once each feature vector was extracted, we used PCA to reduce the dimensionality of the data to the number of trials, followed by MDA to reduce dimensionality to one less than the number of classes. This reduced-dimensionality data was then classified by sparse multinomial logistic regression [1]. Each individual classifier outputs probabilities for each class, and these class probabilities are finally averaged together to produce the final classification label.

We used two different methods to produce output labels for the datasets:

Method 1 was to train a three-class classifier on left vs. right vs. rest. 
- A second method (Method 2) was to take a hierarchical approach by training two different two-class classifiers: one which discriminated between rest and active, and another which discriminated between left and right trials. The final output was then determined by giving rest for all periods which the rest/active classifier predicted rest, and left/right during the active period.

We found that these methods performed differently on the (unlabeled) test data, and we manually chose which method appeared to perform better on each data set. We believe that this step is justified by the substantial differences seen between the train and test data for some of the subjects.  Method 1 was used for test sets c and d.  Method 2 was used for test sets a, b, e, f and g.

[1] Krishnapuram, B., Carin, L., Figueiredo, M.A.T., Hartemink, A.J. Sparse multinomial logistic regression: Fast algorithms and generalization bounds, IEEE Trans. Pattern Anal. Machine Learning, 27(6): 957–968, 2005.