Description of the Essex Entry on Data Set V with Raw EEG Signals Louis C.S. Tsui and John Q. Gan Department of Computer Science University of Essex Colchester CO4 3SQ UK Email:, We have tried different methods for extracting features from the raw EEG signals, and the method finally used is described below. In developing classification methods, we have paid special attention to the very likely overfitting problem. The methods used in preparing this entry are briefly described as follows: 1. Feature Extraction The raw EEG signals were spatially filtered first. After that frequency domain features over the last second of data from each channel were extracted. The first feature vector was extracted from the first 512 samples, the second feature vector was extracted from the 33rd to 544th samples, and so on. That is, a moving window of 512 samples wide moves by a step of 32 samples to extract a new feature vector. This would produce (NumberOfSamples-512)/32+1 feature vectors for a session, where NumberOfSamples is the total number of samples in the session. 2. Feature Dimension Reduction To reduce the dimension of the feature space aims to achieve better generalisation. The dimension reduction include both channel selection and frequency band selection. Techniques used here include two types of PCA methods and cross-validation for choosing optimal feature subset. 3. Classification: LDA (one against the rest) and neural networks have been investigated for classification. Decision fusion has also been considered with LDA as the dominant classifier. 4. Postprocessing: The purpose of postprocessing is to obtain reliable/robust classification. Techniques used include smoothing window on previous classification outputs, mental task change detection and confirmation. The classification results on the testing sessions of 3 subjects are included in the attached file EssexEntryDataSetV_RAW.mat. In the data set description, there are two requirements for the classification output: the first is to provide an output for every input vector, the second is to provide an output every 0.5 seconds. We are not sure which would be used for final evaluation of all the entries. Therefore, in EssexEntryDataSetV_RAW.mat there are two vectors for each subject, named as subjectiTest and subjectiTest8 respectively. The number of estimated class labels in subjectiTest is the number of input vectors in the testing session of subject i (i=1,2,3), whilst the number of estimated class labels in subjectiTest8 is (n-1)/8, where n is the number of extracted feature vectors from the testing session. In evaluating the classification accuracy of this entry, please use the corresponding true class labels.