Abstract
We present a novel approach to describe a P300 by a shapefeature vector, which offers several advantages over the feature vector used by the BCI2000 system. Additionally, we present a calibration algorithm that reduces the dimensionality of the shapefeature vector, the number of trials, and the electrodes needed by a Brain Computer Interface to accurately detect P300s; we also define a method to find a template that best represents, for a given electrode, the subject’s P300 based on his/her own acquired signals. Our experiments with 21 subjects showed that the SWLDA’s performance using our shapefeature vector was , that is, higher than the one obtained with BCI2000feature’s vector. The shapefeature vector is 34dimensional for every electrode; however, it is possible to significantly reduce its dimensionality while keeping a high sensitivity. The validation of the calibration algorithm showed an averaged area under the ROC (AUROC) curve of . Also, most of the subjects needed less than trials to have an AUROC superior to . Finally, we found that the electrode C4 also leads to better classification.
1. Introduction
The P300 is an eventrelated potential (ERP) endogenous component that has a positive deflection that occurs in the scalprecorded electroencephalogram (EEG) and typically elicited approximately 300 ms after the presentation of an infrequent stimulus (such as visual, auditory, or somatosensory) [1]. The specific set of circumstances for eliciting a P300 is known as the Oddball Paradigm which consists of presenting a target stimulus amid more frequent standard background stimuli. Under this paradigm, a P300, among other ERPs, is unconsciously elicited every time a subject’s brain detects the target stimulus (the rare event). In fact, the P300 is a reasonable input signal, with desirable properties and stability to control Brain Computer Interfaces (BCI) [2], applications requiring precise realtime detection as well as memory and computation optimization [3, 4]. The feature vector dimensionality reduction has been a popular choice to achieve these goals within the BCI community because it decreases the complexity of classifiers [5].
The features of a P300 have been represented in time, frequency, timefrequency, and shape domains by using, among others, Wavelet Transform [6], Genetic Algorithms [7], and Common Spatial Patterns [8]. Additionally, the approaches more commonly used for P300 classification are Linear Discriminant Analysis, Stepwise Linear Discriminant Analysis [9], and Support Vector Machines [10].
In this work, we are interested in the shape domain because we assume that (i) every subject produces P300 signals whose waveform can be consistently represented by template curves and (ii) such template curves from a subject are more similar to curves with a P300 than to curves produced by EEG background activity [11]. Most techniques based on these ideas are classified into Cross Correlation Alignment (e.g., Woody’s [12] and Maximum Likelihood (ML) [13] methods), Dynamic Time Warping (DTW) alignment [14, 15], and linear methods such as coherent averaging [11]. Although the latter is the most controversial of all, it is the fastest and the most commonly used averaging method because of the following argument: a P300 can be considered as a welldefined component since the alignment of its peaks “is most likely linear even though the distortion is nonlinear” [16]. For this reason, it is common practice to repeat the stimulation procedure to improve its signaltonoise ratio (SNR) by coherently averaging several segments of filtered EEG signals generated after the stimulation (i.e., trials); the number of stimulations may vary from subject to subject for reasons explained in [17]. Coherent averaging implies that ERP components are unaffected by the averaging procedure and that any variability is due to noise [18]. However, P300’s amplitude, latency, and waveshape vary not only between electrodes but also in time. The first variation is due to its position; that is, the farther the electrode is from the cortical area, the lower the amplitude is. Thus, if we average all the electrode signals without taking into account the latter consideration, we will damage the P300’s properties; for this reason, usually, the electrode signals are processed individually. The variation in time is due to either biological determinants (e.g., increasing difficulty in perception and cognition of a task), subject’s attention level, or experimenterdependent variables [17]. Thus, the coherent average does distort most ERP’s components [15, 19]; however, for a given subject, the averaged P300 remains consistent [20]. The previous considerations can be summarized in the following statement by Knuth et al. [18]: “Of course, waveshape variability also exists, but robust singletrial amplitude and latency estimates are nonetheless obtainable with the assumption of fixed component waveshapes.”
The novelty of this paper consists in the detection of P300 trials based on using pattern recognition techniques on its shape, represented by a feature vector. Specifically, we use a contour representation based on an adapted version of the Slope Chain Code (SCC) and some of its properties (e.g., the tortuosity measure) [21], as well as some general descriptors, such as the differences of areas, to describe the differences between curves. Importantly, chain codes have been successfully used to describe and classify other biosignals such as electrocardiograms [22]. The advantages of using the SCC are as follows: (i) it is selfcontained, which implies that a chain does not need decoding, and (ii) it is finite, which means that the resulting chains can be classified using either grammatical techniques, syntactic analysis [23], or algebraic operations. Because the SCC is very expensive, we adapted it to make it computationally less demanding. In addition to the adapted SCC, we also present an offline calibration algorithm that reduces the dimensionality of the shapefeature vector, the number of subject’s stimulations, and the number of electrodes needed by a BCI to accurately detect a subject’s P300.
We organized the paper as follows. In Section 2, we define the shapefeature vector and explain the details of the proposed algorithm. Then, in Section 2.3, we present our methodology to set the Oddball Paradigm and the experiments to define the parameters needed for the proposed algorithm. In Section 3, we present key results and a discussion of the experiments designed to evaluate the classification performance. Finally, in Section 4, we provide some conclusions.
2. Materials and Methods
In this section, we describe the features of the ERP’s waveform that we use as the vector of characteristics. Additionally, we present an offline calibration algorithm that reduces the dimensionality of the shapefeature vector, the number of trials for a subject, and the number of electrodes needed by a BCI to detect a subject’s P300.
2.1. Feature Vector Based on ERP’s Waveform
As we mentioned before, the vector of characteristics obtained from the waveform of a P300 is central to our work. A first step towards producing such a vector is the coherent averaging of a set of trials.
2.1.1. Coherent Averaging
It is a wellknown fact that coherent averaging increases the SNR in signals and we take advantage of this fact to enhance the small amplitude signals immersed in an EEG. We and other groups [25] assume that the coherent averaging is feasible because (i) there is no correlation between the ERP signal and the rest of the EEG, (ii) the stimulation time and the response reflected in the EEG signal are known, (iii) there exists a consistently detectable component (e.g., a P300), and (iv) the EEG is a random signal with zero mean.
In a common BCI experiment, a number of electrodes are used to acquire EEG signals. We refer to this number as . The signal from an electrode is acquired times (i.e., trials). We will refer to the resulting set of all acquired signals (i.e., signals for electrodes) as and we divide it into two nonoverlapping subsets and . We use the set to train the calibration algorithm (which is discussed in Section 2.2) and the set to validate its performance (see Section 3.2). Furthermore, every EEG signal recorded by an electrode is discretized by number of samples. Consequently, the dimensional vector representing an EEG signal can be represented as follows:where and are also vectors representing the ERP signal and the EEG background (associated with the rest of the brain’s activity), respectively. By coherently averaging the signals of a single electrode, we haveIn practice, the averaged vector is considered to be the zero vector (that vector whose element values are all equal to zero) because the EEG is a random signal with zero mean with little autocorrelation.
Because we intend to use the waveform of the recorded ERP signals to generate the vector of features, we represent a recorded signal as the following sequence of ordered pairs , where is a nonnegative integer corresponding to the sample number and is a real number representing the measured amplitude of the ERP at the position . As a result, the coherent average of (2) produces the vector .
2.1.2. Slope Horizontal Chain Code
Chain codes are alphanumeric sequences with integer alphabets being the most common choices because the easiness and velocity to process the resulting chains in comparison to those based on alphanumeric alphabets. Several integeralphabet chain codes have been proposed [21, 26–32] as well as methods to represent analog signals with sequences of bits (e.g., pulse code modulation [33]); however, the SCC is the most useful for the purposes of this paper because it divides the curve into straightline segments placed onto the curve and preserves with higher resolution the contour shape. By using the orderedpair representation for ERP signals, we can obtain a chain code representing the contour of the curve described by its sequence of ordered pairs [34].
In this work, we adapted the SCC to represent ERP signals and called it Slope Horizontal Chain Code (SHCC). The main differences between the SCC and our code are the following. The SHCC adjusts a segment’s length to avoid interpolation; this adjustment takes advantage of the sampling uniformity during the biosignal acquisition to keep the sampling points as the endpoints of segments. Contrary to the SCC, the SHCC does not compute the angle between two adjacent segments; in contrast, it computes the slope between a segment and the horizontal in the continuous range equivalent to (). Consequently, the segments are independent, which means that if the signal from one electrode is disturbed (e.g., due to noise or loss of information), this will not affect more than one chain element. Furthermore, the SHCC does not require either rotation invariance, since it is not designed for closed curves, or scale invariance. Consequently, the previous differences make the SHCC algorithm computationally less expensive and very useful for realtime applications. Moreover, the SHCC can be easily implemented in hardware; thus, allowing the classifier integration to signal acquisition devices.
On the other hand, the SHCC and the SCC share the following very useful properties for our application: both place line segments onto the curve to preserve with high resolution the contour shape, both are translationinvariant, which is relevant since the SHCC can adequately represent P300’s variability, and both allow feature dimensionality and data reduction. The two are very desirable properties in BCI applications [35].
A first step to transform the curve into a chain by the SHCC is to resample the vector with a new sampling distance given bywhere is a nonnegative integer representing the desired number of line segments to represent the curve (in Section 2.3.4, we will explain the procedure to select the value). The new rediscretized vector is a sequence of ordered pairs , where , for , and . An alternative to this rediscretization process would be to change the sampling rate (i.e., subsampling) during the acquisition process but this can potentially distort the ERP signal, due to aliasing, and produce regions of the signal similar to a P300, which in turn could produce false positives in the classification stage.
Before obtaining the alphabet symbols, the SHCC normalizes every element of as follows:where 1 is a vector whose element values are all equal to one.
These operations produce new coordinate vectors and , where . With these coordinates, the SHCC produces a chain whose th element represents the code associated with the slope between the horizontal axis and the th ordered pair , for . To compute the members of the alphabet, we use a precision of two decimals when computing the individual slopes, resulting in an alphabet of 200 elements. To exemplify this process, we show in Figure 1 a discretized ERP whose chain is = (0.06 −0.02 −0.06 0.06 0.05 0.02 0.01 −0.04 0.04 −0.03 −0.09 0.04 0.05 −0.01 −0.02 0.04).
(a) Original curve
(b) Discretized curve
(c) Chain of the discretized curve
Finally, to form a vector of characteristics, we consider the elements of a chain obtained with the aforementioned SHCC method as part of the features of the vector together with other characteristics as we will show below.
2.1.3. Distance between Chains
The possibility of computing the distance between two curves is an important characteristic that we take advantage of for our proposed method. Since we use the SHCC to represent 2D curves, we obtain a unique curve descriptor represented by a chain. The hypothesis is that the chain representing a P300 template is more similar to the chain of a P300 than to the chain from a nonP300.
There are several distances to measure shape dissimilarity for 2D curves such as the Manhattan (i.e., the norm), the Euclidean (i.e., the norm), the Hausdorff, or the Frèchet distances [36]. To decide between them, we ran experiments with preliminary parameters to compare our algorithm’s performance (we explain our algorithm below) and the results were not significantly different; thus, we decided to compute the distance with the norm because of its lower computational cost. Consequently, for two chains and of length , we define their distance asTo exemplify our hypothesis, we show different subsampled curves in Figure 2. In Figure 2(a), we present in blue the template curve of a subject, whose chain is (0.05 −0.02 −0.05 0.08 0.07 −0.01 0.01 −0.03 0.04 −0.04 −0.09 0.01 0.05 −0.03 −0.04 0.04), and in red a P300 curve of the same subject, whose chain is (−0.01 −0.04 −0.05 0.04 0.05 0.05 0.02 −0.05 −0.02 −0.05 −0.02 0.05 0.02 0 −0.01 −0.01). The Manhattan distance between these two chains is 0.55. In contrast, in Figure 2(b), we present the same template curve together with a nonP300 curve, whose chain is (0.07 −0.03 0 −0.03 0.06 0.04 −0.10 0.06 0.01 −0.04 −0.04 0.06 −0.09 0.01 0.11 0.05); in this case, the distance between chains is 0.92. For this example, one can see that the subject’s template curve is more similar to the P300 curve than to the nonP300 curve. This is just an example and within the calibration process there is some statistical test that makes sure that this hypothesis is met.
(a)
(b)
2.1.4. Tortuosity
Another feature that we would like to capture is how straight or twisted a curve is; one way to measure such a characteristic is bywhere is the th element of the chain . The minimum value of this measure is zero, corresponding to a curve consisting of purely horizontal segments (i.e., all the components have slopes equal to zero). On the other hand, as the curvature increases, the value of will also increase [21]. The measure above is commonly known as tortuosity, and, for example, the tortuosity value of the curve shown in Figure 1 is 0.64.
2.1.5. Differences between the Areas of Two Curves
Both the SCC and the SHCC describe the signals waveform at the expense of losing voltage information; the latter is useful for discrimination between conditions. Moreover, two curves having different shape could have the same tortuosity. For these reasons, we introduce an additional way to compare two curves by computing the difference between their areas. To this end, we apply the trapezoidal rule [37], because it integrates a curve over an interval by breaking the area under the curve into small trapezoids whose areas are easier to compute. In what follows, for a subject, we will compute the difference between the areas of a template curve and the area of either a P300 curve or a nonP300 curve . We refer to the segmentwise differences as . For any two dimensional vectors and , we define the sum of their segmentwise differences as
Finally, for every electrode, we assemble a dimensional vector of shape features by combining all the described parameters above in the following way: the first elements of the vector correspond to the differences between the area of two curves , the following element is the sum of them by (7). The next element is the distance between chains by (5), followed by the tortuosity measure by (6), and the last elements of the vector represent the ERP under analysis , resulting in the vector of size .
2.2. Calibration Algorithm
Like any other BCI system that uses a feature vector, we need to calibrate ours for every subject. For our project, the goals of the calibration are (i) to obtain a template for every electrode that best represents a subject’s P300 in that electrode, (ii) to obtain the optimum number of stimulations, (iii) to select the subset of electrodes that provides the best P300 signal, and (iv) to select the shape features that maximize the area under the receiver operating characteristic (AUROC) curve.
In what follows, we define some sets and variables necessary for our calibration algorithm. For the calibration process, we select a certain number of P300labeled trials and a certain number of nonP300labeled trials for a given electrode . We refer to the set of nonP300labeled trials, with cardinality equal to , as and as to the set of P300labeled trials, with cardinality equal to . Clearly, the total number of trials, for a given electrode , produces a set; we refer to it as (i.e., ). When all the electrodes that we use are taken into account, then the set of all the trials results in the set which can then be expressed as .
Algorithm 1 presents the pseudocode for our calibration method, which is an iterative algorithm based on wrapper methods [38].

A general view of the calibration algorithm is as follows. The iterative algorithm is made of several sections that carry a specific task and are called wrappers. The principal wrapper is an iterative procedure; see lines 4–24. To control the iterations, we make use of the boolean variable loop initialized in line 1. The algorithm iterates while the value of loop is true (see line 4). The goals of this wrapper are to select (i) the electrodes and the shape features that provide the best P300 signal, (ii) the best templates for each electrode, and (iii) the optimum number of stimulations. We select the subset of electrodes and the optimum number of stimulations by finding the templates that satisfy certain criteria. The number of trials (i.e., ) is one of the parameters defined in the experimental design (see Section 2.3.3). We define the variable as the maximum number of stimulations (see line 2), which will gradually decrease to diminish subject’s fatigue, to find the optimum number of stimulations. Additionally, we find the best templates and the best shape features by means of an inner wrapper method that iterates number of times, where is defined in line 5. In each iteration, the inner wrapper randomly selects a set of P300labeled trials to find the best template and the best features for each electrode by analyzing subsets of trials. For this analysis to be statistically significant, we apply a crossvalidation times, where is defined in line 3. Since depends on , it could be computed after is set. Then, the inner wrapper evaluates the detection of a P300 through lines 6–19. On the other hand, in order to reduce the shapefeature vector dimensionality, we use the stepwise regression method (SRM) described in [5, 39]. This method performs feature space reduction by selecting the elements of the shapefeature vector that satisfy certain entry and removal criteria.
Now, we describe the inner wrapper in detail. As we explained before, for an electrode , we need to search for the subject’s best template that consistently represents a P300 waveform (see lines 7–18). Each template chain is generated by randomly selecting a subset , where is the number of trials necessary to generate the subject’s P300 template (we will detail the method to define both and in Section 2.3.4). We perform this task by the RandomSelector operator (see line 8). Then, we compute its coherent average by (2) and we transform the resulting vector into a chain using the SHCC (see Section 2.1.2); this process is carried out in line 9 by the operator ChainCodeSHCC. At this point, the algorithm creates a set of several template chains for a given electrode .
Every template chain is compared with the chains and , where is the chain of the average subset , whose elements (i.e., the number of stimulations) are randomly selected from the set (see lines 11 and 12); and is the chain corresponding to the averaged subset , composed of elements randomly selected from (see lines 15 and 16). These comparisons are carried out (see lines 8–17) times (see line 6) per electrode.
After these comparisons, we obtain two shapefeature vectors and as explained in Section 2.1. The vector represents the features extracted from , , , and (see line 13); the vector represents the features extracted from , , , and (see line 17); this process is performed by the operator FeatureExtractor. For this analysis to be statistically significant, we apply a crossvalidation test. Hence, we compute vectors and times (see lines 10 and 14, resp.) for each electrode. These vectors allow the creation of the matrix defined as
To evaluate the performance of the calibration process by using one template at a time, we decided to use the computed distances between chains as accuracy measures (based on preliminary experiments); this process is carried out in line 18 by the operator AUROCEstimator. For every th column of matrix , we take the element of the first vectors to form a vector , whose elements are the distance element of such vectors; in the same way, we take the element of vectors to form a vector . Thus, and are tuples and . Then, we compute the AUROC to evaluate the comparisons between and and create a matrix entry . Each of these entries represents the discrimination capacity for the th template of each electrode. These entries build a matrix defined as
As mentioned earlier, one of the goals of the principal wrapper is to reduce the shapefeature vector dimensionality. To that end, we use the operator Stepwise (see line 19) that computes the SRM; its entry criteria is value < 0.1 and its removal criteria is value > 0.15. These values were defined based on those reported in [5]. The Stepwise renders the dimensional vector of the elements of selected by the method SRM, the binary dimensional vector , representing the indexes of the vector , and the dimensional vector of estimated coefficients for all the terms in . Every th element of whose corresponding th element of is different from zero will be an entry to the vector . Likewise, every th element of whose value is different from zero will be an entry to the vector . Finally, every element of related to the th element of will be an entry to the vector . The latter procedure finishes the inner wrapper.
As we mentioned before, some goals of the principal wrapper are to select both the subset of electrodes that provides the best P300 signal and the optimum number of stimulations by finding the templates that satisfy certain criteria. To that end, the operator (line 20) computes the average AUROC to measure the performance of the templates of each electrode, for each stimulation byThe operator (see line 21) selects the subset of electrodes that provides the best P300 signal for each subject. To achieve this goal, we select the electrodes where . If there are no such electrodes, then we choose those where and set the variable loop to false. At the end, the algorithm will define as the set of electrodes that meet these conditions; otherwise, it will be unsuitable to find the subject’s P300. On the other hand, to select the subject’ s optimum number of stimulations, is gradually decreased until no more electrodes are found meeting the condition where . In such case, will be the optimum number of stimulations , and the algorithm stops. This process is carried out by the operator (see line 22). The operator (see line 23) selects the templates that achieve the highest values of for each electrode of the set . Finally, the operator (see line 24) selects the indexes of the shapefeature vectors, the matrix of regression coefficients , and the features associated with the previously selected templates. The algorithm returns the , , , , , and values that we store in a text file.
2.3. Experimental Design
2.3.1. Participants
For our experiments, we used the EEG signal database as reported in [24], acquired by the Neuroimaging Laboratory (LINI) of the Universidad Autónoma Metropolitana (UAM), Iztapalapa. We used the EEG signals from 21 healthy students (8 females and 13 males) ranging in age from 21 to 25 years without known neurological conditions. They slept an average of 7.5 hours the night before the experiment. Four students smoked one cigarette 24 hours before the experiments and one of them smoked 5.
2.3.2. Data Acquisition and Processing
The EEGsignal database was acquired using 10 electrodes denoted by Fz, C4, Cz, C3, P4, Pz, P3, PO8, Oz, and PO7 following the international 1020 system (see the configuration in Figure 3), with the right earlobe and the right mastoid serving as reference and ground locations. For the acquisition, screwable passive Ag/AgCl EEG electrodes were used (g.EEGelectrode manufactured by GTec [40]). The impedances between the cap electrodes and the reference electrode never exceeded 5 kΩ. The EEG signals were registered and amplified using a 24bit g.USBamp [41] amplifier. The signal was digitized at a rate of 256 Hz and processed online with a notch filter (Chebyshev of order 4), with cutoff frequencies between 58 and 62 Hz, and a bandpass filter (Chebyshev of order 8), with cutoff frequencies between 0.1 and 60 Hz, to reduce noise. All aspects of data collection and experimental design were controlled using the BCI2000 system [42].
2.3.3. Task Description and ERP Signal Extraction
Despite the fact that our proposed method can be used in any BCI application, we decided to apply it to the P300 word speller, first described by [43], in order to make a comparison with a widely documented system. In the P300 word speller, a subject is presented with an alphanumeric matrix projected onto a computer screen to allow him or her to write a word. We did not include additional procedures that may bias the performance, for example, by inferring symbols.
The participants were asked to spell a priori known words from which we acquired 2,880 EEG signals to form the set . These signals were distributed into the training set () and the test set (). The set consisted of 480 EEG signals that potentially contained P300s (i.e., ) and 2,400 EEG signals without P300s (i.e., ). The set contained 150 signals expected to have 150 P300s (i.e., ) and 750 nonP300s (i.e., ).
For the experiments, the participants sat in front of the computer screen, which is divided into two sections. At the top left corner of the screen, the word to be spelled was displayed while the character currently specified for selection was listed in parenthesis at the end of the word. The remaining of the screen displays a matrix speller, as shown in Figure 4.
The matrix rows and columns were randomly intensified 15 times (i.e., trials) for every letter. The subjects were asked to silently count the number of times the target character was intensified while the matrix rows and columns flashed every 125 ms in random order (i.e., the interstimulus time); every flash lasted 62.5 ms. Because of the nature of the signal, we expected to have a P300 wave 300 ms after every stimulus. For this reason, we decided to extract the next 800 ms of EEG data after every stimulus per channel used in the analysis; thus, we collected around 2 P300 waves in every trial due to the interstimulus time. Each segment of 800 ms was filtered offline using a 4thorder Butterworth bandpass filter with bandwidth range from 0.1 to 12 Hz to extract the ERP signals embedded in the EEG as it is common in the field [44]. The DC component was removed by subtracting the mean of each electrode from the filtered signal. Finally, the linear trend was removed from each trial.
2.3.4. Algorithm Parameters
Now, we explain the methodology used to select the parameters and required by the calibration algorithm. As we showed in (3), represents the number of straightline segments used to divide an ERP signal to obtain a minimal representation of its shape. Correspondingly, is the number of a subject’s P300 trials needed to accurately represent a template.
Our goal is to preserve the P300’s envelope by the minimum representation possible while allowing its detection with the SHCC (see Section 2.1.2). To that end, we chose the value of based on the fact that the ERP bandwidth can get up to 10 Hz [45] and that the sampling frequency must satisfy the Nyquist Theorem; thus, we decided on a resampling frequency of 20 Hz. Considering that we extracted trials of 800 ms, 16 segments were sufficient to preserve the waveshape (in other words, ) and the maximum value of (without signal interpolation) is equal to . With this value in consideration, the shapefeature vector is 34dimensional for every electrode.
On the other hand, to determine the number of trials necessary for a template to accurately represent a subject’s P300 for each electrode, we ran experiments with the calibration algorithm varying the values of with the arbitrarily chosen values . For these experiments, we fixed the value of to 16, in view of the discussion in the previous paragraph, and the value of to 15 trials. Then, we computed the mean and the standard deviation of for all subjects and for every electrode (see Figure 5). For a better interpretation, in this figure, the ordinate represents the amount of P300 necessary to average while the abscissa indicates the measure. From these results, we selected a value for where an inflection point is reached in most of the electrodes, in this case around the value of 180 (i.e., ).
3. Results and Discussion
In this section, we report and discuss the design and results of our experiments; they test the performance of our method in detecting P300 trials based on its shapefeature vector.
3.1. Calibration Algorithm
In order to test the performance of the calibration algorithm, we fixed the values for parameters and to 180 and 16, respectively, as we explained in Section 2.3.4; we also fixed the number of electrodes to 10 and the number of stimulations to 15, as defined in Section 2.3.2.
We performed the crossvalidation of the Calibration Algorithm with the training dataset . We set the value (see line 3) to . That is, we randomly selected 20 P300labeled signals and 20 nonP300labeled signals to balance the datasets. For each subject, we obtained a mean AUROC computed by (10). The average for the studied population is presented in Figure 6, where electrode 1 corresponds to the best performing electrode for each subject (not necessarily the same), while electrode 10 is the one with the lowest per subject. In this figure, we can observe that the average value for electrode 1 for all subjects was . Among all the subjects, two of them had one electrode with close to one ( and ); 12 of them had at least one electrode with ; 16 subjects had at least one electrode with ; and 20 subjects had at least one electrode with . The worst case was one subject whose best was equal to , possibly because the subject was distracted for several reasons such as fatigue, lack of motivation, or hunger [17].
It is worth noting that the normalization process performed by the SHCC could be sensitive to outliers. However, the filters and the subsampling we applied to the signal reduce outliers. Moreover, the AUROCs reflect an adequate performance, even with a nonoptimal normalization.
On the other hand, it is common practice to stimulate a subject fifteen times for every letter [46]. However, our experiments suggest that P300s can be accurately detected with fewer than fifteen stimulations with our calibration algorithm (see Table 1); however, this observation is subjectdependent: fourteen of the twentyone subjects required fewer than fifteen stimulations to have at least one electrode which had a mean AUROC greater than or equal to . Moreover, eight of them needed at most five stimulations, and in one case the subject only required two.
Additionally, we analyzed the behavior of the sets of electrodes for all subjects based on calibrations ’s. For this purpose, we carried out two experiments. In the first one, we set the number of stimulations to 15. The incidence of subjects whose electrodes provided a is as follows. The PO8 electrode met this criterion with 27% (corresponding to six subjects), followed by Fz with 18% (corresponding to four subjects), Cz and PO7 with 14% (corresponding to three subjects), and C4, Pz, and Oz with 9% (corresponding to two subjects). The electrodes that did not meet this criterion were P3, C3, and P4. In the second experiment, we expected the optimum number of stimulations shown in Table 1 to perform as the previous experiment. Figure 7 illustrates the incidence of subjects whose electrodes provided a . The electrodes PO8 and Cz had the highest incidence of all, 27% (corresponding to six subjects) and 23% (corresponding to five subjects), respectively, followed by PO7 and Fz, with 18% (corresponding to four subjects) and 14% (corresponding to three subjects). Finally, the electrodes Pz, Oz, and C4 had the lowest incidence with Pz having only 9% (corresponding to two subjects) and both Oz and C4 having 5% (corresponding to one subject), the remaining two.
We have hypothesized that the discrepancy between the incidences shown above is the result of the criterion that we use for selecting the electrodes. Certainly, there can be other criteria for this selection like statistical tools such as a test to decide whether an electrode is really better than another.
After carrying out the selection of electrodes, we found out that the best information is provided by electrodes C4, Cz, Fz, Pz, PO7, PO8, and Oz, unlike the P3, C3, and P4 electrodes whose information can be discarded since they did not contribute to a desired performance. These results are consistent with the literature [5, 47] which claims that the P300 is generally detected in the central, frontal, parietal, and parietooccipital areas of the scalp (i.e., in the electrodes Cz, Fz, Pz, PO7, PO8, and Oz), regions associated with attention, memory, and visual processes. Additionally, we observed that C4 also provides the best performance for several subjects. For this reason, we suggest using the aforementioned electrodes for speller P300 experiments as a common practice. Therefore, the calibration algorithm could be useful to discard those electrodes that do not provide a relevant information for our methods, either because they are improperly placed, which would generate an undesirable template, or because they will lead to a misclassification. Moreover, if signals are obtained by electrodes other than the ones we suggested, the calibration algorithm will be able to select the ones that lead to the best performance of the calibration process.
3.2. Validation
The aim of the validation process is to analyze the performance of the P300 detection when using our shapefeature vector , the set of templates found for the selected electrodes, and the optimum number of stimulations obtained by the Calibration Algorithm. To that end, we generated the following two experiments. In the first experiment, we compared the performance of two classifiers commonly used by the BCI community [9]: the Stepwise Linear Discriminant Analysis (SWLDA) and Support Vector Machine (SVM) by using the vector . In the second experiment, we compared the performance of SWLDA (considered as one of the best BCI classifiers by Krusienski et al. [9]) by applying both vectors and the one used by the BCI2000 system as described in [5].
In the first experiment, we extracted a balanced subset from the training set . The set is composed of the available P300labeled signals and the randomly selected nonP300labeled signals. We trained twice the number of SVM by applying a leaveoneout crossvalidation with the set . Then, we randomly selected one SVM. To evaluate the classifiers performance, we extracted a balanced subset from the validation set ; the set is composed of the available P300labeled signals and the randomly selected nonP300labeled signals, where . Then, we obtained the shapefeature vectors (see Section 2.1) of . Finally, we classified such vectors with the selected SVM and with the SWLDA. We applied a confusion matrix to analyze the performance of both classifiers. The resulting accuracy was computed bywhere is the number of true positives, is the number of true negatives, is the number of false positives, and is the number of false negatives.
The average accuracy for the studied population of both classifiers is presented in Figure 8, where “electrode 1” corresponds to the performance of the selected features, which includes information from all the electrodes, computed with the template of the best performing electrode for each subject (not necessarily the same), while “electrode 7” is the one with the lowest per subject. In this figure, we can observe that the average value of “electrode 1” with the SVM for all subjects was and with the SWLDA was . In addition, Figure 9 shows the detailed information yielding the accuracy for “electrode 1” for every subject: one of them had one electrode with with both classifiers; eight subjects had at least one electrode with ; 17 subjects had at least one electrode with with the SVM and 20 with the SWLDA; and 20 subjects had at least one electrode with with both classifiers. The worst case was one subject whose best was equal to with the SVM and with the SWLDA, with distraction due to fatigue, lack of motivation, or hunger being a possible cause for such low accuracy. As stated in Section 3.1, the average value of the best electrode for all subjects was , which is not different from the average value of “electrode 1” classified with the SVM, and with the SWLDA the average and the standard deviation have a difference of 0.01.
In order to evaluate the second experiment, we used an unseen unbalance set . First, we obtained its shapefeature vectors (see Section 2.1) and its feature vectors as used in the BCI2000 system. Then, we classified those vectors by the SWLDA. Since the set was labeled, we computed the percentage of correct classification. The SWLDA was unable to generate useful coefficients with the given parameters for two subjects when using the feature vectors . In contrast, it was able to generate weights for all the subjects when using the shapefeature vectors . Thus, taking into account the nineteen subjects that both vectors could solve, the P300 detection using the SWLDA with the shapefeature vectors was , whereas with feature vectors the detection was . For all the subjects, the percentage of correct classification with the shapefeature vectors was 10% higher than that with the feature vectors (see Figure 10).
On the other hand, we evaluated the dimensionality reduction. We selected the elements of the shapefeature vector by the stepwise regression method. We observed that the SRM computes the maximum size of a vector (equal to 38 per electrode), because no additional terms satisfy the entry and removal criteria. However, it is possible to reduce even further the dimensionality of the vector while keeping an accuracy of one at least for one electrode; that was the case for nine subjects using the SWLDA and eight subjects using the SVM.
Finally, we compare other methods with ours. As mentioned earlier, the idea that a P300 is more similar to a template whose waveform resembles that of a P300 than to a nonP300 is not new. As mentioned in Section 1, there is a group of algorithms implementing template matching classifiers that can be used for detecting P300s (i.e., ML [13], DTW [14, 15], and Woody [12]). We consider our method as a member of this group. Our method is similar to DTW since both are based on slopes; however, our method is based on representing a signal waveform by a chain code. Additionally, unlike the methods based on an artificial template (such as those based on Woody’s method [48]), we generate a template for every subject based on their own ERP signals by means of a wrapping method.
4. Conclusions
The P300 is an ERP elicited after the presentation of an infrequent stimulus. This endogenous component possesses some useful properties that permit controlling BCI applications [49]. For these applications to run in real time, it is important to optimize their computational resources. One indirect way to diminish the computational time is by reducing the dimensionality of the input feature vector used in the classification process without weakening the detection accuracy. The dimensionality reduction of such a vector can be achieved by lowering the number of electrodes used to acquire a subject’s EEG. On the other hand, to improve the detection of a P300, several researchers have proposed representing this signal in different domains such as time, frequency, timefrequency, or shape. In this work, we have chosen the latter because we assume that a subject produces P300 signals whose waveform can be consistently represented by template curves and that these template curves are more similar to curves with a P300 than to curves produced by EEG background activity. The novelty of this work is the description of all these curves by means of their shape features. These features are represented as a vector of characteristics whose elements are provided by an adapted version (developed by us) of the Slope Chain Code. The latter is the most useful chain code for the purposes of this work because it divides the curve into straightline segments placed onto the curve and preserves with higher resolution the contour shape. However, our chain code is computationally less expensive than the Slope Chain Code and, therefore, it is very useful for realtime applications. Similar to other chain codes, ours does not require decoding because it is selfcontained and allows using grammatical and syntactic analysis techniques. In addition to the chain, we included in our vector other shape features such as the tortuosity measure (i.e., one of the curve’s properties measured by the Slope Chain Code), the individual difference between the areas of every segment that divides the curve, and the sum of these differences.
In order to demonstrate our main hypothesis that our shapefeature vector improves the P300 detection accuracy, we designed some experiments which demonstrated that the performance of the SWLDA classifier is better when applying our feature vector than when applying the one used in the BCI2000 system [5]; our experiments also suggested that it is possible to significantly reduce the dimensionality of our feature vector while preserving a high accuracy during classification.
Because calibration is a crucial step of any BCI system, we have proposed a calibration methodology that achieves the following goals: (i) it obtains a set of templates that best represents, for a given electrode, the subject’s P300 based on his/her own acquired signals, (ii) it finds the optimal number of trials for every subject, (iii) it selects the subset of electrodes that provides the best P300 signal for every subject, and (iv) it selects the shape features that maximize the classification accuracy while reducing the dimensionality of the feature vector. Our statistical tests showed that our method achieves a high average accuracy in the detection of P300 signals with fewer than fifteen stimulations. Furthermore, in agreement with the literature [5, 47], our results show that the best information is provided by the electrodes selected in the central, frontal, parietal, and parietooccipital areas of the scalp.
Our future work will focus on the implementation of our methodology to a BCI. Additionally, we are planning further studies to analyze the robustness of the computed templates over time. Because there is evidence that the use of grammatical techniques and syntactic analysis yields promising results, we plan to investigate these techniques for detecting the P300 using the chain code approach. Finally, our chain code can be easily implemented on integernumber arithmetic; this makes it suitable for an efficient hardware implementation that integrates a classifier into signal acquisition devices, something that we are currently exploring.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
Montserrat AlvaradoGonzález was supported by CONACyT under the agreement 167254. The authors would like to thank Erik Bojorges from the LINI for providing the code for reading and filtering ERP signals. The authors thank the referees for their constructive comments that significantly improved the paper.