Abstract

Eye state identification is a kind of common time-series classification problem which is also a hot spot in recent research. Electroencephalography (EEG) is widely used in eye state classification to detect human's cognition state. Previous research has validated the feasibility of machine learning and statistical approaches for EEG eye state classification. This paper aims to propose a novel approach for EEG eye state identification using incremental attribute learning (IAL) based on neural networks. IAL is a novel machine learning strategy which gradually imports and trains features one by one. Previous studies have verified that such an approach is applicable for solving a number of pattern recognition problems. However, in these previous works, little research on IAL focused on its application to time-series problems. Therefore, it is still unknown whether IAL can be employed to cope with time-series problems like EEG eye state classification. Experimental results in this study demonstrates that, with proper feature extraction and feature ordering, IAL can not only efficiently cope with time-series classification problems, but also exhibit better classification performance in terms of classification error rates in comparison with conventional and some other approaches.

1. Introduction

Nowadays, electroencephalography (EEG) eye state classification is a very hot research area. Many studies about EEG signals have been implemented. The findings from these studies are important and useful for human cognitive state classification, which are not only crucial to medical care but also significant to some daily life tasks. For example, EEG eye state classification has been successfully applied in the areas of infant sleep-waking state identification [1], driving drowsiness detection [2], epileptic seizure detection [3], classification of bipolar mood disorder (BMD) and attention deficit hyperactivity disorder (ADHD) patients [4], stress features identification [5], human eye blinking detection [6], and so on. These occurrences indicate the importance of research on EEG eye state signal analysis.

In usual case, the data describing EEG eye state belong to continuous type of time-series data. There are a number of machine learning and statistical approaches which can be employed to solve the classification problems with this time-series data. Moreover, previous research has validated that EEG eye state signals can be successfully analysed by some machine learning or statistical approaches.

In this paper, incremental attribute learning (IAL), a novel machine learning approach, is proposed to solve the EEG eye state classification problem. IAL is a “divide-and-conquer” machine learning strategy, which can be implemented by almost all machine learning algorithms, such as neural networks (NNs), particle swarm optimization (PSO), and genetic algorithms (GAs). In IAL, features are gradually imported into the system to predict the class labelling one after the other. Because of this unique process, IAL is able to effectively reduce the interference among features. As a result of that, this approach can not only reduce the noise from data but also enhance the classification accuracy and obtain better results than some conventional approaches [79].

The rest of the paper is organized as follows. In Section 2, a brief introduction to EEG eye state classification is given. Section 3 reviews IAL and presents the preprocessing works for the proposed EEG eye state classification approach. The proposed approach to solve time-series EEG eye state classification problem is presented in Section 4. Section 5 reviews the experimental benchmark problem and compares the results with those derived by some other approaches. Lastly, conclusions are drawn in Section 6.

2. EEG Eye State Classification

EEG signals of eye state monitoring are usually time-series data. Therefore, in order to classify different eye states, time-series pattern recognition approaches should be employed for EEG eye state classification. In previous studies, a number of different time-series approaches have been applied in EEG eye state identification. For example, Fukuda et al. employed a log-linearized Gaussian mixture neural network for EEG eye state classification [10]. Yeo et al. successfully used support vector machines (SVMs) to detect drowsiness during car driving by eye blink [2]. Furthermore, a hybrid system based on decision tree classifier and fast Fourier transform was applied to the detection of epileptic seizure by Polat and Güneş [3]. Sulaiman et al. also employed K-nearest neighbor (K-NN) for stress features identification [5]. In addition, Hall et al. built a 117-second EEG eye state corpus and employed 42 different machine learning and statistical approaches based on Weka [11] to predict eye states. They found that KStar is the best approach among these different methods [12]. Their eye state corpus is now a benchmark problem saved by Machine Learning Repository, University of California, Irvine (UCI) [13]. All these works showed that machine learning and statistical methods are feasible in solving time-series classification for EEG eye state identification.

3. IAL and Feature Ordering

3.1. Incremental Attribute Learning

IAL is a novel “divide-and-conquer” machine learning strategy, which gradually imports features into predictive systems according to some sequential orderings. IAL is a kind of supervised machine learning. Such an approach is designed to avoid dimensional disasters of high dimensional problem. Moreover, it is also able to cope with high dimensional problems where almost all of the features are significant and cannot be discarded by dimensional reduction approaches like feature selection. Moreover, because features are separately imported into the system, they are also computed in isolation. Such a process can effectively reduce the interference among different features. Previous research has validated that IAL can not only cope with problems with large feature dimension space [14] but also reduce the interference during the process and exhibit better performance in the final pattern recognition results [7, 15].

So far, IAL has been widely employed for pattern recognition based on a number of different predictive machine learning algorithms. In previous studies, IAL has been shown as an applicable approach in solving machine learning problems like classification using genetic algorithm (GA) [16, 17], neural network (NN) [8, 18], support vector machine (SVM) [19], particle swarm optimization (PSO) [20], decision tree [21], and so on. These previous studies also showed that IAL can exhibit better performance than conventional methods that train all pattern features in one batch.

In this study, incremental neural network training with an increasing input dimension (ITID) [8] is employed for EEG eye state classification. ITID is one of IAL neural network approaches with ordered feature training. It was developed based on incremental learning in terms of input attributes (ILIA) [18]. However, different from ILIA which often trains features with the original feature orderings in problem dataset, ITID prefers to adjust feature orderings according to some criteria and trains features according to the adjusted sequential feature orderings [22].

Previous research has shown that ITID is applicable for classification. It divides the whole input space into several subdimensions, each of which corresponds to an input feature. Instead of learning input features altogether as an input vector in a training instance, ITID learns input features one after another through their corresponding subnetworks, and the structure of NN gradually grows with an increasing input dimension. During training, information obtained by a new subnetwork is merged together with the information obtained by the old network. With less internal interference among input features, ITID achieves higher generalization accuracy than conventional methods [8]. Figure 1 shows the basic neural network structure of ITID.

3.2. Feature Ordering

IAL gradually imports features for pattern recognition according to some orderings; thus in IAL classification, it is necessary to decide which feature should be imported into the predictive system in an earlier phase and which one can be computed later. Therefore, feature ordering is a compulsory step in IAL preprocessing. In this unique step, features with greater discrimination ability are arranged in the earlier place and those with weak discrimination ability are imported in later steps. This is similar to feature selection, a well-known dimensional reduction preprocessing, where features with greater discrimination ability are selected into a subset for further computing, while those with weak discrimination ability are discarded. However, IAL has a continuously growing feature space; previous research has validated that a proper feature ordering is a key to lower classification error rates based on IAL [22, 23].

In previous studies, a number of feature ordering estimation methods have been developed for IAL. These methods can be divided into two types. One is based on each feature’s single classification error rates, such as contribution-based methods [8, 24], while the other ranks features according to some metrics on feature discrimination ability, like mRMR [25, 26], entropy [27], Fisher’s linear discriminant (FLD) [28], single discriminability (SD) [23], and accumulative discriminability (AD) [9]. Previous experimental results showed that feature orderings derived by AD often outperform those derived by other approaches, because AD is a global metric which aims to ensure that the whole growing feature space always has the largest discrimination ability during the IAL feature importing process, while others are local metrics, which only concentrate on finding the feature with the largest discrimination ability in each single step.

3.3. AD and Maximum Mean Discriminative Criterion

In IAL, it is necessary to ensure that datasets always have the greatest discrimination ability relative to each feature importing step. Namely, comparing among all the different feature orderings, the optimum feature ordering should have the largest value of feature discrimination ability in average. When a new feature is imported into the predictive system accordingly, the feature dimension is increased from to . Therefore, the metric of feature discrimination ability should be the largest all the time, as only in this way it can guarantee that different classes can be separated in the easiest way. Therefore, with the aim of optimal classification results, each intermediate step will identify an optimal feature with the greatest discrimination ability for each round of feature importing. Obviously, after all features are imported, the resulting feature ordering will have the largest sum or mean of accumulative feature discrimination ability calculated in each step of the process. Here, as an efficient metric of feature discrimination ability, AD is employed for the criterion to obtain the optimum feature ordering. The criterion can be given with maximum discrimination ability mean by where is the feature subset of during the feature importing process. The mean with a larger value indicates that the corresponding feature ordering has greater discrimination ability than the others. This criterion is called the maximum mean discriminative criterion (MMDC), which has the capacity to select the optimum feature ordering for IAL.

In (1), AD refers to the accumulative discrimination ability of the dimensional feature space with all imported features, which is the ratio in -feature space between the multidimensional standard deviation of all class centers and the sum of all multidimensional standard deviations of all patterns in each class.

If is the pool of input features, , when the th feature is imported, AD is where is the centroid of vector with patterns belonging to .

Therefore, the results of (2) are calculated on the run when new features are gradually imported into training. To obtain better classification results, it is necessary to ensure that the result of (2) is the maximum in every step of feature importing. Here, std denotes the standard deviation in multidimensional space, which is derived by the standard deviation and Euclidean norm.

Let be the vector for standard deviation calculation; the standard deviation of is where the vector , is the value of th pattern, and is the total number of patterns. Obviously, in (3), the component is a distance between th pattern and its mean. This distance can be written as , the Euclidean norm of -dimensional feature space; Here is the total number of features imported so far. Therefore, to calculate the standard deviation of patterns in two dimensions, (3) can be written as and for a tridimensional space, the equation is Accordingly, multidimensional standard deviation used in (2) of patterns in an -dimensional space is

For EEG eye state identification, feature ordering can be derived based on (1) and MMDC in the first place, and then time-series data with sorted features are imported into the predictive systems according to the feature ordering. Moreover, it is necessary to notice that feature ordering should be obtained only from the training data, so that it is possible to get rid of the influence from the testing data during the preprocessing and training, and the classification results will be closer to the real situation. However, in the validation and testing stages, all the data should be sorted according to the feature ordering derived in the preprocessing by training dataset. Such an operation can ensure that all the features are trained, validated, and tested in the same ordering; and all the features are sorted according to their discrimination abilities. If the feature orderings in the training, validation, and testing phases are different, features with weak discrimination ability will be trained in an early stage. However, this will reduce the accuracy of the classification.

4. Time-Series Classification Approach Based on IAL

4.1. Time-Series Classification

There are two kinds of different times-series classification approaches: instance-based and feature-based [29] approaches. The instance-based approach predicts the classification results for the testing instances based on the similarity to the training instances. In this approach, the nearest-neighbor classifiers with Euclidean distance (NN-Euclidean) [30] or dynamic time warping (NN-DTW) [31, 32] were widely employed. In another aspect, the feature-based approach builds temporal features extracted from the original features and potentially can outperform instance-based classifiers. Feature-based classifiers commonly consist of two steps: (1) defining the temporal features and (2) training a classifier based on the temporal features defined. Due to the fact that IAL employs features one by one to the predictive system, it has little linkage to the instance-based approaches. Therefore, when IAL is employed to solve time-series classification, feature-based approach is more suitable than instance-based approach and should be employed by IAL during the time-series classification process.

4.2. Feature Extraction for Time-Series Classification

Before the formal time-series classification, it is necessary to preprocess the experimental data in two stages: firstly, feature extraction from original data and, secondly, feature ordering for IAL. In comparison with some other classification problems, apart from feature ordering which is a special preprocessing in IAL, temporal feature extraction from the original data is another unique step in time-series classification problems.

Temporal feature extraction aims to classify instances based on the original feature and the state difference in the time distance. Usually, the original features and those directly derived from original features are called first-order features, while features extracted from the state difference in different time distance are called second-order features. Equation (8) is the formula for the second-order features: where is the first-order feature, is the time distance length between two instances, and is the function with and . Theoretically, can be any calculation rules. Moreover, some statistical metrics like mean and standard deviation are often used for feature extraction [33].

In the process of time-series classification, feature extraction and feature ordering should be carried out before training. However, different from feature ordering which is derived only by training data, temporal feature extraction is a process for new feature building; thus it is necessary for it to be implemented with all datasets. Figure 2 demonstrates the working flow of a time-series classification system based on ITID.

5. Experiments

5.1. Benchmark

In this study, the EEG eye state corpus from UCI machine learning repository is employed for the experiments [13]. This EEG eye state dataset was donated by Rösler and Suendermann from Baden-Wuerttemberg Cooperative State University (DHBW), Stuttgart, Germany [12]. All data were derived from one continuous EEG measurement with the Emotiv EEG neuroheadset, which is shown in Figure 3. There are 14980 patterns and 14 features in the dataset, where the 14 features are the data obtained by 14 sensors shown in Figure 4. The duration of the measurement was 117 seconds. The eye state was detected via a camera during the EEG measurement and added later manually to the file after analyzing the video frames.

In this eye state corpus, there are three instances with the numbers 899, 10387, and 11510 having obvious errors. Their values are out of almost 15 k, so they are outliers, which should be deleted before the experiments. Therefore, only 14977 instances are employed in Rösler’s experiments. To compare the results derived from our experiments with those obtained in the previous studies, those three error instances are also discarded in our experiments.

For the output of the corpus, “1” indicates the eye-closed and “0” the eye-open state. There are 6722 legal eye-closed instances and 8255 legal eye-open instances. All values are in chronological order with the first measured value at the top of the data.

Table 1 is the overview of 14977 legal values obtained by those 14 sensors. It presents the minimum and maximum values and means and standard deviations of eye-closed and eye-open states, respectively. According to this table, it is manifested that although the minimum and maximum values are quite different in both eye-closed and eye-open states, the means of these two states of the same feature are almost the same. Nevertheless, the standard deviations of both states have a large difference which is obvious in the eye-open state. Moreover, it can be easily found that the maximum values of eye-open state of features F7, F3, F4, FC6, T8, P8, and O2 are much higher than the same feature’s maximum values in the eye-closed state; and for features AF3, AF4, FC5, F8, T7, P7, and O1, the minimum values of eye-open state are much lower than those in eye-closed state. Therefore, both the mean and standard deviation should be extracted as new features in the eye state classification process.

5.2. Experiments

During the experiments, all the patterns were partitioned for training, validation, and testing with the divisions of 50%, 25%, and 25%, respectively, and sorted according to the time-series sequence. Six different approaches for eye state classification were employed in the experiments. They are designed as shown in Table 2, while Table 3 presents the final classification results by different approaches.

(1) IAL with Feature Extraction and Time-Series 1. This is a time-series approach based on IAL. All the features used in this approach are second-order features, which are the averages and standard deviations derived from every 12 instances of original features. These features have been shown in (9), which were improved from (8), The time distance here is 12, where ,. This time distance is equal to the shortest length of a blink, which is 100 milliseconds, because the total number of instances in the eye state corpus is 14980, which was recorded in 117 seconds. As a result of that, there are 14965 instances and 28 features. Before the classification, feature ordering was derived by AD and MMDC, which has been shown in Table 4. It is obvious that the standard deviations played a more important role in the classification, because all the standard deviations were imported earlier than the averages except the thirteenth feature which is an average in the first place. Such a phenomenon coincides with the data shown in Table 1, where the averages are approximately the same, while the standard deviations are quite different. In addition, the number of features in this approach is 28. It needs feature selection; otherwise the computational cost will be large. Therefore an IAL approach with feature selection is employed [34]. At last only two features were used in this time-series classification, and it obtained a good result in classification error rate, which is 27.3991%.

(2) IAL with Feature Extraction and Time-Series 2. This is also an IAL time-series classification approach. With the feature ordering based on AD and MMDC and feature extraction based on average and standard deviation, this approach also follows the model shown in Figure 2. To investigate the microinfluence brought by time-series, the time duration is set to be 1; namely, . Equation (8) here is Four extracted features were employed in this approach: the first-order means and the first-order standard deviations of all the values of each original instance and the second-order average and second-order standard deviation of all the values of each instance calculated by (10). Therefore, the total instance number in this approach is 14976. Based on the IAL feature ordering metric using AD, these features were sorted following this order: the first-order means of instances, the first-order standard deviations of instances, the second-order standard deviation, and the second-order average. This feature ordering has been shown in Table 4. The error rate obtained in the final classification result is 27.4573%.

(3) IAL with Feature Extraction. This approach is IAL classification without time-series. In comparison with the first and second approaches, this experiment is designed to show the effect of time-series factors, namely, in (8), which is employed in the previous experiments but will not be used in this experiment. Thus there are no time-series factors considered in this approach. As a result, there are no second-order features. Consequently, there are no second-order average and second-order standard deviation. Therefore, this experiment only retains the first-order feature extraction process. During the experiment, IAL with feature extraction is employed merely based on the first-order average and first-order standard deviation, but without considering time-series factors. In this way, all the original features are trained according to the original ordering with newly built average and standard deviation of each instance following behind. This has been shown in Table 4. The final classification error rate is 27.4793%, which is slightly worse than the first approach.

(4) Pure IAL Approach. This approach only employs IAL with feature ordering derived by AD and MMDC. There is no new feature extracted from the original data. Moreover, no time-series impact factors are used in this approach. Such a design aims to find out the influence brought by time-series and first- and second-order feature extraction. The feature ordering is shown in Table 4. The classification error rate is 27.4693%, which approximates to the results obtained in the first and second approaches.

(5) Time-Series Classification with Batch-Training. This method has the same consideration of time-series, which is similar to the second approach, except that all features are trained using the conventional batch-training method. The time duration is the same as that in approach 2, where . Therefore, the total instance number in this approach is 14976. All the features are extracted according to (10) and trained in one batch. As such, there is no average or standard deviation extracted from the extract vector derived by (10). The objective of this experiment is to check the effect of IAL and second-order feature extraction compared using the first and second approaches. The error rate in the final time-series classification is 29.5046%, which is much higher than in the previous IAL approaches.

(6) Conventional Batch-Training Method. The last approach is the conventional method based on back propagation neural networks without considering time-series, whereby all the original features are directly trained in one batch. This approach obtains the highest error rate among all four experiments. The error rate is about 30.6328% in the final classification result.

5.3. Result Analysis

According to the experimental results of these six different approaches shown and compared in Table 3, the first approach obtains the lowest classification error rate and the conventional batch-training method exhibits the worst and highest classification error rate of 30.63%. In comparison with Rösler’s experimental results using multilayer perceptron, where the error rate is more than 30% [12], the results derived by IAL approaches are much better. All of these classification error rates are lower than 30%. This merely indicates that, firstly, IAL approach can outperform conventional batch-training methods; secondly, feature extraction with time-series properties is very useful in the improvement of the classification results for time-series problems; thirdly, feature ordering is very important to IAL. Moreover, feature orderings derived by AD and MMDC can outperform the original feature ordering in IAL.

6. Conclusions

In this paper, a time-series classification approach based on IAL is proposed for EEG eye state identification. The approach is novel in a way that it firstly extracts features from the raw data and then sorts these features using IAL feature ordering approach according to feature’s discrimination ability. During the training process, the newly extracted features are imported into the neural predictive system in a sequential order based on the feature ordering. In comparison with the conventional batch-training methods and feature extraction method without considering the relation between time-series data, the experimental results of time-series IAL showed that such a machine learning approach can not only cope with time-series classification problems but also improve the accuracy of the classification results. Moreover, the experimental results also imply that the relation among time-series data is crucial to the data analysis in such classification problems.

In future, some issues remain as open topics for further research. For instance, besides the mean and standard deviation, whether there exist any other time-series related features for time-series classification is still an open problem. Secondly, the approach to extract new second-order features from raw data is also significant. Thirdly, the correlations between second-order features and first-order features often vary. Hence, the influence existing in the time-series classification process is still unknown. Last but not the least, whether there exists an optimal method for both first- and second-order feature ordering in IAL training is also a challenging problem for time-series classification.

In general, the feasibility of IAL-based time-series classification approach proposed in this paper has been validated by EEG eye state identification experiments. The final results indicated that IAL is applicable for EEG time-series classification. However, there are still a number of work items remaining for the future studies.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This research is supported by the National Natural Science Foundation of China under Grant no. 61070085 and Jiangsu Provincial Science and Technology under Grant no. BK20131182.