#### Abstract

Technically, a feature represents a distinguishing property, a recognizable measurement, and a functional component obtained from a section of a pattern. Extracted features are meant to minimize the loss of important information embedded in the signal. In addition, they also simplify the amount of resources needed to describe a huge set of data accurately. This is necessary to minimize the complexity of implementation, to reduce the cost of information processing, and to cancel the potential need to compress the information. More recently, a variety of methods have been widely used to extract the features from EEG signals, among these methods are time frequency distributions (TFD), fast fourier transform (FFT), eigenvector methods (EM), wavelet transform (WT), and auto regressive method (ARM), and so on. In general, the analysis of EEG signal has been the subject of several studies, because of its ability to yield an objective mode of recording brain stimulation which is widely used in brain-computer interface researches with application in medical diagnosis and rehabilitation engineering. The purposes of this paper, therefore, shall be discussing some conventional methods of EEG feature extraction methods, comparing their performances for specific task, and finally, recommending the most suitable method for feature extraction based on performance.

#### 1. Introduction

In recent years, brain computer interface and intelligent signal segmentation have attracted a great interest ranging from medicine to military objectives [1–6]. To facilitate brain-computer interface assembly, a professional method of feature extraction from EEG signal is desired.

The brain electrical activity is represented by the electroencephalogram (EEG) signals. Many neurological diseases (i.e., epilepsy) can be diagnosed by studying the EEG signals [7–9]. The recoding of the EEG signals is performed by fixing an electrode on the subject scalp using the standardized electrode placement scheme (Figure 1) [10–12]. However, there are many sources of artifacts. The signal noise which can set in when signal is being captured will adversely affect the useful feature in the original signal. The major sources of the artifact are muscular activities, blinking of eyes during signal acquisition procedure, and power line electrical noise [13]. Many methods have been introduced to eliminate these unwanted signals. Each of them has its advantages and disadvantages. Nevertheless, there is a common path for EEG signal processing (Figure 2). The first part is preprocessing which includes acquisition of signal, removal of artifacts, signal averaging, thresholding of the output, enhancement of the resulting signal, and finally, edge detection. The second step in the operation is the feature extraction scheme which is meant to determine a feature vector from a regular vector. A feature is a distinctive or characteristic measurement, transform, structural component extracted from a segment of a pattern [14]. Statistical characteristics and syntactic descriptions are the two major subdivisions of the conventional feature extraction modalities. Feature extraction scheme is meant to choose the features or information which is the most important for classification exercise [15–17]. The final stage is signal classification which can be solved by linear analysis, nonlinear analysis, adaptive algorithms, clustering and fuzzy techniques, and neural networks. This is done by exploiting the algorithmic characteristics of the feature vector of the data input and thus gives rise to a hypothesis [10, 15].

This paper presents a short review of mathematical methods for extracting features from EEG signals. The review considers five different methods for EEG signal extracting. The adopted approach is such that a full literature review is introduced for the five different techniques, summarizing their strengths and weaknesses.

#### 2. Methods

Different articles were used to extract advantages and disadvantages of selected methods by thoroughly reviewing chosen articles including the main methods for linear analysis of one-dimensional signals in the frequency or time-frequency domain. Different common methods of interest were compared and the general advantages and disadvantages of these modalities were discussed.

##### 2.1. Fast Fourier Transform (FFT) Method

This method employs mathematical means or tools to EEG data analysis. Characteristics of the acquired EEG signal to be analyzed are computed by power spectral density (PSD) estimation in order to selectively represent the EEG samples signal. However, four frequency bands contain the major characteristic waveforms of EEG spectrum [18].

The PSD is calculated by Fourier transforming the estimated autocorrelation sequence which is found by nonparametric methods. One of these methods is Welch's method. The data sequence is applied to data windowing, producing modified periodograms [19]. The information sequence is expressed as take to be the point of start of the th sequence. Then of length represents data segments that are formed. The resulting output periodograms give Here, in the window function, gives normalization factor of the power and is chosen such that where is the window function. The average of these modified periodograms gives Welch’s power spectrum as follows:

##### 2.2. Wavelet Transform (WT) Method

WT plays an important role in the recognition and diagnostic field: it compresses the time-varying biomedical signal, which comprises many data points, into a small few parameters that represents the signal [14].

As the EEG signal is nonstationary [7], the most suitable way for feature extraction from the raw data is the use of the time-frequency domain methods like wavelet transform (WT) which is a spectral estimation technique in which any general function can be expressed as an infinite series of wavelets [20–22]. Since WT allows the use of variable sized windows, it gives a more flexible way of time-frequency representation of a signal. In order to get a finer low-frequency resolution, WT long time windows are used; in contrast in order to get high-frequency information, short time windows are used [13].

Furthermore, WT only involves multiscale structure and not single scale. This method is just the continuation of the orthodox Fourier transform method [23]. Moreover, it is meant to resolve issues of nonstationary signals such as EEG [14]. In the WT method, the original EEG signal is represented by secured and simple building blocks known as wavelets. The mother wavelet gives rise to these wavelets as part of derived functions through translation and dilation, that is, (shifting) and (compression and stretching) operations along the time axis, respectively [24]. There are two categories for the WT; the first one is continuous while the other one is discrete [14].

###### 2.2.1. Continuous Wavelet Transform (CWT) Method

This can be expressed as stands for the unprocessed EEG, where stands for dilation, and represents translation factor. The denotes the complex conjugate and can be calculated by where means wavelet. However, its major weakness is that scaling parameter and translation parameter of CWT change continuously. Thus, the coefficients of the wavelet for all available scales after calculation will consume a lot of effort and yield a lot of unused information [14].

###### 2.2.2. Discrete Wavelet Transform (DWT)

In order to address the weakness of the CWT, discrete wavelet transform (DWT) has been defined on the base of multiscale feature representation. Every scale under consideration represents a unique thickness of the EEG signal [23]. The multiresolution decomposition of the raw EEG data is shown in Figure 3. Each step contains two digital filters, and , and two downsamplers by 2. *The* discrete mother wavelet is a high pass in nature, while its mirror image is is a low-pass in nature.

As shown in Figure 3, each stage output provides a detail of the signal and an approximation of the signal , where the latest becomes an input for the next step. The number of levels to which the wavelet decomposes is chosen depending on the component of the EEG data with dominant frequency [14].

The relationship between WTs and filter , that is, low pass, can be represented as follows: Here, represents filter’s -transform. The high-pass filter’s complementary -transform is expressed as By precisely describing the features of the signal segment within a specified frequency domain and localized time domain properties, there are a lot of advantages that overshadow the high computational and memory requirement of the conventional convolution based implementation of the DWT [14, 23].

##### 2.3. Eigenvectors

These methods are employed to calculate signals’ frequency and power from artifact dominated measurements. The essence of these methods is the potential of the Eigen decomposition to correlate even artifact corrupted signal. There are a few available eigenvector methods, among them are Pisarenko’s method, MUSIC method, and minimum-norm method [25, 26].

###### 2.3.1. Pisarenko’s Method

Pisarenko’s method is among the available eigenvector approaches used to evaluate power spectral density (PSD). To calculate the PSD, the mathematical expression is employed and given as [27, 28] In the equation above, stands for coefficients of the defined equation and defines eigenfilter’s order [25, 26]. Pisarenko method uses signal desired equation to estimate the signal’s PSD from eigenvector equivalent to the minimum eigenvalue as follows:

###### 2.3.2. MUSIC Method

This method eradicates issues related to false zeros by the help of the spectra’s average equivalent to artifact subspace of the whole eigenvectors [28]. Resulting power spectral density is therefore obtained as

###### 2.3.3. Minimum Norm Method

This method makes false zeros in the unit circle to separate them from real zeros to be able to calculate a demanded noise subspace vector from either the noise or signal subspace eigenvectors. However, while the Pisarenko technique form application of only the noise subspace eigenvector corresponding to the minimum eigenvalue, the minimum norm technique picks a linear combination of the whole of noise subspace eigenvectors [25, 26]. This technique is depicted by

All the aforementioned eigenvector methods can best address the signal that is composed of many distinctive sinusoids embedded in noise. Consequently, they are prone to yield false zeros and thus resulting in a relatively poor statistical accuracy [26].

##### 2.4. Time-Frequency Distributions

These methods require noiseless signals to provide good performance. Therefore, very restricted preprocessing stage is necessary to get rid of all sorts of artifacts. Being time-frequency methods they deal with the stationary principle; windowing process is therefore required in the preprocessing module [29]. The definition of TFD for a signal was generalized by Cohen as [30] where is popularly known as ambiguity Function, and refers to kernel of the distribution, while and are time and frequency dummy variables, respectively.

Smooth pseudo-Wigner-Ville (SPWV) distribution is a variant method which incorporates smoothing by independent windows in time and frequency, namely, and [29]:

The feature extraction using this method is based on the energy, frequency, and the length of the principal track. Each segment gives the values , , and . The EEG signal is firstly divided into segments; then, the construction of a three-dimensional feature vector for each segment will take place. Energy of each segment can be calculated as follows: where stands for the time-frequency representation of the segment. However, to calculate the frequency of each segment , we make use of the marginal frequency as follows:

Finally, for achieving good results, noiseless EEG signals or a well-denoised signal should be used for TFD [30].

##### 2.5. Autoregressive Method

Autoregressive (AR) methods estimate the power spectrum density (PSD) of the EEG using a parametric approach. Therefore, AR methods do not have problem of spectral leakage and thus yield better frequency resolution unlike nonparametric approach. Estimation of PSD is achieved by calculating the coefficients, that is, the parameters of the linear system under consideration. Two methods used to estimate AR models are briefly described below [18, 19].

###### 2.5.1. Yule-Walker Method

In this method, AR parameters or coefficients are estimated by exploiting the resulting biased approximate of the autocorrelation data function. This is done by subsequently finding the minimization of the least squares of the forward prediction error as given below [31]: where can be defined by Calculating the above set of linear equations, the AR coefficients can be obtained: while gives the approximated lowest mean square error of the th-order predictor given as follows:

###### 2.5.2. Burg’s Method

It is an AR spectral estimation based on reducing the forward and backward prediction errors to satisfy Levinson-Durbin recursion [8]. Burg’s method estimates the reflection coefficient directly without the need to calculate the autocorrelation function. This method has the following strength: Burg’s method can estimate PSD’s data records to look exactly like the original data value. It can yield intimately packed sinusoids in signals once it contains minimal level of noise.

The difference between method of Yule-Walker and Burg's method is in the way of calculating the PSD. For Burg's method, the PSD is estimated as follows:

Parametric methods like autoregressive one reduce the spectral leakage issues and yield better frequency resolution. However, selecting the proper model order is a very serious problem. Once the order is too high, the output will induce false peaks in the spectra. If the order is too low, the result will produce smooth spectra [32].

#### 3. Performance of Methods

The general aim of this review is to shed light on EEG signal feature extraction and to show how fast the method used for the signal extraction and how reliable it will be the extracted EEG signal features. Moreover, how these extracted features would express the states of the brain for different mental tasks, and to be able to yield an exact classification and translation of mental tasks. The speed and accuracy of the feature extraction stage of EEG signal processing are therefore very crucial, in order not to lose vital information at a reasonable time. So far in the discussed literature, wavelet method is introduced as a solution for unstable signals; it includes the representation by wavelets which are a group of functions derived from the mother wavelet by dilation and translation processes. The window with varying size is the most significant specification of this method since it ensures the suitable time frequency resolution in all frequency range [26]. Autoregression analysis suffers from speed and is not always applicable in real-time analysis while FFT appears to be the least efficient of the discussed methods because of its inability to examine nonstationary signals. The strength of AR method can be emphasized by further comparing its performance with that of classical FFT as shown in Table 1.

It is highly recommend to use AR method in conjunction with more conservative methods, such as periodograms, to help to choose the correct model order and to avoid getting fooled by spurious spectral features [32].

The most important application for eigenvectors is to evaluate frequencies and powers of signals from noise corrupted signal; the principle of this method is the decomposition of the correlation matrix of the noise corrupted. Three methods for eigenvectors module were discussed: Pisarenko, multiple signal classification (MUSIC), and minimum norm [27]. The good thing about the eigenvector method is that it produces frequency spectra of high resolution even when the signal-to-noise ratio (SNR) is low. However, this method may produce spurious zeros leading to poor statistical accuracy [26].

The TFD method offers the possibility to analyze relatively long continuous segments of EEG data even when the dynamics of the signal are rapidly changing. At the same time a good resolution both in time and frequency is necessary, making this method not preferable to use in many cases [30].

Table 2 shows the summary of advantages and disadvantages of the above-mentioned methods, their accuracies, speeds, and suitability to make it easier to compare their performances.

#### 4. Conclusion

Five of the well-known methods for frequency domain and time-frequency domain methods were discussed. Acclaim about the definite priority of methods according to their capability is very hard. The findings indicate that each method has specific advantages and disadvantages which make it appropriate for special type of signals. Frequency domain methods may not provide high-quality performance for some EEG signals. In contrast, time-frequency methods, for instance, may not provide detailed information on EEG analysis as much as frequency domain methods. It is crucial to make clear the of the signal to be analyzed in the application of the method, whenever the performance of analyzing method is discussed. Considering this, the optimum method for any application might be different.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.