Table of Contents Author Guidelines Submit a Manuscript
Computational Intelligence and Neuroscience
Volume 2011, Article ID 406391, 7 pages
Research Article

PyEEG: An Open Source Python Module for EEG/MEG Feature Extraction

1Department of Computer Science, Department of Electrical Engineering, Texas Tech University, Lubbock TX 79409-3104, USA
2ECHO Labs, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
3Department of Physiology, McGill University, Montreal, QC, Canada H3G 1Y6

Received 31 August 2010; Revised 26 October 2010; Accepted 31 December 2010

Academic Editor: Sylvain Baillet

Copyright © 2011 Forrest Sheng Bao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Computer-aided diagnosis of neural diseases from EEG signals (or other physiological signals that can be treated as time series, e.g., MEG) is an emerging field that has gained much attention in past years. Extracting features is a key component in the analysis of EEG signals. In our previous works, we have implemented many EEG feature extraction functions in the Python programming language. As Python is gaining more ground in scientific computing, an open source Python module for extracting EEG features has the potential to save much time for computational neuroscientists. In this paper, we introduce PyEEG, an open source Python module for EEG feature extraction.

1. Introduction

Computer-aided diagnosis based on EEG has become possible in the last decade for several neurological diseases such as Alzheimer's disease [1, 2] and epilepsy [3, 4]. Implemented systems can be very useful in the early diagnosis of those diseases. For example, traditional epilepsy diagnosis may require trained physicians to visually screen lengthy EEG records whereas computer-aided systems can shorten this time-consuming procedure by detecting and picking out EEG segments of interest to physicians [5, 6]. On top of that, computers can extend our ability to analyze signals. Recently, researchers have developed systems [3, 4, 7, 8] that can hopefully use (any) random interictal (i.e., non-seizure) EEG records for epilepsy diagnosis in instances that are difficult for physicians to make diagnostic decisions with their naked eyes. In addition to analyzing existing signals, this computer-based approach can help us model the brain and predict future signals, for example, seizure prediction [9, 10].

All the above systems rely on characterizing the EEG signal into certain features, a step known as feature extraction. EEG features can come from different fields that study time series: power spectral density from signal processing, fractal dimensions from computational geometry, entropies from information theory, and so forth. An open source tool that can extract EEG features would benefit the computational neuroscience community since feature extraction is repeatedly invoked in the analysis of EEG signals. Because of Python's increasing popularity in scientific computing, and especially in computational neuroscience, a Python module for EEG feature extraction would be highly useful. In response, we have developed PyEEG, a Python module for EEG feature extraction, and have tested it in our previous epileptic EEG research [3, 8, 11].

Compared to other popular programming languages in scientific computing such as C++ or MATLAB, Python is an open source scripting language of simple syntax and various high-level libraries (for detailed advantages of Python, read, such as Scipy ( which allows users to run MATLAB codes after slight modification. There have been several popular open source Python projects in the neuroimaging community already, such as NIPY ( However, in neural physiology community, Python is not yet quite popular. As we are not aware of any open source tools in Python (or other programming languages) that can extract EEG features as mentioned above, we introduce and release PyEEG in this paper.

Though originally designed for EEG, PyEEG can also be used to analyze other physiological signals that can be treated as time series, especially MEG signals that represent the magnetic fields induced by currents of neural electrical activities.

The rest of the paper is organized as follows. In Section 2, we introduce the framework of PyEEG. Section 3 gives the definitions to compute EEG features. A tutorial of applying PyEEG onto a public real EEG dataset is given in Section 4. Section 5 concludes the paper.

2. Main Framework

PyEEG’s target users are programmers (anyone who writes programs) working on computational neuroscience. Figure 1 shows its framework. PyEEG is a Python module that focuses only on extracting features from EEG/MEG segments. Therefore, it does not contain functions to import data of various formats or export features to a classifier. This is due to the modularity and composition principles of building open source software which indicate that small programs that can work well together via simple interfaces are better than big monolithic programs. Since open source tools like EEG/MEG data importers (e.g., EEGLab, Biosig, etc.) and classifier front-ends are already available, there is no need for us to reinvent the wheel. Users can easily hook PyEEG up with various existing open source software to build toolchains for their EEG/MEG research.

Figure 1: PyEEG framework.

PyEEG consists of two sets of functions. (1)Preprocessing functions, which do not return any feature values. Only two such functions have been implemented so far. embed_seq() builds embedding sequence (from given lag and embedding dimension) and first_order_diff() computes first-order differential sequence. One can build differential sequences of higher orders by repeatedly applying first-order differential computing. (2)Feature extraction functions, that return feature values. These are listed in Table 1.

Table 1: PyEEG-supported features and extraction functions with their return types.

PyEEG only uses functions in standard Python library and SciPy, the de facto Python module for scientific computing. PyEEG does not define any new data structure, but instead uses only standard Python and NumPy data structures. The reason is that we want to simplify the use of PyEEG, especially for users without much programming background. The inputs of all functions are a time sequence as a list of floating-point numbers and a set of optional feature extraction parameters. Parameters have default values. The output of a feature extraction function is a floating-point number if the feature is a scalar or a list of floating-point numbers (a vector) otherwise. Details about functions are available in the PyEEG reference guide at

3. Supported Feature Extraction

In this section, we detail the definitions and computation procedures to extract EEG features (as shown in Table 1) in PyEEG. Since there are many parameters and various algorithms for one feature, the numerical value of a feature extracted by PyEEG may be different from that extracted by other toolboxes. Users may need to adjust our code or use non-default values for the parameters in order to meet their needs. Please note that the index of an array or a vector starts from 1 rather than 0 in this section.

3.1. Power Spectral Intensity and Relative Intensity Ratio

To a time series [𝑥1,𝑥2,,𝑥𝑁], denote its Fast Fourier Transform (FFT) result as [𝑋1,𝑋2,,𝑋𝑁]. A continuous frequency band from 𝑓low to 𝑓up is sliced into 𝐾 bins, which can be of equal width or not. Boundaries of bins are specified by a vector 𝑏𝑎𝑛𝑑=[𝑓1,𝑓2,,𝑓𝐾], such that the lower and upper frequencies of the 𝑖th bin are 𝑓𝑖 and 𝑓𝑖+1, respectively. Commonly used unequal bins are EEG/MEG rhythms, which are, 𝛿(0.5-4Hz), 𝜃(4-7Hz), 𝛼(8-12Hz), 𝛽(12-30Hz), and 𝛾(30-100Hz). For these bins, we have 𝑏𝑎𝑛𝑑=[0.5,4,7,12,30,100].

The Power Spectral Intensity (PSI) [12] of the 𝑘th bin is evaluated as PSI𝑘=𝑁(𝑓𝑘+1/𝑓s)𝑖=𝑁(𝑓𝑘/𝑓s)||𝑋𝑖||,𝑘=1,2,,𝐾1,(1) where 𝑓s is the sampling rate, and 𝑁 is the series length.

Relative Intensity Ratio (RIR) [12] is defined on top of PSI RIR𝑗=PSI𝑗𝐾1𝑘=1PSI𝑘,𝑗=1,2,,𝐾1.(2) PSI and RIR are both vector features.

3.2. Petrosian Fractal Dimension (PFD)

To a time series, PFD is defined as PFD=log10𝑁log10𝑁+log10𝑁/𝑁+0.4𝑁𝛿,(3) where 𝑁 is the series length, and 𝑁𝛿 is the number of sign changes in the signal derivative [13]. PFD is a scalar feature.

3.3. Higuchi Fractal Dimension (HFD)

Higuchi's algorithm [14] constructs 𝑘 new series from the original series [𝑥1,𝑥2,,𝑥𝑁] by𝑥𝑚,𝑥𝑚+𝑘,𝑥𝑚+2𝑘,,𝑥𝑚+(𝑁𝑚)/𝑘𝑘,(4) where 𝑚=1,2,,𝑘.

For each time series constructed from (4), the length 𝐿(𝑚,𝑘) is computed by 𝐿(𝑚,𝑘)=(𝑁𝑚)/𝑘𝑖=2||𝑥𝑚+𝑖𝑘𝑥𝑚+(𝑖1)𝑘||(𝑁1).(𝑁𝑚)/𝑘𝑘(5)

The average length is computed as 𝐿(𝑘)=[𝑘𝑖=1𝐿(𝑖,𝑘)]/𝑘.

This procedure repeats 𝑘max times for each 𝑘 from 1 to 𝑘max, and then uses a least-square method to determine the slope of the line that best fits the curve of ln(𝐿(𝑘)) versus ln(1/𝑘). The slope is the Higuchi Fractal Dimension. HFD is a scalar feature.

3.4. Hjorth Parameters

To a time series [𝑥1,𝑥2,,𝑥𝑁], the Hjorth mobility and complexity [15] are, respectively, defined as 𝑀2/TP and (𝑀4TP)/(𝑀2𝑀2), where 𝑥TP=𝑖/𝑁, 𝑑𝑀2=𝑖/𝑁, 𝑀4=(𝑑𝑖𝑑𝑖1)2/𝑁, and 𝑑𝑖=𝑥𝑖𝑥𝑖1. Hjorth mobility and complexity are both scalar features.

3.5. Spectral Entropy

The spectral entropy [16] is defined as follows1𝐻=log(𝐾)𝐾𝑖=1RIR𝑖logRIR𝑖,(6) where RIR𝑖 and 𝐾 are defined in (2). Spectral entropy is a scalar feature.

3.6. SVD Entropy

Reference [17] defines an entropy measure using Singular Value Decomposition (SVD). Let the input signal be [𝑥1,𝑥2,,𝑥𝑁]. We construct delay vectors as 𝐲𝑥(𝑖)=𝑖,𝑥𝑖+𝜏,,𝑥𝑖+(𝑑𝐸1)𝜏,(7) where 𝜏 is the delay and 𝑑𝐸 is the embedding dimension. In this paper, 𝑑𝐸=20 and 𝜏=2. The embedding space is then constructed by 𝑑𝑌=𝐲(1),𝐲(2),,𝐲𝑁𝐸𝜏1𝑇.(8)

The SVD is then performed on matrix 𝑌 to produce 𝑀 singular values, 𝜎1,,𝜎𝑀, known as the singular spectrum.

The SVD entropy is then defined as𝐻SVD=𝑀𝑖=1𝜎𝑖log2𝜎𝑖,(9) where 𝑀 is the number of singular values and 𝜎1,,𝜎𝑀 are normalized singular values such that 𝜎𝑖=𝜎𝑖/𝑀𝑗=1𝜎𝑗. SVD entropy is a scalar feature.

3.7. Fisher Information

The Fisher information [18] can be defined in normalized singular spectrum used in (9) 𝐼=𝑀1𝑖=1𝜎𝑖+1𝜎𝑖2𝜎𝑖.(10) Fisher information is a scalar feature.

3.8. Approximate Entropy

Approximate entropy (ApEn) is a statistical parameter to quantify the regularity of a time series [19].

ApEn is computed by the following steps.(1)Let the input signal be [𝑥1,𝑥2,,𝑥𝑁]. (2)Build subsequence 𝑥(𝑖,𝑚)=[𝑥𝑖,𝑥𝑖+1,,𝑥𝑖+𝑚1] for 1𝑖𝑁𝑚, where 𝑚 is the length of the subsequence. In [7], 𝑚=1,2, or 3. (3)Let 𝑟 represent the noise filter level, defined as 𝑟=𝑘×SD for 𝑘=0,0.1,0.2,,0.9. (4)Build a set of subsequences {𝑥(𝑗,𝑚)}={𝑥(𝑗,𝑚)𝑗[1..𝑁𝑚]}, where 𝑥(𝑗,𝑚) is defined in step 2. (5)For each 𝑥(𝑖,𝑚){𝑥(𝑗,𝑚)}, compute 𝐶(𝑖,𝑚)=𝑁𝑚𝑗=1𝑘𝑗,𝑁𝑚(11) where 𝑘𝑗=||||1if𝑥(𝑖,𝑚)𝑥(𝑗,𝑚)<𝑟,0otherwise.(12)(6)1ApEn(𝑚,𝑟,𝑁)=𝑁𝑀𝑁𝑚𝑖=1ln𝐶(𝑖,𝑚)𝐶(𝑖,𝑚+1).(13)

ApEn is a scalar feature.

3.9. Detrended Fluctuation Analysis

Detrended Fluctuation Analysis (DFA) is proposed in [20].

The procedures to compute DFA of a time series [𝑥1,𝑥2,,𝑥𝑁] are as follows. (1)First integrate 𝑥 into a new series 𝑦=[𝑦(1),,𝑦(𝑁)], where 𝑦(𝑘)=𝑘𝑖=1(𝑥𝑖𝑥) and 𝑥 is the average of 𝑥1,𝑥2,,𝑥𝑁. (2)The integrated series is then sliced into boxes of equal length 𝑛. In each box of length 𝑛, a least-squares line is fit to the data, representing the trend in that box. The 𝑦 coordinate of the straight line segments is denoted by 𝑦𝑛(𝑘). (3)The root-mean-square fluctuation of the integrated series is calculated by 𝐹(𝑛)=(1/𝑁)𝑁𝑘=1[𝑦(𝑘)𝑦𝑛(𝑘)]2, where the part 𝑦(𝑘)𝑦𝑛(𝑘) is called detrending. (4)The fluctuation can be defined as the slope of the line relating log𝐹(𝑛) to log𝑛.

DFA is a scalar feature.

3.10. Hurst Exponent

The hurst exponent (HURST) [21] is also called Rescaled Range statistics (R/S). To calculate the hurst exponent for time series 𝑋=[𝑥1,𝑥2,,𝑥𝑁], the first step is to calculate the accumulated deviation from the mean of time series within range 𝑇𝑋(𝑡,𝑇)=𝑡𝑖=1𝑥𝑖𝑥,where1𝑥=𝑇𝑇𝑖=1𝑥𝑖[].,𝑡1..𝑁(14) Then, R(𝑇)/S(𝑇) is calculated as R(𝑇)=S(𝑇)max(𝑋(𝑡,𝑇))min(𝑋(𝑡,𝑇))(1/𝑇)𝑇𝑡=1𝑥(𝑡)𝑥2.(15) The Hurst Exponent is obtained by calculating the slope of the line produced by ln(R(𝑛)/S(𝑛)) versus ln(𝑛) for 𝑛[2..𝑁]. Hurst Exponent is a scalar feature.

4. Using PyEEG on Real Data

In this section, we use PyEEG on a real EEG dataset to demonstrate its use in everyday research.

The dataset (, from Klinik für Epileptologie, Universität Bonn, Germany [22], has been widely used in previous epilepsy research. In total, there are five sets, each containing 100 single-channel EEG segments. Each segment has 4096 samples. Data in sets A and B are extracranial EEGs from 5 healthy volunteers with eyes open and eyes closed, respectively. Sets C and D are intracranial data over interictal periods while Set E over ictal periods. Segments in D are from within the epileptogenic zone, while those in C are from the hippocampal formation of the opposite hemisphere of the brain. Sets C, D, and E are composed from EEGs of 5 patients. The data had a spectral bandwidth of 0.5–85 Hz. Please refer to [22] for more details.

Using PyEEG is like using any other Python module. Users simply need to import PyEEG and then call its functions as needed. PyEEG is provided as a single Python file. Therefore, it only needs to be downloaded and placed under a directory on Python module search paths, such as the working directory. Alternatively, PYTHONPATH environment variable can be set to point to the location of PyEEG.

On Python interpreter, we first import PyEEG and load the data >>> import pyeeg >>> fid = open('Z001.txt', 'r') >>> tmp = fid.readlines() >>> data = [float(k) for k in tmp]

where Z001.txt is the first segment in set A. The data type of data is list. After loading EEG data, we can use PyEEG to extract features as follows (using all default parameters): >>> DFA = pyeeg.dfa(data)>>> DFA 0.81450526948129354>>> Hurst_Exponent = pyeeg.hurst(data) >>> Hurst_Exponent0.68053321812240675>>> PFD = pyeeg.pfd(data)>>> PFD0.58651018327048932

Due to space limitations, we are not able to print all feature values of all EEG segments. Instead, we visualize the averages of the features (except RIR and PSI) within each of the five sets in Figure 2. Error bars represent the variances of features in each set. PSIs for five sets are plotted in Figure 3. Users can replot these pictures and get averages of features on Python interpreter by a testing script ( from our project website.

Figure 2: Distributions of ten features extracted by PyEEG in each set.
Figure 3: Average PSI of each set. Note that the scale in 𝑦-axis of set E is much larger than that of other sets.

From Figures 2 and 3, we can see that healthy, interictal, and ictal EEG signals have different distributions for most features. Table 2 lists parameters used in this experiment.

Table 2: Values of parameters used in our example.

5. Discussion and Future Development

So far, we have listed features that can be extracted by PyEEG and their definitions. Our implementation sticks on their definitions precisely even though faster algorithms may exist. There are many other EEG features, such as Lyapunov Exponents, that have not been yet implemented in PyEEG. More EEG features will be added into PyEEG in the future while we finish unit testing and documentation for each function. In personal emails, some open source projects, such as ConnectomeViewer ( and NIPY/PBrain (, have expressed the interest in including PyEEG into their code. Therefore, we will keep maintaining PyEEG as long as it can benefit the entire computational neuroscience community.


The software is released under GNU GPL v.3 at Google Code: No commercial software is required to run PyEEG. Because Python is cross-platform, PyEEG can run on Linux, Mac OS, and Windows.


  1. J. Dauwels, F. Vialatte, and A. Cichocki, “A comparative study of synchrony measures for the early detection of Alzheimer's disease based on EEG,” in Proceedings of the 14th International Conference on Neural Information Processing (ICONIP '07), vol. 4984 of Lecture Notes in Computer Science, pp. 112–125, 2008. View at Publisher · View at Google Scholar
  2. A. A. Petrosian, D. V. Prokhorov, W. Lajara-Nanson, and R. B. Schiffer, “Recurrent neural network-based approach for early recognition of Alzheimer's disease in EEG,” Clinical Neurophysiology, vol. 112, no. 8, pp. 1378–1387, 2001. View at Publisher · View at Google Scholar · View at Scopus
  3. F. S. Bao, D. Y. C. Lie, and Y. Zhang, “A new approach to automated epileptic diagnosis using EEG and probabilistic neural network,” in Proceedings of the 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI '08), vol. 2, pp. 482–486, November 2008. View at Publisher · View at Google Scholar · View at Scopus
  4. J. Dauwels, E. Eskandar, and S. Cash, “Localization of seizure onset area from intracranial non-seizure EEG by exploiting locally enhanced synchrony,” in Proceedings of the 31st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC '09), pp. 2180–2183, September 2009. View at Publisher · View at Google Scholar · View at Scopus
  5. A. B. Gardner, A. M. Krieger, G. Vachtsevanos, and B. Litt, “One-class novelty detection for seizure analysis from intracranial EEG,” Journal of Machine Learning Research, vol. 7, pp. 1025–1044, 2006. View at Google Scholar · View at Scopus
  6. C. W. Ko and H. W. Chung, “Automatic spike detection via an artificial neural network using raw EEG data: effects of data preparation and implications in the limitations of online recognition,” Clinical Neurophysiology, vol. 111, no. 3, pp. 477–481, 2000. View at Publisher · View at Google Scholar · View at Scopus
  7. V. Srinivasan, C. Eswaran, and N. Sriraam, “Approximate entropy-based epileptic EEG detection using artificial neural networks,” IEEE Transactions on Information Technology in Biomedicine, vol. 11, no. 3, pp. 288–295, 2007. View at Publisher · View at Google Scholar · View at Scopus
  8. F. S. Bao, J. M. Gao, J. Hu, D. Y. C. Lie, Y. Zhang, and K. J. Oommen, “Automated epilepsy diagnosis using interictal scalp EEG,” in Proceedings of the 31st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC '09), pp. 6603–6607, September 2009. View at Publisher · View at Google Scholar · View at Scopus
  9. K. Lehnertz, F. Mormann, T. Kreuz et al., “Seizure prediction by nonlinear EEG analysis,” IEEE Engineering in Medicine and Biology Magazine, vol. 22, no. 1, pp. 57–63, 2003. View at Publisher · View at Google Scholar · View at Scopus
  10. E. O'Sullivan-Greene, I. Mareels, D. Freestone, L. Kulhmann, and A. Burkitt, “A paradigm for epileptic seizure prediction using a coupled oscillator model of the brain,” in Proceedings of the 31st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC '09), pp. 6428–6431, September 2009. View at Publisher · View at Google Scholar · View at Scopus
  11. F. S. Bao, Y.-L. Li, J.-M. Gao, and J. Hu, “Performance of dynamic features in classifying scalp epileptic interictal and normal EEG,” in Proceedings of 32nd International Conference of IEEE Engineering in Medicine and Biology Society (EMBC '10), 2010.
  12. R. Q. Quiroga, S. Blanco, O. A. Rosso, H. Garcia, and A. Rabinowicz, “Searching for hidden information with gabor transform in generalized tonic-clonic seizures,” Electroencephalography and Clinical Neurophysiology, vol. 103, no. 4, pp. 434–439, 1997. View at Publisher · View at Google Scholar · View at Scopus
  13. A. Petrosian, “Kolmogorov complexity of finite sequences and recognition of different preictal EEG patterns,” in Proceedings of the 8th IEEE Symposium on Computer-Based Medical Systems, pp. 212–217, June 1995. View at Scopus
  14. T. Higuchi, “Approach to an irregular time series on the basis of the fractal theory,” Physica D, vol. 31, no. 2, pp. 277–283, 1988. View at Google Scholar · View at Scopus
  15. B. Hjorth, “EEG analysis based on time domain properties,” Electroencephalography and Clinical Neurophysiology, vol. 29, no. 3, pp. 306–310, 1970. View at Google Scholar · View at Scopus
  16. T. Inouye, K. Shinosaki, H. Sakamoto et al., “Quantification of EEG irregularity by use of the entropy of the power spectrum,” Electroencephalography and Clinical Neurophysiology, vol. 79, no. 3, pp. 204–210, 1991. View at Google Scholar · View at Scopus
  17. S. J. Roberts, W. Penny, and I. Rezek, “Temporal and spatial complexity measures for electroencephalogram based brain-computer interfacing,” Medical and Biological Engineering and Computing, vol. 37, no. 1, pp. 93–98, 1999. View at Google Scholar · View at Scopus
  18. C. J. James and D. Lowe, “Extracting multisource brain activity from a single electromagnetic channel,” Artificial Intelligence in Medicine, vol. 28, no. 1, pp. 89–104, 2003. View at Publisher · View at Google Scholar · View at Scopus
  19. S. M. Pincus, I. M. Gladstone, and R. A. Ehrenkranz, “A regularity statistic for medical data analysis,” Journal of Clinical Monitoring and Computing, vol. 7, no. 4, pp. 335–345, 1991. View at Publisher · View at Google Scholar · View at Scopus
  20. C.-K. Peng, S. Havlin, H. E. Stanley, and A. L. Goldberger, “Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time series,” Chaos, vol. 5, no. 1, pp. 82–87, 1995. View at Google Scholar
  21. T. Balli and R. Palaniappan, “A combined linear & nonlinear approach for classification of epileptic EEG signals,” in Proceedings of the 4th International IEEE/EMBS Conference on Neural Engineering (NER '09), pp. 714–717, May 2009. View at Publisher · View at Google Scholar · View at Scopus
  22. R. G. Andrzejak, K. Lehnertz, F. Mormann, C. Rieke, P. David, and C. E. Elger, “Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state,” Physical Review E, vol. 64, no. 6, Article ID 061907, 8 pages, 2001. View at Google Scholar · View at Scopus