Behavioural Neurology

Volume 2018, Article ID 4638903, 15 pages

https://doi.org/10.1155/2018/4638903

## Mining EEG with SVM for Understanding Cognitive Underpinnings of Math Problem Solving Strategies

Correspondence should be addressed to Sebastián Maldonado; lc.sednau@odanodlams

Received 4 April 2017; Accepted 24 September 2017; Published 11 January 2018

Academic Editor: Guido Rubboli

Copyright © 2018 Paul Bosch et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

We have developed a new methodology for examining and extracting patterns from brain electric activity by using data mining and machine learning techniques. Data was collected from experiments focused on the study of cognitive processes that might evoke different specific strategies in the resolution of math problems. A binary classification problem was constructed using correlations and phase synchronization between different electroencephalographic channels as characteristics and, as labels or classes, the math performances of individuals participating in specially designed experiments. The proposed methodology is based on using well-established procedures of feature selection, which were used to determine a suitable brain functional network size related to math problem solving strategies and also to discover the most relevant links in this network without including noisy connections or excluding significant connections.

#### 1. Introduction

Recently, there has been an outburst in the number of investigations related to the applications of data mining tools to neuroscience [1, 2]. Data mining in this domain is usually related, on one hand, to processing/analyzing three-dimensional images from different medical imaging modalities that capture structural (e.g., MRI, CT, and histology) and functional/physiological (e.g., PET, fMRI, and SPECT) information about the human brain [3, 4]. On the other hand, some tools and approaches have been specifically tailored to grasp the complexity of brain electric activity through the analysis of electroencephalographic (EEG) signals [5]. However, the vast majority of these studies commonly seek to discover patterns in electrophysiological signals and images correlated with the diagnosis, prognosis, and evolution of a particular pathology or brain disorder and with the image analysis of normal/disease resting state fMRI [6, 7]. Comparatively, only very few works in this area use machine learning techniques for studying normal brain cognitive high level functions; probably because in these cases, the interpretation of the effects of single brain regions or connections between these regions on the separation of *pattern classes* is more complicated, given that discriminative brain pattern is a description of the cumulative contributions of many features that contribute to cognitive underpinning of brain high-level functions.

In this work, we use machine learning techniques to discover patterns of synchrony in functional brain networks, constructed from the EEG registers of a group of healthy individuals while they were solving specially designed math problems. The problems were devised specifically to detect and measure analytic processing. An intuitive resolution could lead to a quick and simple but incorrect response that should be overridden analytically. This study aims to correlate types of responses (correct or incorrect) with specific patterns of neural synchronization. The primary finding is that classification of these patterns using data mining tools on datasets from complex cognitive processes related to math performance is achievable. For each pair of EEG channels, corresponding to time windows associated with correct and incorrect answers given by the participants, correlations and phase synchrony were calculated. With these measures as entries, we construct connectivity (synchronization) networks as proxies of functional brain networks. A novel feature selection methodology that identifies the most relevant connections in these networks is proposed by using a nonlinear SVM-based classifier. This methodology allows us to determine not only a suitable network size but also the most relevant connections in the network, reducing the complexity and, therefore, facilitating the interpretation of mined patterns.

##### 1.1. Synchronization/Correlation Networks of Normal Brain Cognitive High-Level Functions Used in Resolution of Math Problems

Lately, investigations have shifted from the study of local activation of large groups of neurons to the analysis of integration patterns among these groups. It is thought that the physiological bases of information processing and mental representation are provided by functional networks [8]. In fact, there is a great deal of current interest in the recent development of different techniques to extract large-scale functional and anatomical brain connectivity networks based on methods for creating correlation networks [9–11].

Researchers have developed a widely used method for creating correlation networks by using neural synchronization. Neural synchronization is a fundamental process in cortical computation which is believed to play an important role in information processing in the brain at both cellular and macroscopic levels [12, 13]. Brain oscillations that are ubiquitous phenomena in all brain areas become synchronized and consequently allow an implementation of the whole range of brain functions [12]. In particular, in our work, we use neural synchronization to measure the integrated activity of the functional brain network responsible for different math performances. Specifically, we use linear correlation and phase synchronization as measures for neural synchronization.

The correlation coefficient estimates linear coupling among signals of EEG channels, and its values are distributed over the unit interval. But the assumption that only linear interdependencies are relevant is actually not correct. Strictly speaking, linear correlation analysis based on Pearson’s correlation coefficient and its derivatives can potentially miss important features of any dynamic system, particularly when we study brain functional network integration dynamics. Thus, in addition to linear correlation, we use phase synchronization between distant brain oscillating foci [14–18].

Phase synchrony in EEG channels assesses the stability of differences between phases of EEG signals at equivalent frequencies taken simultaneously by different electrodes. More simply stated, it is a measure of how the relative phase is distributed over the unit circle. If the two signals are phase synchronized, the relative phase will occupy a small portion of the circle, and the mean phase coherence is high. Phase synchronization has previously been considered to be a very good indicator of the functional coupling of neural activity in distant brain areas [19, 20]. To our knowledge, to date, phase synchrony in EEG channels has not been used to study brain networks involved in mathematical activities. So, using it creates a relevant contribution to understanding the collaborative and integrative nature of neural functioning in mathematics.

The large-scale functional integration of different brain zones is a relevant aspect of understanding the neural mechanisms responsible for the use of diverse problem-solving strategies in mathematics. The cognitive underpinnings of several mathematical activities have previously been related to a widely distributed brain network that includes parietal, temporal, and frontal structures as their main nodes [21–23]. Our research uses electroencephalographic (EEG) analysis [18] for the study of the whole-brain connectivity network and shows how mathematical cognition depends upon the integration of activities from distributed brain regions.

Some researches on EEG analysis have shown that specific aspects of mathematical reasoning could be related to different features of electric activity in some frequency bands (see, e.g., [24, 25]). In [25], for example, it is shown that incorrect performance in simple mathematical tasks is preceded by higher delta activity (signal frequencies < 4 Hz) in the lateral and medial areas of the right prefrontal cortex and by higher theta activity (4–8 Hz) bilaterally in the medial frontal zones. These slow wave patterns precede the subject’s erroneous performance and show inhibited activity of the error-monitoring areas during erroneous mathematical calculations (i.e., these areas were simply not recruited). Therefore, a failure in the functional integration of these zones during problem resolution would be responsible for the subject’s erroneous mathematical performance. On the other hand, correct answers were preceded by alpha activity (8–12 Hz) in the right posterior parietal area, a zone previously linked to mathematics. These early findings suggest that the size and integration of the functional network of different brain zones entailed in the resolution of problems are a relevant issue for understanding the neural mechanisms underlying math performance.

##### 1.2. Graph Theoretical (Network) Approaches and SVM Working Together

Network theory is helpful in characterizing the interdependencies of various brain zones. However, graph theoretical (network) approaches in the study of brain functional networks suffer from some important methodological difficulties [26, 27]. For example, graph measures are strongly dependent on the network size (number of nodes), network density (percentage of links present), and degree (number of connections per vertex). This makes comparing results from different studies, which generally use distinct criteria to build functional networks, very difficult. Indeed, to construct unweighed networks, one has to apply a threshold on the connectivity values of the original weighted network. This results in scaling of the network properties as a function of the threshold [26]. The threshold can be chosen in a variety of ways, for example, based on an arbitrary choice, or using statistical criteria of connectivity strength, based on the average degree, or based on the density of the network. Fixing a standard number of vertices and the average degree could solve these size effects but could also introduce spurious connections or ignore strong connections in the network [27].

Recently, the use of a minimum spanning tree (a subnetwork of the original weighted network that connects all vertices in the network without forming loops and has the minimum total weight of all possible spanning trees), see [28], has been proposed to solve many of these methodological difficulties.

##### 1.3. Top-Down Approach and Main Motivations for This Study

Students often arrive at universities without a well-formed background in abstract reasoning and with limited experience in the application of mathematical strategies. They lack proper understanding of some mathematical topics and often use inappropriate associations of different facts while trying to solve mathematical problems. These associations are fast internal reactions to external stimuli and appear to be related to the way in which the mind processes information.

Many authors in educational research have pointed out the persistence of student errors and misconceptions with respect to specific topics and tasks. For example, in [29], the authors observed that students react in a similar way to a wide variety of conceptually nonrelated problems that share some external common features. This fact led them to suggest that many responses described in the literature as alternative conceptions (misconceptions) could be better explained as evolving from a few common intuitive rules such as *More of A—More of B*, *Same A—Same B*, *Everything can be divided*, and *Over-generalized linearity*.

The present work applies a dual-process model of cognitive processing to these kinds of problems, testing the hypothesis that relative amounts of intuitive/analytic processing by the brain promote different strategies in the resolution of mathematical problems, leading to accurate or faulty solutions.

This work aims to solve these methodological difficulties by using some advanced tools from data mining. Specifically, the main methodological contribution is twofold: First, we extend the -SOCP method [30], originally developed for linear binary classification, to nonlinear modeling thanks to the use of kernel functions. This model proposes a robust setting based on second-order cone programming, in which the traditional maximum margin approach for SVM is adapted by replacing the reduced convex hulls by ellipsoids [31], leading to a potentially superior classification performance [30, 31]. Additionally, we propose a novel feature selection methodology that identifies the most relevant connections in the network of interest while constructing the classifier using the -SOCP method [30].

The rest of this article is structured as follows: Section 2 presents the methodology for capturing the data used in the modelling process. Section 3 provides a brief description of developments for feature selection and SVM, in which our -SOCP method and the novel-embedded feature selection strategy is highlighted. Section 4 describes our results using neural synchronization datasets collected for this study. A summary of this paper can be found in Section 5, where we provide the main conclusions of this study and address future developments.

#### 2. Materials and Methods: Cognitive Neuroscience

##### 2.1. The Dual Process Theory

As our theoretical framework, we use the dual process theory (DPT) [32, 33]. According to DPT, our cognition and behavior operate in parallel with two quite different modes, called system 1 (S1) and system 2 (S2), roughly corresponding to our commonly held notions of intuitive and analytical thinking. The S1 and S2 modes are activated by different parts of the brain and have different evolutionary origins (S2 being more recent evolutionary and, in fact, largely reflecting cultural evolution). Like perception, S1 processes are characterized by being fast, automatic, effortless, unconscious, and inflexible (hard to change or overcome). Unlike perception, S1 processes can be language-mediated and relate to events not in the here-and-now (i.e., events in faraway locations and in the past or future). In contrast, S2 processes are slow, conscious, effortful, and relatively flexible. The two systems differ mainly along the dimension of accessibility: how fast and how easily things come to mind. Although both systems can at times run in parallel, S2 often overrides the input of S1 when analytic tendencies are activated and cognitive resources are available. For example, it is known that in geometry-related math problems, students tend to handle attributes of the problems such as distance, size, and similarity that are automatically registered by S1 quickly and spontaneously. We used this fact to design tests with some salient stimuli in such a way that each alternative for answering the problem would clearly indicate whether the participant took an intuitive/wrong strategy or an analytic/correct one.

The use of S2, consciously accessed, analytical processes trigger global and large-scale patterns of integrated neural activity. This fact appears as a variation on the global amount of synchrony between different brain areas. A greater proportion of S2 processes will appear as a greater amount of global synchrony. On the other hand, typical math errors due to semiautomatic use of heuristics will appear neurally as a reduced coupling of central work space neurons. Central work space neurons are thought to be particularly dense in the parietal, prefrontal, and cingulate cortices [21].

##### 2.2. Test Designing Based on Cognitive Neuroscience

DPT enables understanding diverse phenomena because it predicts different judgments qualitatively depending on which reasoning system is used. DPT has been applied successfully to diverse domains and phenomena across a wide range of fields. While heuristic processing may render some manageable mathematics problems (by reducing the number of consciously driven operations), on some occasions, it can lead to errors and bias, reducing the effectiveness of a strategic plan of resolution. Available evidence and theory suggest that a converging suite of intuitive cognitive processes facilitates and supports some common rule-based flaw strategies in the resolution of math problems, which is a central aspect of deficient mathematical performance. In this way, stereotyped errors come from the semiautomatic and insufficiently evaluated application of highly repeated S1 system heuristics for solving problems. Under most circumstances, S1 procedures lead to correct answers (e.g., linearity is a common property of many, but not to all, mathematical operations) but in certain cases, it can lead to mistakes. To avoid these errors, the subjects must inhibit their semiautomatic responses to allow proper, conscious evaluation of the problem [34]. Some neuroscience researches have linked response inhibition to prefrontal activity, especially in its medial zones [35]; error monitoring in general (see [36], for detailed review) and mathematical error monitoring in particular [22] have been linked to the frontal lobes, mainly to their medial structures.

However, individual differences in the tendency to override initially flawed intuitions in reasoning analytically could be associated with different mathematical performances. In fact, elaborative processing must entail a deeper level of consciously controlled stimulus analysis. This processing is assumed to involve more effortful, analytical thought and is less likely to lead to errors and biases, although sometimes it may prove to be dysfunctional due to effects such as *paralysis by analysis*—the tendency to become overwhelmed by too much information processing.

Some attributes of the problem denominated in DPT as *natural assessments* could lead to wrong strategies and answers, because students could ignore other, less accessible, attributes of the problem, or some instructions that should be considered in the resolution.

Another possible source of errors is called *attribute substitution*. According to [32], when people try to solve a complex problem, they often substitute attributes. That is to say, an individual assesses a specific real attribute of the problem heuristically by means of another attribute, which comes to mind more easily. The real attribute is less accessible, and another, related attribute which is more available replaces the first one. This substitution is so fast that S2 monitoring functions cannot be activated. The individual does not notice that he/she is really answering another question.

The math tasks in our experiments were designed to highlight different problem resolution strategies. An intuitive approach, for example, will produce a quick and easy, yet incorrect, answer that must be analytically overridden to be correct. In every case, participants choosing different resolution strategies will at the same time choose different alternatives to answer the math problem. Appendix B presents three of the math problems for illustrative purposes. The complete list of the 20 math problems can be found as supplementary material (available here).

##### 2.3. Preprocessing the Dataset from EEG Recording

The raw data for the training and test subsets (see the next sections) were extracted from the EEGs of a group of engineering students that were recorded while each one of them was solving a set of 20 math problems. The relevant metadata for the various participants is presented in Appendix A. These EEGs (10–10 position convention) were registered in a semidark room with a low level of environmental noise while each student was sitting in a comfortable chair. The data were recorded with the 64-channel Geodesic Sensor Net (EGI, USA) at the sampling frequency of 1000 Hz.

Since the sensors in the outer ring of the net were excluded from the analysis, because of low-quality signals, only 61 sensors were used for computations. The data were previously filtered (FIR, band-pass of 1–100 Hz), rereferenced against the common average reference, and segmented into nonoverlapping 1-s epochs using NS3 software.

As preliminary work for cleaning the dataset, we separated the oscillatory EEG-evoked electric activity from the induced one [37]. To do this, the EEG-evoked activity for each subject and his/her specific math problem was measured and averaged. This evoked activity was then subtracted from the total EEG activity through tests, subjects, and electrodes. The resulting EEG subtraction signal was analyzed with a fast Fourier transform on mobile overlapping and longtime windows between 5 and 10 seconds, because we did not know a priori what the interesting cognitive events to measure would be.

The measurement for each subject-math problem was segmented into time intervals ranging from to . In , the math problem is presented, while at , the question mark appears. The value is considered to be the baseline of before the occurrence of the problem.

##### 2.4. Constructing the Correlation and Synchronization Matrices from the Raw Dataset

As will be shown in the following sections, a new method for feature extraction from EEG signals was developed by choosing elements of the correlation or synchronization matrices. The EEG time series recorded for each participant/math problem were used to construct the correlation and synchronization matrices of the functional brain networks with rows and columns representing sensors. These matrices contain information about (linear) interdependence and long-range synchronies between EEG channels. Both types of information would be used for classification purposes. Moreover, in the case of the synchronization matrix, we would also manage information about frequency bands.

The correlation coefficient is perhaps one of the most well-known measures for (linear) interdependence between two signals and : where is the length of the signals, and are the (sample) means of and , respectively, and and are the (sample) variances of and , respectively.

The correlation coefficient quantifies the linear correlation between and . If and are not linearly correlated, is close to zero; on the other hand, if both signals are identical, then .

Every correlation coefficient is a bivariate measure that serves as a coupling coefficient that links the electrode nodes and . With these coefficients as entries, we construct a connectivity matrix (adjacency matrix) **Corr**, representing a functional brain network. Thus, we have a connectivity matrix composed of undirected and weighted edges consistent with the correlation coefficients. The matrix is symmetric, so it has independent elements. Zeros are placed in diagonal elements.

In order to discover to what extent two sensor locations were synchronized, we also used the phase locking value (PLV) [18, 19]. Sample PLV is one of the most widely used measures of brain synchronization. It quantifies the phase relationship between two signals with high temporal resolution without making any statistical assumptions on the data.

Given two time series of signals and and a frequency of interest , the procedure computes a measure of phase locking between the components of and for each latency at frequency . This requires the extraction of the instantaneous phase of every signal at the target frequency. The phases are calculated by convolving each signal with a complex wavelet function:

that is, where represents the signal amplitude. Following [19], we take and we define in the same way as . Next, we can calculate the phase differences . The phase locking value is then defined at time , as the average value: for all time-bins and trial .

In our experiments, PLV measures were normalized relative to a baseline [38]. Specifically, this was done by using the baseline before the onset of the math problem. The normalized signal was obtained by subtracting the average activity of the baseline from the raw signal and then dividing by the standard deviation of the baseline in a frequency-by-frequency manner.

By construction, PLV will be zero if the phases are not synchronized at all and will be one when the phase difference is in perfect, constant synchronization. The key feature of PLV is that it is only sensitive to phases, irrespective of the amplitude of each signal.

From the EEG channels, we computed a symmetric synchronization matrix for each participant and for each math problem within a specific frequency band. Each element of the matrix corresponds to the PLV computed for the electrode pair and . The matrix is also symmetric, so it has independent elements and, as before, zeros are placed in diagonal elements.

Each matrix element of is the PLV computed for the corresponding pair of sensors. An illustrative example of the synchronization matrix is presented in Figure 1.