Abstract

Neuromarketing is a modern marketing research technique whereby consumers’ behavior is analyzed using neuroscientific approaches. In this work, an EEG database of consumers’ responses to image advertisements was created, processed, and studied with the goal of building predictive models that can classify the consumers’ preference based on their EEG data. Several types of analysis were performed using three classifier algorithms, namely, SVM, KNN, and NN pattern recognition. The maximum accuracy and sensitivity values are reported to be 75.7% and 95.8%, respectively, for the female subjects and the KNN classifier. In addition, the frontal region electrodes yielded the best selective channel performance. Finally, conforming to the obtained results, the KNN classifier is deemed best for preference classification problems. The newly created dataset and the results derived from it will help research communities conduct further studies in neuromarketing.

1. Introduction

Neuromarketing, a relatively new consumer behavior and marketing research technique, is used to understand the consumer mind, both conscious and subconscious, to find explicitly what triggers the “buy” button in the mind. The visual and auditory aspects of a product or marketing advertisement that influence preference making are the key focus of neuromarketing research. Researchers are interested in finding out what regions of the brain are responsible for various stimuli, how they respond, and how conscious reasoning affects the decision-making process. Neuromarketing integrates neuroscience with marketing using non-invasive modern technologies. This is done purely through the ethical use of neuroscientific approaches, where consent forms that address the materials and their usage are signed by participants to avoid any legal issues. A common tool used in neuromarketing is the electroencephalogram (EEG) for raw brain signal acquisition. Digital signal processing techniques (artifact removal, filtering, data labeling, etc.) are applied to the signal to remove irrelevant or unwanted signal components and split the data into meaningful sections or data points. Features of statistical significance are generated using the data points to train and test the classification machine learning model. The model is then used to predict consumer preferences to successfully design and produce products that sell more.

With time, neuromarketing has been adopted into several marketing campaigns of tech giants (Google, Amazon, PayPal, etc.), automobile industries (Hyundai, etc.), food commercials, etc. This technique is expected to grow more accurate in terms of understanding or classifying consumer preference (on visual-auditory aspects or merchandise pricing, packaging, and advertising) and is also expected to lead to new heights and applications due to its enormous potential. An example of such an application would be websites where neuromarketing is used to understand and make the site more user friendly as well as visually captivating.

Figure 1 visualizes how the incorporation of research with marketing paved the way for neuromarketing. Research into pupil dilation [1, 2] while watching advertisements began in the 1970s to address consumer behavior. In the 1980s, a series of studies [35] were conducted at the University of Wisconsin-Madison that investigated what patterns of EEG signals emerged while watching commercials. An article in Forbes highlighted several studies [68] in September of 2004. This article received significant media attention, which caused the scientific community to become familiar with such concepts, inciting inspiration for future research. In 2004, a rather interesting and famous research study, the “Coke vs. Pepsi” study [9], was conducted. Here, the authors found that branding played an enormous part in determining consumer preference for soft drinks and found that the absence of branding decreased the lure of the soft drinks.

Companies evaluate pictures, words, sound effects, and music to produce branding campaigns that are more appealing to clients [10]. In [11], the authors suggested a fast Fourier transform-based model to identify the most favored automobile brand in 4 criteria. For this purpose, EEG signal of 12 participants was collected while they surveyed the advertisement videos. The preprocessing stage consisted of two filters (Butterworth and Laplacian), and a total of three statistical features from the alpha rhythm were extracted. These were the power spectral density, spectral energy, and spectral centroid, respectively. Finally, the subject intention for a brand was computed using K-nearest neighbor (KNN) and probabilistic neural network (PNN) classifiers where a maximum accuracy of 96.62% was recorded using the PSD feature with the PNN classifier.

Furthermore, in [12], the authors analyzed the EEG data of 12 males and 12 females using a classifier (2-class preference). They obtained high classifier accuracies of 84.82% and 89.36% using KNN and SVM, respectively. However, neural networks such as DNN had a maximum accuracy of 79.76%. In addition, Oon et al. [13] used an eegoSports device to collect EEG data from a group of five males and five females between ages 19 and 23. The subjects were each presented with a video comprised of four categories of merchandise, with five advertisement videos in each category, and asked for their preference. After using KNN classifier to analyze the 2-class data, the authors produced an 80% classifier accuracy using alpha band feature, 76.18% using beta band, and 72.39% using both alpha and beta as features.

The main motivation behind pursuing this research on neuromarketing is that it creates an efficient model for the advertisers and marketers, removing their dependency to rely on consumers to both accurately and willingly report their emotions on a product. In addition, the method can closely tie the physiological reactions to specific parts of the advertisement and provide insight into the automatic responses that take place at the subconscious level. These benefits shall allow the marketing industry to potentially save a significant amount of capital as well as reduce its environmental footprint that goes into making the ad.

The absence of sufficient research contents due to its age and the limited number of datasets available makes this field opportune for contributions. In addition, the results obtained from this newly created dataset can prove fruitful in contributing knowledge to this field. Furthermore, a study of the accuracies and sensitivities obtained for the several statistical features of EEG signals can help suggest the most relevant one for predicting user preference.

With the goal of building predictive models that can identify the consumers’ preference only based on their EEG data, the objectives of this research were set as follows:(i)Apply different classifier algorithms on selected dataset.(ii)Analyze subject-wise and overall performance. Analyze selective channel performance.(iii)Analyze predictive models for both genders.(iv)Suggest the best classifier algorithm among the ones worked with for predicting user preference.

2. Materials and Methods

The project commenced with the selection of a database for this project. For this purpose, the dataset from a like/dislike experiment for neuromarketing was acquired, with all due permissions, from Agency Lab (currently CCDS – Center for Computational and Data Sciences) under the Department of Computer Science and Engineering, Independent University, Bangladesh. A flowchart depicting the entire process is illustrated in Figure2. The collected (raw) EEG data for each subject were then manually labeled to obtain the individual samples, to which the usual BCI steps were applied to achieve the outcomes of this project. These steps included signal preprocessing, artifact removal, feature extraction and selection, and finally classification and performance evaluation. Figure 3 summarizes the experimental procedure described below.

2.1. Dataset
2.1.1. Data Acquisition

A total of 49 subjects (24 males and 25 females) between the ages of 19 and 28 (mean age = 21.15), with normal or corrected vision, volunteered for this experiment. They were each shown a total of 30 image advertisements collected from the Internet. The glimpse of advertisement as image stimuli is illustrated in Figure 4. These advertisements were equally divided into three sets as follows:(i)Products suited towards the female gender (such as female shoes).(ii)Products suited towards the male gender (such as male shoes).(iii)Products that are thought of being gender independent (such as food).

The subjects were given the option to “like” the current image or skip to the next image at their convenience. Option “skip” is considered as “not liked” and thus “dislike.” The experiment took on average a total of 6 minutes per subject with all the subjects being right-handed. Figure 2 illustrates how the EEG datasets were created.

A 14-channel EMOTIV EPOC+ wireless headset with sampling frequency of 128 Hz was used to record the raw EEE signals following the international standard 10–20 system of electrode montage.

The experiment data for each subject were obtained in 3 forms: (a) a CSV file that contained the raw EEG signals from all 14 channels, (b) a desktop recording that shows the time for which the images were shown and the user’s preference, and (c) a webcam video of the subject interacting with the stimulus (images).

2.1.2. Dataset Labeling

Each column of the EEG file (CSV), starting in row two, represents the data collected over time. Only a few were significant to us and are discussed as follows. The first column, “COUNTER,” represented the packet counter and was used as a time base. The second column stood for the “INTERPOLATED” flag, which showed if a packet was dropped (value = 1). FLAG = 0 meant the sample was good. The 14 consecutive columns labeled AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, and AF4 represented the EEG channels associated with the headset, showing the voltage fluctuations over time.

The desktop recordings were used to identify the starting and ending times of a single image. These timestamps were then used to calculate the corresponding start and end row numbers in the EEG file (using the sampling frequency). Finally, the data sample was imported into the MATLAB workspace and stored alongside the user’s preference for that image. A block diagram illustrating this labeling process for a particular subject is shown in Figure 5. For simplicity, the first two and the last images are shown, while the intermediate ones have been curtailed.

This process was repeated for every subject for labeling all raw EEG signals to get the total collection of approximately 1400 (neglecting 70 missing responses) feedback responses as likes/dislikes and their corresponding EEG data segments.

The uniform and non-overlapping 2 seconds of data segment for each feedback response (like/dislike) are taken into account for preference classification.

2.2. Signal Preprocessing

The acquired EEG signal is usually contaminated with noise and artifacts generated from different sources. The objective of signal preprocessing is to eliminate (or at least attenuate) these noises and artifacts from the signal. A simple and most used technique to remove both noise and artifacts is filtering. These filters are designed in such a way as to not introduce any change or distortion to the signal [14]. The Butterworth design in particular offers a maximally flat magnitude response along with a slight delayed phase response. As they have a more linear phase response for the passband when compared to Bessel and elliptic filters, they should offer better delay characteristics along with a smaller overshoot. To obtain a sharp transition in the frequency domain and avoid the generation of excess artifacts due to passband ripples, an IIR filter should be used. The authors in [15] have shown that IIR filters surpass FIR filters with a flat response characteristic as well as smaller response time.

In this project, the DC offset of the raw EEG signal was removed by subtracting the mean value of each signal from the signal itself. A high-pass 6th order filter with a cutoff frequency of 0.5 Hz (produces best results after several trials) was then used to remove very low frequency components (noise or artifacts such as breathing noise). The EEG signals were downsampled to 128 Hz, and hence following the Nyquist theorem, the maximum frequency component present in the signal is 64 Hz. As a result, an additional low-pass filter was not required to remove high-frequency noises. Finally, a notch filter (very narrow band-stop filter) with a null frequency of 50 Hz was used to eliminate or at least attenuate the 50 Hz power line interference noise. The magnitude and phase responses, in MATLAB, of the high-pass filter (0.5 Hz) and the notch filter (49 Hz–51 Hz) are shown in Figures 6 and 7, respectively.

Artifacts are unwanted electrical signals with origins other than the brain itself. These unwanted signal components are captured by the EEG device along with the brain signal itself. The two chief categories of artifact are physiological/biological or non-physiological (referred to as “noise”) artifacts. Physiological artifacts take various forms including cardiac pulse, respiratory, eye blink (as well as lateral eye movement), and muscle movement artifacts [16]. In this experiment, a high-pass filter was used to remove any sub-hertz frequency artifacts such as the breathing noise. In addition, a visual inspection of the raw signals for each channel was performed to estimate the contamination levels and decide whether to use the channel.

2.3. Feature Extraction

A total of 27 statistical common features, based on literature reviews, were extracted. The maximum accuracy across all subjects was obtained by selecting 18 of the 27 features extracted. A selected few, that produced good results, are discussed as follows. These include a general description as well as their MATLAB implementations or interpretations.

2.3.1. Skewness

It is a measure of the deviation of a distribution, assuming a unimodal probability distribution of a random variable (real number), from its mean or symmetrical bell curve of the normal distribution. When the flag is set to 1 on MATLAB, the skewness is biased, and the following equation applies:

2.3.2. Kurtosis

It is a measure of how susceptible a probability distribution is to its outliers; in other words, kurtosis determines the heaviness of the distribution tails. If the flag is configured to be 1 on MATLAB, then the kurtosis is biased, giving the following equation:

2.3.3. Hjorth Parameters

Hjorth Activity, Hjorth Mobility, and Hjorth Complexity are all normalized slope descriptors (NSDs) used for analyzing non-stationary signals of irregular textures.

(1) Hjorth Activity. It signifies a signal’s power density. In frequency domain, it represents the power spectrum surface, according to Parseval’s theorem. The Hjorth Activity is usually estimated by the following expression:

(2) Hjorth Mobility. It evaluates the approximate mean frequency. The Hjorth Mobility is usually estimated by the following expression:

(3) Hjorth Complexity. It evaluates the approximate frequency spectrum of the signal, which calculates the correlation between a sine wave and the signal. The Hjorth Complexity is usually estimated by the following expression:

2.3.4. Wentropy (y, “norm,” 1.1)

It first converts the signal into wavelet packets indexed by position, scale (as in wavelet packet decomposition), and frequency. With power set to 1.1, the norm entropy is calculated as follows:where s is the signal, (si)i refers to the orthonormal coefficients of s, and 1 ≤ p. Entropy is the average level of information or uncertainty that a random variable contains inherently in its possible outcomes.

2.4. Feature Selection

To select relevant features for classification, the obtained feature data were first sorted for preference (like/dislike) and then several statistical graphs such as box and whisker plot, scatter plot, and area plot for were visually inspected for each feature. Skewness and entropy features were the best differentiators among 27 extracted statistical features. Figures 8 and 9 demonstrate the distinguishing characteristics of the skewness and entropy features for preferences. For other features, similar comparison plots have been analyzed and the most distinguishable features have been selected such as skewness, entropy, kurtosis, Hjorth parameters, and band powers.

2.5. Classifiers and Classification

Since this research worked with a binary class problem, like and dislike, several classifier algorithms, commonly used in binary classifications, were investigated, and among them, 3 were chosen as follows: (i) support vector machine (SVM) creates a line or a hyperplane (decision boundary) that differentiates the dataset into two distinct classes; (ii) K-nearest neighbor (KNN) creates a distance function based on the K-value and separates data into two classes based on the distance; and (iii) neural network (Pattern Recognition App of MATLAB) applies random weight to nodes between each layer, and an activation function is designed to provide output that separates the data into two classes.

The feature data are split into training data, validation data, and testing data in a ratio that relies on the availability of data and the application once it has been preprocessed and is ready for training. This is done to guarantee that the model does not “overfit” or “underfit” excessively and performs equally well in the real world. This was done on classification-based MATLAB apps used, where the training-to-testing ratio was decided based on the several trials performed to determine the best ratio.

The training data are fed into the model, which are then used to train numerous classification models in the Classification Learner App in MATLAB. Their performances are compared using the validation set, which is then used to optimize the model’s hyperparameters, retrain and re-evaluate the models, and consequently choose the superior model parameters. Finally, the model is employed to predict on the test dataset, yielding a final test score that more or less properly describes the model’s performance on the provided dataset.

3. Results

3.1. Performance Evaluation

Ideally, a precision of 1 and a sensitivity of 1 are desired in a classifier model, implying 100% accuracy, which is not always the case for a machine learning model. So, we should aim for a high precision with a much greater sensitivity value. For this project, a greater significance has been attached to sensitivity before accuracy because in marketing research, the identification of a like is more important than the identification of a dislike. In addition, since the number of liked and disliked sample data points was slightly imbalanced, accuracy carries less weight when compared to sensitivity. The performance parameters are calculated using Figure 10 as follows.

The accuracy of classifier is the percentage of total samples correctly classified by the classifier, while the classification error is obtained by subtracting the accuracy from 1.

The classifier sensitivity (true positive rate) calculates the percentage of positives that are accurately classified.

Specificity (true negative rate) calculates the percentage of correctly identified negatives.

Precision (positive predictive value) calculates the percentage of correctly identified positives.

Each table that follows has one row italicized to highlight the best results.

3.2. Individual Subject Accuracies

Out of the 49 subjects in this experiment, only 35 subjects were finally selected on in this work, due to several reasons: (i) noisy or blank channels (showing no voltage fluctuations), (ii) software or hardware malfunctions, (iii) human errors as detected in the recordings (webcam and screen recording), and (iv) imbalanced dataset. The data on these subjects including their gender, age, number of likes/dislikes, and the corresponding accuracies are summarized in Table 1.

Only seven subjects had an accuracy rate that exceeds 80%. However, the high accuracy can be deceiving due to a lack of balance between “like” data points and “dislike” data points. For example, subject 25 had a classifier accuracy of 92.6%. However, the subject dataset contained 25 “like” data points, whereas, it only had 4 “dislike” data points. This indicates that the classifier lacked sufficient information due to the imbalanced dataset and most likely produced a biased classification result.

3.3. Gender-Wise Classification

Comparing the gender-wise performance as shown in Table 2, it is seen that the performance measures for the female subjects were comparably higher than those for the males. In addition, the SVM classifier had the highest accuracy and sensitivity for males, at 70.7% and 83.3%, respectively, whereas KNN had the highest accuracy and sensitivity for females at 75.7% and 95.8%, respectively.

3.4. Classification of Selective Channel Configurations

Table 3 compares the selective channel performance across all subjects. For the selected channel analysis, the highest sensitivities were seen for the frontal channels, but the highest accuracy was obtained for the frontal and occipital region for both classifiers. These results agree with the previously suggested theory that frontal regional activity is the most prominent activity for the decision-making process.

3.5. Overall Classification Performance across All Subjects

Table 4 compares the overall (male and female combined, as well as all 14 channels) performance measures obtained across all subjects for all classifiers. Here, it was seen that all classifiers performed relatively well and around about the same region; however, KNN classifier had the highest accuracy and sensitivity at 74.6% and 92.57%, respectively. The optimizable SVM and KNN classifiers, on MATLAB Classification Learner App, were configured for 150 iterations, and the optimal hyperparameter was chosen to produce the best results. In contrast, the neural network pattern recognition app was configured with 10 hidden layers and 2 output layers that had been retrained a minimum of 30 times to produce the best possible results as shown below.

3.6. Confusion Matrix and Minimum Classification Error Plot for Overall Classification Performance across All Subjects

For overall performance across all subjects, the confusion matrixes are shown below for each classifier.

3.6.1. SVM Classifier

The performance matrix is shown in Figure 11 where 25% holdout validation is shown and the remaining 75% feature matrix was used to train the optimizable SVM model. Totally, 181 preferences were correctly identified out of 244 samples while validating the model, and remarkable sensitivity was found, though the classification accuracy was around 74% which is demonstrated in Table 4.

3.6.2. KNN Classifier

Similarly, the performance matrix is shown for optimizable KNN classifier in Figure 12 where similar 25% holdout method for validation was used for calculating performance parameters of the model. KNN identified one more “like” than SVM, and even the sensitivity was bit higher, but it could not identify “dislike” prominently.

3.6.3. Neural Net Pattern Recognition App

In Figure 13, the performance matrix is shown for pattern net recognition-based neural network where 15% feature matrix is used for validation, 15% is used for testing, and the remaining 70% is used for training set.

4. Discussion

The performance measures assessed for each classifier were accuracy, sensitivity, specificity, and precision. As explained before, accuracy is the percentage of total samples that are correctly predicted. The sensitivity measure refers to the percentage of positives that are correctly predicted, whereas specificity is the percentage of negatives that are correctly predicted. For this research, a greater significance has been attached to sensitivity and then accuracy because in marketing research, the identification of a like is more important than a dislike.

The accuracies obtained with SVM, KNN, and neural network pattern recognition classifier algorithm model are around 71% to 75%. These accuracies roughly translate to seven correct predictions out of every ten predictions. When compared with the large dataset (of subjects with diverse backgrounds), the classifier performed rather well and can be considered reliable. However, sensitivity and precision are two significant parameters that should be taken into consideration for performance evaluation as explained earlier. As a result, increased precision with a higher sensitivity value for the model is also extremely desirable. The sensitivity and precision of each classifier are as follows: 89.19% and 73.74%, 92.57% and 72.87%, and 85.98% and 71.69%. Each sensitivity parameter value is generally very high and makes up for the precision parameter that lacks behind. These performance parameters were obtained after 150 iterations. The pattern recognition neural network took a great deal of time to train and retrain several times to produce the given accuracy.

A performance comparison between few selected published works and our results is tabulated in Table 5. Though there have been better performances found in [17, 18], our model is quite convenient in performance for such own recorded big database in consideration compared to their databases. As the dataset is different, comparison between our model and deep learning based models [19, 20] is quite inconvenient. Still, few comparisons are shown in Table 5.

The innovation of this study is that the database is original, large, and self-recorded and EMOTIV 14-channel wireless device is used where the 128 Hz downsampled EEG signal is used for neuromarketing application. Our result is quite prominent for such large dataset with lowest resolution which establishes that this type of experiment can be useful for neuromarketing research and application. Even gender-based comparisons were shown that imply whether any product discrimination based on gender affects neuromarketing or not. Moreover, the database itself is a contribution of this study and getting prominent results for such large database with lowest quality of EEG signal is another major contribution of this work. Finally, another contribution of this work is the analysis of brain region based channel selection and its performance comparison for data recorded with a low number of channels (i.e., 14 here) compared to 32, 64, or even more channels .

5. Conclusion

This article presents a detailed study on the use of EEG signals for preference classification in neuromarketing. The raw EEG signal extracted from the experiment was labeled manually, where the individual images were tagged with their corresponding EEG signal durations. Several faulty datasets were identified and removed due to software glitch or hardware errors. A filter was applied to remove the DC offsets and artifacts, and from these, 18 out of 27 statistical features were used on the machine learning algorithms, and their corresponding performance parameters were analyzed.

From the three classifier algorithms, SVM had slightly better selective channel performance while KNN had a slightly better overall and gender-wise performance. Again, very high sensitivity values were obtained in nearly all classifications performed, where the minimum was around 86% and the maximum was around 95.8%. In addition, the accuracies indicate that both SVM (74.2%) and KNN (74.6%) are good classifiers for binary classifications of highly imbalanced dataset, with the latter being the superior one. Neural net pattern recognition performed roughly the same (71.0%).

The subject-wise results showed great variations where the highest individual accuracy was 92.9% and the lowest was 57%. The remaining subject varied between 64% and 82%.

The gender-wise performance showed that the performance measures for the female subjects were comparably higher than those for the males. SVM had the highest accuracy and sensitivity for males, at 70.7 and 83.3%, respectively, whereas KNN had the highest accuracy and sensitivity for females at 75.7% and 95.8%, respectively.

Finally, the results obtained from selective channel analysis agree with the previously suggested theory that frontal regional activity is the most prominent activity for the decision-making process.

The large imbalance in the number of like and dislike samples and the inconsistent length of individual samples in the dataset were the primary limitations of this work. The huge discrepancies in data length adversely affected the statistical features extracted and limited the classifier performance.

For future works, the use of dataset with consistent control parameters such as the data length and age group can be used to enhance reliability and consistency. In addition, advanced signal processing techniques such as wavelet transform and additional sets of different features can be used for classification. The models of deep learning algorithms are unexplored as this is the first attempt on database and only machine learning algorithms are intentionally explored in this research project. So, in the future work, deep learning algorithms such as convolutional neural network (CNN) and other popular models as well as further channel selection analysis can be used to investigate their effect on the classifier performance.

Data Availability

The recorded EEG data are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Injamamul Haque Sourov and Faiyaz Alvi Ahmed contributed equally to this study.

Acknowledgments

This research was sponsored by IUB Sponsored Research Grant (#2021-SETS-07) and supported by Biomedical Instrumentation and Signal Processing Lab (BISPL) of Department of EEE, IUB, and Center for Computational and Data Sciences of Department of CSE, IUB.