Abstract

Nowadays, smart devices as a part of daily life collect data about their users with the help of sensors placed on them. Sensor data are usually physical data but mobile applications collect more than physical data like device usage habits and personal interests. Collected data are usually classified as personal, but they contain valuable information about their users when it is analyzed and interpreted. One of the main purposes of personal data analysis is to make predictions about users. Collected data can be divided into two major categories: physical and behavioral data. Behavioral data are also named as neurophysical data. Physical and neurophysical parameters are collected as a part of this study. Physical data contains measurements of the users like heartbeats, sleep quality, energy, movement/mobility parameters. Neurophysical data contain keystroke patterns like typing speed and typing errors. Users’ emotional/mood statuses are also investigated by asking daily questions. Six questions are asked to the users daily in order to determine the mood of them. These questions are emotion-attached questions, and depending on the answers, users’ emotional states are graded. Our aim is to show that there is a connection between users’ physical/neurophysical parameters and mood/emotional conditions. To prove our hypothesis, we collect and measure physical and neurophysical parameters of 15 users for 1 year. The novelty of this work to the literature is the usage of both combinations of physical and neurophysical parameters. Another novelty is that the emotion classification task is performed by both conventional machine learning algorithms and deep learning models. For this purpose, Feedforward Neural Network (FFNN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Long Short-Term Memory (LSTM) neural network are employed as deep learning methodologies. Multinomial Naïve Bayes (MNB), Support Vector Regression (SVR), Decision Tree (DT), Random Forest (RF), and Decision Integration Strategy (DIS) are evaluated as conventional machine learning algorithms. To the best of our knowledge, this is the very first attempt to analyze the neurophysical conditions of the users by evaluating deep learning models for mood analysis and enriching physical characteristics with neurophysical parameters. Experiment results demonstrate that the utilization of deep learning methodologies and the combination of both physical and neurophysical parameters enhances the classification success of the system to interpret the mood of the users. A wide range of comparative and extensive experiments shows that the proposed model exhibits noteworthy results compared to the state-of-art studies.

1. Introduction

Intelligent and integrated devices became one of the essential parts of both our social and business daily life. Especially, computers, phones, tablets, sensors, and cloud services have been part of our private and public domains all around the world in the last decades. These devices gather many parameters about their users including mobility (walking, running, and climbing) information, sleep time, and places where the users visited. Moreover, new developments in technology integrate these devices to the biological body of their users, and this creates online information about human beings accessible by all over the world [1]. Being online generally increases tension/stress level of the user and lowers his/her immunity against regular life/business problems. There is a strong relation between stress and mental health and having emotional intelligence help to manage this relationship [2]. Many countries have strong regulations for users who work at critical positions. They regularly need to take Neurophysical Testing (NPT) to ensure mental health is good to avoid risks depending on stress. The application period of NPT is usually one a year or twice which is not sufficient. Measuring such risks should be a part of daily operation which can be easily realized with intelligent and integrated devices. These devices are already being started to be used by millions of people as smart watches or mobile phones. Otherwise, user-oriented mistakes may cause difficult situations to be not compensated. According to Quorum Disaster Recovery Report, Q1 2013, disaster percentage created by users is 22%, while hardware failures 55%, software failures 18%, and natural disasters 5%, merely. Furthermore, understanding users’ neurophysical conditions using devices has a long history. Earlier studies as stated in Section 2 used computer keyboards to gather data about the users [316]. Keyboard patterns have been used in many studies including identification and authorization. One of the first studies is implemented in [17] which demonstrates that the keystroke information can be used to enforce more security in computer systems in 1980. Base keystroke patterns and some of derived parameters from keystroke patterns are also used as a feature in our study.

IoT (Internet of Things) is thought as network of sensors. Our study creates a private IoT network. Smart devices like smart watches and mobile phones are capable of measuring various parameters of their users such as heart data, sleep time, mobility, blood pressure, and body temperature by means of sensors [1]. These metrics are named and used as physical parameters in our study.

In this study, we propose and implement a system to analyze collected parameters of the users. A custom mobile application is designed in order to gather information from users. The mobile application collects and records physical and neurophysical parameters from each user and transfers into a central repository. Neurophysical parameters are keystroke parameters while physical parameters are heartbeat, motion, energy, and sleep used in this work. Physical and neurophysical parameters are consolidated as a novelty presented by the study. One year data are collected from 15 users to demonstrate the effectiveness of this work. The aim of the study is to find a relation between these parameters and users’ moods.

There are many types of classifiers used in different research areas such as text categorization, image classification, and pattern recognition. The most of supervised algorithms for classification is reviewed and compared in papers [1719]. Some of the commonly used classification algorithms include K-Nearest Neighbors (K-NN), Naïve Bayes (NB), Artificial Neural Networks (ANN), Decision Trees (DT), and Support Vector Machines (SVMs). Conventional machine learning and deep learning models are used together in the study. One of the aims of the study is to evaluate performance of conventional machine learning and deep learning models. Because of both superior performance of deep learning models in the literature and the lack of studied for the implementation of deep learning models on mood detection field, different deep learning techniques are employed in this study. These are Feedforward Neural Network (FFNN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Long Short-Term Memory neural network (LSTM). Furthermore, Multinomial Naïve Bayes (MNB), Support Vector Regression (SVR), Decision Tree (DT), Random Forest (RF), and Decision Integration Strategy (DIS) algorithms are used as traditional machine learning algorithms. Experimental results are collected for each of the algorithms as a part of our study. We have also considered similar studies and their performances. Experimental results indicate that the consolidation of both physical and neurophysical parameters and the utilization of deep learning methodologies increase the classification success of the system to understand the emotion of the users.

Our main contributions are summarized as follows:(i)We propose the usage of the combination of physical and neurophysical features to predict the mood of users which is the novelty of this study(ii)For the purpose of detecting the mood of user, both conventional machine learning algorithms and deep learning techniques are employed and classification performances of each model are compared(iii)To demonstrate the contribution of proposed model, a customized data gathering platform is constructed and data collected for one year

The rest of this paper is organized as follows: Section 2 gives a summary of related work about parameters for the estimation of user behaviors and emotion analysis of the users. Section 3 describes the methodology and proposed framework. Sections 4 and 5 present experimental results and conclusions, respectively.

This section gives a summary about related studies. Studies related to physical and neurophysical parameters are investigated in the literature.

Keystroke patterns are one of the considered parameters of our study. There are several methods on keystroke-based user recognition such as minimum distance [36], statistical [79], data mining [10], and neural networks [1113]. These methods are employed to determine the keystroke dynamics in the state-of-the-art studies. One of the main researches of keystroke dynamics is authentication. The usability of keystroke dynamics for hardening password and encryption is represented in [14]. As a result of these studies, it is accepted that every user has a unique pattern while using the keyboard. This unique pattern is called as TypingDNA. TypingDNA is utilized to differentiate users. The study in [14] proposes to understand users’ behaviors using Key Press Delay (KPD) and Key Stroke Delay (KSD) techniques. KPD stands for time delay between two keyboard chars. KSD stands for total time of one keyboard char press. The experimental results exhibit that KPD and KSD perform well to differentiate users. Measuring KPD and KSD patterns gives a glue to the researchers about users’ neurophysical conditions. Each user has a base KPS/KSD value during under normal conditions, but KPD/KSD values are rapidly changing under abnormal situations [15]. Although the most of keystroke pattern recognition studies aims to identify users, there are also several studies to analyze the emotions of the users using keystroke patterns [16]. In this study, we have derived another parameter named Error Count (EC) as an addition to current keystroke parameters. Error Count (EC) parameter holds typing errors during keyboard usage. All these 3 keystroke patterns are used as neurophysical parameters of this study.

Image processing techniques are also used to understand users’ behaviors. In [20], MIT researchers developed an algorithm that predicts how two people approach each other using the given images. The algorithm tries to estimate the next movement of two people by analyzing given images. Patterns are recognized, and the final movements such as hug, high five, and handshake are estimated with the proposed algorithm. This study employs images to forecast the features that are extracted from images and calculates probabilities of the final movement. Electroencephalography (EEG) devices are also utilized in some of studies in the literature [2129]. In [21], it is shown that measuring EEG signals gives an opinion about human emotional conditions. Discrimination between calm, exciting positive, and exciting negative emotional states is observed when EEG is used.

Heartbeat is one of the common parameters used to differentiate user emotions. It is known that heartbeat changes depending on environmental and biological conditions. It beats faster in extraordinary conditions and gets slow when in a relax environment [30]. Even listening to classical music gives an overall insight about user emotional condition because of its effect on our heartbeat [30]. While early studies use electromechanical film (the EMFi chair) sensors to get parameters [31], recent technology is capable of obtaining these parameters using ordinary smart watches. In another study [32], Harper and Southern focus on the just heartbeat dataset to interpret the emotions of users. For this purpose, Bayesian algorithm is employed as a classification algorithm. They conclude that the usage of heartbeat dataset and Bayesian algorithm for the classification purpose achieves 90% accuracy value. In [33], extensive experiments are conducted on the heartbeat dataset to identify psychoneural illnesses. Heart rate variability (HRV) is used to differentiate several emotional states which may be an indicator for illness. In another study [34], Zhu et al. conclude their study stating that the heartbeat is a good indicator for identifying disorders and evaluation of emotions is a key value to identify emotional disorders. Heartbeat is considered as one of the physical parameters in our study.

Sleep quality is one of the parameters of this study. There is a strong relation between sleep quality and emotional condition [35]. The study [36] demonstrates that insufficient sleep duration or sleep quality may cause many problems including negative emotions, perceptual anomalies, even paranoia. In another study [37], Garett et al. concentrate on showing the relation between social media usage and sleep quality by observing twitter messages of college students for the purpose of analyzing emotional states of students. Experimental results of the study [37] demonstrate that the usage of social media is associated with sleep quality among students. Sleep quality has also similar effects even in older ages. In [38], it is claimed that increasing sleep quality creates meditation effect in older adults.

Motion/energy and walking posture parameters are also considered to identify the user emotions [39]. Walking posture is easily observable and changeable when people is under stress. In the study [40], it is emphasized that walking style also impacted physiological states during stress and changing the way a person walks may improve their responses to stress. Human gait analysis is also one of the popular research areas in emotion identification. In [41], venture et al. clarify that it is possible to identify several emotion states like neutral, joy, anger, sadness, and fear. The experimental results represent that the use of the gait analysis characterizes each emotion which is supported by accuracy results. Motion and energy parameters are considered as physical parameters in our study.

Sentiment analysis has recently become a popular research field. There are many studies to specify and classify users’ feelings by using social media platforms [42, 43]. It is usually difficult to analyze texts for interpreting emotions. Textual information broadly includes two categories: facts and opinions [44]. Facts about entities, events, and their properties are described by objective expressions. Subjective expressions usually describe opinions about people’s sentiments, appraisals or feelings toward entities, events, and their properties [45]. In this study, the main focus is not to implement sentiment analysis to understand the mood of users. Instead of this, predefined emotion-attached questions are employed. Daily-basis questions and answers are collected with these questions. Thus, not the opinions but the facts are obtained by interacting with the users. The user answers are mapped to the sensitivities and correlated with physical and neurophysical parameters.

Our work differs from the mentioned literature studies above. There are many different types of parameters collected in order to increase the classification accuracy of the system. In this study, not only physical but also behavioral/neurophysical parameters and the combination of them are evaluated while the literature studies above usually focus on one or few parameters. Moreover, the inclusion of deep learning models in addition to the traditional machine learning algorithms boosts the classification performance of the proposed system.

3. Methodology

Figure 1 shows basic steps of tasks performed during this study. Data collection part was one of the difficult parts of the study because finding volunteers for such a long period was a problem and not accepted by many people. Data collection was started with 15 volunteers, and the dataset was composed of data gathered from 15 different users in the age range 25–35 for nearly 365 days. As mentioned in former sections, data have two parts: physical and neurophysical. Sensors of smart watches are used to obtain physical data. By default, smart watches measures and sends these measurements to mobile phones. Mobile phones store all the measurements in their local database. Any application like ours can access these data and process it or send the data to another location for processing. Another functionality of the application is to interact with the users and to ask them six random questions to understand their emotional conditions. These questions are asked in the mornings and at the end of the day and takes just one minute to completed. Each question has a hidden emotional value. If a user has positive answer to a question, it is expressed as 1, and otherwise as −1. Some of the sample questions are given below:Question 1: “I believe that it is going to be a good day”Emotion: positiveQuestion 2: “There will be another boring day”Emotion: negativeQuestion 3: “I feel empty and do not have any energy”Emotion: negative

Questions generate −1 or 1 value. For Question 1, if the user agrees about the statement, it returns 1 otherwise −1 because Question 1 is positive emotion loaded question. On the other hand, Questions 2 and 3 are negative emotion loaded questions. If the user agrees about the statement it generates −1 and 1 if disagreed.

Figure 2 shows feeling attribute values that range from −4 to 4. It is calculated as the sum of answers to questions. The highest one symbolizes better mood of the user. In this way, all physical and behavioral parameters and corresponding emotional statuses are obtained with the proposed model. The developed mobile application collects all data and sends it to a central repository. The central repository keeps data with a timestamp and an anonymous user id. Data are collected hourly. Each sensor may have multiple measurements per minutes. For instance, heartbeats may have several values in a regular condition and number of the measurements taken can increase when the user is in abnormal situation like sporting or over stress. Thus, sensors create hundreds of measurements for each hour. Data are preprocessed before machine learning stage. Especially, data like heartbeat need a special preprocessing stage. It is one of the large data creators. Our study focuses to capture abnormal parameter values. Heartbeat has a base value for each user. Our study shows that base value is approximately 90 percent of taken measurements. Remaining 10 percent is usually different from the base value. Abnormal situations usually increase the heartbeat frequency. Heartbeat frequency lowers when the user is in relaxing conditions. This %10 is a good indicator to capture emotional transitions. Heartbeat values are collected as base value and abnormal values. Average of the values is calculated and recorded as heartbeat value of the corresponding hour.

Energy, sleep, and step parameters are cumulative and supplied by smart watches. They are also recorded on an hourly basis. Each hour’s value is calculated by subtraction of the previous hour’s value.

Total time, EC, and Avg values are keyboard pattern parameters. Users periodically type a given short text on the devices. Total time spent to complete typing, average time of typing (Total time/number of characters), and typing errors are recorded as neurophysical parameters. Users are asked to type given texts at least once a day.

Data are collected on an hourly basis, and total size of the data is 365 × 24 × 15 (day × hour × users). Before machine learning stage, data are summarized to daily basis by taking the average values of the parameters. 1 row of data per a user stores all parameters with their summarized values. Approximately 365 × 15 matrix is created for 1 year of data. This matrix contains all physical and behavioral data for 15 users as it is shown in Table 1.

After collecting the physical and neurophysical data, machine learning algorithms are applied to the dataset. Experiments are carried out on CPU using 12 threads in an Intel® Xeon® E5-2643 3.30 GHz machine. Python 3 version with Pycharm environment is employed as a programming framework. Apple iWatch and Vestel smart watch devices are used for data collection. iPhone 7 and 8 smart phone models are compatible and used to gather data from these smart watches. Sensor specifications are given as reference [46].

Feeling field in Table 1 is used to label data. Dataset is divided into two halves randomly. 67% of the dataset is employed for the training process, and the remaining part is kept as the test data. After that, Multinomial Naïve Bayes, Support Vector Regression, Decision Tree, Random Forest, Feedforward Neural Network algorithms, Decision Integration Strategy, and Deep Learning algorithms are performed. After the training step, each of the algorithm is executed and their classification performance recorded.

A detailed description of the parameters is given in Section 3.1.

3.1. Neurophysical Parameters

Data of the study have 3 parameters related to keystroke parameters. They are Total Time, Average Time, and Error Count. They are classified as neurophysical parameters.

3.1.1. Keystroke Dynamics Measurement

Keystroke dynamics is about measurement of keystroke timings. The identification of keystroke is one of the main study fields of emotion analysis. There are three approaches for keystroke identification, namely, Euclidean distance measure (EDM), nonweighted probability, and weighted probability. These are the most widely used methods used in the literature. Distance between pattern vectors is measured with EDM approach as seen in Figure 3. Assume that there are two pattern vectors R and U, where R stands for real time vector and U signifies the reference vector of user. Mathematical equations of these two N dimensional vectors can be written as follows:

In this study, keystroke dynamics are employed to calculate the total time of parameters. Measured KPD and KSD parameters are stored as total time parameter.

In Figure 4, <x1, x2, x3, …, xd> are biometric pattern values generated either by KSD or KPD.

Assume that KPD and KSD are N dimensional vectors which are seen in Figure 5 and expressed as follows:

Total time and average time parameters are obtained using the custom application which is created for this study. Application asks users to type some random texts and measures both parameters.

3.1.2. Error Count Measurement

Error Count (EC) is third keystroke parameters used in the study. Typing errors are recorded when users are typing given text. Expected value of Error Count is 0. Abnormal situations increase the value of Error Count. This parameter can be a key indicator to understand emotional condition of a user. Error Count (Ec) is measured on each iteration and recalculated using moving average method by considering previous measurement values. Calculated Error Count parameter is represented by Ψ. In formula (3), Ec indicates error count of the current iteration. Ψec is current calculated value of error count and Ψec−1 is the previous calculated value of the Error Count. Figure 6 shows how Ec value is calculated in several iterations. ω represents independent iterations. Iteration 1 (ω1) has 4 errors while others 2 and 1. Table 2 shows values of Ec and Ψec. Error Count measurement is done on a daily basis, and only one value is created for each day depending on iteration number:

Error count parameters are obtained using the custom application which is created for this study.

3.2. Physical Parameters

There are many physical parameters which can be obtained using standard intelligent devices. Sleep quality, energy, mobility/movement, and heart pulse rates are accepted as physical parameters in this work.

3.2.1. Sleep Quality Measurement

Sleep quality is a significant component of physical and mental health as well as overall well-being. There are different indicators to measure the quality of sleep such as time in sleep. Although time is not the only parameter, it gives a general perspective about the quality of sleep. We considered total sleep time (Sst) as main input for sleep quality. In Table 3, recommended sleep durations by age are given. Table 3 shows type of people by ages and their corresponding sleep durations recommended by National Sleep Foundation (NSF) in 2005 which is called Sleep Health Index (SHI). After a rigorous, systematic review of the world scientific literature involving sleep duration to health, performance, and safety, SHI is the first index developed by a professional organization for age-specific recommended sleep durations [47].

Today, smart devices have the capability of measuring the sleep time. Sleep analysis is commonly used by people to understand and interpret their sleeping habits like “Time to go bed,” “In bed duration,” and “Sleep duration.” In this study, sleep quality is calculated with equation (4). Ψsq identifies the sleep quality parameter. It is a constant value for each day depending on sleep time of a user.

The sleep quantity parameter is obtained using a smart watch. The smart watch measures and sends sleep time daily basis.

3.2.2. Energy Measurement

Energy is evaluated in two main parts such as active energy (Eae) and resting/passive (Epe) energy. Passive energy is the minimum level of energy to keep live for a human being and used as base level of energy usually used by body itself. On the other hand, an active energy is exposed by burning when a human being exhibits an extra effort. In this study, total energy which is the sum of both is employed. Ψeq representation of total energy in formula (5):

The energy parameter is acquired using a smart watch. The smart watch measures and sends energy daily basis.

3.2.3. Mobility/Movement Measurement

Mobility/movement (Emm) is a measure of steps taken for a period of time. Depending on life style of each individual, it exhibits an average value. Depending on illness or a rush day, this value can vary and is a good indicator to distinct an ordinary day and unordinary day. Ψmq stands for mobility parameter which is basically the total step count the corresponding day:

The mobility/movement parameter is obtained using a smart watch. The smart watch measures and sends step counts daily basis.

3.2.4. Heart Pulse Measurement

Heart pulse is a measure for heart beats for an individual and has an average value for each user. In this study, average heart beat value is defined as base heartbeat value (Hbase). All base and abnormal heartbeats (H) are measured, and the average value of heartbeats is calculated for each hour. Daily value is recalculated as average value of hourly values:

The heart pulse parameter is obtained using a smart watch. The smart watch measures and sends heart pulse counts on an hourly basis.

3.3. Classification

In this study, we mainly concentrate on the supervised learning approach because the nature of problem is more suitable. Naïve Bayes (NB), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Artificial Neural Network (ANN) are applied to analyze the neurophysical conditions of the users.

3.3.1. Naïve Bayes Algorithm (NB)

A Naïve Bayes classifier is a very popular probabilistic machine learning model used for many classification tasks. It is based on the Bayes theorem that calculates the conditional probabilities with equation (8).

Using Bayes theorem, we find the probability of A happening given that B has occurred. Here, B is the evidence and A is the hypothesis. The Naïve Bayes algorithm assumes that the predictors/features are independent from each other. That is, there is no relationship among features. Hence, it is called naive. Bernoulli Naïve Bayes, Gaussian Naïve Bayes, multinomial Naïve Bayes, and complement Naïve Bayes are the different types of Naïve Bayes approaches. The Bayes theorem has a large application field in computer science including text classification or spam analysis [48]. In this study, we use multinomial Naïve Bayes algorithm.

3.3.2. Support Vector Regression Algorithm (SVR)

Support Vector Regression (SVR) is a regression algorithm that has been invented from Support Vector Machine. The theory of this approach is developed and presented by Vladimir Vapnik [49]. The main point of SVR is very similar to linear regression, but SVR adds the margin concept. With this concept, any point in space must be reachable with the mainline and its sum with a support vector. This sum operation is the margin concept. The way of finding support vector depends on the dimension of used space. In linear spaces, used formula is

In our case, space is in 8th dimensions because there are 8 parameters exists. Equation (9) defines formula of Support Vector Machine. In this study, radial basis function (RBF) kernel is used during computation.

3.3.3. Decision Tree Algorithm (DT)

The Decision Tree algorithm builds a classification model that uses a tree-like graph for decisions and their possible after-effect, including chance event results, resource costs, and utility. The Decision Tree or a classification tree uses a tree representation to learn a classification function which predicts the value of a dependent attribute (variable), given the values of the independent (input) attributes (variables). It is a supervised classification algorithm that divides a labeled dataset into smaller datasets while building the Decision Tree. It is proposed by Quinlan [50]. Decision Tree algorithm breaks down the dataset into smaller datasets until the last dataset has only similar objects. During this process, separated subsets get attached to questions which have certain responses. These questions make it possible to find the right subset which the new data belong to.

3.3.4. Random Forest Algorithm (RF)

Random Forests or Random Decision Forests are supervised learning algorithms that be used for classification and regression. The “forest” consists of multiples of decision trees. The algorithm constructs an ensemble of decision trees at training time and merges their decisions by outputting the target class (classification), which is highly voted by the ensemble of decision trees or mean prediction (regression) of the individual trees. The Random Forest algorithm provides high accuracy and do not overfit to their training set if there are enough trees in the forest [51]. It is proposed by Korting [52]. Random Forest is constructed by creating a series of decision trees from bootstrapped training samples. The decision tree split is done on a random set of features/predictors rather than using the full set of predictors. In this work, Random Forest algorithm is employed with ten estimators.

3.3.5. Decision Integration Strategy (DIS)

Decision integration is also a machine learning paradigm where the same or different multiple types of classifiers are trained to solve a problem. In contrast to conventional machine learning approaches which try to learn one hypothesis from training data, decision integration strategies try to construct a set of hypotheses and combine them to use. Each machine learning algorithm creates their hypotheses and decision integration model generates only one final decision. Majority voting is a known approach and gives better results rather than using a single classifier. Combination of majority voting and Artificial Neural Network models are robust and efficient for classification [53]. In this work, the decision of each classifier is voted according to majority.

3.3.6. Feedforward Neural Network (FFNN)

In [54], McCulloch and Pitts offer a computational model for neural networks based on mathematics and algorithms called threshold logic inspired by biology of neurons. It consists of many neurons that are organized as layers. Feedforward Neural Networks are also known as multilayered networks (MLNs). In these networks, information flows forward from one layer to another, through the input nodes then through the hidden layers (single or many layers) and finally through the nodes of output layer. In MLN, there is no feedback mechanism that provides the flow of output information back to its inputs or some previous layers. These networks are represented by a combination of many simpler models called sigmoid neurons. Multilayered networks are composed of many nodes where sigmoidal functions are preferred as activation functions. MLNs are capable of learning complex and nonlinearly separable decision boundaries from data. MLNs consist of one input layer, one output layer, and several hidden layers between input and output layers. The more hidden layers allow MLNs to learn the more complex nonlinearly separable relations between the input and the output.

3.3.7. Convolutional Neural Network (CNN)

CNNs are a special type of deep learning networks [55] that provide a better performance than many other machine learning algorithms. CNN is also a feedforward Neural Network with more hidden layers. The better learning capability of CNN comes from hidden layers that extract features and can learn representations from the data. Hidden layers contain convolutional layers mixed with pooling layers. The most significant block of CNN is the convolutional layer. Input data pass through a series of convolution layers with filters (Kernels). The convolution of data with filters generates a feature map that associates information with data on the filter. Multiple filters are applied to input data to get a stack of feature maps that becomes the final output of the convolutional layer. The values of filters are learned during the training process. Convolution operation captures information about local dependencies or semantics in the regions of original data. An additional activation function like a rectified linear unit is applied to feature maps to add nonlinearity to CNN. After the convolution process, a pooling layer reduces the number of samples in each feature map and holds the most important information. The pooling layer decreases the training time and reduces the dimensionality of data and overfitting by using a pooling function. The most common type of pooling function is called max pooling that finds the largest value in a specified neighborhood window. CNN architectures contain a sequence of convolutional layers interleaved with pooling layers, followed by a number of fully connected layers.

3.3.8. Recurrent Neural Network (RNN)

RNNs are powerful tools for modelling sequential data. An RNN hidden state is a function of all previous hidden states of recurrent nets [22, 56]. RNN output does not only depend on current input but also depends on whatever the past information is retained in hidden states. RNNs differ from feedforward networks like CNN by the feedback loops that enables the past decision information to be kept in the network. RNNs suffer from the gradient exploding problems. When there are long-term dependencies in sequence data, RNNs have difficulty to learn the past data. These problems occur during gradient descent process during training when the gradients are being propagated back in many layers. Because of continuous matrix multiplications coming from deeper layers, small values shrink exponentially and vanish. This is called vanishing gradient problem that makes the model impossible to learn from data further back in a sequence. On the other hand, when there are large weight values coming from deeper layers, they become larger because of continuous matrix multiplication and go to NaN values during training and crash the model to learn. This is called the exploding gradient problem. To deal with exploding problems, several methods like gradient clipping or appropriate activation functions can be used.

3.3.9. Long Short-Term Memory Network (LSTM)

Another popular and widely used solution is to utilize Long Short-Term Memory networks (LSTMs) which are variants of RNNs in order to solve the gradient vanishing/exploding problems of RNNs [57, 58]. LSTM maintains the error to backpropagate through deeper layers and to proceed to learn over many time steps. Basically, LSTMs are developed to acquire long-distance dependencies within sequence data. LSTMs hold the contextual semantics of information and store long dependencies between data. LSTM employs special memory cells or units to store information for dependencies in long range context. Each LSTM unit includes input, forgets, and outputs gates to control which portions of information to remember, forget, and pass to the next step. The LSTM unit makes decisions about what to store, and when to permit reading, writing, and deleting via gates that pass or block information through a LSTM unit.

4. Experiment Results

In this study, the comprehensive experiments are carried out to analyze the neurophysical conditions of users using conventional machine learning algorithms and deep learning models. In order to demonstrate the contribution of our work, accuracy, precision, recall, and F-measure are employed as evaluation metrics in the experiments. We employ 67% of the dataset for the training process, and the remaining part is the test data. The widely used 10-fold cross validation with 5 × 2 approach is applied on the dataset. In this approach, the dataset is divided randomly into two halves. One part is employed in training and remaining in testing and vice versa. This procedure is applied five times, repeatedly. Thence, 10 predictions of testing accuracy and other measures are achieved for each dataset and each model. The accuracy results located in tables are the averages of these 10 estimates for all models.

In deep learning part, Adam optimization [25] with a learning rate of 0.0001 is used. Hyperbolic tangent as the nonlinear activation function is employed in the experiments. To alleviate the problem of overfitting, dropout [26] is applied and set to 0.5. The number of layers is adjusted to 7, and the number of epochs is set to 100. For deep learning algorithms, CNN, RNN, and LSTM models have 7 layers. In CNN, the first one is an input layer. The second layer is the convolution layer with 64 filters whose kernel size is 3. The third layer is max-pooling layer whose max-pool size 2. The fourth and fifth layers are the same as second convolution layer and third layer, respectively, with the same characteristics. The sixth layer is the full-connected layer with 16 neurons. The last layer is the output softmax layer with 5 neurons. In the RNN model, the first layer is an input layer. It is followed by the implementation of RNN with 2 Simple RNN layers each with 32 RNN cells followed by 2 time distribute dense layers for 5 class classification. Dropout [26] rate on the fully connected layers is set to 0.5. Tangent activation function is used at the last layer. In the LSTM model, the first layer is an input layer. It is followed by the implementation of LSTM with 2 layers of 32 LSTM cells followed by 2-time distribute dense layers for 5-class classification. The dropout rate on the fully connected layers is set to 0.5. Tangent activation function is used at the last layer. For all deep learning models, Moolayil [27] library is used in the experiments. For conventional classification algorithms, RBF (radial basis function) kernel function is used in SVR. In DT, “criterion” parameter is set to mean-squared error. RF, “n_estimators” parameter, is set to 4. This will create 4 DTs within RF algorithm. In MNB, parameters are set as alpha = 1.0, class_prior = none, fit_prior = true. Other parameters for each conventional classification algorithm are set to default values as they are predefined in sklearn library.

Abbreviations mentioned in the tables and explanations used for the traditional machine learning algorithms and deep learning models are as follows: MNB: Multinomial Naive Bayes, RF: Random Forest, SVR: Support Vector Regression, DT: Decision Tree, DIS: Decision Integration Strategy, FFNN: Feed Forward Neural Network, CNN: Convolutional Neural Network, RNN: Recurrent Neural Network, LSTM: Long Short-Term Memory neural network. The best accuracy results acquired for each user is indicated with boldface.

As a first step, the classification performances of both conventional machine learning algorithms and deep learning methods are analyzed in terms of each user as given in Tables 4 and 5, respectively. Experiments are carried out with two different approaches. In the first approach, each user data are isolated and evaluated in the algorithms. There are 365 rows of data and a row of data for each user created each day. Physical and neurophysical parameters are recorded in each row for the user. As a summary, the training and test sets for each user is obtained with its own data and processed from each other independently in the first approach.

In Table 4, the accuracy percentages of conventional machine learning algorithms and Decision Integration Strategy in terms of each user are presented. It is clearly observed that the mean classification accuracy of DT is the best performing machine learning algorithm with 77.87% accuracy result among others. Moreover, DT exhibits superior classification performance compared to the DIS which is competitive with the 72.18% accuracy success, by boosting the success of the proposed system. It is followed by RF with 60.02%, SVR with 41.71%, and MNB with 28.25%. MNB exhibits the poor classification performance among others with 28.25% of accuracy value. Fundamentally, Naïve Bayes algorithm assumes that all features are conditionally independent. When parameters are somehow dependent each other, Bayes algorithm may give poor results such as the classification performance of MNB. In this work, parameters collected, especially physical ones, are dependent on each other. We consider that the reason behind of poor performance of MNB is the dependency of features. MNB performs better when features are independent. DT is the best performer because features in this study are dependent each other. As listed in Table 4, the success order of the classifiers is summarized as follows: DT > DIS > RF > SVR > MNB.

In Table 5, the accuracy percentages of deep learning models in terms of each user are given. We intend to demonstrate the classification capability of deep neural networks and compare their performances with the traditional FFNN by implementing FFNN, CNN, RNN, and LSTM. Among the deep learning methods, CNN remarkably exhibits classification success with 79.06% of mean accuracy. Moreover, CNN outperforms other deep learning models while FFNN exhibits the poorest classification performance with 69.03% among other learning methods. Furthermore, LSTM maintains approximately 7% improvement considering the success of FFNN while RNN provides nearly 5% enhancement compared to the classification performance of FFNN. As it is seen in Table 5, the performance order of NN learning models is abstracted as follows: CNN > LSTM > RNN > FFNN. From a wider perspective, the classification performances of both deep learning models and conventional machine learning techniques are as follows: CNN > DT > LSTM > RNN > DIS > FFNN > RF > SVR > MNB. Similarly, the best and worst classification performances are carried through CNN with 79.06% of accuracy and MNB with 28.25% of accuracy, respectively. Surprisingly, DT is quite competitive with 77.87% of mean accuracy when the classification successes of deep learning models are considered. It is followed by LSTM with 76.12%, RNN with 74.86%, DIS with 72.18%, FFNN with 69.03%, RF with 60.02%, SVR with 41.71%, and MNB with 28.25% of mean accuracy value.

In the second approach, it is assumed that all users may have similar characteristics with similar physical/neurophysical and emotional conditions without any difference among users. Therefore, all users’ data are combined to test this assumption by constructing a large single dataset. After that, dataset is divided into two parts as follows: 33% of the dataset is the training data and the remaining part is the test data. In Table 6, accuracy, F-measure, precision, and recall results of all user data with both traditional machine learning algorithms and deep learning models are presented. The classification success of both deep learning models and conventional machine learning techniques when all user data are used instead of single-user-based data are as follows: CNN > DT > LSTM > RNN > DIS > FFNN > RF > SVR > MNB. CNN outperforms others with 84.31% accuracy when the algorithms are evaluated among themselves. In addition to CNN, DT and RNN exhibit approximately 5% improvement compared to the first approach. There is a performance increase of 16% with SVR algorithm in terms of classification success while a minimum improvement of the classification performance is observed as nearly 2% with RF in the second approach. Even MNB whose classification success is the worst exhibits nearly 3% improvement. It is followed by FFNN and LSTM with nearly 4% enhancement and by DIS with nearly 6% improvement in proportion to the first approach. As listed in Table 6, it is clearly observed that DT is quite competitive with 82.03% accuracy when compared with the classification success of deep learning algorithms. It can also be concluded that the best three classification successes are obtained by deep learning models except DT and DIS. Surprisingly, DIS and RF exhibit poor classification performances when they are compared with the deep learning algorithms. As it is observed in Table 6, the usage of conventional machine learning algorithms MNB and SVR has a disadvantage because of their low classification accuracies. Moreover, CNN exhibits the best classification success for all evaluation metrics as seen in Table 6. It is important to emphasize that 82.57% of precision among all machine learning algorithms and deep learning models supports the accuracy result of the CNN model. The CNN model represents the remarkable precision value when especially compared to the other deep learning models. This means that it is more convenient to choose the CNN model for constructing the classification model for mood detection. Even when compared to the second best precision performance (DT), the CNN algorithm exhibits between 3% and 4% more enhancement in system performance. This performance represents also similar results in terms of other evaluation metrics. Compared to the second best successful model DT, 2% advancement is provided by CNN for both accuracy and f-measure metrics while the same model shows 1% more improvement in recall results compared to second best successful model DIS. As a result of Table 6, the use of the CNN model, regardless of which evaluation metric is used, significantly improves the system performance for the purpose of mood detection.

Table 7 shows comparison of related studies. We evaluated some of the studies which use conventional machine learning algorithms and deep learning algorithms. Each row represents a study. Resource field in Table 7 indicates the type of the parameters used in the study. Corresponding algorithm and accuracy value are also given for each study. Several algorithms are used in related studies. Table 7 also gives a general perspective about performance of the algorithms. Linear Discriminant Analysis (LDA), Statistical Analysis, Support Vector Machine (SVM) algorithms, Naïve Bayes (NB), K-Nearest Neighbors (K-NN), and Artificial Neural Network (ANN)/Deep Learning- (DL-) based algorithms are selected for comparison.

Novelty of our study is to use neurophysical parameters with physical parameters. Used neurophysical parameters are derived from keystroke patterns. Most of the studies in the literature use physical parameters rather than neurophysical parameters. Neurophysical parameters are volatile and have measurement difficulties. We believe that we found a way to eliminate this difficulty using keystroke patterns and emotion embedded questions. Comparison of traditional machine learning algorithms and deep learning algorithms is another novelty of our study. Tables 4 and 5 show the accuracy of each algorithm with the same dataset. Table 7 gives a comparison table among studies.

Last 2 rows in Table 7 shows the performance of algorithms of our study in conventional machine learning and deep learning algorithms. A graphical presentation of the accuracy values is also given in Figure 7.

5. Conclusion

In this study, we developed a model for emotional analysis of users by collecting many parameters from sensors. These parameters are grouped as physical and neurophysical. Both of physical and neurophysical information are obtained from the sensors of mobile devices and keyboard patterns of users. The data collection period was long, and the users were trained before the data collection stage. Keystroke pattern-based measurements are inputted as neurophysical parameters while sleep quality, energy, mobility/movement, and heart pulse are entered as physical parameters to the system. The conventional machine learning algorithms and deep learning methodologies are used in order to classify emotion/mood of the users. Deep Learning techniques has the capability to solve a given problem end to end without many preprocessing steps, whereas traditional machine learning techniques need some preprocessing steps to break down the problem into different subparts first and then further process the results of subparts at a later stage. Another advantage of deep learning is the better capability of it to learn from data than traditional machine learning algorithms that needs a domain expert and preparation before applying them. As it is stated in the previous section, we observed better results with CCN deep learning algorithm.

To the best of our knowledge, this is the first attempt to analyze the emotions of users by using the neurophysical information of users enriched with their physical information. To predict the emotions of the users, Feedforward Neural Network (FFNN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Long Short-Term Memory neural network (LSTM) are employed as deep learning methodologies while Multinomial Naïve Bayes (MNB), Support Vector Regression (SVR), Decision Tree (DT), Random Forest (RF), and Decision Integration Strategy (DIS) are applied as conventional machine learning algorithms. In order to analyze the proposed model, a dataset is gathered with the usage of smart devices and keyboard timing information obtained from users. Experiment results demonstrate that the usage of deep learning methodologies and the combination of both physical and neurophysical parameters enhance the classification success of the system to predict the sensitivity/mood or emotional state of users. Especially, CNN exhibits outstanding classification success among others. We conclude the study that the physical and neurophysical parameters are strongly linked to users’ emotional conditions. When emotions change, users’ body gives a reaction to the new situation and this change can be observed by measuring physical/neuro physical parameters. Experimental results also confirm this connection between physical and neurophysical parameters.

Despite the significant contributions, the proposed framework has some limitations. The hard part of this study was finding subjects because nobody wanted to be a subject for a one-year study by taking a responsibility. 15 persons from academic and business circles were found as volunteers, hardly. Secondly, training of subjects was a difficult process. It took a month for the subjects to gain the habit of using the given application. CPU is used in the experiments because of the small size of the dataset. The use of GPU can be considered as an option when the number of users increases. As a future work, we also plan to enrich this study by employing sentiment analysis of the users which will be obtained from texts in social media platforms. Besides, we also intend to perform a Decision Integration Strategy for deep learning methodologies in order to compare them with Decision Integration Strategy of conventional machine learning algorithms.

Data Availability

Data in this study are collected from users as a part of this study. Thus, they can be shared if requested.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.