#### Abstract

High blood pressure (BP) is associated with an increased risk of cardiovascular diseases. Therefore, optimal precision in measurement of BP is appropriate in clinical and research studies. In this work, anthropometric characteristics including age, height, weight, body mass index (BMI), and arm circumference (AC) were used as independent predictor variables for the prediction of BP reactivity to talking. Principal component analysis (PCA) was fused with artificial neural network (ANN), adaptive neurofuzzy inference system (ANFIS), and least square-support vector machine (LS-SVM) model to remove the multicollinearity effect among anthropometric predictor variables. The statistical tests in terms of coefficient of determination (), root mean square error (RMSE), and mean absolute percentage error (MAPE) revealed that PCA based LS-SVM (PCA-LS-SVM) model produced a more efficient prediction of BP reactivity as compared to other models. This assessment presents the importance and advantages posed by PCA fused prediction models for prediction of biological variables.

#### 1. Introduction

Accurate measurement of BP is essential in epidemiological studies, in screening programmes, in research studies, and in clinical practice to classify individuals, to ascertain hypertension related risks (coronary heart disease, stroke, and kidney failure), and to guide management. Recommendations of several international organisations including the American Heart Association (AHA) [1], British Hypertension Society (BHS) [2], and European Society of Hypertension (ESH) [3] revealed that accuracy of BP measurements is highly associated with the conditions in which the measurements are taken. The observer should be aware of the considerable variability that may occur in BP due to various factors. However, it is not always feasible to control all the factors, but we can minimize their effect by taking them into account in reaching a decision [3].

In clinical practice, talking is one of the most common measurement disturbances influencing BP measurement accuracy [4]. It can contribute to elevated BP reading, termed BP reactivity to talking, that may result in the misdiagnosis of hypertension or in overestimation of the severity of hypertension and may lead to overly aggressive therapy. Antihypertensive treatment may be unnecessary in the absence of concurrent cardiovascular risk factors [5].

In the past few years, several studies have quantified the effect of talking on BP. Zheng et al. [6] measured BP in healthy subjects under five different conditions including resting, deeper breathing, talking, and head and arm movement and proved that SBP and DBP changed significantly in comparison to the resting condition. Le Pailleur et al. [7] explored a sharp and significant increase in SBP and DBP of hypertensive subjects while talking. Le Pailleur et al. [8] showed an instantaneous rise in SBP and DBP of treated and untreated hypertensive patients under a period of stress talking and a period of counting aloud (active periods).

Zheng et al. [9] demonstrated significantly higher manual and automated MAPs with talking in healthy subjects. Lynch et al. [4] reported that verbal activity is consistently associated with marked elevations in both normotensive and hypertensive subjects. Tardy et al. [10] demonstrated that talking state increased the BP as compared to resting state of the subjects. Lynch et al. [11] described sudden extreme drop in blood pressure in both experimental and clinical situations when a person is talking about or describing situations of hopelessness and helplessness. Long et al. [12] showed statistically significant increase in BP when speaking compared to when quiet. Hellmann and Grimm [13] investigated the effect of talking on subjects with one previous diastolic blood pressure reading of 90 mm Hg or more and not taking antihypertensive medicines. Blood pressure increased significantly under both talking conditions (reading neutral material for part of the procedure and reading neutral material continuously).

Epidemiological studies from different populations have explored a significant correlation between BP and anthropometric characteristics [14–16]. Therefore, anthropometric variables should be considered to attain an accurate measurement of BP. However, multicollinearity between anthropometric predictor variables has also been reported, which may result in “overfitting” of the prediction model [17–19]. One approach to dealing with multicollinearity is to use PCA, a statistical approach. By using PCA the original data set can be transformed into principal components (PCs) that are orthogonal and are able to explain the maximal variance of the data without losing any information [20, 21].

Soft computing covers computational techniques that offer somewhat “inexact” solutions of very complex problems through modeling and analysis with a tolerance of imprecision, uncertainty, partial truth, and approximation. The successful applications of soft computing approaches in biomedical studies suggest that the impact of soft computing will be felt increasingly in the coming years.

The fusion of a statistical and soft computing approach usually improves the training speed, enhances the robustness of the model, and reduces the calibration error. These models may aid the clinicians in the decision-making process regarding clinical admission, early prevention, early clinical diagnosis, and application of clinical therapies. In this sense, this paper focuses on the development of PCA based soft computing approaches for prediction of BP reactivity to talking, which include conventional statistical method of PCA for data preprocessing. We developed PCA based ANN (PCA-ANN), PCA based ANFIS (PCA-ANFIS), and PCA-LS-SVM models for prediction of BP reactivity to talking in normotensive and hypertensive subjects. The prediction accuracy of developed models was assessed and compared using statistical indices including coefficient of determination (), root mean square error (RMSE), and mean absolute percentage error (MAPE) to select the model that most accurately predicts the BP reactivity.

The rest of the paper is structured as follows. In Section 2, we present the details of data collection. Section 3 deals with the experimental approaches used for data analysis. Section 4 deals with the summary of results obtained. Section 5 describes the discussion and Section 6 concludes with future directions of work.

#### 2. Data Collection

A total of 40 normotensive and 30 hypertensive female subjects among the students, staff, and faculty of Sant Longowal Institute of Engineering and Technology (Deemed University), Longowal, District Sangrur, Punjab, India, volunteered for this study. Eligible participants had to be over 18 years of age. We excluded the subjects who were pregnant and who had arrhythmias. The institutional research committee approved the research protocol and all participants gave written informed consent before participation.

A standard questionnaire was administrated to collect information on anthropometric characteristics of the participants. A specially separated room was used to conduct the study. This ensured minimal interference within the room while the tests were being carried out. The observers involved in the study were trained using the BHS’s BP measurement training materials [22].

To eliminate observer bias, BP was measured using a clinically validated (under standardized measurement conditions), newly purchased, and fully automated sphygmomanometer OMRON HEM-7203 (OMRON HEALTHCARE Co., Ltd., Kyoto, JAPAN) that uses the oscillometric method of measurement. The BP monitor is available with a small cuff (17–22 cm), medium cuff (22–32 cm), and large cuff (32–42 cm). The appropriate size of cuff was determined from the mid-arm circumference of the subject.

Subjects were advised to avoid alcohol, cigarette smoking, coffee/tea intake, and exercise for at least 30 minutes prior to their BP measurement. They were instructed to empty their bladder and sit upright with elbows on table, supported back, and feet flat on the ground, as they are the potential confounding factors. Moreover, they were asked not to talk and move during measurement [1].

After a rest period of 5 minutes [1], the measurements were performed four times repeatedly at an interval of one minute. First measurement was discarded and the average of last three measurements was taken into account. Subsequently, the same measurement protocol was repeated under talking phase during which the observer asked each subject to “tell me about your work in detail” [4]. During talking phase, the observer only talked to the subject to maintain the flow of conversation, making every possible effort to talk minimally. To improve the reliability of measurements, the subjects were examined for a week [3].

#### 3. Experimental Methods

Data were expressed as mean ± SD. A paired -test was used to assess the difference between measurements of resting and talking conditions.

##### 3.1. PCA

Firstly, Bartlett’s test of sphericity [23] and Kaiser Meyer Olkin (KMO) measure of sampling adequacy [24] were applied to check the suitability of data for application of PCA.

Bartlett’s test of sphericity tests the null hypothesis that the correlation matrix is an identity matrix or there is no relationship between predictor variables. Consider where is chi-square, is sample size, is number of predictor variables, is natural log, and is determinant of the correlation matrix.

KMO compares the magnitude of calculated correlation coefficients and partial correlation coefficients. The formula for KMO is given as follows: where is sum over all variables in the matrix when variable , is Pearson correlation coefficient between variables and , and is partial correlation coefficient between variables and .

KMO index ranges, from 0 to 1, should be greater than 0.6 for the PCA to be considered appropriate.

PCA is a multivariable statistical analysis technique. The objective of PCA is to remove the multicollinearity problem and reduce the number of predictor variables and transform them into PCs which are independent linear combinations of the original data set and account for the maximum possible variance of the original data set so that adequate information from the original data set can be extracted [20, 21].

The eigenvalues of the standardized matrix are calculated from where is correlation matrix of the standardized data, is eigenvalues, and is identity matrix. The weights of the variables in the PCs are then obtained by where is matrix of weights.

To evaluate the influence of each predictor variable in the PCs, varimax rotation was used to obtain values of rotated factor loadings. These loadings represent the contribution of each predictor variable in a specific PC. The PCs used for the prediction of BP reactivity to unsupported back were obtained through multiplication of the standardized data matrix by weights () [25].

##### 3.2. ANN

ANN’s customary architecture was composed of an input layer, an output layer, and one or more intervening layers, also referred to as hidden layers to capture the nonlinearity in data.

Figure 1 shows an ANN model consisting of nodes in input layer, one hidden layer with hidden nodes, and an output layer with one node.

Network is trained by presenting one pair of input-output vector at a time. The weighted sum of inputs calculated at th hidden node is where is weight on connection from the th to the th node, is input data from input node, is total number of input nodes, and is bias on the th hidden node.

Each hidden node uses a tangent sigmoid transfer function to generate an output, say , between 0 and 1. The outputs from each hidden nodes, along with the bias on the output node, send to the output node and weighted sum becomes where is total number of hidden nodes and is weight from the th hidden node to the output node.

The weighted sum NET becomes the input to the linear transfer function of the output node and the predicted output is

And then the second phase of the BP algorithm, adjustment of the connection weights, begins. The parameters of the ANN can be determined by minimizing the following objective function in the training process: where is output of the network from th observation.

The sensitivity of the outputs to each of the th inputs, as partial derivatives of the output with respect to the input, under the assumption that relationship of and is monotone [26], is given as or with the assumption that and are constants. The independent variable with higher relative positive or negative sensitivity has the higher positive or negative impact on dependent variable.

##### 3.3. ANFIS

ANFIS, a multilayer feed forward network, uses neural network learning algorithms and fuzzy reasoning to map an input space to an output space, as shown in Figure 2. It has the ability to combine the verbal power of a fuzzy system with the numeric power of a neural network.

It can construct an input-output mapping based on human knowledge (if-then fuzzy rules) and stipulated input-output data pairs. The parameters of membership function, if-then rule exertion, and output parameters are calculated by training data set. The training algorithm is usually hybrid or back propagation. The ANFIS implements the rules of the form: If is … and is , then

where is independent variable, is a fuzzy linguistic concept, and is dependent variable.

*Input Layer (**).* Each unit of input layer stores parameters of membership functions to define a membership function that represents a linguistic term.

*Input Membership Function Layer (**).* Each unit of this layer represents a rule. The inputs to a unit are degrees of membership which are multiplied to determine the degree of fulfilment for the rule represented by .

*Logical Nodes (**).* This layer consists of a unit for each rule that computes relative degree of fulfilment as follows:

Each unit of is connected to all the rule units in .

*Output Membership Function Layer (**).* Each unit of computes the output of a rule as

The units are connected to all units of input layer and to exactly one unit in .

*Output Layer (**).* It computes the final output by adding all the outputs from [27].

##### 3.4. LS-SVM

LS-SVM is an extension of standard support vector machine. It converts the inequality constraints of SVM into equality ones which leads to solving a linear system instead of a quadratic problem, whose convergence speed is faster [28]. It has been widely used in estimation and approximation of function [29]. The architecture of LS-SVM is shown in Figure 3.

Given a set of training data set with the input vector and the output vector , the regression function of least square-support vector machine, in feature space , can be stated as where is weight vector and is bias. maps the input into a vector in .

The model is inferred from the training data set by minimizing the cost function given below subject to equality constraint where is error, , and is regularization parameter.

Solving this optimization problem in dual space leads to finding the coefficients and in the following solution: where is kernel function.

##### 3.5. Performance Indices Used for Model Comparison

For the comparison of developed models and selection of the optimal among them, performance of models was evaluated using and RMSE and MAPE.

is the square of the correlation coefficient between two variables and whose pairs are available as follows:

RMSE is the square root of mean square error, given by following equation:

MAPE is defined as average of percentage errors, given by following equation: where is the number of samples, is the predicted value obtained from the model, is the actual value, and is the average of the actual values.

The lower the RMSE and MAPE, the better the accuracy of the model in predicting the parameter. Also, the highest values indicated that the model performed the best [30].

#### 4. Results

Descriptive statistics for each anthropometric characteristic is given as mean and SD in Table 1.

The results of paired -test demonstrated statistically significant higher SBP, DBP, and mean arterial pressure (MAP), () in talking condition. The mean rise was found to be higher in hypertensive individuals than normotensives, as shown in Table 2. These results are consistent with the recommendations of AHA for BP measurement in humans and experimental animals [1].

Table 3 presents the Pearson’s correlation coefficients calculated for all anthropometric variables. High values of correlation coefficient (greater than 0.6) between pairs of anthropometric characteristics [31] revealed the existence of multicollinearity.

Before applying PCA, Bartlett’s test of sphericity and Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy were applied to determine whether PCA was suitable for data studied. The results are shown in Table 4. High value of chi-square () for Bartlett’s test suggests that use of PCA is appropriate () in normotensive and hypertensive subjects. The value of KMO is also greater than 0.6 which indicates that our sample size is enough to apply PCA [32].

The first four PCs (PC1–PC4), explaining more than 5% of total variation, as shown in Table 5, were retained for further analysis.

Rotated component loadings after varimax rotation represent the extent to which the original anthropometric characteristics are influential in forming PCs, as shown in Table 6.

The bold marked loads show the highest correlation between anthropometric characteristic and corresponding component. For both normotensive and hypertensive subjects, weight and BMI were positively highly correlated with PC1 and a negative high correlation between height and PC2 was observed.

Principal score values for assigned PCs were determined by using principal score coefficients.

Moreover, the value of Pearson’s correlation (correlation coefficient < 0.6) between PCs, as shown in Table 7, indicates the elimination of multicollinearity effect presented in Table 3.

To develop PCA based soft computing prediction models 80% of data were used for training while entire data set was used for testing. Moreover, data must be normalized to achieve more accurate predictions [38]. The predicted BP reactivity values were denormalized for comparison with the actual values. MATLAB 7.5 version was used to develop the prediction models.

##### 4.1. PCA-ANN

To achieve the best ANN structure for BP reactivity prediction, various structures of feed-forward neural network with different number of neurons in hidden layer were investigated. Finally, with consideration of statistical indices, a structure with two hidden layers, having six nodes in each hidden layer, was developed. There were four input nodes representing the four PCs and one output node representing the BP reactivity to talking. Tangent sigmoid and linear transfer functions were used as activation functions in the hidden and output layers. Back propagation learning algorithm based on Levenberg-Marquardt technique was used [39].

Figures 4 and 5 show the scatter plot between observed and predicted values of SBP, DBP, and MAP reactivity from PCA-ANN model in normotensive and hypertensive subjects, respectively.

**(a) SBP**

**(b) DBP**

**(c) MAP**

**(a) SBP**

**(b) DBP**

**(c) MAP**

##### 4.2. PCA-ANFIS

PCA-ANFIS model was developed using genfis1 with grid partition on data. Different ANFIS parameters were tested in order to achieve the perfect training and maximum prediction accuracy.

Input membership functions “trapmf” and “gauss2mf” were used to predict SBP and DBP reactivities, respectively, in normotensive individuals, whereas membership function “psigmf” was used to predict SBP and DBP reactivity in hypertensive individuals. Output membership function “linear” was used.

Other parameters of trained PCA-ANFIS model were number of membership functions = 16, number of nodes = 55, number of linear parameters = 80, number of nonlinear parameters = 32, total number of parameters = 112, and number of fuzzy rules = 16.

The observed and predicted values of SBP, DBP, and MAP reactivity from PCA-ANFIS model for normotensive and hypertensive subjects were plotted in Figures 6 and 7.

**(a) SBP**

**(b) DBP**

**(c) MAP**

**(a) SBP**

**(b) DBP**

**(c) MAP**

##### 4.3. PCA-LS-SVM

A PCA-LS-SVM model using RBF kernel and grid search optimization algorithm with 2-fold cross-validation was developed to obtain the optimal parameter combination [40]. The optimal values of (regularization parameter) and (squared bandwidth) for normotensive and hypertensive subjects were shown in Table 8.

Figures 8 and 9 show the scatter plot between observed and predicted values of SBP, DBP, and MAP reactivity from PCA-LS-SVM model in normotensive and hypertensive subjects, respectively.

**(a) SBP**

**(b) DBP**

**(c) MAP**

**(a) SBP**

**(b) DBP**

**(c) MAP**

Comparison of statistical indices for the models, as shown in Table 9, revealed that PCA-LS-SVM model has the highest value of and lowest value of RMSE for the prediction of BP reactivity to talking in normotensive and hypertensive subjects.

#### 5. Discussion

For proper diagnosis and treatment of hypertension, accurate and reproducible BP measurements are essential.

This study confirms and extends previous studies [4, 6–13] by documenting a significant increase in BP with talking. This finding tends to support Weiner et al. [41] suggestion that there may be an association between verbal activity and BP elevations. And withdrawal from such verbal activity has important clinical implications for the cardiovascular system.

Furthermore, we illustrated an application of PCA based soft computing models in predicting the BP reactivity to talking. PCA corrects for confounding caused by anthropometric characteristics including age, height, weight, BMI, and AC and, therefore, normotensive subjects were used to provide a basis for comparison.

As far as we know, this paper is the first study related to prediction of BP reactivity to talking using PCA based soft computing approaches. Therefore, the results were compared with indirectly related studies [33–37], as shown in Table 10. Promising results of soft computing techniques in all studies are due to their high degree of robustness and fault tolerance. In this work, specifically, the best performance of LS-SVM is sourced from its several advantages including global optimal solution ability, fast convergence rate, and good generalization with small size sample.

This study has a number of advantages. We used small, medium, and large size cuffs, which may have produced more accurate readings. And we took the mean of multiple readings to strengthen the accuracy of BP measurements.

However, any single comparison between the models might not reliably represent the true results. Validation of the computing models using larger database is essential to get an accurate measure of performance outside the development population.

#### 6. Conclusion

The successful development of any prediction model depends largely on the quality and nature of data used for model development. To address the issue of multicollinearity within the anthropometric variables, PCA is incorporated. Furthermore, performance comparison of PCA-ANN, PCA-ANFIS, and PCA-LS-SVM models revealed the potential capability of PCA-LS-SVM model in predicting BP reactivity. This work may provide a valuable reference for researchers and engineers who apply soft computing models for modeling biological variables. The results are helpful in physician’s diagnosis for the prevention of hypertension in clinical medicine. Our future research is targeted to study an ensemble approach by combining the outputs of different hybrid techniques with more predictor variables and larger data sets to achieve wide clinical application of the soft computing.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.