Extensive research has been performed on continuous and noninvasive cuff-less blood pressure (BP) measurement using artificial intelligence algorithms. This approach involves extracting certain features from physiological signals, such as ECG, PPG, ICG, and BCG, as independent variables and extracting features from arterial blood pressure (ABP) signals as dependent variables and then using machine-learning algorithms to develop a blood pressure estimation model based on these data. The greatest challenge of this field is the insufficient accuracy of estimation models. This paper proposes a novel blood pressure estimation method with a clustering step for accuracy improvement. The proposed method involves extracting pulse transit time (PTT), PPG intensity ratio (PIR), and heart rate (HR) features from electrocardiogram (ECG) and photoplethysmogram (PPG) signals as the inputs of clustering and regression, extracting systolic blood pressure (SBP) and diastolic blood pressure (DBP) features from ABP signals as dependent variables, and finally developing regression models by applying gradient boosting regression (GBR), random forest regression (RFR), and multilayer perceptron regression (MLP) on each cluster. The method was implemented using the MIMIC-II data set with the silhouette criterion used to determine the optimal number of clusters. The results showed that because of the inconsistency, high dispersion, and multitrend behavior of the extracted features vectors, the accuracy can be significantly improved by running a clustering algorithm and then developing a regression model on each cluster and finally weighted averaging of the results based on the error of each cluster. When implemented with 5 clusters and GBR, this approach yielded an MAE of 2.56 for SBP estimates and 2.23 for DBP estimates, which were significantly better than the best results without clustering (DBP: 6.27, SBP: 6.36).

1. Introduction

Blood pressure (BP) is one of the most important health indicators and can be used to diagnose various diseases. BP measurement techniques can be broken down into two categories of invasive methods and noninvasive methods. While the invasive approach tends to provide more accurate BP readings, it has some drawbacks and limitations. The World Health Organization has issued reports on the subject that each year, 9.4 million people die from excessive blood pressure around the world (hypertension), and roughly 30% of all men and 25% of all women suffer from this condition [1]. After diabetes, hypertension is the second leading cause of cardiovascular disease, but it also tends to be asymptomatic, so it has been called the silent killer. As one of the vital signs, blood pressure needs to be regularly controlled. In many clinical settings, BP monitoring needs to be constant, especially if the patient is old or is in the intensive care unit (ICU). Regular BP monitoring can also help prevent stroke, heart attack, and heart failure [24]. Unfortunately, most people with hypertension are unaware of their condition and how it harms their internal organs like the brain, eyes, and kidneys over time.

As mentioned earlier, there are two types of blood pressure measurement methods: invasive and noninvasive. In invasive blood pressure (IBP) monitoring, measurements are done by a sensor or cannula needle inserted in a blood vessel. This method can provide continuous accurate BP information but has drawbacks such as vessel blockage and potential area infection [5]. Noninvasive blood pressure (NIBP) monitoring methods can be classified into two categories: (1) the auditory methods and (2) the methods based on vital signals. The auditory method is the common BP measurement method which involves wrapping a cuff around the arm. Naturally, this method measures the blood pressure at one instant and cannot provide continuous BP readings. Also, using this multiple method consecutively leads to patient dissatisfaction [6]. Given the limitations of direct BP measurement methods, several indirect methods have also been developed for this purpose. As of this writing, researchers have not found a consistent relationship between blood pressure and electrocardiogram and photoplethysmogram signals, so that blood pressure cannot be reliably obtained from these signals. However, there are indeed some relationships between blood pressure and the features extracted from these signals [7, 8]. Therefore, these features can be used to create prediction models for BP estimates using data analysis methods and technologies. In the noninvasive and cuff-less BP estimation method, we first extract a vector of physiological features from ECG and PPG signals and then develop a regression model for BP estimation with these features used as input [9, 10]. The greatest weakness of noninvasive cuff-less methods compared to other BP measurement methods is their lower accuracy, which can be somewhat improved by using a combination of different features and different machine-learning and data mining methods.

Over the years, researchers have conducted many studies on feature extraction from physiological signals, such as ECG, PPG, ICG, and BCG, and also blood pressure (BP) estimation based on these features. As mentioned, the main challenge in this field is how to raise the accuracy of BP estimates. In this paper, we introduce a new clustering-based method to achieve significant accuracy improvement in this area. This method starts with extracting PTT, PIR, and HR features from ECG and PPG signals and extracting SBP and DBP features from the corresponding ABP signal. While previous methods of this field follow this step by developing a model based on the extracted features, in all the other works, no attention has been paid to the high dispersion of data that extracted from ECG, PPG, and ABP signals, which will have a negative effect on the accuracy of the model. In the proposed method, first, a clustering algorithm is applied to PTT, PIR, and HR, and then a model is developed separately for each resulting cluster using the corresponding SBP and DBP data. Since the data of the extracted features tend to have high dispersion and contain multiple trends, using the clustering algorithm in this way can greatly improve the accuracy of estimations. In many works, such as [11, 12], a large number of features are extracted from the raw ECG and PPG signals. According to research, by increasing the number of effective features in the development of the machine-learning model, the accuracy of the model can be significantly increased. On the contrary, it can be concluded that increasing the number of extracted features can lead to high computational complexity in real-world applications. However, in our work, only 3 features have been extracted from ECG and PPG signals. Finally, the accuracy has been improved by using the clustering algorithm.

Another noteworthy point is that in the various studies that used the MIMIC data set as their database, the researchers had no idea about the patient’s physiological condition. However, in our work, after extracting the features and clustering, we noticed similarities in the raw ECG, PPG, and ABP signals corresponding to the data samples in each cluster, which can be used to patients clustering, which can have a positive effect on the accuracy or correctness of features.

In other works, which uses ECG and PPG signals, the appearance of each person’s signal can be different, which will affect the accuracy of the extraction features and feature extraction algorithms [9, 13]. The extraction process can be more accurate by using the clustering technique and clustering the raw signals of patients based on their similarities.

2. Materials and Methods

2.1. Data Set

The MIMIC-II (multiparameter intelligent monitoring in intensive care) data set from the Physionet website was used in this research. This data set contains 12,000 records of vital signals captured from people admitted to American medical centers and hospitals. The signals of this data set include ECG, PPG, and arterial blood pressure (ABP) at a sampling rate of 125 Hz [14]. A preprocessed and cleaned version of this data set is publically available on the Kaggle website [15].

2.2. Features Extraction

The pulse transit time (PTT), which is the time it takes for the arterial pulse wave to move from the aortic valve to the peripheral artery, is a typical approach to make continuous BP measurements. In other words, the time difference between the R-peak of the ECG signal and a reference point on the PPG signal of the corresponding pulse wave is referred to as the PTT [14]. The heart of this strategy is the notion of pulse wave velocity (PWV), which is obtained from the Moens–Korteweg equation (MK) [16]:

In this equation, E is the elastic modulus of the arterial wall, h is the thickness of the wall, ρ is blood density, and d is the vessel radius. The following formula shows how PWV is inversely correlated to PTT [17]:

The distance between the heart and the reference peripheral (e.g., the fingertip) site is denoted by K. The use of PWV leads to obtaining a more accurate PTT but requires parameters, such as the person’s physical characteristics [2, 15, 18, 19].

The ECG and PPG can derive the PTT features by taking the second derivative of the PPG or SDPPG signal. PTT indicates for the time interval between the peak of an ECG signal and a PPG signal reference point or the peak of a cycle in the SDPPG signal [10]. Unfortunately, PTT-based BP estimation alone is not accurate enough to be used for continuous cuff-less BP measurement in clinical settings [18]. However, this accuracy can be improved by the use of new BP-related features. One of the features that can increase the estimation accuracy of the regression model is heart rate (HR), as several studies have shown an improvement in the results after combining this feature with PTT [9]. Since the behavior of blood flow in vessels depends on various factors, PPG will also be a good signal to improve the results of BP estimation. This improvement can be made by combining PTT with several different features of PPG, one of which is the PPG intensity ratio (PIR).

In theory, changes in arterial diameter, △d, could be reflected by PIR throughout one cardiac cycle from systole to diastole. Moreover, there is an exponential relationship between PIR and △d that is shown by this expression [18, 20]:

Essentially, PIR has been defined as the maximum to minimum ratio of the amplitude of a PPG waveform. IH is the peak point of a PPG cycle or maximum amplitude, and IL is the bottommost point of a PPG cycle or minimum amplitude where α is a constant that is associated with the optical absorption coefficient in the light path. Physiologically, four variables largely influence BP, including cardiac output, arterial compliance, blood volume, and peripheral resistance. PTT could be used to evaluate arterial compliance because it has been proposed to be one of the indices of arterial stiffness [18, 21]. Moreover, there may be a relationship between cardiac output and PTT via the heart rate. Considering blood volume and peripheral resistance, changing the arterial diameter has been regarded as a main source to be evaluated by PIR that has been already illustrated. Therefore, BP changes could be directly captured by PIR and PTT employed to estimate BP [18].

The features used in this study are PTT, PIR, and HR, which are independent variables. Systolic blood pressure (SBP) and diastolic blood pressure (DBP) are the dependent variables. After extracting these features, we developed several models based on regression on the data, but these models were found to be not sufficiently accurate because of the inconsistency and multitude of trends in the data for different features and the high dispersion of feature values. Thus, we clustered the data and developed a regression model for each cluster and then obtained a final estimate by averaging the outputs of these models with attention to the number of samples in each cluster.

Figure 1 describes the block diagram of the process of PB estimation with the proposed method.

Figure 2 depicts the extraction process of PIR and PTT. Now, PTT represents the time between the peak of the second derivative of PPG or SDPPG wave in the cardiac cycle and the peak of the ECG wave. As mentioned earlier, PIR has been proposed to be the ratio of minimum amplitude (IL) to maximum amplitude (IL) of a PPG signal in the cardiac cycle [10, 18, 20].

The interval between two successive QRS complexes can be used to measure the heart rate when the cardiac rhythm is regular. The heart cost is assumed on papers by dividing the number of big boxes between two subsequent QRS waves by 300.

SBP and DBP may be calculated by taking the maximum and minimum values of the ABP signal in each cycle. Mathematical equations are as follows [13]:

2.3. Clustering and Regression Models
2.3.1. Clustering

There are several methods and algorithms for dividing a set of items into identical or highly similar clusters. The k-means algorithm is one of the simplest and most popular clustering algorithms used in data mining and unsupervised machine learning.

In multivariate clustering, it is typically needed to use multiple features of items to cluster them, which raises the question of what distance functions to use for this purpose. In any case, what is important in this clustering is the way we measure the degree of similarity or dissimilarity between data samples.

The goal of the clustering operation is to form clusters so that the distance between items in each cluster is minimal. In contrast, if the similarity of items is measured by a similarity function, the goal will be to form clusters so as to maximize the value of that function for each cluster. Given the inconsistency and multitude of trends in the data for different features and the high dispersion of feature values, we tried to first obtain clusters of data or independent variables. This was done by assessing the appropriateness of the number of clusters based on the silhouette value and ultimately using the values of the independent variable in each cluster as the input of the regression model.

2.3.2. Random Forest Regression

Random forest is an easy-to-use machine-learning algorithm that tends to provide excellent results even without the adjustment of its meta-parameters. Thanks to its simplicity, this algorithm is one of the machine-learning algorithms that are widely used for both classification and regression.

Random forest falls in the category of supervised machine-learning algorithms. As the name implies, this algorithm builds a random forest made of a group of decision trees. This is often done by the method known as bagging; the basic idea is to use a combination of learning models to reach better results. Simply put, random forest builds several decision trees and merges them to make more accurate and consistent predictions [22].

2.3.3. Gradient Boosting Regression

Gradient boosting is a classification and regression machine-learning algorithm, which builds a prediction model using an ensemble of weak models. The goal of almost all machine-learning algorithms is to minimize a defined loss function during the learning process. The constructed model needs to be updated such that the value of the loss function value approaches zero and the predicted values approach the observed values as much as possible.

The core idea of the gradient boosting algorithm is to make stronger models by combining weaker models in an iterative process.

Here, it is necessary to first describe how boosting models are created. To build boosting models, we first perform a sampling with replacement in which samples have a fixed weight in the selection probability calculations. After building a model with these samples, the samples that have produced the highest errors are returned to the sample pool and the sample selection probabilities for the next iteration of modeling are updated according to the error of each sample, which also ensures that the models properly cover the entire solution space. In the end, an ensemble of all models made through this process is created.

In gradient boosting regression, we first construct a regression tree model for the samples and measure the error of this model, that is, the difference between the observed values and its predictions. We then build a new model for the data that the previous model have predicted incorrectly and recalculate the error. Next, we combine the new model with the previous one and update the ensemble. These steps are repeated until the sum of errors approaches a fixed value or the model becomes overfit [23].

2.3.4. Deep Multilayer Perceptron

MLP has been considered one of the supervised learning algorithms for learning a function via training on a data set so that m and o represent the number of dimensions for input and output, respectively.

According to the target and a collection of features , MLP is capable of learning a nonlinear function approximator for regression and/or classification. In fact, there is a difference between it and logistic regression because one or more nonlinear layers, known as hidden layers, may exist between the output and input layers. In addition, the leftmost layer that is also called input layer contains a set of neurons implying the input features. All neuron in the hidden layer transform values from the previous layer with a weighted linear summation and then a nonlinear activation such as the hyperbolic tan function.

2.4. Model and Results Evaluation
2.4.1. MAE and RMSE

In this study, the modeling results are assessed in terms of root mean square error (RMSE) and mean absolute error (MAE). Provided in the following is a description of these model evaluation criteria. The root mean square error (RMSE) quantifies how far the model’s or statistical estimator’s predicted values differ from the observed values. RMSE is an excellent measure for evaluating the prediction error of a model for a given data set. This metric is basically the standard deviation of the difference between expected and observed values, shown as follows:

As many have pointed out, because of using the square root of the mean square error, RMSE is not as biased as other measures and is very suitable for medical and bioinformatics problems that are solved by regression. The other error measure used in this study is MAE. MAE measures the difference between predicted and observed values without considering the direction of this difference. Therefore, what is important for MAE is the magnitude of error in estimations not whether they have been overestimates or underestimates. In statistical discussions, this measure is sometimes referred to as L1 Loss.

Mathematically, MAE is the average absolute difference between predicted and observed value shown as follows:

2.4.2. BHS and AAMI

Many studies in the field of BP estimation use the protocol developed by the British Hypertension Society (BHS) for the evaluation of BP measuring devices and methods as the benchmark of their accuracy assessments. In this protocol, accuracy evaluations are performed based on the absolute error of measurements. More specifically, this protocol grades the methods and devices based on the ratio of the number of readings with an error of less than 5 mmHg, 10 mmHg, and 15 mmHg to the total number of readings.

Another standard for evaluating BP measuring devices and methods is the AAMI standard. In this standard, a device or method is approvable only if the mean error and standard deviation of readings are less than 5 mmHg and 8 mmHg, respectively. In this study, the accuracy of SBP and DBP estimates is evaluated using BHS and AAMI standards.

3. Results

First, the data and the extracted features were visualized, and the correlation between the features was measured. A very important section of creation regression model is the preparation of data, which in this study involved a scaling operation. This phase is very important because it affects how much time it takes to construct the regression model and the length of the convergence process. Next, we developed several machine-learning regression models, including random forest regression, gradient boosting regression, and multilayer perceptron regression, and evaluated the model outputs by different criteria. The next step was to implement the main approach of the study, that is, to cluster the extracted data or features and variables while using the silhouette value to determine the best number of clusters and then develop a model for each cluster with the regression algorithms mentioned above. The model outputs were compiled by weighted averaging, and the final results were compared in terms of different measures to identify the best regression model.

Figure 3 shows the histogram and scatter diagram of PTT, PIR, BPMIN, and BPMAX. We used the scatter diagram to create a graphical representation of the relationship between independent and dependent features, and we plotted a density diagram to gain an overview of the distribution of values for each feature.

The next step was to obtain and examine the results of the machine-learning regression models described in the previous sections. In this step, the machine-learning models were developed with the features PTT, PIR, and HR as independent variables (input) and bpmin and bpmax as dependent variables (output). First, we developed the model by regression on the entire data using random forest regression, gradient boosting regression, and multilayer perceptron regression. The results of this process are presented in Table 1. It should be noted that in all steps, the regression models were evaluated in terms of RMSE and MAE.

We used the k-means method to cluster the data using the Silhouette criteria to identify the optimal number of clusters, given the inconsistency and variety of trends in the data for different features. Figure 4 shows the optimal number of clusters for the clustering algorithm according to the Silhouette criterion.

Figure 5 shows the cohesion and dispersion of data in one of the clusters extracted from the data PIR, PTT, SBP, and DBP.

Next, we used random forest regression, gradient boosting regression, and multilayer perceptron regression algorithms to develop a separate model for each cluster.

The model error for each cluster was then determined in terms of RMSE and MAE and target-estimation correlation coefficient (r) for gradient boosting regression. Finally, the total error of the model and correlation coefficient for all clusters was determined by weighted arithmetic mean. Finally, the error rate for the whole data and the total error rate are also provided. The results of the proposed clustering-based approach are presented in Table 2.

Figure 6 shows the Bland–Altman plot of each cluster for SBP and DBP estimations. As shown in the figure, most errors are in the range of 8 mmHg for DBP and 12 mmHg for SBP.

However, there are also some outliers in these plots, which are more frequent in the one for SBP estimation.

Figure 7 shows the correlation plots of DBP and SBP estimation for our suggested technique versus reference BP. The overall calculated DBP and SBP had a correlation value of 0.94 and 0.88, respectively, which was obtained by weighted averaging of correlation coefficient in each cluster.

After applying clustering on the sample of features extracted from ECG, PPG, and ABP signals, we investigated the raw signal corresponding to each data sample in each cluster. Evidence showed that the ECG, PPG, and ABP signals corresponding to each data sample in each cluster are very similar in appearance and signal shape, which can be used to study the physiological characteristics of patients.

Figure 8 shows the ECG, PPG, and ABP signals of two different patients in cluster 1 and cluster 2 which are very similar to same cluster samples and very different from other cluster samples:

The results of the accuracy evaluation of the proposed method based on the BHS standard are presented in Table 3. According to this standard, the proposed method will be of Grade A in DBP estimation and SBP estimation.

Table 4 shows the results of the accuracy evaluation of the proposed method based on the AAMI standard. According to this standard, the method produces acceptably accurate estimates for DBP and SBP.

4. Discussion

It should be noted that while a large number of studies have been conducted in the field of BP estimation, many of these studies have used their own data sets, the majority of which are not publically available due to confidentiality and privacy considerations. Therefore, we cannot compare our results with all of the previous studies. In this section, we first compare our results with the results of studies that have used MIMIC/PhysioNet data sets and then make some comparisons with studies that have used their own data sets. A noteworthy point regarding the MIMIC-II data set is that it comprises readings from ICU patients, who tend to be older and under medication [9]. Another important point regarding MIMIC-II is the lack of physiological data (e.g., age, height, and weight), which can affect the accuracy of the extracted features and the model. While we could potentially use these data to include physiological and biological parameters in the clusters and examine their effects on the estimation accuracy, unfortunately, this could not be done with MIMIC-II. Because of using MIMIC-II, in this study, we only had access to ECG, PPG, and ABP signals, and therefore our feature extraction was limited to these signals. Thus, using a richer data set containing other signals such as SCG and BCG in addition to ECG, PPG, and ABP may be able to improve the accuracy of the extracted features and the resulting model [24].

The studies that have used publicly available MIMIC data sets include [9, 19], where PAT, HR, AI, LASI, and IPA features were extracted from ECG and PPG signals and then the Adaboost algorithm was used to develop BP estimation models based on these features. Our method outperforms the models of [9, 19] in terms of MAE and r as well as BHS and AAMI standards. Our results are also better than those reported in by Miao et al. [11], where an estimation model was developed by multisample regression based on 35 features extracted from the same ECG and PPG signals.

Ibtehaz et al. [13] developed their estimation model with the CNN algorithm using only the PPG signal. Our method also performs better than this model in terms of MAE and BHS and AAMI standards. Our results are also more accurate than the results of Kurylyak et al. [12], where they used 21 features extracted from ECG and PPG signals of MIMIC-II and an ANN algorithm to develop their model. The same can also be said for few previous studies [25, 26], where estimation was performed using the features extracted from the MIMIC data set of the PhysioNet website.

We also compared our method with some of the methods that have used their own data sets, which are listed in Table 5. Chen et al. [17] created their own data set by compiling the data of 98 subjects and developed their model using the multiple regression method based on features like PTT. Our method showed better performance in estimating SBP and DBP than this model. Our results are also better than the results of Radha et al.[27], where they used a data set consisting of the data collected from 106 healthy individuals with random forest and dense network, and also the results of Esmaili et al. [29], where they used a data set compiled from the data of 32 subjects with a calibration step. The results of the present work are also more accurate than those of Dong et al., Agham and Chaskar [28, 31], and other listed works that have used their own data sets.

5. Conclusion and Future Works

This study developed a new clustering-based algorithm to improve the accuracy of the blood pressure estimation, which uses the k-means algorithm for clustering extracted features and uses random forest regression algorithm, gradient boosting regression algorithm, and multilayer perceptron regression algorithm to estimate systolic blood pressure (SBP) and diastolic blood pressure (DBP) in each cluster. The results showed that according to high dispersion and the multitude of trends in the data and extracted features, the clustering algorithm can increase the prediction accuracy for each model. Overall, it can be concluded that since previous works have chosen not to deal with high dispersion and multitude of trends in the data before developing their learning models, it is indeed possible to reach considerably better prediction results by applying a clustering algorithm to the extracted data and then building a separate model for each cluster. In future works, we hope to develop a method for real-time feature extraction and sample clustering and ultimately create a real-time procedure for receiving vital signals such as ECG and PPG from thousands of people, performing feature extraction and signal processing, clustering the data, and producing BP estimates with the least possible delay and the highest possible accuracy; a task that will require using Big Data-related platforms, tools, and algorithms.

Data Availability

The data for this study are originated from PhysioNet and the well-known MIMIC-II database; however, a preprocessed data set from the MIMIC-II database is available at https://www.kaggle.com/mkachuee/BloodPressureDataset, which we utilized, and can be accessed through this link.


The authors published a preprint version of this work on the arxiv open-access repository which is available via the link https://arxiv.org/abs/2110.06996 [37].

Conflicts of Interest

The authors declare that they have no conflicts of interest.