Abstract

There has been a sustained effort in the research community over the recent years to develop algorithms that automatically analyze heart sounds. One of the major challenges is identifying primary heart sounds, S1 and S2, as they represent reference events for the analysis. The study presented in this paper analyzes the possibility of improving the structure characterization based on shape context and structure assessment using a small number of descriptors. Particularly, for the primary sound characterization, an adaptive waveform filtering is applied based on blanket fractal dimension for each preprocessed sound candidate belonging to pediatric subjects. This is followed by applying the shape based methods selected for the structure assessment of primary heart sounds. Different methods, such as the fractal ones, are used for the comparison. The analysis of heart sound patterns is performed using support vector machine classifier showing promising results (above 95% accuracy). The obtained results suggest that it is possible to improve the identification process using the shape related methods which are rarely applied. This can be helpful for applications involving automatic heart sound analysis.

1. Introduction

Auscultation is widely used for evaluation of cardiac function. Since the heart sound analysis mostly depends on the individual skills of the clinician, there is a growing demand for automatic heart sound interpretation methods and systems [1]. Phonocardiography is a cost-effective noninvasive method which enables both hearing and visualizing the content of heart sound signals. It is considered to be helpful in avoiding complex and expensive imaging equipment [2].

One of the major concerns in heart sound analysis is to identify the first (S1) and the second (S2) heart sound [3]. Their proper identification is of key importance for interpretation of other signal’s components, such as extra sounds and murmurs found between the fundamental sounds, in systole (S1-S2) and diastole (S2-S1) intervals. The closure of mitral and tricuspid valves forms S1, where S2 sound is produced by the closure of aortic and pulmonary valves. With the use of electrocardiogram reference, the identification of S1 is easy to perform due to the localization of QRS complex. More often, there is a need to identify the sounds without the use of any synchronously recorded reference. This is a challenging task since these sounds are found as components of relatively high energy in the same low frequency range having similar morphology [4]. Thus, characterizing the typical heart sounds, such as S1 and S2, has acquired great popularity over the years [58].

The existing methods for characterization of the sounds often apply the calculation of envelograms to introduce the energy aspect, such as the Shannon energy based one, where higher energy is usually associated with S1 [5, 6, 9]. Similar auxiliary envelopes are applied using different joint time-frequency representations [8, 10, 11]. The identification methods for the primary heart sounds usually consider signal/envelope characteristics (e.g., maximum, variance, frequency, and positive/negative area [6, 9, 12]). It is noticed that the labels S1 and S2 are traditionally assigned without taking into account varying energy of the sounds [8, 10], where the improved characterization is needed regardless of the assumptions concerning the intervals between the candidates (e.g., recurring sequence of S1-S2 pairs, no excitement, and neither missed candidates nor misinterpreted systoles as intervals of short duration [9, 10]).

On the other hand, shape descriptors for the structure characterization are considered to be valuable for the identification, even though they are rarely applied. Skewness of the envelope waveform as a measure of high order statistics is used in [4] to characterize the structure, where the asymmetrical energy distribution of S2 is assumed due to closure of valves. Another shape related method is based on the kurtosis calculation to characterize the peakedness of the sounds [8]. The advantages of fractal framework are also recognized for the heart sounds, where different fractal dimension (FD) methods related to variation and structure of the waveforms are applied. Typical heart sounds S1 and S2, of short duration (20–200 ms) and low frequencies (20–200 Hz), can be considered fractal in nature [2, 11].

So far, fractal complexity is used for heart rate variability [13, 14], as well as for the auscultatory recording classification [15], where FD is shown as a satisfying tool in comparison to the other time-frequency features. As a measure of fractal complexity, for S1 and S2 heart sound identification, variance FD (VFD) is applied in [16]. For lung sounds belonging to pediatric patients, Katz FD (KFD) [17] is applied in identification of crackles and swallowing in [18, 19]. In [20], blanket method (BFD) is used for the tidal volume estimation using tracheal sounds. In the previous work of our group, initial examination of blanket method is shown as a possible tool for primary heart sounds as well [21]. Moreover, in our previous works [11, 12], novel methods are proposed for efficient classification of the auscultatory recordings without the primary heart sound detection using a few multifractal spectrum related features.

This study makes a contribution to research on methods for heart sound analysis using advantages of fractal theory and shape context, using relatively small number of descriptors. In Section 2, the description of fractal concepts in identification of primary heart sounds is briefly presented. For the waveform pattern analysis, new shape related methods are proposed, which are based on adaptive filtering and modified blanket approach bearing in mind the most extreme points within the heart sound signals. The new method is tested over real heart sounds belonging to pediatric subjects and compared with other methods known from literature. The results of the study are presented in Section 3. In Section 4, some concluding remarks are presented.

2. Materials and Methods

The fundamental cardiac sounds S1 (heard as a “lub” sound) and S2 (heard as a “dub” sound) are usually found in an auscultatory recording as characteristic components of relatively high energy. Each beat consists of four parts: S1 sound, systole, S2 sound, and diastole [3]. Other extra sounds found in a heart sound signal (clicks, snaps, murmurs, etc.) may convey valuable information, but an interpretation of any extra sound depends on the identification of S1 and S2. Figure 1 shows a part of a record (duration of two seconds) belonging to a pediatric subject. The labels S1 and S2 are given by physician as the fundamental heart sounds.

In order to examine the patterns of structures found in auscultatory recordings, the dataset of over a thousand sequences belonging to pediatric subjects is gathered. Particularly, heart sounds are collected in compliance with the ethical standards from the apex area. The acquisition of heart sounds is performed at the Health Center “Zvezdara” and additional echocardiography examination at the University Children’s Hospital in Belgrade, Serbia. The sounds are initially recorded with 8 kHz and downsampled to 1 kHz for the analysis since the basic structure of the waveforms is not degraded after downsampling. Littman 4100WS stethoscope is used for the acquisition.

2.1. Fractal Concept in the Pattern Characterization

Self-similarity property of the waveforms is based on the existence of similar patterns across different scales. Namely, the similar shape may arise when observing a structure across the scales, where these structures (fractals) can be described by their fractal dimension [17]. The fractal approaches were found as efficient tools for characterization of complex structures, where different covering techniques and extensions from the envelopes to measuring area can be used [12, 13, 17, 18].

In order to clarify the basic mathematical concept applied in FD calculation, a brief explanation is given for better understanding. The principle for calculating box-counting FD (BCFD), as one of the most frequently used methods, is based on covering a waveform with boxes of size by appropriate grid, defined by evenly spaced squares, where the number of boxes, , needed to cover a signal is calculated [12]. The size of squares, , changes with scale. When tends to zero, the dimension can be estimated via the power law:

The algorithms for FD estimation are mostly defined to be easy to calculate, where a waveform can be described by a single fractal measure. When applied to specific sound structures of relatively short duration, they may be valuable in their identification. There are many different methods, where some of them are directly related to the structure analyzed. It is worth mentioning that the KFD is one of the most often applied, related to the pattern characterization [17, 22], which considers the length of a curve, , calculated as a sum of Euclidean distances between the successive samples. The KFD is computed by averaging the distance between the successive samples aswhere and is the planar diameter of a waveform defined as a maximum of distances between the first and any other point of the waveform.

Envelopes can be applied in measuring area, as used in BFD method. By defining the blankets for structure covering, it is possible to define adequate covering areas and perform the fitting in a log-log domainwhere can be described as follows: (i)The area change between the adjacent as in [23] (here denoted as BFD1) or(ii)The curve length estimation as in [24] (here denoted as BFD2).

2.2. Multiscale Heart Sound Identification

Even though the higher number of components is expected in S1, the interpretation of each of the primary heart sound waveforms for the identification purpose is not an easy task [7]. The structure of the S1 and S2 heart sounds produced by cardiac contractions and valve closures can be considered to be similar across different scales. By assuming the importance of the most abrupt change of a waveform’s magnitude for the identification, we propose an adapting technique as follows:wherewhereInitially, and . The proposed technique calculates the upper (“above”) and lower (“below”) envelopes starting from the maximum and minimum points, respectively. The envelopes are formed according to the most prominent extreme points. The curves are adapting to the original waveform structure and the extreme points in each iteration . The information related to the extremes is extracted by the adaptive structure filtering (4a), (4b), (5a), and (5b), so that the local maxima affect variation of the lower envelopes, while the local minima affect the upper ones. Moreover, it is assumed that the most abrupt waveform change is relevant for the classification by clamping the curves using the maximum and the minimum of a structure. In Figure 2, the envelopes are presented for BFD1 and the new covering method based on (4a), (4b), (5a), and (5b). Ability of extremes to affect variation of the envelopes can be noticed in Figure 2(b).

In order to describe the primary heart sounds, the covering area is divided into two parts: the area when viewing the structure from “above” () and the area when viewing the structure from “below” (), similarly as in [23]. The upper and lower areas are calculated as The applied filtering decreases the difference between the curves and the original structure. This enables the strip-like estimation of the waveform until the difference between the positive upper and lower areas becomes negligible (). An extension to measuring area is made from the adaptive filtering in order to assess each structure.

After the adaptive filtering, three different methods for characterization of the primary sounds are proposed. The proposed methods are based on the structure assessment using the difference between the covering areas. As noticed in FD calculation and texture/pattern estimation [25], the first few iterations are mainly sensitive to the structure, that is, to the most noticeable notches. Thus, the adaptive technique in the first iterations seems to be useful in the identification. Waveform is bounded around the most abrupt change of a waveform’s magnitude using a threshold . The threshold is determined empirically as , where denotes the maximum of difference between upper and lower envelopes: . By testing, it is found that smaller threshold values can lead to misinterpretation, that is, the detection of local extremes which are not dominant.

In the first method, the value is calculated as For each scale, it can be interpreted as difference between slopes of the covering area values. After intensive experiments, we found the third iteration as appropriate for characterizing S1 and S2 sounds.

The second method calculates the total area in the multiscale structure estimation aswhereThe value decreases in each subsequent iteration.

The third method is based on the slope-difference. It is applied to assess the structure as whereThe slope vectors are calculated for covering area values across scales (). After the estimation, the local directions are summed, forming the result which yields the information related to the structure of primary heart sounds. Similar approaches based on summing the local directions are used in image processing for different structures in the shape context methods [26, 27]. In our approach, the sum of local directions, described by (9a) and (9b), is applied for heart sound identification.

Each of the methods (7), (8a), (8b), (9a), and (9b) can give insight into the content and shape of the structures. Computation error in the multiscale structure estimation does not produce significant consequences on the proposed methods suitability. The computation error, such as an error of the roundoff noise nature, affects the upper and lower areas, and in a similar way; thus the overall impact is negligible.

2.3. Classification and Evaluation

The classification method is based on support vector machine (SVM) classifier, which is considered as a suitable tool for discrimination tasks [2830]. Namely, SVM is applied as a classifier which distinguishes the data by finding a separating hyperplane with a maximal margin between the classes. When applied to the waveforms, it is described by the kernel function and regularization parameter, based on the trade-off having large normalized margin and less constraint violation. The kernel function is used to train the SVM, where the most common kernel types are the linear and the Gaussian radial basis function (RBF) described by its squared bandwidth [21, 30].

SVM based classification is performed using fivefold cross-validation [28], where nine hundred sound sequences are used. The separation of the candidates is made during the cross-validation to properly estimate the overall performance, where the classification is performed without any prior knowledge, meaning that the sequences used in the training phase are not a part of the dataset used for testing. The recursive feature elimination technique is used to improve the classification accuracy by eliminating the least significant descriptors [29, 30].

The evaluation results are obtained using the Receiver Operating Characteristic (ROC) curve. The ROC curve presents true positive rate versus false positive rate for different decision thresholds, where as a performance measure the Area Under the Curve (AUC) is calculated. Moreover, the classification accuracy and -measure are calculated asrespectively, where are the true positives (S1 hits), are the true negatives (S2 hits), are the false positives (missed S2), and are the false negatives (missed S1). The -measure describes the class imbalance.

3. Results and Discussion

The three proposed methods, described in Section 2.2, are firstly tested individually for the structure assessment of the heart sounds, according to their AUC performance. They are compared to additional methods from the literature. We considered standard methods based on signal or its Shannon energy based envelope [5] (such as variance, highest amplitude/envelope value, signal/envelope area, and positive/negative signal area [6, 10, 11]) and different fractal methods: BCFD, KFD, BFD1, and BFD2 (described by expressions (1)–(3)), VFD [16], Sevcik (SFD) [31], and Higuchi (HFD) [22]. Five of FD methods can be considered highly shape related (KFD, BFD1, BFD2, SFD, and HFD). To the authors’ knowledge, some of them have not been examined so far for the primary sound identification, like SFD or BFD2. We also considered statistical and shape related methods, based on kurtosis and skewness [4, 8].

3.1. Results

The AUC performance of the proposed methods is compared to other methods, where only the methods having AUC values higher than 60% are presented in Figure 3. In Figure 3, the positive area and the negative area are calculated in accordance with the sound amplitude, while the total area 1 and the total area 2 are calculated using the sound amplitude and the Shannon energy based envelope, respectively. The results presented in Figure 3 show that all three methods proposed in Section 2.2 gave better performance in comparison to the other tested methods.

Note that our proposed methods are related to shape characterization. By testing other shape related methods, we found the BFD2 as a best choice for S1 versus S2 classification (AUC > 90%). It is expected that the KFD may show promising results (here AUC > 85%) since the method is considered highly consistent for shape characterization in different applications (e.g., for electroencephalogram analysis [22]). Examples of some hits and missed candidates for a set of waveforms using the third proposed method, described by (9a) and (9b), are presented in Figure 4, where only a few candidates are misinterpreted due to their structures.

The tests with AUC performance are followed by SVM based classification and cross-validation, where the selection of methods is made using the feature elimination and the grid search technique [2830]. In order to obtain robust results in sound characterization, the accuracies are calculated after five repetitions dividing the recordings in a random manner. For SVM based classification, we analyzed all previous methods which had been tested individually. For the classification, different number () of descriptors is used in the feature elimination technique.

For the case , the best result is obtained for the two proposed shape context methods and , where a decision boundary is presented in Figure 5(a). For different values of N, the obtained accuracy and AUC values are presented in Figure 5(b) showing the noticeable changes in accuracy in comparison to AUC performance. This is mainly due to the third method which is shown as the most significant one among the tested methods for the waveform characterization. The best performance is found for , where the selected descriptors are the proposed shape context values and , BFD2, and the total area calculated as In this case the best accuracy results are obtained and presented in Table 1. In Table 1, the results represent the obtained average values showing above 95% accuracy for each of the classes and AUC higher than 98% for the classification. Approximately, 4% higher accuracy results are obtained using RBF kernel in comparison to the linear one.

The proposed SVM based classification utilizes the adaptive filtering and the measuring areas for the sound classification. In this paper, the cross-validation is performed only according to the healthy pediatric subjects. An additional validation of the model is performed on the waveforms belonging to ten patients which were not included in the cross-validation procedure, where the proposed structure assessment methodology showed excellent results with AUC of 97.6% and accuracy of 90.6% (-measure of 0.91).

3.2. Discussion and Comparison

The study in this paper is applied and tested for the primary heart sound identification process on the basis of shape related characterization for pediatric subjects. The advantage of the proposed methodology relies on the applied adaptive filtering and the selected structure assessment. The high accuracy results for the classification are obtained efficiently, without time-consuming characterization methods, by employing the shape context characterization and keeping a small number of descriptors. In comparison to the state-of-the-art methods that employ FD methods, the structure assessment characterization methods do not use averaging/fitting estimations, like the least-squares in log-log domain used in (3). The applied method also overcomes high iterations for the calculation. Particularly, the high iterations found for BFD1/2 seem not to bring new relevant information regarding the structure and may prevent FD to reflect the sound type with high accuracy (Figure 2). Thus, applying the filtering towards the structure enables adapting to the most prominent extreme points. It can be noticed that the S1 adapting to the structure encounters the higher number of the prominent local extremes than in S2. The limitations of the proposed methodology are directly related to the sound characterization. In particular, the misclassified sounds are found among the missed examples presented in Figure 4. These are the limitations related to the found positions of the most prominent maximum and minimum used for clamping the envelopes, where some side details may produce the misinterpretation. The proposed model is adjusted to the healthy individuals. The additional experiment for nonhealthy group using a set of 180 waveforms is performed under the same circumstances providing high accuracy results.

The study is based on the shape context characterization and can be considered valuable for automatic heart sound analysis. High accuracy (above 95%) is obtained for the classification and labeling of the primary sounds regardless of the intersound relationships. In comparison to the identification from [5, 11], where highest envelope value () is applied for S1-systole and S2-diastole differentiation, the structure assessment overcomes the errors found due to varying energy in a signal, as presented in Figure 6(a). Recurring sounds are not assumed for the classification model, and thus the methodology may overcome errors found due to nondetected candidates and similar misinterpretations. Finally, the methodology shows significant improvement of 6% higher accuracy in comparison to the methodology from [21], as presented in Table 2. The obtained ROC curves are presented in Figure 6(b) with 6.5% higher AUC value and 0.08 higher -measure.

4. Conclusions

The study in this paper analyzes the possibility of using the shape context and fractal theory in the S1 and S2 pattern characterization. The obtained results show that the proposed method is able to significantly improve the accuracy (higher than 95%) by avoiding averaging and fitting procedures across the scales. The fractal theory based approaches enable developing new methods keeping a small number of descriptors in the identification of the primary sounds.

The study shows the significance of the shape context and ability to differentiate the sounds regardless of the variable energy values without even considering intersound relationships. Moreover, the obtained results indicate that the shape related approaches are valuable for further improvements in the identification of the heart sounds.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

This research has been partially supported by the Ministry of Education, Science and Technological Development of the Republic of Serbia under Grant no. III44009.