Abstract

In view of the deficiency of psychoacoustic objective parameters in prediction of nonstationary vehicle sound quality, we propose an innovative method to extract the transient acoustic time-frequency characteristic parameters as objective parameters to evaluate the quality of vehicle door closing sound based on complex analytic wavelet. The signal is decomposed by empirical mode decomposition (EMD) and the decomposed intrinsic mode function (IMF) components are analyzed by the spectrum analysis. On the basis of human auditory frequency range, some IMF components are eliminated and the main frequency bands of the effective IMF components are extracted as the analytical frequency bands of the complex analytic wavelet. The center frequencies of the complex analytic wavelet analysis are extracted according to the critical bands, thereby determining wavelet parameters (band width, center frequency, scale factor, etc.,). To highlight the influence of high-frequency components and balance the data discrepancy, we extract the energy ratio coefficient as the objective parameter after weighting the time-frequency components. By comparing with the extracted objective parameters of traditional psychoacoustics, the correlation between the subjective evaluation results and the energy ratio coefficients is analyzed. The results demonstrate that the energy ratio coefficients extracted based on the complex analytic wavelet transform have a greater correlation with subjective evaluation results than the traditional psychoacoustic objective parameters. In addition, the frequency components of 1720 Hz∼3150 Hz have a strong negative correlation with the vehicle door closing sound quality.

1. Introduction

The rapid development of the automotive industry has greatly improved the requirements for automobile comfort. Vehicle door closing sound, as a significant factor determining the purchase intention of customers, plays a central role in automobile comfort index. This makes automotive enterprises invest a lot of costs to participate in the research on the sound quality of the vehicle door closing. Meanwhile, it is also meaningful to study the sound quality of slamming in home appliances and other industries.

Previous research on the vehicle sound quality have mainly focused on the objective prediction parameters of sound quality, including traditional psychoacoustic parameters (loudness, sharpness, roughness, etc.,) under steady conditions [14] and time-frequency parameters under unsteady conditions [57]. However, the investigation on objective parameters of nonstationary sound quality is lacking. Many psychoacoustic parameters, such as loudness, sharpness, roughness, fluctuation strength, pleasantness, etc., have been applied to evaluate vehicle sound quality [8, 9]. Liu et al. optimized the genetic algorithm of supporting vector machines and established the prediction model of diesel engine sound quality based on psychoacoustic objective parameters [10]. The conventional psychoacoustic parameters are calculated in the frequency domain with integral weighted method and can desirably predict sound quality under steady working conditions, which is not so effective for the sound quality prediction of nonstationary signals. The vehicle door closing sound is a typical nonstationary signal with the characteristics of short duration, wide frequency range, and strong nonlinear coupling. Its psychoacoustic objective parameters are time-varying and fluctuate drastically; thus, they cannot be calculated based on the frequency domain model [11]. In addition, the time domain or frequency domain analysis alone cannot accurately reflect the characteristics of vehicle noise, so the time domain and frequency domain signal features should be studied and extracted simultaneously [12]. Wang found that the traditional frequency-based techniques were not applicable to sound quality evaluation (SQE) in the case of the nonstationary vehicle noises [13]. Some related researches also revealed that traditional objective parameters failed to predict the sound quality in accuracy [14]. Therefore, it is more imperative to develop new parameters and methods for objective evaluation of nonstationary sound quality.

Time scale and energy distribution dominate the physical data interpretation [15]. For nonstationary signal processing, some time-frequency techniques suitable for extracting transient signal features have been mentioned in References [1619]. The EMD in the Hilbert–Huang transform (HHT) method was adopted to estimate the noise resulting from slamming a car door [20], which means that EMD can extract the main impact characteristics of the noise resulting from door slamming. In 2014, Wang et al. proposed an approach to calculate the internal noise roughness of vehicles, based on human auditory features [21]. This model can be used for time-varying vehicle noise quality evaluation while considering the masking effect, which lays the foundation for noise evaluation under unsteady conditions. Moreover, a new scheme based on the discrete wavelet transform (DWT) for SQE of nonstationary vehicle noises was proposed [13]. Due to the outstanding time-frequency characteristics of wavelet analysis, this proposal can be used to deal with the sound quality evaluation of nonstationary vehicle noise, which provides inspiration for the complex analytic wavelet tool adopted in our present work. The wavelet transform is very attractive for time-variant signal analysis because of its scaling property that optimizes the time-frequency resolution [22]. Since the frequency characteristics of transient signals continuously change with respect to time, analytic wavelet transform, a modified version of the wavelet transform, provides a good solution and can be an ideal tool for nonstationary sound and vibration signal analysis [23]. As for researches on objective parameters, Yang et al. introduced the energy of the sound signal IMF components as an objective parameter and proposed a method for predicting the quality of vehicle door closing sound based on EMD and BP neural network [6]. In 2016, Xia et al. extracted the mean value of wavelet entropy as an objective parameter of nonstationary sound quality evaluation [7], which can roughly reflect the time-frequency energy features and can be applied to the quality analysis and evaluation of vehicle door closing sound.

Based on the above researches, we introduce the complex analytic wavelet and the EMD to extract the time-frequency domain features of the door closing sound. The EMD is used to extract the effective IMF components, and the spectrum analysis is performed to obtain the frequency range of the analytic wavelet analysis. The center frequencies of the complex analytic wavelet analysis are extracted according to the critical bands, thereby determining the wavelet parameters. Then, the complex analytic wavelet decomposition is performed to obtain the energy ratio coefficients of each component, which are taken as the objective parameters of the acoustic features. Finally, by extracting the psychoacoustic objective parameters of each sample, we analyze the correlation between the two categories of objective parameters and the subjective evaluation results. These results show that the energy ratio coefficients extracted by the complex analytic wavelet transform have a greater correlation with the subjective evaluation results, and they are more suitable for evaluating vehicle door closing sound quality. Furthermore, the frequency components of 1720 Hz∼3150 Hz in the closing sound have a strong negative correlation with vehicle closing sound quality.

2. Subjective Evaluation and Objective Psychoacoustic Parameter Extraction

2.1. Subjective Evaluation

The collection of sound samples is an important part in the first stage of sound quality evaluation, and it can even determine the result of the whole experiment [24, 25]. In this paper, the digital head manual system of Head Acoustics and the multichannel data acquisition recorder are used to collect sound samples (as shown in Figure 1). The digital head manual system not only approximates the geometrical dimensions of the human body in appearance, but also simulates the human external ear structure, which maximizes the restoration of the human auditory characteristics. The samples are collected in a quiet and suitable room that accords with the experimental conditions. The digital head manual system is placed in the center of the driver seat, with the binaural microphone 70 cm away from the horizontal plane of the seat. Since the main analysis objective of this paper is the sound quality and the subjective feeling brought by each sample, we do not need to guarantee that each slamming is similar and there are no specific requirements for speed, strength, distance, etc.,

During the collection process, the windows and other auxiliary systems are closed. The sampling frequency is 48000 Hz and the sampling time is 5 seconds. A total of 65 samples from 15 cars are collected, among which 17 doors closing sound samples with different subjective feelings are selected as the analysis samples. The cars used in the process of sound sample collection include Volkswagen Lavida, BMW 5 Series, Toyota Yaris, Chevrolet Cruze, and other family sedans. It is explained in Reference [26] that how the number and structure of the jury were determined as follows:(1)The age range of the evaluators is roughly limited from 20 to 50 years old(2)The evaluators are required to be in good health and have good hearing(3)It should be ensured that a number of the evaluators have rich experience in acoustic research(4)The number of evaluators should not be too small

Thus, 18 postgraduates with normal hearing, good health, and no bad habits are selected as the subjective jury to conduct evaluation experiments, including 15 males and 3 females. In order to maximize the recovery of recorded sound samples, the HD650 high-fidelity headset equipped with a power amplifier is used for playback. The paired comparison method is selected as the subjective evaluation method. The evaluation subject is required to train before the subjective evaluation, including the description of the evaluation samples and evaluation method, etc.

The test is conducted in three sessions, and 6 people are chosen for evaluation in a quiet environment at a time. The 17 samples are numbered as S1, S2, …, S17. Subjective annoyance degree of vehicle closing sound is used as the yardstick, and the subjective evaluation is conducted according to the evaluation procedure of reference [26] as follows:

First, the replay of the sound samples is carried out in the sequence of (Si, Sj) (i, j = 1–17, i < j). In this procedure, S1 is selected as the reference sample and compared with the other 16 samples one by one. Consecutively, S2 is sampled and then compared with S3, …, S17 in turn. The rest is sampled and compared in this way. Finally, S1∼S17 are replayed in sequence and scored, respectively.

Scoring rules: every 2 sound samples are scored together in the process of comparison. The more annoying one scores 0 point whereas the less annoying one scores 2 points. If two samples are close for the subjective feeling, then each is scored 1 point.

Statistical product and service solutions (SPSS) software is employed to make Spearman correlation analysis among each evaluator. By taking the arithmetic mean of the correlation coefficients between each evaluator and the other evaluators, we can obtain the average correlation coefficients, as shown in Table 1.

When the correlation coefficient of Spearman in statistics is less than 0.7, indicating that the linear correlation between the variables is weak. As displayed in Table 1, the Spearman correlation coefficients of the three individuals of ET9, ET13, and ET18 are less than 0.7, the subjective evaluation results of the three individuals should be eliminated. By integrating the results of the remaining 15 evaluators and calculating the average value of each sample, we can obtain the subjective evaluation values of the 17 samples as shown in Table 2.

2.2. Objective Psychoacoustic Parameter Extraction

The vehicle door closing sound is a typical shock signal. In the past, there were several indicators for evaluating the sound [27]: main shock time, low-frequency continuation, high frequency, and peak sound pressure level. The main shock time corresponds to the loudness; the low frequency continuation corresponds to the fluctuation strength, reflecting the low frequency change; the high-frequency component corresponds to the sharpness and roughness. The sharpness reflects the high-frequency component and the roughness reflects the high-frequency variation. The peak sound pressure level corresponds to the A-weighted sound level. Therefore, the traditional psychoacoustic objective parameters selected in this paper are the loudness, the sharpness, the roughness, the fluctuation strength, and the A-weighted sound level. The linear average values of the psychoacoustic objective parameters of the left and right ears are calculated in the software Matlab as shown in Table 3.

3. Extraction of Frequency Components by Mother Wavelet Analysis

The acoustic signal is decomposed by the EMD and the main spectrum components of each IMF component are analyzed. The effective IMF components are selected, and the frequency bands of the extracted IMF components are used as the frequency components of the mother wavelet analysis.

3.1. Selection of Effective IMF Components

The EMD is a new method with the adaptive time-frequency analysis, which can decompose time and frequency adaptively on the basis of the local time-varying characteristics of signals. It is very suitable for the analysis of nonstationary signals and nonlinear signals [28, 29]. Its basic method is to transform a wave with irregular frequency into a form of multiple waves and residual wave with a single frequency. The decomposed IMF components represent the oscillation of the original sequence at different time scales, and the frequencies of each IMF are arranged from high to low. Since the closing sound is relatively short, the interception time of this paper is set to 0.7 s to sufficiently express its characteristics. The IMF components of sample S1 are decomposed by the EMD (as shown in Figure 2). Original signal of the intercepted closing sound is shown in the first column and first line. And the rest are IMF components whose main frequency components decrease with the increase of the sequence number.

The spectrum of each IMF component is analyzed, and the IMF components are selected in the main frequency range of 20 Hz∼20 kHz. The spectrum diagram of the first 6 orders IMF components of sample S1 is given (as shown in Figure 3). It can be seen that the first 6 orders IMF components are mainly between 20 Hz and 12 kHz, and the main frequency components are arranged from high to low. The analysis of the other 20 samples shows that the first 6 orders IMF components of most samples are mainly between 20 Hz and 12 kHz, and the main frequency component of the 7th order is lower than 20 Hz, so the first 6 orders IMF components of each sample are selected as the effective IMF components.

3.2. Determination of the Frequency Band of Analytic Wavelet Analysis

The wave peaks of IMF component spectrums correspond to the center frequencies of critical bands, and the selection of the main frequency bands is performed by the critical bands. The peaks of the first six orders IMF component spectrums of each sample are extracted, and the frequency bands corresponding to the spectrum peaks of the effective IMF components are as follows:

Thus, the analytic wavelet analysis frequency bands of each sample can be obtained as shown in Table 4.

Time-frequency analysis of the left vocal channel of sample S1 between 10 Hz∼20 kHz is shown in Figure 4. It is observed that the frequency components change in 2.5 seconds, covering more frequency components, and the frequency range is 10 Hz∼12 kHz, which is consistent with the main frequency range of the selected IMF components and further illustrates the effectiveness of the selection. As shown in Figure 4, the main frequency components are below 200 Hz, and the highest frequency component is approximately 12 kHz. The intermediate frequency also accounts for a certain proportion. The critical frequency bands selected at this time cover almost all the frequency components in Figure 4, which further confirms the rationality of frequency band division.

4. Energy Feature Extraction

4.1. Complex Analytic Wavelets

Analytic wavelet can effectively analyze nonstationary signal, especially for the analysis of impact signal [30]. In our works, complex Morlet wavelet is used as the mother wavelet. Because of its good localization in both time and frequency domain and its band-pass amplitude-frequency characteristic, it can be widely employed for engineering signal analysis [31, 32]. The extracted band is taken as the corresponding one of the analytic wavelet transform and the extracted center frequency of critical frequency band is taken as the analytical one. Finally, the scale factor corresponding to the central frequency is calculated. The mother wavelet selected in this paper is the complex form of Morlet wavelet as follows:where , is the center frequency of mother wavelet and is mother wavelet bandwidth. Then the continuous wavelet transform of the closing sound sample iswhere is the scale factor, is the displacement factor, and represents complex conjugate of . Given the upper, lower, and center frequencies of the analysis frequency, if the original signal is decomposed according to the specified frequency band and the center frequency, the formula of normalizing the frequency of the original signal is as follows:where is normalized frequency, is frequency limit, and is sampling frequency. The calculation method of the scale factor corresponding to the center frequency is as follows:where is center frequency of mother wavelet, is sampling frequency, and is the central frequency of critical frequency band. The scale factor is 100 and step length is formulated aswhere is the step length of the scale factor, is the scale factor corresponding to the upper limit of the critical frequency band, and is the scale factor corresponding to the lower limit of the band.

In this paper, the center frequency of mother wavelet ; bandwidth . The scale factor, normalized frequency, and step length are calculated according to Equations (3)–(5). Table 5 shows the scale factors corresponding to the center frequencies, the range of scale factors, and the step lengths corresponding to the frequency bands.

The sample S1 is decomposed according to the above scale factors and related frequency factors. Nine consecutive wavelet decomposition waveforms corresponding to the scale factors of their center frequencies are shown in Figure 5.

The scale factors decrease gradually from top to bottom, and the center frequencies increase gradually, and each graph corresponds to different critical frequency band.

4.2. Feature Extraction

The analytic wavelet energy ratio coefficient is the proportion of the continuous wavelet component energy to the total energy of each component. The energy ratio coefficient is introduced to highlight the influence of high-frequency components and balance the data discrepancy. At the same time, the weight of high-frequency components is calculated in order to highlight its influence. The energy ratio coefficient is defined as , with the following formula:where is energy of each continuous wavelet component, is the total energy of each continuous wavelet component, and is the weighting coefficient (according to Equation (9), the energy calculation method is squared, where is also squared.); the calculation method of is defined as follows:where is the maximum value of wavelet coefficient modules of continuous wavelet decomposition and is the maximum value in the maximum of all wavelet coefficient modules. The formula for calculating the energy of each continuous wavelet component is defined as follows:where is the sampled time and is the decomposed continuous wavelet component. The above formula is discretized and calculated as follows:where is the sampling number and is the sampling interval.

The comparison of the energy ratio coefficients after weight calculation of sample S1 is illustrated in Figure 6. It is shown that the energy ratio of the eight weighted orders all increase except for the decrease of the first order energy ratio, which makes the energy ratio of each frequency band tend to be stable. The results also reflect the weight increase of the high-frequency components. The energy ratio coefficients of each sample are shown in Table 6.

5. Correlation Analysis

To compare the extracted time-frequency objective parameters with the traditional psychoacoustic objective parameters, the correlation coefficient is used to judge their effectiveness in evaluating vehicle door closing sound quality. The correlation coefficient is calculated as follows:where is the correlation coefficient between vector and vector , and are the elements in the two vectors, and are the average of the two vectors, respectively, and is the number of samples.

The correlation between the traditional psychoacoustic objective parameters and the subjective evaluation value is calculated as shown in Table 7. In addition, the correlation between the extracted objective parameters based on complex analytic wavelet and the subjective evaluation value is also calculated as shown in Table 7. The correlation coefficients between objective parameters and subjective evaluation value are arranged from small to large as shown in Figure 7.

The scatter plots between loudness, sharpness, roughness, the energy ratio coefficients of component 6∼8, and the subjective evaluation value are obtained, respectively, as shown in Figures 813. It can be seen from Figures 8, 10, and 12 that in the objective parameters of traditional psychoacoustics, the correlation between the loudness and the subjective evaluation value is the largest. It can be seen from Figures 9, 11, and 13 that in the objective parameters based on complex analytic wavelet extraction, the correlation between the energy ratio coefficient of component 8 and the subjective evaluation value is the largest. It can be seen from Table 7 and Figure 7 that the energy ratio coefficient of component 8 has the highest correlation with the subjective evaluation value, and the correlation coefficient reaches −0.9278, which has a strong negative correlation.

6. Conclusion

(1)In this paper, EMD is combined with the complex analytic wavelet innovatively. It is the first time that the frequency bands of the complex analytic wavelet are extracted by using the decomposition characteristics of EMD. Accordingly, the wavelet analysis bandwidth, scale factor, and center frequency are determined. To highlight the influence of high-frequency components, the energy ratio coefficients after weighting are extracted as the objective parameters for sound quality evaluation of closed doors. These results show that the energy ratio coefficients based on complex analytic wavelet transform have a greater correlation with the subjective evaluation results than the traditional psychoacoustic objective parameters. Therefore, they are more suitable for the evaluation of vehicle door closing sound quality.(2)It can be seen from Table 7 that there is a strong negative correlation between the energy ratio coefficients of component 6∼8 and the subjective evaluation value. The critical frequency band corresponding to the three coefficients is 1720 Hz∼3150 Hz. It can be inferred that the frequency band has a great influence on the sound quality of closed door, so further research on the frequency band can be carried out to improve the vehicle door closing sound quality in the future.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The authors gratefully acknowledge the support of National Natural Science Foundation of China (grant no. 11232004).