Abstract
The existence of various sounds from different natural and unnatural sources in the deep sea has caused the classification and identification of marine mammals intending to identify different endangered species to become one of the topics of interest for researchers and activist fields. In this paper, first, an experimental data set was created using a designed scenario. The whale optimization algorithm (WOA) is then used to train the multilayer perceptron neural network (MLP-NN). However, due to the large size of the data, the algorithm has not determined a clear boundary between the exploration and extraction phases. Next, to support this shortcoming, the fuzzy inference is used as a new approach to developing and upgrading WOA called FWOA. Fuzzy inference by setting FWOA control parameters can well define the boundary between the two phases of exploration and extraction. To measure the performance of the designed categorizer, in addition to using it to categorize benchmark datasets, five benchmarking algorithms CVOA, WOA, ChOA, BWO, and PGO were also used for MLPNN training. The measured criteria are concurrency speed, ability to avoid local optimization, and the classification rate. The simulation results on the obtained data set showed that, respectively, the classification rate in MLPFWOA, MLP-CVOA, MLP-WOA, MLP-ChOA, MLP-BWO, and MLP-PGO classifiers is equal to 94.98, 92.80, 91.34, 90.24, 89.04, and 88.10. As a result, MLP-FWOA performed better than other algorithms.
1. Introduction
The deep oceans make up 95% of the oceans’ volume, which is the largest habitat on Earth[1]. Creatures are continually being explored in the depths of the ocean with new ways of life [2, 3]. Much research has been conducted in the depths of the ocean, but unfortunately, this research is not enough, and many hidden secrets in the ocean remain unknown [4].
Various species of marine mammals, including whales and dolphins, live in the ocean. Underwater audio signal processing is the newest way to measure the presence, abundance, and migratory marine mammal patterns [5–7]. The use of human-based methods and intelligent methods is one method of recognizing whales and dolphins [8]. Initially, human operator-based methods were used to identify whales and dolphins. Its advantages include simplicity and ease of work. However, the main disadvantage is the dependence on the operator’s psychological state and inefficiency in environments where the signal-to-noise ratio is low [9].
To eliminate these defects, automatic target recognition (ATR) based on soft calculations is used [10, 11]. Then, contour-based recognition was used for recognition of whales and dolphins due to its time complexity and low identification rate [12, 13, 14]. The next item from the subset of intelligent methods is the ATR method based on soft computing [15], which is wildly popular due to its versatility and parallel structure [16, 17].
The MLP-NN neural network, due to its simple structure, high performance, and low computational complexity, has become a useful tool for automatically recognizing targets [18–20]. In the past, MLP-NN training used gradient-based methods and error propagation, but these algorithms had a low speed of convergence and were stuck in local minima [21–23].
Therefore, this paper presents a new hybrid method of MLP-NN training using FWOA to classify marine mammals. The main contributions of this work are as follows:(i)Practical test design to obtain a real data set from the sound produced by dolphins and whales(ii)Classifier design using MLP-NN to classify dolphins and whales(iii)MLP-NN training using the proposed FWOA hybrid technique(iv)MLP-NN training using new metaheuristic algorithms (CVOA, ChOA, BWO, PGO) and WOA as benchmark algorithms
In the following paragraphs, the paper is structured in such a manner that Section 2 designs an experiment for data collection. Section 3 will cover how to extract a feature. Section 4 describes WOA and how to fuzzy. Section 5 will simulate and discuss it, and finally, Section 6 will conduct conclusions and recommendations.
1.1. Background and Related Work
MLP-NNs are a commonly used technology in the field of soft computing [11, 24]. These networks may be used to address nonlinear issues. Learning is a fundamental component of all neural networks and is classified as unsupervised and supervised learning. In most cases, back-propagation techniques or standard [25, 26] are also used as a supervised learning approach for MLP-NNs. Back-propagation is a gradient-based technique with limitations such as slower convergence, making it unsuitable for real world applications. The primary objective of the neural network learning mechanism is to discover the optimal weighted edge and bias combination that produces the fewest errors in network training and test samples [27, 28]. Nevertheless, the majority of MLP-NN faults will remain high for an extended period of time throughout the learning process, after which they will be reduced by the learning algorithm. This is particularly prevalent in mechanisms relying on gradients, such as back-propagation algorithms. In addition, the back-propagation algorithm’s convergence is highly dependent on the initial values of the learning rate and the magnitude of the motion. Incorrect values for these variables may potentially result in algorithm diverging. Numerous research works have been conducted to address this issue using the back-propagation algorithm [29], but there is not enough optimization that has occurred, and each solution has unintended consequences. We have seen an increase in the usage of meta-heuristic methods for neural network training in recent years. The following (Table 1) discusses many works on neural network training using different meta-heuristic techniques.
GA and SA are likely to minimize local optimization but have a slower convergence rate. This is inefficient in real-time processing applications. Although PSO is quicker than evolutionary algorithms, it often cannot compensate for poor solution quality by increasing the number of iterations. PSOGSA is a fairly sophisticated algorithm, and its performance is insufficient for solving problems with a high dimension. BBO requires lengthy computations. Despite their minimal complexity and rapid convergence speed, GWO, SCA, and IWT fall victim to local optimization and so are not appropriate for applications requiring global optimization. The primary cause for being stuck in local optimizations is a mismatch between the exploration and extraction stages. Various methods are provided to solve problems such as getting stuck in local optimizations and slow convergence speed in WOA, including parameterization ɑ by the linear control strategy (LCS) and arcsine-based nonlinear control strategy (NCS-arcsine) to establish the right balance between exploration and extraction [51, 52]. LCS and NCS-arcsine strategies usually do not provide appropriate solutions when used for high-dimensional problems.
On the other hand, the no free launch (NFL) theorem logically states that meta-heuristic algorithms do not have the same answer in dealing with different problems [29]. Due to the problems mentioned and considering the NFL theory in this article, a fuzzy whale algorithm called Fuzzy-WOA is introduced for the MLP-NN training problem to identify whales and dolphins.
To investigate the performance of the FWOA, we design an underwater data accusation scenario, create an experimental dataset, and compare it to a well-known benchmark dataset (Watkins et al. 1992). To address the time-varying multipath and fluctuating channel effects, a unique two cepstrum liftering feature extraction technique is used.
2. The Experiment Design and Data Acquisition
As shown in Figure 1, to obtain a real data set of sound produced by dolphins and whales from a research ship called the Persian Gulf Explorer and a Sonobuoy, a UDAQ_Lite data acquisition board and three hydrophones (Model B& k 8013) were obtained and were used with equal distance to increase the dynamic range. This test was performed in Bushehr port. The array’s length is selected based on the water depth, and Figure 2 shows the hydrophones’ location.

(a)

(b)

(c)

(d)

The raw data included 170 samples of pantropical spotted dolphin (8 sightings), 180 samples of spinner dolphin (five sightings), 180 cases of striped dolphin (eight sightings), 105 cases of humpback whale (seven sightings), 95 samples of minke whale (five sightings), and 120 samples of the sperm whale (four sightings). The experiment was developed and performed in the manner shown in Figure 2.
2.1. The Ambient Noise Reduction and Reverberation Suppression
For example, the sounds emitted by marine mammals (dolphins and whales) recorded by the hydrophone array are considered x (t), y (t), z (t), and the original sound of dolphins and whales is considered s (t). The mathematical model of the output of hydrophones is in
In equation (1), the Environment Response Functions (ERF) are denoted by h (t), (t), and q (t). ERFs are not known, and “tail” is considered uncorrelated [53], and naturally, the first frame of sound produced by marine mammals does not reach the hydrophone array at one time. Due to the sound pressure level (SPL) in the Hydrophone B&K 8103 and reference, which deals with the underwater audio standard, the recorded sounds must be preamplified by a factor of .
The frequency domain SPL is then transformed using the Hamming window and fast Fourier transform (FFT). Following that, (2) reduces the frequency bandwidth to 1 Hz.
SPLm is the obtained SPL at each fundamental frequency center in dB; re 1 μPa, SPL1 is the SPL reduced to 1 Hz bandwidth in dB; re 1 μPa, and Δf represents the bandwidth for each 1/3 Octave band filter. To reduce the square mean error (MSE) between ambient noise and marine mammal noise, a Wiener filter was utilized [54]. Following that, the results were computed using (3) to identify sounds with a low SNR, less than 3 dB, that should be eliminated from the database.where T, V, and A represent all the available signals, sound, and ambient sound, respectively. After that, the SPLs were recalculating at a standard measuring distance of 1 m as follows:
Figure 3 illustrates the block diagram for ambient noise reduction and reverberation suppression.

In the next part, the effect of reverberation must be removed. In this regard, the common phase is added to the band (reducing the phase change process using the delay between the cohesive parts or the initial sound is called the common phase) [55]. Therefore, a cross-correlation pass function by adjusting each frequency band’s gain eliminates noncorrelated signals and passes the correlated signals. Finally, the output signals from each frequency band are merged to form the estimated signal, i. e., . The basic design for removing reverberation is shown in Figure 4. Figure 5 illustrates typical representations of dolphin and whale sounds and melodies, as well as their spectra.


(a)

(b)

(c)

(d)

(e)

(f)
3. Average Cepstral Features and Cepstral Liftering Features
The effect of ambient noise and reverberation decreases after detecting the audio signal frames obtained in the preprocessing stage. In the next step, the detected signal frames enter the feature extraction stage. The sounds made by dolphins and whales emitted from a distance to the hydrophone experience changes in size, phase, and density. Due to the time-varying multipath phenomenon, fluctuating channels complicate the challenge of recognizing dolphins and whales. The cepstral factors combined with the cepstral liftering feature may considerably reduce the impacts of multipath, whilst the average cepstral coefficients can significantly minimize the time-varying effects of shallow underwater channels [56]. As a result, this section recommends the use of cepstral features, such as mean cepstral features and cepstral liftering features, in order to construct a suitable data set. Cepstrum indices of greater and lower values indicate that the channel response cepstrum and the original sound of dolphins and whale cepstrum are distinct. [57]. They are located in distinct regions of the liftering cepstrum. Therefore, by reducing time liftering, the quality of the features is increased. Following removing noise and reverberation, the frequency domain frames of SPLs (S (k)) are passed to the portion extracting cepstrum features. The following equation determines the cepstrum characteristics of the sound produced by dolphin and whale signals.where S(k) indicates the frequency domain frames of sounds generated by dolphins and whales, N signifies the number of discrete frequencies employed in the FFT, and Hl (k) denotes the transfer function of the Mel-scaled triangular filter with l = 0,1, ..., M. Ultimately, using the discrete cosine transform (DCT), the cepstral coefficients are converted to the time domain as c (n).
As previously stated, the sound generated by dolphins and whales is obtained via a method called low-time liftering. Consequently, (6) is recommended to separate the sound that originated from the whole sound.
Lc indicates the liftering window’s length, which is typically 15 or 20. The ultimate features may be computed by multiplying the cepstrum c (n) by and using the logarithm and DFT functions as described in the following equations:
Finally, the feature vector would be represented using
The first 512-cepstrum points (out of 8192 points in one frame for a sampling rate of 8192 Hz, expect for zeroth index are corresponded to 62.5 ms liftering coefficients and are windowed from the N indices, which is equivalent to one frame length to reduce the liftering coefficients to 32 features. Prior to averaging, the duration of subframes is five seconds. Ten prior frames compose 50 s average cepstral features throughout the averaging liftering technique, smoothing 10 frame results in the final average cepstral features. As a result, the average cepstral feature vector has 32 elements. The vector would then be used as an input signal for an MLP-NN in the subsequent phase.
To summarize, the number of inputs to a neural network equals P. The whole feature extraction step is shown in Figure 6. To summarize, Figure 7 depicts the result of this step.


4. Design of an FWOA-MLPNN for Automatic Detection of Sound Produced by Marine Mammals
MLP-NN is the simplest and most widely used neural network [58, 59]. Important applications of MLP-NN include automatic target recognition systems. For this reason, this article uses MLP-NN as an identifier [60]. MLP-NN is amongst the most durable neural networks available and is often used to model systems with a high degree of nonlinearity. In addition, the MLP-NN is a feed-forward network able of doing more precise nonlinear fits. Despite what has been said, one of the challenges facing MLP-NN is always training and adjusting the edges’ bias and weight [61].
The steps for using meta-heuristic algorithms to teach MLPNN are as follows: the first phase is to determine how to display the connection weights. The second phase involves evaluating the fitness function in order to determine the connection weights, which may be thought of as the mean square error (MSE) for recognition issues. The third step employs the evolutionary process to minimize the fitness function, that is represented by the MSE. Figure 8 and equation (10) illustrate the technical design of the evolutionary technique for connection weight training.where n represents the input nodes’ amount, indicating the node’s connection weight to the node, denotes the bias (threshold) of the hidden neuron.

As noted before, the MSE is a frequently used criterion for assessing MLP-NNs, as the following equation demonstrates.where m is the number of neurons in the MLP outputs, is the optimal output of the input unit in cases where training sample is utilized, and denotes the real output of the input unit in cases where the training sample is observed in the input. To be successful, the MLP must be tuned to a collection of training samples. As a result, MLP performance is calculated as follows:
T denotes the number of training samples, denotes the optimal output related to ith input when using kth the training sample, m denotes the number of outputs, and indicates the input’s real output when using the training sample. Finally, the recognition system requires a meta-heuristic method to fine-tune the parameters indicated above. The next subsection proposes an instructor based on an improved whale optimization algorithm (WOA) with fuzzy logic called FWOA.
4.1. Fuzzy WOA
This section upgrades WOA using fuzzy inference. In this regard, in the first subsection, it will review WOA, and in the second subsection, it will describe the fuzzy method for upgrading WOA.
4.1.1. Whale Optimization Algorithm
The WOA optimization algorithm was introduced in 2016, inspired by the way whales were hunted by Mirjalili and Lewis [62]. WOA starts with a set of random solutions. In each iteration, the search agents update their position by using three operators: encircling prey, bubble-net attack (extraction phase), and bait search (exploration phase). In encircling prey, whales detect prey and encircle it. The WOA assumes that the best solution right now is his prey. Once the optimal search agent is discovered, all other search agents will update their position to that of the optimal search agent. The following equations express this behavior:where t is the current iteration, and are the coefficient vectors, is the place vector which is the best solution so far, and is the place vector. It should be noted that in each iteration of the algorithm, if there is a better answer, should be updated. The vectors and are obtained using the following equations:where decreases linearly from 2 to zero during repetitions and is a random vector in the distance [0, 1]. During the bubble-net assault, the whale swims simultaneously around its victim and along a contraction circle in a spiral pattern. To describe this concurrent behavior, it is anticipated that the whale would adjust its location during optimization using either the contractile siege mechanism or the spiral model. (17) is the mathematical model for this phase.where is derived from (8) and denotes the distance i between the whale and its prey (the best solution ever obtained). A constant b is used to define the logarithmic helix shape; l is a random number between −1 and +1. is a random number between zero and one. Vector A is used with random values between −1 and 1 to bring search agents closer to the reference whale. In the search for prey to update the search agent’s position, random agent selection is used instead of using the best search agent’s data. The mathematical model is in the form of the following equations:
is the randomly chosen position vector (random whale) for the current population, and vector is utilized with random values larger or smaller to one to drive the search agent away from the reference whale. Figure 9 shows the FWOA flowchart, and Figure 10 shows the pseudocode of the FWOA. In the next section, we will describe the proposed fuzzy system.


4.1.2. Proposed Fuzzy System for Tuning Control Parameters
The proposed fuzzy model receives the normalized performance of each whale in the population (normalized fitness value) and the current values of the parameters and . The output also shows the amount of change using the symbols and . The NFV value for each whale is obtained as follows:
The NFV value is in the range of [0. 1]. This paper’s optimization problem is of the minimization type, in which the fitness of each whale is obtained directly by the optimal amount of these functions. (21) and (22) update the parameters and for each whale which are as follows:
The fuzzy system is responsible for updating the parameters and of each member of the population (whale) and the three inputs of this system are the current value of parameters and , NFV. Initially, these values are “fuzzification” by membership functions. Then, their membership value is obtained using μ. These values apply to a set of rules and give the values ∆α and ∆C. After determining these values, the “defuzzification” process is performed to estimate the numerical values ∆α and ∆C. Finally, these values are applied in (12) and (13) to update the parameters ∆α and ∆C. The fuzzy system used in this article is of the Mamdani type. Figure 11 shows the proposed fuzzy model and membership functions used to adjust the whale algorithm’s control parameters. The adjustment range for membership functions is obtained using the primary [62] of the WOA. Many experiments were performed for all types of membership functions, including trimf, trapmf, gbellmf, gaussmf, gauss2mf, sigmf, dsigmf, psigmf, pimf, smf, and zmf. Comparison of the results showed that trimf input and output membership functions are more suitable for using the data set obtained in Sections 2 and 3.

The semantic values used in the membership functions of the input variables , , and NFV are low, medium, and high. The semantic values used in the output variables ∆α and ∆C are NE (negative), ZE (zero), and PO (positive). The fuzzy rules used are presented in Table 2, and how to train MLP-NN using FWOA is shown in Figure 12.

5. Simulation Results and Discussion
The evaluation of the IEEE CEC-2017 benchmark functions is presented in this section, followed by a discussion of the results achieved for the classification of marine mammals.
5.1. Evaluation of IEEE CEC-2017 Benchmark Functions
The CEC-2017 benchmark functions and dimension size are shown in Table 3. Table 6 shows the parameters selected in the algorithms used for the benchmark functions. In all algorithms, the maximum number of iterations is 100, and the population size is 180.
As shown in Table 4, the FWOA algorithm has achieved more encouraging results compared to CVOA, WOA, ChOA, BWO, and PGO. From a more detailed comparison of WOA with its upgraded version with a fuzzy subsystem (FWOA), it can be seen that the improvement and upgrade of WOA have been successful.
5.2. Classification of Marine Mammals
In this section, to show the power and efficiency of MLP-FWOA, in addition to using the sounds obtained in Sections 2 and 3, the reference dataset (Watkins et al. 1992) is also used. As already mentioned, To obtain the data set, the vector is assumed to be an input for the MLP-WOA. The dimension is 680 × 42, which indicates that there are 42 features and 680 samples in the data set. In addition, the benchmark dataset has a dimension of 410 × 42. In MLP-FWOA, the number of input nodes is equal to the number of features. The 10-fold cross-validation method is used to evaluate the models. Therefore, first, the data are divided into ten parts, and each time nine parts are used for training and another part for testing. Figure 13 shows the 10-fold cross-validation. The final classification rate for each classifier is calculated using the average of the ten classification rates obtained.

To have a fair comparison between the algorithms, the condition of stopping 300 iterations is considered. There is no specific equation for obtaining the number of hidden layer neurons, so (23) is used to obtain [63].where N indicates the total number of inputs and H indicates the total number of hidden nodes. Furthermore, the number of output neurons corresponds to the number of marine mammal classifications, namely six.
For a comprehensive assessment of FWOA performance, this algorithm is compared with WOA [62], ChOA [64], PGO [65], CVOA [66], and BWO [67] benchmark algorithms.
In all population base algorithms, population size is a hyper parameter that plays a direct role in the algorithm’s performance in the search space. For this reason, many experiments were performed with different population numbers, some of which are shown in Table 5. The results showed that for the proposed model, a population of 180 is the most appropriate value. In other words, with an increasing population, in addition to no significant improvement in model performance, complexity increases, and processing time increases. Table 6 illustrates the fundamental parameters and major values of various benchmark methods. The classification rate to adjust the population size of different algorithms for the data set is obtained from parts 2 and 3.
The classifiers’ performance is then tested for the classification rate, local minimum avoidance, and convergence speed. Each method is run 40 times, and the classification rate, mean, and standard deviation of the smallest error, the A20-index [68], and the value are listed in Tables 7 and 8. The mean and standard deviation of the smallest error, the A20-index, and the value all reflect how well the method avoids local optimization.

Figures 14 and 16 show a comprehensive comparison of the convergence speed and syntax and the classifiers’ final error rate. Figures 15 and 17 show a receiver operating characteristic for datasets.

(a)

(b)


(a)

(b)
The simulation was conducted in MATLAB 2020a by using a personal computer with a 2.3 GHz processor and 5 GB RAM.
As shown in Figures 14 and 16., among the benchmark algorithms used for MLP training, FWOA has the highest convergence speed. PGO has the lowest convergence speed by adjusting control parameters by fuzzy inference, correctly detecting the boundary between the exploration and extraction phases. As shown in Tables 7 and 8, MLP-FWOA has the highest classification rate, and MLP-PGO has the lowest classification rate among the classifiers. The STD values, shown in Tables 7 and 8, indicate that the MLP-FWOA results rank first in the two datasets, confirming that the FWOA performs better than other standard training algorithms and demonstrates the FWOA’s ability to avoid getting caught up in local optimism. A value of less than 0.05 indicates a significant difference between FWOA and other algorithms. According to Tables 7 and 8, a20-index is 1 for all predictive classifiers. It confirms that all models can provide good results for similar data as well.
Adding a subsystem to a metaheuristic algorithm increases its complexity. However, a comparison of the convergence curves in Figures 14 and 16 shows that the FWOA achieved the global optimum faster than the other algorithms used. Other algorithms were stuck in the local optimum if they converged. In particular, by comparing the WOA and FWOA in Figures 14 and 16, it can be seen that using an auxiliary (fuzzy system) subsystem is necessary to avoid getting caught up in the local optimum in the WOA. In general, using a fuzzy system to improve WOA increases complexity. However, the convergence curves and better performance of FWOA than other algorithms used show a reduction in computational cost. The reduced MSE of this method compared to other algorithms employed is more indicative that despite increased complexity, FWOA performance is improving.
6. Conclusions and Recommendations
In this paper, to classify marine mammals, a fuzzy model of control parameters of the whale optimization algorithm was designed to train an MLP-NN. CVOA, WOA, FWOA, Ch0A, PGO, and BWO algorithms have been used for the MLP-NN training stage. As the simulation results show, FWOA has a powerful performance in identifying the boundary between the exploration and extraction phases. For this reason, it can identify the global optimal and avoid local optimization. The results indicate that MLP-FWOA, MLP-CVOA, MLP-WOA, MLP-ChOA, MLPBWO, and MLP-PGO have better performance for classifying the sound produced by marine mammals. The convergence curve also shows that FWOA converges faster than the other five benchmark algorithms in convergence speed.
Due to the complex environment of the sea and various unwanted signals such as reverberation, clutter, and types of noise in the seabed, lack of access to data sets with a specific signal-to-noise ratio is one of the main limitations of the research.
For future research directions, we recommend the following list of topics:(i)MLP-NN training using other metaheuristic algorithms for the classification of marine mammals(ii)Using other artificial neural networks and using deep learning for the classification of marine mammals(iii)Direct use of metaheuristic algorithms as classifiers for classification of marine mammals.
Data Availability
No data were used to support this study.
Disclosure
An earlier version of the manuscript has been presented as Preprint in Research Square according to the following link https://assets.researchsquare.com/files/rs-122787/v1_covered.pdf?c=1631848722.
Conflicts of Interest
The authors declare that they have no conflicts of interest.