Abstract
Water color is an important representation reflecting the characteristics of its quality in inland lakes or ponds; however, sufficient water color image samples are often difficult to obtain due to the limitation of fishery production. For few color image samples, the existing data enhancement methods based on the depth generation model have the problems of low quality of generated data, difficulty of network training, and so on; moreover, for image classification, traditional methods based on convolutional neural network (CNN) cannot effectively extract the potential manifold structure features in the image and the full connection layer in CNN cannot simulate biological neurons well, resulting in high time cost and low efficiency. In this paper, a water quality classification method has been proposed to solve the above problems, the improved semisupervised triple-generation adversarial network (triple-GAN) algorithm is used to enhance the few water color image samples, and the feature data can then be extracted from enhanced data by manifold learning method t-distributed stochastic neighborhood embedding (t-SNE). Moreover, convolutional spiking neural network (CSNN), in which spiking neural network (SNN) has replaced the original full connection layer of CNN, is used for final water quality classification. The main contribution of this paper is to build a new algorithm framework, introduce triple-GAN and CSNN into the field of classification of few water color image samples for the first time, and make an exploration of integrating artificial intelligence (AI) and water quality analysis problems. By comparing with traditional methods, the proposed method is proved to have the advantages of less time-consuming, low operation cost, and high classification accuracy.
1. Introduction
1.1. Background
The output of aquatic products in inland lakes and ponds is an important part of fishery production in human society. Common sense tells people that due to the characteristics of geographical property or characteristics of the target breeding fish stocks, the water in different lakes and ponds often has different biological, chemical, and physical characteristics [1]. These characteristics can reflect the current ecological state and the related behavior of fish stocks and can provide an important reference for grasping the state of fishery ecosystem and its sustainable development. It is of no doubt that the water resource quality of inland lakes and ponds has a significant impact on the survival of fish and other biological populations, including plankton, microorganisms, and zooplankton [2]. In order to meet the requirements of fishery production, it is of great significance to evaluate the water quality of lakes and ponds.
Water color is an important representation reflecting the characteristics of water quality; it can be used as the basis for water quality evaluation [3, 4]. In actual fishery production, water quality is mostly judged based on the observation from naked eyes and experiences; thus, judgment deviation caused by subjectivity is prone to happen and the comparability and interpretability of observation results are poor [5]. Digital image processing technique based on computer vision, combined with the use of artificial intelligence (AI) algorithms, provides a new idea for water quality evaluation. At the same time, it can also make the rapid discrimination of water quality possible. Up to now, a large number of scholars have studied the water quality evaluation methods for different waters. Also, there have been many reports on employing different types of AI algorithms to solve the problem of water quality evaluation which will be discussed in Section 2 of this paper.
However, when evaluating water quality, especially the water quality of inland lakes and ponds based on AI algorithms under the background of fishery production, there are often some problems in its actual operation, which have not been properly solved yet. First of all, it always takes time to collect samples of different water colors in the same lake or pond area, which makes the color samples provided for water quality analysis often insufficient; moreover, in the process of finding the effective characteristics of water quality, due to the constrains of the objective function in many AI algorithms, the efficiency of employing the algorithms themselves to find the optimal solution based on gradient descent (GD) related methods is not high and even falls into the local optimal solutions.
1.2. Objective of the Paper
1.2.1. Research Motivation
For the classification of few water color image samples mentioned above, deep generation models (such as generative adversarial network (GAN) or other automatic encoders) are usually used to realize data enhancement. However, traditional deep generation models can only generate fake sample data randomly by noise and the randomness is somehow too strong to approach the real data with high quality. Even if some models introduce semisupervision mechanisms, the structures of generator and discriminator are still solidified, and there is lack of mechanism innovation, the network training is unstable and prone to the problems of gradient disappearance and gradient explosion. There are two reasons for the instability of network training, gradient disappearance, and gradient explosion: (1) the traditional activation function cannot meet the excellent properties of nonlinearity, unsaturated, and differentiability almost everywhere at the same time. After the activation function is nested layer by layer in the deep network, its disadvantages will be amplified accumulatively, which is not conducive to the subsequent parameter optimization. (2) The traditional optimizer of deep generation model generally uses the line search method in convex optimization theory to optimize parameters, such as GD or stochastic GD (SGD). Such methods are often limited to local gradient information and easy to fall into the trap of local extreme value. The closer to the optimal value point, the more serious sawtooth oscillation appears, which is easy to cause the low quality of image generated.
Moreover, water quality images often directly use the convolution layer of convolutional neural network (CNN) for feature extraction, but the potential manifold geometry in the picture (such as algebraic structure, probability structure, and topology) can hardly be extracted effectively. Even if the manifold learning method is used to reduce the dimension of water color image, there is often a lack of parameter optimization method which is coupled with the optical information of image and depends on the principle of physical optics. In addition, the full connection layer in traditional CNN has many drawbacks: (1) the neuron model is too simple to simulate the changes of membrane potential and spiking releasing process of biological neurons; (2) the output value of neuron is only an approximate representation of neural information, and its inherent limitation is that the time information of a single spiking is not used; (3) a large number of label datasets are usually required to drive the fitting of the network, and thus, the energy consumption is relatively high and efficiency is relatively low; (4) the expansibility of the structure is not strong, especially not conducive to dealing with spatiotemporal events.
1.2.2. Main Contribution of This Paper
In this paper, a water quality classification method for inland lakes and ponds just under the condition of few color image samples based on triple-GAN and convolution spiking neural network (CSNN) has been proposed. For this paper, the main contributions are as follows:(1)In order to achieve better training effect under the condition of few samples, triple-GAN is used to enhance the collected original data, which abandons the zero-sum game process between generator and discriminator, but adopts the idea that the discriminator adds a supervised learning process based on few samples of real data under a certain category in order to improve the authenticity efficiency of the generated data under a certain category. Also, Xwish activation function is used to replace all of the activation functions in triple-GAN algorithm so as to avoid the defects of gradient disappearance and gradient explosion mentioned before; at the same time, fixed memory step fractional order gradient descent method (FMSGDM) and fixed memory step fractional order gradient ascent method (FMSGAM) based on Caputo derivative are used to replace the traditional GD in solving the loss function so as to avoid falling into the trap of local optimization. Based on the above method, the probability distribution contained in few water color samples can be properly learnt and new high-quality samples under the condition of this probability distribution can be properly generated.(2)Manifold learning nonlinear t-distributed stochastic neighbor embedding (t-SNE) is used to reduce the dimension of the original and generated data; on the one hand, the noise in the color samples can be reduced; on the other hand, the advantage is that t-SNE can be utilized to ensure the probability isomorphism of the data manifold structure before and after dimensionality reduction to extract the data features that maintain the manifold structure. In order to match the optical information of water color samples, a new optical optimization algorithm, Gaussian random parameter-based combination optics inspired optimization (GCOIO), for parameter optimization has also been proposed.(3)Convolutional spiking neural network (CSNN), in which spiking neural network (SNN) has replaced the original full connection layer of CNN, is used for final water quality classification, so as to better simulate the membrane potential change and spiking releasing process of biological neurons.
1.3. Paper Organization
The remainder of this paper is organized as follows: a comprehensive literature review on water quality evaluation related research will be done in Section 2: methodologies, hardware and system development, and the specific research on water quality analysis based on image analysis and artificial intelligence algorithm technology. Then, in Section 3, triple-GAN, which has been introduced in to deal with few samples, will be introduced in detail, including the replacement of its activation function and the optimization of its loss function. In Section 4, the optimal solution of t-SNE and its objective function will be discussed in order to enhance effective features and realize data dimensionality reduction. In Section 5, the detailed process of using CSNN to process the enhanced and dimensionality-reduced data and the final realization of water quality classified evaluation will be discussed in detail. In order to verify the effectiveness and superiority of the proposed method, in Section 6, based on the water sample data of a tilapia aquaculture pond given in literature [5], the proposed method and its key improvements compared with the original triple-GAN and CNN are numerically analyzed in detail. Finally, the conclusion will be drawn in Section 7.
2. Literature Review
No matter what kind of water quality is analyzed, the main task is to classify and judge the water quality according to different analysis objectives after processing the collected data or images. As discussed in Section 1, a large number of scholars have carried out relevant research on water quality analysis at present. These studies can be roughly divided into two categories: methodology and system design. In this section, these two categories of research will be reviewed firstly.
2.1. Methodologies on Water Quality Analysis
As the water environment contains a large number of chemicals, the traditional water quality analysis methods are mostly based on the chemical perspective and pay attention to the chemical composition of the water. Wang et al. have analyzed the seasonal variation of chemical composition in Taihu Lake, China, based on optically active components in spectral analysis [3]; Abdullah et al. and Ma et al. have analyzed the geochemical characteristics of river water and ground water quality in basin area [6, 7]; Niroumand-Jadidi et al. have tried to acquire the spectra-derived features of water based on multispectral instruments (MSIs) and ocean and land color instruments (OLCIs) [2]; Nayeem et al. have tried to realize rapid determination of the chemical composition in water based on spectral analysis [8].
During the past twenty years or so, scholars have also tried to employ mathematical methodologies into water quality analysis related area. Smith et al. have tried to identify the types of suspended solids in water based on artificial neural network (ANN) in an early stage [9], and Nor et al. have employed multilayer ANN to classify the pollution amount in water [10]; Koponen et al. have employed expert system in handling water quality parameter statistical data [11]; Karami et al. and Shi et al. have both tried to realize water area classification by employing hierarchical cluster analysis (HCA) through environmental variables [12, 13], while Sun et al. have tried to use support vector regression to realize similar object [14]; Zhao et al. and Zhang et al. have both tried to use principal component analysis (PCA) method to find the sources of pollutant in coastal areas and urban rivers [15, 16]. During the past few years, more complex mathematical methodologies, such as unsupervised learning [17] and hybrid algorithm [1, 18, 19], have also been applied in water quality analysis and scholars like Liu et al. have even proposed their own deep learning (DL) based network to predict water quality [20].
Also, there are a bunch of scholars who have tried to go a step further to put methodologies into system design so as to contribute to water quality analysis. Cruz et al. have proposed a process control system for surface water parameter monitoring based on the geology method [21]. McEliece et al. have proposed a model-based decision support system for drinking water evaluation [22].
2.2. Water Quality Analysis Based on Image Processing and Artificial Intelligence Algorithms
By considering the actual water quality data collection condition of fishery, most of them are water color analysis after water sample collection. The acquisition of water color image is limited by conditions and often can only reflect the characteristics of water quality locally. Therefore, many scholars have carried out research on image structure extraction and information extraction with incomplete information. Chen and his team members have tried to put forward a variety of image information completion, image information reconstruction, and image information restoration methods, so as to make full use of local information and image semantics and achieve a high success rate of recognition and restoration [23–25], while Xia et al. have proposed a method to track the key target information in the image and achieved a high accuracy [26].
Usually, water color images, as the analysis object, can be acquired via many different ways, and till now, various image processing and AI algorithms have been used to realize water quality analysis. The acquisition of satellite remote sensing images has been widely used in water quality monitoring. Based on satellite image data, Ribeiro et al., Dona et al., Nazeer et al., and Batur et al. have employed ANN, expert experience, periodical regression model, and PCA-based response surface regression to realize the estimation and prediction of specific chemical components or organic matter content in water [27–30]. Apart from these reports, some scholars have used neural network (NN) related basic algorithms or hybrid algorithm to solve the problems of water quality analysis based on images. Isikdogan et al. have used fully CNN to analyze and predict the water availability of surface water mapping image data [31]. Yuan et al. have used a hybrid algorithm of long short-term memory (LSTM) network and neural network (LSTM-NN) to analyze pollutants in water with fish activities [32], and Pu et al. have employed CNN with hierarchical structure to try to establish the relationship between remote sensing images of inland lakes and on-site water quality level [33].
3. Data Enhancement Based on Triple-GAN
3.1. Triple-GAN
As described in Section 1, by considering that there are always very few samples of water color images which can be applied to fishery, it is necessary to enhance the original data. Conventional data enhancement methods include partial differential equation method based on variational method, GAN algorithm, and variational self-encoder algorithm. The main framework algorithm used in this paper is triple-GAN.
GAN is a kind of unsupervised deep learning algorithm. Its main idea is based on the zero-sum game between generator and discriminator, and its theoretical foundation is Nash equilibrium [34]. The objective function of GAN can be described as
The logic process is that the random noise z generates the fake sample data G(z) and then gives it to the discriminator D to judge the authenticity. Therefore, in order to achieve Nash equilibrium, on the one hand, it is required to minimize 1 − D(G(z)), that is, the closer the false sample data are to the real data, the better; on the other hand, the larger the result of discriminator judging the real sample data, the better. Finally, the objective function is given in the form of cross entropy. However, because of the unstable training of GAN, it is easy to lead to gradient disappearance and gradient explosion, which restrict its generation effect.
On the basis of GAN algorithm, some scholars have proposed conditional GAN (CGAN) [35], category-aware GAN (CatGAN) [36], and triple-GAN [37]. Among them, triple-GAN is a semisupervised deep learning algorithm, which abandons the zero-sum game process between generator and discriminator, but adopts the Nash equilibrium of classifier, generator, and discriminator under category conditions; its objective function is as follows:
Even if it can achieve good training results, the problem of gradient disappearance and gradient explosion in traditional GAN still exists. The problem lies in the activation function of neurons in GAN. In order to overcome this problem, a new depth network activation function, Xwish, is used to replace the original activation function in triple-GAN.
3.2. Xwish Activation Function
In the framework of the neutrons described in ANN, the target of activation function is to put the weighted sum into a nonlinear function for nonlinearity. To be specific, the weighted sum can be regarded as a linear hyperplane which divides spatial data into two categories, but it is a just linear classifier which does not have universal significance, as most problems in practice are nonlinear. Therefore, activation function is used to make the linear classifier to be nonlinear.
Classical activation functions all have their disadvantages, such as saturation effect on both sides of sigmoid function, a dead zone on the left side of ReLU, and nondifferentiable at 0 point of PReLU, ELU, and LReLU. Thus, the network structure in this paper adopts the Xwish activation function introduced in literature [38]. It satisfies the properties of nonlinearity, unsaturation, and differentiability everywhere. The expression of Xwish function can be described as
The expression of its first derivative iswhere β is a hyperparameter, which is unified in this paper as β = 0.1.
3.3. FMSGDM and FMSGAM
In the loss function optimization part of triple-GAN, many studies including [37] still use the GD method or stochastic GD (SGD) method; however, both GD and SGD methods are easy to fall into the trap of local optimum, so the FMSGDM introduced in literature [39] has been chosen to replace the original loss function in triple-GAN in this paper.
The advantage of fractional gradient over classical gradient is that it is more sensitive to gradient information and is not easy to fall into the trap of local optimum. Fractional gradient is defined by fractional derivative, which can be divided into three types: Grünwald–Letnikov, Riemann–Liouville, and Caputo. Grünwald–Letnikov derivative is defined by the limit, so it is rarely used in engineering operation. Riemann–Liouville derivative takes the form of integration before derivation, while Caputo derivative takes the form of derivation before integration. As fractional derivative does not satisfy the exchange law of integration and derivation, Riemann–Liouville and Caputo are two derivatives with different structures. Caputo is more suitable for engineering operation, so the FMSGDM discussed in this part is defined by Caputo derivative, and the specific iterative format of the FMSGDM can be described aswhere K ∈ Z+, µ is the step size, α is the order, and Γ(·) is the Gamma function.
The flow of FMSGDM is as follows: (1)x1, x2, …, xK−1 are inputs and xN is the output.(2)α, μ, N, and K are hyperparameters, which are given manually.(3)Perform the following iterative cycle from k = 0 to k = N − 1: until the termination conditions are met. When the minus sign in equation (7) has been changed to the plus sign, the FMSGAM can be acquired.
Furthermore, it only needs to solve all types of gradients of the loss function in triple-GAN:(1)Solve the gradient of discriminator parameter θd:(2)Solve the gradient of classifier parameter θc:(3)Solve the gradient of class-conditional generator parameter :
Then, the value of gradients can be put into FMSGDM to let the problem be solved.
The pseudocode of the training process of triple-GAN with Xwish activation function, FMSGDM, and FMSGAM is as follows: FOR number of training iterations, DO(1)Sample a batch of pairs (, )∼ of size , a batch of pairs (xc, yc)∼pc(x, y) of size mc, and a batch of labelled data (xd, yd)∼p(x, y) of size md.(2)Update discriminator D (with network activation function Xwish) by FMSGAM, and the fixed memory step fractional order gradient of D is calculated by .(3)Update classifier C (with network activation function Xwish) by FMSGDM, and the fixed memory step fractional order gradient of C is calculated by .(4)Update class-conditional generator G (with network activation function Xwish) by FMSGDM, and the fixed memory step fractional order gradient of G is calculated by . END FOR
4. Data Dimensionality Reduction Based on t-SNE
As the water color image data are usually a bit too uniform, in order to facilitate the effective feature extraction in the subsequent classification step employing deep networks, dimensionality reduction of the data should be done so as to make the features in the water color image to be more sparse and obvious. In this paper, t-SNE described in literature [40] is used to reduce the dimensionality of data.
4.1. t-SNE
t-SNE is based on stochastic neighbor embedding (SNE), while SNE is an unsupervised nonlinear dimensionality reduction method in the field of machine learning [41]. The basic idea of SNE is to let the nearest points expressed by probability in the high-dimensional space to be projected into low-dimensional space. Assuming that there are two points ϕi and ϕj in the high-dimensional space and the conditional probability that ϕj is close to ϕi is pj|i, the spatial Euclidean distance can be transformed into a probability value with the help of Gaussian distribution, namely,where is the variance of the normal distribution centered at ϕi, after ϕi and ϕj are projected, the corresponding points in the low-dimensional space are Фi and Фj, and the probability value formula is
According to the probability value, the discrete probability distributions corresponding to high-dimensional space and low-dimensional space are Pi and Qi. In order to ensure the effectiveness of the algorithm, the optimal parameters in equations (11) and (12) must be found, so that the two distributions are close enough, that is, some measure should be taken to investigate the similarity of the two probability distributions. In machine learning, Kullback–Leibler (KL) divergence is generally used to measure the difference between two probability distributions, and its formula is
It is hoped that the smaller the KL divergence is, the better. Therefore, the algorithm is finally transformed into the problem of finding the minimum value of the following objective function:where n is the number of samples. The above optimization problem can be solved by GD or SGD.
As mentioned before, t-SNE improves SNE algorithm. First, pj|i and qj|i are not necessarily equal; thus, symmetry cannot be ensured. Therefore, equations (11) and (12) can be changed as the form defined by joint probability as
However, equation (15) is prone to the problem of abnormal values, so equation (16) can be further transformed into
Also, the modified SNE is called symmetrical SNE. It should be pointed out that the definition based on Gaussian distribution will lead to congestion, which means that the points after dimensionality reduction will gather together and will be difficult to distinguish. By considering that t-distribution has a long tail effect, which can help the points after dimensionality reduction to be more sparse with more obvious characteristics, it is necessary to change the definition of probability value in low-dimensional space to
Till now, the original SNE has been transformed as t-SNE. t-SNE itself still has defects, as its essence is to solve the optimal parameter problem of minimizing KL divergence, and KL divergence is not a real distance which does not satisfy the symmetry of the distance axiom. In order to increase the scalability of the algorithm, t-SNE is further improved in this paper. KL divergence in t-SNE is replaced by JS divergence as
From equation (19), we can get the optimized objective function as
4.2. GCOIO
Usually, GD or SGD can be used to solve the objective function, but GD and SGD are easy to fall into the trap of local optimum. Therefore, in this paper, Gaussian random parameter-based combination optics inspired optimization (GCOIO) has been used to replace GD-related optimization method in t-SNE.
Optics inspired optimization (OIO) is a heuristic optimization algorithm [42]; its basic idea is to regard the optimization function as a mirror, the convex part of the function as a convex mirror, and the concave part of the function as a concave mirror. Also, each initial solution is regarded as an initial optical source point. In this way, after the light of each optical source point is reflected by the function mirror, the upright or inverted image that has been scaled down can be obtained by the uneven nature of the reflecting surface, and these series of image points can be used as the initial optical source points for the next step of optimization. As OIO algorithm is also prone to fall into local optimum, and the algorithm is easy to get too premature. Rotation OIO (ROIO) and combination OIO (COIO), which is the combination of OIO and ROIO, have also been proposed so as to improve the operation performance.
No matter which type the OIO-related algorithms mentioned above are, uniform distribution is used to construct the position information of random optical source points; however, uniform distribution has weak randomness and degree of freedom. In this paper, the statistical simulation method Box–Muller transform is used to transform the uniform distribution into Gaussian distribution, which can help to increase the randomness of the algorithm, and constructs GCOIO.
4.2.1. Basic OIO Model
Within the definition domain of the optimization function, the initial optical source point and another variable, vertex coordinates , are randomly generated. When ik ≠ j, there is .
There are 2 processing methods at this time: (1) if (where is the focal length), the corresponding function segment is regarded as a convex function model; (2) if , then the corresponding function segment is regarded as a concave function model.
Suppose that the mirror function of the jth optical source point is a concave function and the function position of the optical source point is a random number in (where d∞ is infinite). In this paper, Box–Muller transform in statistical simulation is used:where Z1 and Z2 obey [0, 1] uniform distribution.
When transforming uniform distribution to the Gaussian distribution , during the initial operation, there iswhere n is the number of optical source points. According to the imaging method discussed in reference [42], the object distance and the radius of the surface, namely, and , can be obtained, and the center of the surface is also a random number in .
Assuming that the function in front of the optical source points is a convex function, the function position of the j-th optical source point is a random number in , so it is easy to obtain the object distance, and the radius of the surface and the center of the surface are also random numbers of .
4.2.2. Iteration of New Optical Source Points
As discussed above, Gaussian random parameter based OIO has been proposed in this paper to increase the randomness in optimizing the objective function of t-SNE.
In the way of optical source point iteration in Gaussian random parameter based OIO (GOIO), the image length of the j-th optical source point is , which is the search step. The position of the j-th light source point in the t-th iteration iswhere is the new search solution.
In the way of optical source point iteration in Gaussian random parameter based ROIO (GROIO), the search direction is determined by the first law of light reflection and the objective function is optimized by rotating the position of the image around the main axis. The position of the image of the j-th light source point in the t-th iteration iswhere is the object length, is the image distance, and is the reflection image point of .
Based on the discussion above, GCOIO can be obtained by combining the solution update method of GOIO with the solution update method of GROIO. The position of the j-th light source point in the t-th iteration iswhere is a Gaussian random number on the interval [0, 1]. In this paper, GCOIO is used to solve the optimization part of t-SNE.
5. Water Quality Classification Based on CSNN
The data after dimensionality reduction are then given to deep network for classification. The deep network referred to in this paper is based on CNN framework. Compared with biological NN, traditional NN has inherent limitations in information processing: on the one hand, the neuron model is too simple; on the other hand, the neural spiking is not connected with time [43]. Different from the classical CNN, Xwish activation function is also used between the convolution layer and the pooling layer, and the traditional network is replaced by the spiking neural network in the full connection layer. Thus, in this paper, the above network form is called CSNN.
5.1. SNN
A large number of neuroscience experiments show that the visual and auditory nervous systems of many organisms use the time of action potentials (i.e., neural spikings) issued by neurons to encode information [44, 45], and the third-generation ANN model-spiking neural network (SNN) has come into being due to the requirements of actual situations. The main idea of SNN is to use time coding for information transmission and processing and directly use the neural spiking releasing time of neurons as the input and output of the network model, so as to realize the efficient processing of information, which is more pressing into the actual biological neural system than the traditional neural network model [46, 47].
The full connection layer of CSNN in this paper adopts the multilayer SNN structure based on spatiotemporal interest points (STIPs) to replace the original second-generation neural network. Of course, the learning process of this method is still the process of supervised learning algorithm.
5.1.1. Learning Rules of STIP
The framework of STIP learning rules is as follows: discrete spiking sequences are transformed into continuous functions through the representation of spiking sequence inner product and can be interpreted as specific neurophysiological signals. The purpose of mapping the spiking sequence set to the reproducing kernel Hilbert space corresponding to the kernel function is to unify the representation of the spiking sequence and the formal definition of the similarity measure of the spiking sequence [48, 49]. By defining the error function of spiking sequence varying with time and the relationship between neuron input spiking sequence and output spiking sequence, the learning of complex spatiotemporal model of spiking sequence is constructed and the inner product form of synaptic weight learning rule is given.
5.1.2. Spiking Frequency Coding Method
In the spiking frequency coding method, the amount of information transmitted by a specific spiking mode response does not have to be proportional to the number of spiking in the coding time window. The spiking activity response below the reference frequency can also transmit a large amount of information. The mutual information transmitted by a specific symbol depends on the degree of distinction between the symbol and all other possible symbols. Common spiking coding methods include frequency coding based on spiking counting, frequency coding based on spiking density, frequency coding based on group activity, precise spiking timing coding, first spiking trigger time coding, delay phase coding, and spiking sequence coding. Since the first spiking trigger time coding method has been widely used in the data coding of SNN [50–52], it is used as the spiking coding method here in CSNN in this section.
5.1.3. Transform of Spiking Sequence
The input and output of spiking neurons are expressed as spiking sequence coding information. For a given spiking sequence s(t), a specific smooth function can be selected, and the spiking sequence can be uniquely transformed into a continuous function by convolution operation:where tτ indicates the releasing time of the τ-th spiking in the neuron and M is the total number of spiking. For the spiking sequence released in time interval Γ = [0, T], it is assumed that the spiking sequence input by presynaptic neurons is si(t) ∈ S(Γ), where i = 1, 2, …, , and represents the number of presynaptic input neurons. The spiking sequence output by postsynaptic neurons is sa(t) ∈ S(Γ). By employing equation (26), the relationship expression between multiple input spiking sequences and output spiking sequences in time t after conversion can be expressed aswhere weight represents the link strength between presynaptic neuron i and postsynaptic neuron, namely, the weight of synaptic.
5.1.4. Learning Rules of Synaptic Weights
The key of constructing spiking neuron supervised learning algorithm is to define the error function of spiking sequence and the learning rules of synaptic weights. The error of spiking neuron at time t can be defined as the square of the difference between the output spiking sequence sa and corresponding function fs(t) of the target spiking sequence sd ∈ S(Γ):
Therefore, the total error of spiking neuron at time Γ is . The gradient of synaptic weight can be calculated by employing spiking sequence error function. The change of synaptic weight from presynaptic neuron i to postsynaptic neuron can be calculated aswhere η represents the learning rate and represents the gradient calculation value of spiking sequence error function for synaptic weight , which can be expressed as the integral of the derivative of error function E(t) to weight in time interval:
Using the error function and chain rule, the derivative of the error function E(t) to the synaptic weight in time t can be obtained according to (27) and (28):where si ∈ S(Γ) represents the spiking sequence released by neuron i. According to equation (30), the gradient of error function for synaptic weight can be calculated aswhere F(si, sj) represents the inner product of 2 spiking sequences si and sj; its definition iswhere and represent the releasing time corresponding to the 2 spiking sequences si and sj, respectively, and Ni and Nj represent the number of spiking corresponding to the 2 spiking sequences si and sj, respectively. κ(·, ·) is the kernel function, which is generally required to have the characteristics of symmetry, translation invariance, and positive definite, and Gaussian kernel is taken in this paper.
According to the above derivation process, the following synaptic learning rules can be obtained:where , , and represent the releasing time of the corresponding spiking in the input spiking sequence si, the actual output spiking sequence sa, and the target spiking sequence sd, respectively; Fi, Fa, and Fb represent the number of spiking contained in the spiking sequences si, sa, and sd, respectively.
5.1.5. Self-Adaptive Learning Rate
The value of learning rate has a direct impact on network training time and training accuracy. A too small learning rate will lead to too slow convergence speed of the algorithm and play a negative role in the effective updating of weights; a too high learning rate will easily lead to too much vibration in the learning process and affect the efficiency of the algorithm. Therefore, it is necessary to define the self-adaptive learning rate to improve the adaptability of the learning algorithm to the training process.
Here, the definition of adaptive learning rate in the CSNN is as follows:where c is the scaling factor, η∗ is the benchmark learning rate, namely, the value of learning rate in benchmark frequency range, and is the releasing frequency of neuron spiking sequence; the self-adaptive learning rate η can be adjusted according to different spiking releasing frequencies, and its benchmark frequency interval is [, ]. When ∈[, ], scaling factor c = 1; otherwise, the expression of c is
5.1.6. Similarity Measurement of Spiking Sequence
In order to evaluate the final learning performance, the approximation between the actual spiking sequence and the target output spiking sequence after learning needs to be measured; it can be understood as measuring the distance between the two groups of spiking sequences. By Cauchy–Schwarz inequalitythe similarity measurement S of the two spiking sequences can be obtained as follows:where is the Euclidean norm of the spiking sequence; the more similar the two spiking sequences are, the closer S is to 1.
5.2. Multilayer SNN Based on STIP
In the above discussion, the basic theory of single neuron has been prepared. For the needs of water color image classification, the multilayer feed forward SNN based on STIP rules needs to be introduced. Combining the contents described in Sections 5.1.4 and 5.1.5, its learning rules can be described as follows:(1)The learning rule of synaptic weight between neurons in the last hidden layer and output layer is(2)The learning rules of synaptic weights between input layer or hidden layer and hidden layer neurons are where and represent the releasing time of the corresponding spiking in the input spiking sequence si and the spiking sequence sh issued by the hidden layer neuron, respectively; and represent the releasing time of the corresponding spiking in the actual output spiking sequence soa and the target spiking sequence sod of the o-th neuron in the output layer, respectively; Fi, Fh, , and represent the number of spiking contained in the spiking sequences si, sh, , and , respectively; No is the number of neurons in the output layer.
The pseudocode of the algorithm training process is as follows:
Note: for precise in expression, the spiking sequence encoded by the characteristic data of each sample in the training set is recorded as si and the spiking sequence encoded by the label data of each sample in the training set is recorded as . In order to distinguish si and of each training sample, mark has been introduced to represent the training data pair composed of the characteristic spiking sequence of the ξ-th training sample and the tag (target) spiking sequence (Algorithm 1).
|
6. Numerical Analysis
There are only 5000 water color picture data of a tilapia aquaculture pond for the study in this paper [5]. In order to meet the requirements of training in the subsequent CNN, the data need to be enhanced and increased to 20000 by employing triple-GAN. The activation function of triple-GAN adopts Xwish, and the optimization part is FMSGDM. t-SNE is used to reduce the dimension of 20000 data and to strengthen the features; its optimization part adopts GCOIO. After dimensionality reduction, the data are subject to feature extraction and further dimensionality reduction compression through several convolution layers and pooling layers. Xwish is still selected as the nonlinear activation function between convolution layer and pooling layer. Furthermore, the data are spiking encoded and sent to SNN, and finally the classified category is output through spiking decoding. The flowchart of this method is shown in Figure 1.

First, the original water color image data are processed. Each image in the small sample is outputted as red, green, and blue (RGB) color channels. Figure 2 shows the original sample water color image, and Figure 3 shows its corresponding RGB three-color channel output result.


It can be seen from Figure 3 that the image edge data are invalid, and as the color data are relatively uniform, the central pixel image data have been intercepted via Python program. Figure 4 shows the effect after the original screenshot, and Figure 5 is the RGB three-color channel output result of the screenshot.


Then, the processed data are inputted into triple-GAN for data enhancement. In the process of data enhancement network training, triple-GAN is set to use 3 different activation functions including Xwish, Sigmoid, and ReLU, respectively, for comparison, as shown in Figures 6 and 7.


It can be clearly seen from Figures 6 and 7 that after 10000 iterations of training, the accuracy of Sigmoid activation function is the lowest, which is 83.76%, and the network is difficult to converge. The network using ReLU function as the activation function has higher accuracy and faster convergence speed, and the accuracy is 94.46%. By introducing Xwish activation function, it improves the drawbacks of traditional activation functions, such as easy to saturate on both sides and cause gradient explosion and gradient disappearance (when Sigmoid is used) and easy to fall into dead zone, resulting in neurons unable to learn effective features (when Sigmoid is used). By combining the advantages of two kinds of common activation functions, Xwish activation function can enhance the training efficiency and accuracy of the algorithm, and the training accuracy of the network is the highest, reaching 98.12%, 14.36% higher than Sigmoid and 3.66% higher than ReLU. The network has the fastest convergence speed and the highest accuracy, which reflect the characteristics of high efficiency and accuracy of Xwish function.
In order to focus on the optimization performance of FMSGDM, its time efficiency and iteration speed have also been investigated in this section. In the performance analysis shown later in this section, a hyperparameter τ and a performance evaluation function ρs(τ) which can be recognized as the standard indicator of the possibility of solution defined in literature [53] have been chosen to show the performance results. For convenience of expression, the number of iterations is recorded as NIT. It can be seen from Figure 8 that FMSGDM is superior to GD and SGD in CPU time performance, and it can also be seen from Figure 9 that the iteration time performance of FMSGDM is also superior to GD and SGD.


In order to evaluate the image enhancement quality of triple-GAN proposed in this paper, two classical data enhancement indexes, peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), are introduced. PSNR is the most common and widely used image objective evaluation index. However, it is based on the error between corresponding pixels, that is, the image quality evaluation based on error sensitivity. Although it is not completely consistent with the visual quality seen by human eyes, it is still used as the baseline for other indicators. The unit of PSNR is dB. The larger the value of PSNR is, the smaller the distortion is in the image.
As the visual characteristics of human eyes are not considered (human eyes are more sensitive to contrast differences with low spatial frequency and are more sensitive to brightness contrast differences than chromaticity, the perception result of human eyes to an area will be affected by its surrounding adjacent areas, etc.), the evaluation results are often inconsistent with people’s subjective feelings. Therefore, SSIM is also introduced to evaluate the enhanced data objectively and comprehensively together with PSNR. SSIM evaluates the similarity of two images from brightness, contrast, and structure and evaluates the distortion by sensing the structure information, which is closer to the human eye. The value range of SSIM is [0,1]. The larger the SSIM value is, the smaller the distortion is in the image.
GAN, CGAN, CatGAN, and triple-GAN have been used to increase 5000 groups of original water color image samples to 20000 groups, and the average PSNR and SSIM of each method have been calculated. The results are shown in Table 1. As PSNR higher than 40 dB and SSIM value closer to 1 also indicate excellent image quality, it can be seen from Table 1 that the enhancement quality of triple-GAN image is very high.
After data enhancement, the enhanced data are then sent to t-SNE for dimension reduction and feature enhancement. The original 2-dimensional image matrix will be reduced to 1-dimensional vector. The optimization performance of GCOIO has been focused. It can be seen from Figure 10 that the CPU time performance of GCOIO is better than that of GD and SGD, and it can also be seen from Figure 11 that the iteration time performance of GCOIO is also superior to GD and SGD.


Finally, the one-dimensional data are input into the one-dimensional CSNN architecture. There are in total 8 layers of convolution and pooling, in which the activation function is Xwish. The data are spiking encoded after passing through the convolution pooling layer and input to the 5-layer SNN based on STIP constructed. The output spiking of the network layer is decoded to obtain the category label.
Figure 12 shows the spiking sequence learning process of spiking neurons. In Figure 12, “” represents the target output spiking sequence, “Δ” represents the output spiking sequence before spiking neuron learning, and the black dot represents the actual output spiking sequence of some learning cycles in the learning process. It can be seen from the learning process that the neuron learns the desired output spiking sequence from the initial output spiking sequence after about 46 cycles. Figure 13 shows the change of learning accuracy curve in the learning process. It can be observed that, after 46 cycles, the actual output is the same as the target spiking sequence, that is, the similarity measure S = 1. Figures 14 and 15 show the changes of 500 synaptic weights before and after learning, respectively. In the process of learning, the variation range of neuron synaptic weight is [0, 0.2]. Through the analysis of spiking sequence learning process, it can be seen that SNN can learn the spatiotemporal patterns of complex spiking sequences and has good learning ability.




Then, the performance of the method proposed in this paper is compared with the existing popular water quality image classification methods. The comparison algorithms include logistic regression (LR) combined with multiclassification strategy OvO, random forest (RF), naive Bayes (NB), and support vector machine (SVM) combined with multiclassification strategy OvO; deep learning methods, including multilayer perception (MLP), DenseNet homologous variant models, DenseNet-121, DenseNet-169, and DenseNet-201, and ResNet homologous variant models, ResNet-34, ResNet-50, and ResNet-101.
Different from the evaluation indexes of the two classification problems, the commonly used evaluation indexes of the two classification problems’ algorithm include accuracy rate, precision rate, recall rate, and F1 score. The water color image classification problem in this paper is a five-class classification problem. Multiclassification problem standard evaluation indexes Macro-F1 and Micro-F1, which can comprehensively reflect the accuracy rate and recall rate of each specific classification, have been used in this paper. In Table 2, the evaluation index parameters have been shown, in which the higher the values of Macro-F1 and Micro-F1, the better the classification quality. It can be seen that the values of Macro-F1 and Micro-F1 of the method proposed in this paper are higher than those of other methods, which shows its high accuracy and stability.
7. Conclusions
In this paper, a new algorithm framework for few color samples’ quality classification for inland lakes or ponds has been proposed, in which triple-GAN and SNN have been introduced into the field of water quality problems for the first time. The proposed method has made an active theory exploration of integrating AI and water quality analysis problems. In order to effectively evaluate the classification efficiency of the method proposed in this paper, multiclassification problem standard evaluation indexes Macro-F1 and Micro-F1 have been introduced to evaluate its performance, and its advantages can be easily seen via the final results. It can be concluded that the better performance acquired by employing the method proposed in this paper benefits from two aspects: on the one hand, from the spiking sequence learning process of spiking neurons, the change of learning accuracy curve in the learning process, and the change of synaptic weight before and after learning, it can be seen that the SNN in the full connection layer of the proposed method has the advantages of high neuron biological simulation, low energy consumption, and high efficiency; on the other hand, the number of layers of CSNN proposed in this paper is lower than that of the deep learning algorithm involved in the comparison, but the experimental effect is better than the latter. It can be concluded that the t-SNE algorithm based on GCOIO proposed in this paper has significant feature extraction ability.
Nomenclature
z: | The latent variable (Gaussian random noise) |
p(x, y): | True data joint distribution |
p(x): | True marginal distribution of random variable x |
p(y): | True marginal distribution of random variable y |
p(y|x): | The probability of occurrence of y on the premise of x about true data (x, y) |
p(x|y): | The probability of occurrence of x on the premise of y about true data (x, y) |
pz(z): | The prior distribution over the latent variables |
: | The marginal distribution of x defined by the generator |
: | Assume that x is transformed by the latent style variable z given the label y, namely, x = G(y, z), z∼pz(z) |
: | The joint distribution formed by p(y) , where a fake input-label pair can be sampled from G by first drawing y ∼ p(y) and then drawing x|y ∼ |
pc(y|x): | The conditional distribution defined by the classifier |
pc(x, y): | The joint distribution formed by p(x)pc(y|x), where x∼p(x), a fake label y is produced by C according to the conditional distribution pc(y|x) |
C: | Classifier that approximately characterizes the conditional distribution pc(y|x) ≈ p(y|x) |
G: | Class-conditional generator that (approximately) characterizes the conditional distribution in the other direction ≈ p(x|y) |
D: | Discriminator that distinguishes whether a pair of data (x, y) comes from the true distribution p(x, y) |
: | The expectation under the condition of true data probability density p(x) |
: | The expectation under the condition of fake sample data probability density pz(x) |
: | The expectation under the joint probability density p(x, y) of true data x and label data y |
: | The expectation under the condition that the probability density of fake sample data and the probability density label data are independent |
: | The expectation under the condition that the conditional probability density pc(y|x) with true data x as the premise and clustering classification label y as the result |
: | The expectation under the condition that conditional probability density pc(y|G(z)) with generated fake sample data G(z) as the premise and clustering classification label y as the result |
: | The expectation under the joint probability density pc(x, y) of true data x and its label data y in the classifier module |
: | The expectation under the joint probability density of true data x and its label data y in the generator module |
ρ(x): | Xwish activation function |
β: | Hyperparameter in Xwish activation function |
μ: | Step size in the fractional order descent method with fixed memory step size |
α: | Order in the fractional order descent method with fixed memory step size |
Γ(·): | Gamma function |
xk-K: | Varying initial instant |
: | The α-th order Caputo derivative by taking xk-K as the initial iteration point |
N: | Subscript of final output iteration point in fixed memory step fractional order descent method |
K: | Subscript of existing iteration points in the fixed memory step fractional order gradient descent method |
ζ: | Fixed memory step size fractional order |
f(x): | Network loss function |
θd: | Parameter of discriminator in triple generative adversarial network (triple-GAN) |
θc: | Parameter of classifier in triple-GAN |
: | Parameter of generator in triple-GAN |
(xd, yd): | Data feature and label involved in discriminator |
(xc, yc): | Data feature and label involved in classifier |
(, ): | Data feature and label involved in generator |
md: | The number of data involved in discriminator |
mc: | The number of data involved in classifier |
: | The number of data involved in generator |
: | Sample points in high-dimensional space |
: | Sample points in low-dimensional space corresponding to high-dimensional space after dimensionality reduction |
pj|i: | Conditional probability of near in high-dimensional space |
qj|i: | Conditional probability of near in low-dimensional space |
Pi: | Probability distribution of points in high-dimensional space |
Qi: | Probability distribution of points in low-dimensional space |
KL(·||·): | KL divergence |
JS(·||·): | JS divergence |
: | The position of the j-th optical source point in n-dimensional space at the t-th iteration |
: | Vertex coordinates at the t-th iteration |
: | Focal length |
: | The function position of the j-th optical source point when its mirror function is a concave function at the t-th iteration |
Z1, Z2: | Random numbers obeying uniform distribution in [0, 1] |
n: | Number of optical source points |
: | The object distance of the j-th optical source point at the t-th iteration |
: | The surface radius of the j-th optical source point at the t-th iteration |
: | The surface center of the j-th optical source point at the t-th iteration |
: | The image length of the j-th optical source point at the t-th iteration |
: | The image position of the j-th optical source point at the t-th iteration |
: | The curvature radius, whose center is on the principal axis passing through |
: | The object length of the j-th optical source point at the t-th iteration |
: | The image distance of the j-th optical source point at the t-th iteration |
: | The reflection image point of at the t-th iteration |
: | Gaussian random number in [0, 1] |
s(t): | A given spiking sequence |
H: | Specific smoothing function |
tτ: | Releasing time of the t-th spiking of neuron |
S(Γ): | All spiking sequences released within the time interval Γ = [0, T] |
M: | Total number of spiking |
: | Number of presynaptic input neurons |
si(t): | Spiking sequence released by input neuron i |
sa(t): | Spiking sequence output by postsynaptic neurons |
: | Input spiking sequence transformed by convolution operation |
: | Output spiking sequence transformed by convolution operation |
: | The link strength between presynaptic neurons and postsynaptic neurons, that is, synaptic weight |
sd: | Target spiking sequence |
E(t): | Error of spiking neuron at time t |
: | Total error of spiking neuron at time Γ |
: | Changes in synaptic weights from presynaptic neurons i to postsynaptic neurons |
η: | Learning rate |
η∗: | Benchmark learning rate |
: | Gradient value of spiking sequence error function for synaptic weight |
F(·, ·): | Inner product of two spiking sequences |
, : | Releasing times corresponding to spiking sequence si and sj |
Ni, Nj: | The numbers of spiking corresponding to spiking sequences si and sj |
κ(·, ·): | Kernel function |
, , : | Releasing times corresponding to the input spiking sequence si, the actual output spiking sequence sa, and the target spiking sequence sd |
Fi, Fa, Fb: | The numbers of spiking corresponding to input spiking sequence si, the actual output spiking sequence sa, and the target spiking sequence sd |
c: | Scaling factor |
: | The frequency of releasing neuron spiking sequence |
S: | Similarity measurement parameter |
: | Synaptic weights between neurons in the last hidden layer and output layer |
: | Synaptic weights between neurons in the input layer and hidden layer |
, : | Releasing times corresponding to the input spiking sequence si and the spiking sequence sh released by hidden layer neurons |
, : | Actual output spiking sequence and the target spiking sequence and the of the o-th neuron in the output layer |
, : | Releasing times corresponding to the actual output spiking sequence and the target spiking sequence of the o-th neuron in the output layer |
, , : | The numbers of spiking corresponding to spiking sequence sh released by hidden layer neurons, the actual output spiking sequence , and the target spiking sequence |
No: | Number of neurons in the output layer. |
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work was supported in part by the Zhejiang Soft Science Research Project under Grant no. 2019C35092, the Science and Technology Commission of Shanghai Foundation under Grant no. 19DZ1205804, the Jiaxing Public Welfare Research Project under Grant nos. 2020AY10033, 2020AY30025, and 2021AY10079, and the General Scientific Research Fund of Zhejiang Provincial Education Department under Grant no. Y202147878.