#### Abstract

A fuzzy radial basis adaptive inference network (FRBAIN) is proposed for multichannel time-varying signal fusion analysis and feature knowledge embedding. The model which combines the prior signal feature embedding mechanism of the radial basis kernel function with the rule-based logic inference ability of fuzzy system is composed of a multichannel time-varying signal input layer, a radial basis fuzzification layer, a rule layer, a regularization layer, and a T-S fuzzy classifier layer. The dynamic fuzzy clustering algorithm was used to divide the sample set pattern class into several subclasses with similar features. The fuzzy radial basis neurons (FRBNs) were defined and used as parameterized membership functions, and typical feature samples of each pattern subclass were used as kernel centers of the FRBN to realize the embedding of the diverse prior feature knowledge and the fuzzification of the input signals. According to the signal categories of FRBN kernel centers, nodes in the rule layer were selectively connected with nodes in the FRBN layer. A fuzzy multiplication operation was used to achieve synthesis of pattern class membership information and establishment of fuzzy inference rules. The excitation intensity of each rule was used as the input of T-S fuzzy classifier to classify the input signals. The FRBAIN can adaptively establish fuzzy set membership functions, fuzzy inference, and classification rules based on the learning of sample set, realize structural and data constraints of the model, and improve the modeling properties of imbalanced datasets. In this paper, the properties of FRBAIN were analyzed and a comprehensive learning algorithm was established. Experimental validation was performed with classification diagnoses from four complex cardiovascular diseases based on 12-lead ECG signals. Results demonstrated that, in the case of small-scale imbalanced datasets, the proposed method significantly improved both classification accuracy and generalizability comparing with other methods in the experiment.

#### 1. Introduction

Signal analysis in nonlinear dynamic systems is an active area of research in the fields of artificial intelligence and data modeling [1]. However, due to noise in the measuring instrument or environment, the information obtained by sensors may be inaccurate or incomplete. Therefore, fuzzification analysis is usually needed for this kind of signal [2]. In addition, for some complex time-varying systems, it is difficult to obtain large-scale or complete signal sets due to the nonrepeatability of state process or the high cost of signal acquisition. Deep neural network models have been applied to the classification of time-varying signals, including dynamic recurrent neural networks [3–5], deep recursive networks [6], long short-term memory (LSTM) models [7], and deep convolutional neural networks [8–10]. However, these models depend on the completeness of training datasets and are not suitable for the time-varying signal classification of incomplete or ambiguous datasets. Therefore, the application of signal pattern classification and state prediction in complex nonlinear systems still faces some challenges.

Fuzzy neural networks can be effective for incomplete and inaccurate process information modeling and analysis [11]. Several algorithms, combining neural networks and fuzzy processing, have been proposed for signal analysis. Uyar and İlhan propose a genetic algorithm based trained recurrent fuzzy neural networks for the diagnosis of heart diseases [12]. Fei and Wang proposed an adaptive fuzzy neural network control scheme based on a radial basis neural network to enhance the performance of a shunt active power filter [13]. Camastra et al. presented a fuzzy decision system for environmental risk assessments of genetically modified plants, based on a Mamdani inference [14]. Liu et al. proposed a generalized prediction system called a recurrent self-evolving fuzzy neural network (RSEFNN), which utilized online gradient descent learning rules to classify driving fatigue in EEG regression problems [15]. Nazari et al. proposed a fuzzy inference, fuzzy analytic, and hierarchy process-based clinical decision support system for the diagnosis of heart disease. The corresponding fuzzy inference rules were acquired using expert knowledge [16]. Ilbahar et al. proposed a novel approach to risk assessment for occupational health, based on the Pythagorean fuzzy analytic hierarchy process and a fuzzy inference system [17]. Mohamad and Mukhtar developed a weighted Mamdani-type fuzzy inference model for a relative ideal preference system, based on fuzzy if-then rules [18].

A comprehensive analysis shows that most of existing fuzzy neural networks are the fuzzy analysis models based on fuzzy feature extraction and if-then rules. These techniques can be regarded as “static models” and are based on fuzzy logic that requires embedded expert knowledge. In the fuzzy analysis of time-varying signals, fuzzy inference processes are primarily constructed using backpropagation and circulation [19]. As a result, these methods exhibit limitations, such as weak adaptive learning capabilities, low efficiency for large-scale dataset processing, or professional experience requirements.

The radial basis neural network (RBNN) is a widely used kernel function technique [20]. It achieves nonlinear mapping by varying the parameters of nonlinear neuron transformation functions and improves learning speeds of network by linearizing the connection weight adjustments. It offers the advantages of fewer model parameters and low computational complexity and can form effective feature interfaces [21, 22]. Xu and He extended the processing domain of a radial basis neural network to the time dimension and proposed a radial basis process neural network (RBPNN) model [23]. This algorithm accepts multichannel time-varying signals as input and can embed distribution characteristics for typical signal samples, based on radial basis kernel center functions. However, it exhibits a shallow structure with low information capacity and includes strict requirements for sample set completeness. A novel fuzzy neural network can be established when rule-based reasoning for fuzzy logic systems is combined with the feature knowledge embedding mechanism and learning properties of an RBPNN. This provides a new methodology for the fuzzy classification of time-varying signals. Fuzzy neural network also has important applications in the field of robust adaptive control. Kong et al. proposed an adaptive fuzzy neural network control scheme using impedance learning for the multiple constrained robots with unknown dynamics and time-varying constraints, which improved the environment-robot interaction [24]. He et al. designed a boundary control method based on bionics to control a two-link rigid-flexible wing, which effectively improves the mobility and the flexibility of aircraft [25]. He et al. used radial basis function neural network to approximate the aerodynamic perturbation torque and proposed a hierarchical control scheme to study the trajectory tracking problem of microaerial vehicles in the longitudinal plane, and it is shown that the tracking errors are bounded [26].

In this paper, a novel fuzzy radial basis adaptive inference network (FRBAIN) is proposed for multichannel time-varying signal classification. First, the radial basis process neural network (RBPNN) was fuzzified to establish a fuzzy radial basis process neural network (FRBPNN), composed of a time-varying signal input layer, a fuzzy radial basis kernel transformation layer, and a membership degree output layer. The dynamic time warping (DTW) algorithm, which is insensitive to the contraction and expansion of time-varying signals, was used to measure the similarity between time-varying signal distribution features. The dynamic fuzzy C-means clustering (DFCM) algorithm was used to divide the sample set pattern classes into subclasses with similar features. Typical characteristic signal samples for each pattern subclass could be determined, which were used as the kernel centers of radial basis process neurons (RBPNs). The exponential sigmoid function with fuzzy membership properties was used as the activation function, to produce the output of each RBPN based on the fuzzy set membership degree and the fuzzification of RBPN. In this case, the fuzzy radial basis process neurons (FRBPNs) become the parameterized membership functions, relative to the fuzzy set of the pattern subclass. Nodes in the fuzzy rule layer were selectively connected with nodes in FRBN layer according to the signal categories of FRBN kernel centers. A fuzzy multiplication operation was then used to synthesize membership information of fuzzy sets and establish fuzzy reasoning rules. The output of rule layer was regularized and normalized as the excitation intensity. A T-S fuzzy function was used as the classifier, which accepted the excitation intensity of each rule as input to classify multichannel time-varying input signals.

The FRBAIN proposed in this paper can realize the embedding of the time-varying signal each pattern class diversity prior feature knowledge, as well as the structure and data constraints of the model. Through the learning of the sample set, it can adaptively establish the fuzzy inference rules and classification rules with fine-grained and effectively improve the problem that the features of the pattern classes with few samples in the small-scale, and imbalanced dataset are suppressed and weakened in the training, and the robustness and generalization ability of the model are improved.

Cardiovascular disease diagnosis based on ECG signal is a typical multichannel signal classification problem. ECG signals exhibit nonstationarity, irregular periods, stretch drift, and high background noise, resulting in fuzziness and multiple solutions [27]. In 12-lead ECG signals, atrial premature beat, frequent ventricular premature beat, atrial tachycardia, and atrial fibrillation with rapid ventricular rate exhibit similar distributions and complex combination characteristics. In this study, FRBAIN was used to classify and diagnose these four diseases using small-scale and imbalanced datasets, to verify the feasibility and effectiveness of the proposed model.

The remainder of this paper is organized as follows. After discussing the challenges of neural network-based time-varying signal classification, in Section 2, the theoretical framework for the proposed model is given. A comprehensive learning algorithm for the FRBAIN is proposed in Section 3. In Section 4, the classification experiment of ECG signals and result analysis are carried out. In Section 5, the work of this paper is summarized, and the advantages, limitations, and potential applications of this method are pointed out.

#### 2. A Dynamic Fuzzy Radial Basis Adaptive Inference Network

##### 2.1. A Fuzzy Radial Basis Neural Network

The information processing domain in RBPNN was extended to fuzzy sets to establish a fuzzy radial basis neural network (FRBNN). This model consists of a multichannel time-varying signal input layer, a radial basis fuzzification layer, and an output layer, as shown in Figure 1.

In the figure, is a multichannel time-varying input signal, denotes fuzzy radial basis neurons (FRBNs), and is the output of the FRBN. The term is the connected weight between the hidden layer and the network output unit, and is the output of network. The input signal vector can be linearly transferred to the FRBN layer. Information fusion and the fuzzification processing of multichannel input signals were achieved in the FRBN layer, in addition to membership degree output. The fuzzy classification of input signals was performed in the output unit.

The radial basis kernel function was assumed to be an exponential sigmoid with fuzzy membership [28]. The output of the FRBN is then given bywhere is the kernel center vector in the FRBN. The term represents the distance (or fuzzy feature similarity) between and , based on a certain norm, and is a smoothing parameter. and are morphological parameters. The FRBNN output is a fuzzy linear weighted sum of the hidden layer node outputs. It can be calculated as follows:

##### 2.2. The Fuzzy Radial Basis Adaptive Inference Network

###### 2.2.1. The FRBAIN Model

The DFRBAIN is composed of a multichannel time-varying signal input layer, a radial basis fuzzification layer, regularization layer I, a fuzzy rule layer, regularization layer II, and a T-S fuzzy classifier. This structure is shown in Figure 2, where is the multichannel time-varying input signal, and corresponds to the subclass in the pattern class. is the number of pattern subclasses in the pattern class. The FR terms are nodes in the fuzzy rule layer units and T-S is a fuzzy classifier.

The following mapping relationships between the input and output of each FRBAIN layer can be determined from Figure 2.(1)The input layer accepts a multichannel time-varying signal .(2)In the radial basis fuzzification layer, FRBNs are used as the fuzzy set membership functions and exponential sigmoid is used to represent the radial basis kernel function. The output of at the node in this layer can be represented as follows:where is the universal fuzzy set, is the membership function for , represents the kernel center signal vector, is an FRBN smoothing parameter, and . is the number of samples in the pattern subclass.

Kernel center functions for the fuzzy radial basis neurons were determined using the following approach.(1)Multichannel time-varying signal sample sets, containing pattern classes, were used as input. The DTW algorithm [29], which is insensitive to contraction and extension of time-varying signals, was used to measure the similarity between signal sample features. The DFCM clustering algorithm [30] was then used to divide the samples from each pattern class into subclasses exhibiting similar characteristics, before selecting typical samples from each. The pattern class was divided into partitions and the sample set containing a total of pattern subclasses.(2)The selected typical signal samples were arranged as follows: , . Here, the first subscript of indicates the cluster center category and the second subscript represents the cluster center of the pattern class. The terms were sequentially assigned as kernel centers for each node in the RBFPN layer, with total nodes in the FRBN. Structural and data constraints were then produced through the embedding of prior feature knowledge.(3)Regularization layer I normalized the outputs of FRBN layers, which is defined as where represents the regularized output of . The FRBNs were used as parameterized membership functions and the membership degree of the input signal, relative to the fuzzy set, was adaptively determined by learning the instance sample set.(4)The fuzzy rule layer connects the antecedent (regularization nodes) and the conclusion nodes (FR output nodes). Connection rules required that each rule node was connected only to a regular node from each input (after being fuzzed). This process is shown in Figure 2 for the connection between the third and fourth layers. In the classification problem, the fuzzy sets corresponding to the pattern subclasses were the same as the pattern classes. In the -classification problem, the number of fuzzy sets was denoted by . Since the FRBN layer outputs are according to the pattern subclass fuzzy set, the number of nodes in the rule layer is given by In practice, connection rules and connection methods may differ and the number of nodes and generation rules in the rule layer will vary. Using fuzzy multiplication, the output of the rule node can be expressed as follows: where other -normal operators that perform fuzzy “and” operations can also be used in fuzzy multiplication.(5)Regularization layer II processes outputs of the fuzzy rule layer. The output of the node in this layer can be considered the activation intensity of the rule.(6)The T-S fuzzy classifier accepts the normalized rule activation intensity , output by regularization layer II, as input. The output of the T-S fuzzy classifier is then given by where is the activation function for the classifier and and are classifier parameters.

###### 2.2.2. The Extended FRBAIN Model

As seen in (5), an increase in the number of nodes in the FRBN layer will cause an exponential increase in the number of nodes in the rule layer. To solve this problem, an extended FRBAIN (E-FRBAIN) model was constructed by adding a pattern layer between the FRBN and rule layers, representing the membership degree of pattern class fuzzy sets, as shown in Figure 3.

In Figure 3, each node in the FRBN layer converges the output membership degree to the corresponding node in the pattern layer, according to pattern subclass labels for the kernel center and containment relationships with the pattern class. The output of each node in the pattern layer can be calculated using a “sum” or “maximum” operation.

The output of the pattern layer is then given by

In (8), is the regularized output of the FRBN and is the number of signal sample pattern classes contained in the training set. The term is the serial number set for the FRBN layer node corresponding to the pattern class.

In the classification problem, the fuzzy set corresponding to the pattern class is the same as the fuzzy set corresponding to the pattern subclass. Equation (8) suggests the fuzzy membership degree for each node in the pattern layer integrates membership degree information for each pattern subclass in the fuzzy set. The number of fuzzy sets in the -classification problem is denoted by . Multiplication rules require the number of nodes in the rule layer to be , where in practice. Therefore, the E-FRBAIN model effectively reduces the number of nodes in the rule layer, while generated fuzzy rules simultaneously retain membership degree information for pattern subclasses.

###### 2.2.3. Property Analysis

Comprehensive analysis shows that the properties of the FRBAIN are as follows:(1)In this paper, using an algorithm that combines DTW and DCFM, each pattern samples of the dataset are divided into pattern subclasses with more similar features, and the diversity typical features samples of each pattern subclass are determined, which are used as the kernel centers of the RBPN. When the number of typical feature signal samples is determined, the number of nodes in the radial basis fuzzy layer (the first hidden layer) in the model is also determined. The number of nodes from the second hidden layer to the final classification unit of the network model is calculated according to the fuzzy inference rules, which realizes the structural constraints of the model.(2)The typical signal samples of each pattern subclasses are used as the radial basis kernel centers, which implicitly expresses the category features of each pattern signal, realizes the memory and storage of the typical signal distribution features of each pattern class, and strengthens the role of prior feature knowledge in classification. In the fuzzy radial basis kernel transformation layer, the input signals and the kernel centers are measured for feature similarity, and the transformations of the node units in the subsequent each layer are calculated according to the output of the fuzzy radial basis neuron layer, which realizes the data constraints of the model.(3)In this paper, the typical signal feature samples of each pattern class are used as the kernel centers of FRBNs, which can improve the phenomenon that the features of the pattern class with less samples in the imbalanced dataset are suppressed and weakened in the training and reduce the optimization search space of model parameters. Moreover, the model proposed in this paper contains only a few parameters, and the parameters can be determined adaptively through the learning of small-scale dataset, which has a strong ability of signal sample feature identification. It is suitable for the modeling and analysis of small-scale imbalanced datasets in mechanism and can improve the robustness and generalization ability of the model.

###### 2.2.4. Algorithm Complexity

For the FRBAIN model proposed in this paper, assuming that the number of samples in the training dataset is , the number of nodes in the radial basis fuzzification layer is , the number of nodes in the fuzzy rule layer is , and the number of pattern classes is , then the time complexity of DTW algorithm, the radial basis fuzzification layer, the pattern layer, the fuzzy rule layer, and the T-S fuzzy classifier are , and , respectively. Adding all the items together, the total time complexity of the proposed method is .

#### 3. The Learning Algorithm

The FRBAIN learning process can be divided into 3 stages. (1) The DTW algorithm can be used to measure the similarity between signal sample features. The DFCM algorithm can then be used to divide pattern classes for the training set into several subclasses, identifying typical feature signal samples in each subclasses. (2) The total number of pattern subclasses is then set to the number of nodes in the FRBN layer. Typical signal samples are then used as kernel centers for each FRBN while calculating the output. (3) The gradient descent algorithm is used to train the FRBAIN parameters.

##### 3.1. The DTW Algorithm

DTW is a similarity measurement technique for time-series signal distribution characteristics, based on dynamic programming, which combines distance calculations and time warping [31]. The algorithm requires an optimal time warping function , which nonlinearly maps the time axis of time-series signals to the time axis of a reference template. The resulting function satisfies

It is assumed the test template includes an -frame feature vector, the reference template includes an -frame feature vector, and is the distance measurement between the frame feature vector in the test template and the frame feature vector in the reference template. The term is a warping function representing the minimum cumulative distance for each frame of the test and reference templates under optimal time warping. Smaller values indicate higher similarity between two signal distribution features. The primary steps in the DTW algorithm are as follows [32]: *Step 1*. A signal sequence contrast matrix is constructed. *Step 2*. The distance measure and warping cost functions are defined. *Step 3*. A warping path is determined using a dynamic programming algorithm. *Step 4*. The optimal path is identified and the similarity degree between signal sequences is calculated.

##### 3.2. The Dynamic Fuzzy C-Means Clustering Algorithm

The DFCM clustering algorithm is a dataset partitioning technique that acquires membership degree information from each sample point for all cluster centers, through optimization of the objective function, prior to determining sample point classes [33]. The coupling and separation degrees between signal samples are then calculated by setting different clustering numbers, evaluating the corresponding results, and selecting the optimal clustering output.

Suppose the sample set contains signals and clusters. The coupling degree representing the in-class compactness and the separation degree , reflecting the between-class separation [34]. The following formula was used to evaluate the clustering results:where is the coupling weight factor. Smaller values represent better clustering results, and the value corresponding to the minimum of is the optimal number of clusters. Partitions of the sample set produce the best clustering results.

##### 3.3. The Training of FRBAIN

The parameters of FRBAIN include the radial basis kernel center smoothing parameter vector , the morphological parameters and , the connection weight matrix (from the rule layer to the fuzzy classifier), and the parameter vector for the T-S classifier. The specific learning steps are as follows:(i)*Step 1*. The DTW-DFCM algorithm is used to divide the subclasses in each pattern class and determine typical signal samples in each. These subclasses form the kernel centers of each FRBN and determine the number of nodes in the FRBN layer.(ii)*Step 2*. FRBAIN training control parameters are set, and all parameters are initialized.(iii)*Step 3*. The output of each FRBN is calculated for the input signal samples using equation (3).(iv)*Step 4*. The FRBN layer outputs are regularized.(v)*Step 5*. The outputs of each node in the rule layer are calculated using equation (6), and connections are established between the rule and regularization layers.(vi)*Step 6*. Fuzzy classifier outputs are calculated using equation (7).(vii)*Step 7*. The gradient descent algorithm is used to learn all FRBAIN parameters.

#### 4. Experiment and Analysis

##### 4.1. The Datasets

The data used in this experiment consisted of 12-lead ECG signal samples from the Chinese Cardiovascular Disease Database (CCDD). Each recording time was more than 10 seconds and included 9 heartbeats [35]. Each record lasted more than 10 seconds, including 9 heartbeats. The samples are marked with heartbeat segmentation and the diagnosis results by medical experts. Atrial premature beats, frequent premature beats, atrial tachycardia, and atrial fibrillation with rapid ventricular rate exhibit similar distributions and complex combination characteristics. In addition, the number of samples available for different disease types varied significantly. Experimental data consisted of 926 atrial premature beat, 985 frequent premature beat, 408 atrial fibrillation with rapid ventricular rate, and 389 atrial tachycardia samples, selected to form a small-scale and imbalanced database with 2708 signals.

##### 4.2. The FRAIN Model for ECG Signal Classification

In the experiment, the DTW-DFCM algorithm was used to cluster a sample set of 4 diseases. The cluster numbers for the atrial premature beats, frequent premature beats, atrial tachycardia, and atrial fibrillation with rapid ventricular rate were 5, 6, 4, and 5, respectively. There were 20 pattern subclasses clustered in total. The clustering centers of these subclasses were selected as typical feature signal samples, and each pattern subclass corresponded to the fuzzy set of 4 diseases.

Network structure parameters in the E-FRAIN model, shown in Figure 3, were set with 12 nodes in the input layer, 20 nodes in the FRBN layer, 4 nodes in the pattern layer, nodes in the rule layer, and 256 nodes in the regularization layer II. The T-S fuzzy classifier included 256 input nodes and 1 output node. The stochastic gradient descent algorithm with Adam optimizer is used to train the model parameters. The training set samples are divided into 50 batches, each batch has 54 samples, which are trained in batches. Every 50 training cycles, the learning rate will be adjusted to 1/10 of the previous batch. The initial learning rate was set at 0.5. The maximum number of iterations is 500, and the final learning rate is 0.005. When the training error is less than 0.005, the training ends.

##### 4.3. Experimental Results and Analysis

###### 4.3.1. Experimental Analysis

The sample set was randomly divided into 2 groups according to the proportion of illnesses, of which 1800 samples constituted the training set and the remaining 908 samples formed the test set. Property parameters and E-FRAIN connection weights were determined using the learning algorithm discussed in Section 3. Training error accuracy was set to 0.05, the maximum number of iterations was 3000, and the learning efficiency was 0.25. An overall accuracy rate of 87.56% was achieved in classifying test set samples. Corresponding evaluation indexes are shown in Table 1.

As seen in the table, the classification results achieved by the proposed technique are comparable to those of existing algorithms. This is because feature knowledge for typical signal samples, based on pattern subclasses, was embedded in the E-FRBAIN to effectively establish the model structural and data constraints. This approach also had the effect of reducing model parameters, which improved robustness for modeling small-scale and imbalanced sample sets. The membership degree for pattern subclasses was also used as an information unit to improve the model’s identification capabilities for complex signal features, thereby maintaining the diversity of pattern features.

##### 4.4. A Comparative Experiment and Analysis

In the comparative experiment, three types of deep neural network models were selected to directly classify multi-channel process signals. This included the multichannel deep convolutional neural network (MC-DCNN) [10], an algorithm combining LSTM with random forest (LSTM + RF) [36], and the deep gated recurrent unit (GRU) recurrent network (GRU-RNN) [37]. The same training and test sets were applied to each.

The architecture of the MC-DCNN model used in this experiment was , where “Size” denotes the kernel size, and denote the number of filters, and and denote subsampling factors. The terms , , and , respectively, represent the number of input layers, units in the hidden layer, and units in the MLP output layer. A comparative analysis suggested an architecture of 12-8(5)-2-4(5)-2-440-4 to be optimal. The LSTM + RF model used in the experiment was constructed using a series model of two LSTM networks, with 3 hidden layers in each LSTM. A random forest classifier with 100 trees was established in the feature vector space used for classification. The deep GRU recurrent network was superimposed with 5 GRU units and included a Softmax classifier.

A 4-fold crossover method was used in the experiment. The sample set was randomly divided into 4 groups according to the disease proportion, with 677 samples in each group. Three of these were combined to form the training set and one group was used as the test set. Four experiments were performed and the average value of each evaluation index in the experimental results was used as the comparison index. These results are shown in Table 2, where it is evident the proposed technique achieved the best results across all evaluation indicators.

This is because the model embeds diverse pattern class feature knowledge of signal samples in the mechanism. Decisions were then based on a fuzzy set of pattern subclasses. The structural and data constraints are implemented, and the number of model parameters is reduced, which improved signal feature identification ability and generalization. The other models are end-to-end deep learning algorithms for time-series signals, which include more parameters. In the case of incomplete and small-scale imbalanced datasets, the model structure and parameter selection include large degrees of freedom, which can result in overfitting and decreased generalizability. In addition, compared with other comparison methods, the time complexity and training time of proposed method in this paper have been greatly increased, mainly due to the time cost on iterative use of the DTW algorithm in training, but the average precision has been greatly improved.

The average correct recognition rate and mean standard deviations and *t*-test [38] were used as performance evaluation contrast index in the experiments, and the results are shown in Table 3.

In the experiment, the proposed method achieves good results in both the training set and the test set. The three other deep learning models have achieved good results in the training set learning, but the performance index and generalization property of the test sets are greatly reduced.

Comprehensive analysis shows that compared with general fuzzy neural network, the proposed method has advantages in feature knowledge embedding and fuzziness. It can express the features of signal samples in fine granularity, keep the diversity of features, and reduce semantic adhesion. Compared with it, the learning properties and generalization ability of deep neural networks have a strong dependence on the completeness of the dataset. For large-scale complete datasets, deep neural networks have advantages. However, in the case of small-scale imbalanced datasets, the deep neural network models have more parameters and large degrees of freedom, and the features of the pattern class with less samples are often weakened and suppressed during training, and the recognition accuracy and generalization ability are unstable. Due to the embedding of diversity prior feature knowledge, the proposed method can achieve the structural and data constraints, can improve the robustness and generalization ability of the model, and has good adaptability to modeling of small-scale incomplete datasets in the mechanism.

#### 5. Conclusion

A fuzzy radial basis adaptive inference network was proposed in this study, which embeds prior feature knowledge for pattern classes in mechanism, effectively realized structural and data constraints of the model, and improved the modeling properties of small-scale imbalanced datasets. The membership functions for fuzzy sets, fuzzy inference rules, and classification rules could be determined adaptively, based on sample set learning. Due to the DTW-DFCM algorithm used to cluster and divide the pattern subclasses of each pattern class, the number of nodes in each layer can be computable, so that the FRBAIN can be regarded as a deterministic model. Simultaneously, the inference and classification of whole network are based on membership degree information from fuzzy sets, so that the FRBAIN exhibits both fuzziness and randomness. These bring convenience to the practical application of the model and better generalization properties and robustness. Based on the construction of radial basis fuzzification layer and the feature embedding mechanism of fuzzy radial basis neuron, it is convenient to embed the new typical feature knowledge of pattern class, expand and maintain the model, and improve the recognition ability of signal features. The comparative experiments results show that in the case of small-scale imbalanced datasets, the recognition rate of this method is 5.37% higher than other methods in the experiment, and other performance evaluation indicators are also significantly improved. The proposed method has good applicability in small-scale dataset modeling, but for large-scale datasets without obvious statistical characteristics, the computational complexity will increase exponentially. In addition, it has a strong dependence on the selection of typical feature samples, and has higher requirements for the similarity measurement of time-varying signal distribution characteristics, and the workload of selecting diverse typical feature samples in each pattern class is also relatively large. The proposed method can be extended to the field of typical feature embedding in pattern recognition, attention mechanism in image detection and segmentation, model architecture construction in multimodal data integration analysis, and so on, to achieve the structural and data constraint of the model. It has great application potential and value for research in unknown or low-cognition fields.

#### Data Availability

The data used to support the findings of this study are included within the article.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This research was funded by the National Key R&D Program of China, under Grant no. 2018YFC1406203, and by the SDUST Research Fund, under Grant no. 2019TDJH102.