Security and Communication Networks

Research Article

A Bayesian Classification Intrusion Detection Method Based on the Fusion of PCA and LDA

Algorithm 3

(1)	Use a feature vector X to represent matrix V.
(2)	Generally, classify X into the class with the largest posterior probability value, which is essentially the maximum value of , which is .
(3)	To obtain the maximum value of , you only need to maximize . If the prior probabilities are not known, they are generally considered to be equally probabilistic, i.e., . Otherwise, the knowledge based on probability can be calculated from the prior probability formula: P(C_i) = S_i/S, where S_i is the number of training samples and S is the total number of training samples.
(4)	Define formula: , , and defined as in (1)–(3).
	That is, the linear correlation coefficients of X and Y. and are variances of X and Y. For intrusions coded as Smurf, is calculated and the attributes of other attacks are selected, and so on. denotes the number of sample objects with attribute , denotes the number of sample objects with attribute , and denotes the number of sample objects with attribute belonging to class .
(5)	.
(6)	In addition, the effect will be very unsatisfactory in the environment of small amount of data or high probability of some remote data. This is because the Naive Bayesian formula is a continuous product, so in order to improve the classification effect, we can change the continuous product into a continuous sum: that is to say, change to .
(7)	In the case of a large number of attribute sets, in order to save the cost of computing time, it is generally assumed that the class conditions are independent of each other, that is, the individual attribute values are independent of each other, .
	If is a discrete attribute, the probability can be calculated by the formula: .
	indicates that takes the value of and belongs to the number of training samples of , while represents the total number of training samples in .