Abstract

Spectrum sensing is of crucial importance in cognitive radio (CR) networks. In this paper, a reliable spectrum sensing scheme is proposed, which uses K-nearest neighbor, a machine learning algorithm. In the training phase, each CR user produces a sensing report under varying conditions and, based on a global decision, either transmits or stays silent. In the training phase the local decisions of CR users are combined through a majority voting at the fusion center and a global decision is returned to each CR user. A CR user transmits or stays silent according to the global decision and at each CR user the global decision is compared to the actual primary user activity, which is ascertained through an acknowledgment signal. In the training phase enough information about the surrounding environment, i.e., the activity of PU and the behavior of each CR to that activity, is gathered and sensing classes formed. In the classification phase, each CR user compares its current sensing report to existing sensing classes and distance vectors are calculated. Based on quantitative variables, the posterior probability of each sensing class is calculated and the sensing report is classified into either representing presence or absence of PU. The quantitative variables used for calculating the posterior probability are calculated through K-nearest neighbor algorithm. These local decisions are then combined at the fusion center using a novel decision combination scheme, which takes into account the reliability of each CR user. The CR users then transmit or stay silent according to the global decision. Simulation results show that our proposed scheme outperforms conventional spectrum sensing schemes, both in fading and in nonfading environments, where performance is evaluated using metrics such as the probability of detection, total probability of error, and the ability to exploit data transmission opportunities.

1. Introduction

Cognitive radio (CR) has been proposed to address the issue of spectrum scarcity resulting from inefficient utilization of spectrum resources [1, 2]. A CR user has unlicensed access to the spectrum under the constraint that primary user (PU) communication is not affected. To ensure this, the spectrum is continuously monitored for PU activity. Spectrum sensing can also be used to detect spectral holes and enable CR users to transmit opportunistically. The performance gain of a CR system is further improved by cooperative spectrum sensing (CSS), where multiple CR users cooperate to detect spectral holes.

While matched filtering outperforms other techniques such as cyclostationary detection and energy detection used for spectrum sensing, its complexity makes it impractical for most systems. Energy detection is the simplest technique, given the limited resources (e.g., energy and computational power) of most CR users. Common spectrum sensing problems such as multipath fading and shadowing can be overcome by exploiting spatial diversity using CSS, thereby ensuring that PU constraints are met [3]. In CSS, individual CR users share their data with a fusion center (FC) that combines local reports to make a global decision. CR users can report the actual amount of received energy, i.e., the not quantized into different levels and then reporting the quantized level which can be represented by fewer bits than the number of bits required for representing the actual amount of energy received. This is called soft-decision combination and results in optimal detection performance but theoretically requires infinite bandwidth [4]. Alternatively, CR users can make a hard decision based on the received energy and report a single bit representing either the presence or absence of the PU to the FC [5]. Hard reporting saves bandwidth but produces inferior results as compared to soft reporting. Linear soft combination has nearly the same performance as likelihood ratio tests [6].

To balance performance and bandwidth efficiency a combination of both soft and hard decisions can be used where the energy range can be quantized, as in [4, 7]. In [4], the authors used a so-called softened hard combination scheme, in which the observed energy is quantized into four regions using two bits, where each region is represented by a label. This achieves an acceptable trade-off between the improved performance resulting from soft reporting and information loss during quantization process. The FC uses a decision combination rule to combine decisions reported by CR users and make a global decision. The decisions of CR users are in quantized form; i.e., instead of reporting a one-bit decision or the actual amount of energy received to the FC, the CR users quantize the received energy into multiple levels and send multiple bits denoting the quantization zone. This is called quantized-hard decision combination [8].

Along with other factors such as the number of participating CR users, the sensing environment, and sensing capabilities of CR users, the FC’s global decision combination rule determines the detection performance of the CR system. For instance, an OR rule results in good protection for the PU but has the lowest spectral hole exploration capability [9], whereas an AND rule improves spectral hole detection but lowers the PU protection capacity. Likewise, poor sensing and/or malicious CR users reduce the performance of the k-out-N decision combination rule. More sophisticated combination rules such as Bayesian analysis and the sequential probability ratio test (SPRT) have better PU protection and spectral hole exploration but require prior information which may not always be available in a given CR environment [10].

The notion of learning from the environment is embedded in the concept of cognitive radios. CR users are meant to monitor the environment and adapt their operating characteristics (operating frequency, transmitting power, etc.) to the changing conditions. To enable CR users to learn from the environment, several authors have considered machine learning algorithms [1116]. Machine learning in spectrum sensing becomes a task of extracting a feature vector from a pattern and classifying it into a hypothesis class corresponding either to the absence or presence of PU activity. Fading and shadowing can make estimating the channel condition difficult, and hence spectrum sensing cannot reliably determine the PU status based on the current sensing slot only [17]. However, machine learning-based spectrum sensing is capable of implicitly learning the surrounding environment. Another advantage of machine learning-based spectrum sensing is that it can reliably detect PU activity without requiring any prior knowledge of the environment.

Machine learning algorithms are classified into two types: supervised and unsupervised. K-nearest neighbor (KNN) is a supervised machine learning algorithm. In KNN, training instances (spectrum sensing feature vectors) are used to form K neighborhood classes. A test instance is then classified into one of K neighbors based on majority voting. The voting is based on statistical information gained from finding the distance between the test instance and the training instances. The distance should be calculated accurately as to truly reflect the classifying class [18]. KNN is the simplest of machine learning algorithms, suitable for the low-complexity requirements of CR users. KNN is also the most stable machine learning algorithm [19].

Authors in [2022] have considered KNN for spectrum sensing. In [20] the authors have considered a binary hypothesis testing and have proposed to optimize the distance between the two classes. The drawback of their scheme is that they have considered soft-decision combination and have used a one-time spectrum sensing which cannot be checked against ground reality. In [21], KNN is used in conventional way as a counting mechanism to fill the spaces in building a TV white space database. The use of KNN is limited to reconstruction of the missing spectrum sensing points and thus, the full capacity of KNN as a classifier is not exploited. In [22], authors have found a global energy detection threshold for different conventional rules of decision combination. These rules are used in conjunction with different classification schemes to classify a test instance which takes the signal strength as a feature vector. The authors in [22] also have used KNN as a counting mechanism and, moreover, the global decision combination rule does not take into consideration the weight of individual CR users and their performance history.

Authors in [23] used multiple antennas for centralized spectrum sensing while in [24] a scheme based on multiple energy detectors and adaptive multiple thresholds for cooperative spectrum sensing was presented. For regional area networks, some improved energy detectors were presented in [25, 26]. Authors in [25] proposed a two-stage energy detector where decisions of both the detectors are fused at a decision device while in [26] multiple antennas were used for spectrum sensing in regional area networks. In [27] both a fixed energy detector and adaptive double threshold were used for cooperative spectrum sensing. In [28] multiple antennas based energy detector utilizing adaptive double threshold for spectrum sensing was proposed while in [29] a comparison between cyclostationary detection technique and adaptive double threshold based energy detection scheme was carried out.

In this paper, we propose a machine learning-based reliable spectrum sensing algorithm in which the FC uses a weight-based decision combination rule. In the training phase, CR users perform spectrum sensing, and based on an acknowledgment signal (ACK) and the global decision, the sensing report is assigned to a sensing class. The sensing class corresponds to the behavior of a CR user in a changing environment which is due to the changing activity of the PU. These sensing classes reliably reflect the activity of the PU and the CR user’s behavior in response to it. After enough information is gathered about the surrounding environment, the classification phase begins. In the training phase the CR users form a local decision. The local decision is in quantized-hard form. The local decisions of the CR users are sent to the FC and the FC takes a global decision. The CR users stay silent or transmit according to the global decision. If the CR users transmit and ACK is received in the next time slot then the transmission was successful. Based on the global decision and the status of ACK signal sensing classes are formed. The training phase is over when enough training data for the sensing classes is gathered. In the classification phase, the KNN algorithm is used, where the sensing classes obtained in the training phase are treated as neighbors for the test instance, which is the current sensing report. The Smith-Waterman algorithm (SWA) is used to accurately find the distance between the current sensing report and the neighboring classes. Based on quantitative variables like the conditional probability and posterior probability, which are calculated through KNN, the current sensing report is classified into one of the sensing classes, corresponding to either the absence or presence of the PU. The local decision is then reported to the FC, where the local decisions of all CR users are combined to make a global decision, taking into consideration the reliability of each CR user.

The proposed scheme uses the quantized information as opposed to the soft-decision combination scheme that is proposed in [30]. The spectrum is sensed multiple times in a sensing slot, which makes the proposed scheme more reliable since temporal diversity to the spectrum sensing process is added as wireless channel changes rapidly. The scheme proposed in [30] was based on one-time spectrum sensing while we add a verification mechanism in case that the spectrum sensing decision is absence of PU activity. The classification problem in the proposed scheme is a multilabel one where the current spectrum sensing report is classified into eight different classes. These eight different classes belong to either hypothesis. But the division of the binary hypothesis into subclasses makes the proposed scheme more accurately analyze the PU activity. In addition, the scheme proposed in [30] used the KNN in the traditional way as a counting mechanism. On the other hand, we in the proposed scheme use posterior probability to find the nearest neighbor and utilize KNN to calculate the conditional and prior probabilities.

In the reference of [31], KNN was simply used for data recovery in white space database as a mechanism for majority voting. The classification problem in [31] is also a binary one and the KNN decides a label based on majority labels of the neighboring data points. The proposed spectrum sensing scheme is different from that of [31] in that quantized energy levels are used to train the classifier and then the sensing reports are used to find the class label of the current sensing report by finding the distance between them. Instead of majority voting, we have used an efficient distance measuring algorithm, Smith-Waterman algorithm (SWA) to calculate the similarity of the current sensing report and the training reports.

Mikaeil et al. proposed different classification schemes which work on thresholds calculated through different fusion rules [22]. In the paper, we utilize a different fusion rule at the fusion center which takes into consideration the weight of different CR users before taking a global decision. The focus of [22] is to find the thresholds for different schemes and KNN is used as one of the classifications schemes. On the other hand, in the proposed scheme, the fusion rule utilizes the distance between the test report and the training reports intrinsically at the CR user level and at the FC the historical accuracy of each CR user is also taken into consideration. In this way, the global fusion rule at the FC makes use of the training reports as well as the history of performance of each CR user. Therefore, the global fusion rule is more robust as well as reliable.

Spectrum sensing has been incorporated into satellite communications, 5G as well as MIMO schemes. The growing need for spectrum has made spectrum sensing crucial for next generation’s communication technologies. Authors in [32] employed CR for future broadband satellite-terrestrial communications under the broader framework toward 5G, while the authors in [33] employed joint spectrum sensing and channel selection optimization for satellite communication based on cognitive radios. The concept of the PU as employed by the CR network was employed for satellite cluster communications where the presence of the primary satellite system was detected using the concepts of spectrum sensing by J. Min et al. [34]. The spectral efficiency of MIMO systems which has hybrid architectures were investigated by [35] by investigating the optimal number of users in the system while in [36] the upper limit of downlink spectral efficiency and energy efficiency were investigated in massive MIMI systems with hybrid architecture.

The rest of the paper is organized as follows: Section 2 describes the system model; Section 3 describes the spectrum sensing scheme which consist of KNN algorithm, SWA, the training phase, and the classification phase in detail; Section 4 describes the cooperative spectrum sensing and the global decision combination in detail; Section 5 discusses the results; and Section 6 concludes the paper.

2. System Model

In this section the energy detection method used and the quantization method which is employed are discussed. This section deals with forming of sensing report which is used both in training phase and classification phase of the spectrum sensing scheme. We consider CR users that continuously sense the spectrum report their local decisions to the FC through a dedicated control channel [4]. The CR user transmits information if a spectral hole exists which is determined by the FC. CR users can either transmit or receive at a given time; i.e., they operate in half-duplex mode. CR users are assumed to be close to the PU and outside the range of other PUs. The system model is presented in Figure 1.

CSS introduces spatial diversity, while temporal diversity is introduced by dividing the sensing slot into minislots. We consider a slotted time-frame structure, where the first slot is used for spectrum sensing and the second slot is used for transmitting CR user data. The authors in [37] investigated the optimal sensing slot duration. In this work a suboptimal sensing slot duration is considered. The sensing result may change when fading and shadowing phenomena are present. Temporal diversity counters these effects by sensing the spectrum times in the sensing slot. In this work, the sensing slot is further divided into minislots. In each minislot, the spectrum is sensed independently. The sensing performance can be improved if the number of minislots and hence the sensing duration are increased but that results in lesser duration for the transmission slot. The authors in [37] investigated the optimal number of minislots for sensing-throughput trade-off in CRNs. According to [37], diversity reception is introduced in the sensing process by sensing the channel independently in minislots within the same sensing phase. In our proposed scheme the results of these minislots are combined to form a sensing report which later is used in the classification phase as given in Section 3.2. The sensing reports were previously used in [8] to calculate trust of each CR user in a CRN which is under-attack by malicious users. In this work the sensing reports are used to train the classifiers and then later used for classifying the current sensing report. A half-duplex CR user system is considered in which in the sensing slot the CR users remain silent. If in the sensing slot it is decided that the PU is absent then the CR users transmit in the transmission slot; otherwise the CR users remain silent. When the duration of one-time frame, which consists of a sensing slot and a transmission slot, is over the CR users sense the spectrum again. Energy detection is used in each minislot. The energy received in the -th sensing slot by the -th CR user at the -th minislot, , can be expressed aswhere , is total number of minislots, is the j-th energy sample received at the k-th minislot of the -th sensing slot by the i-th CR user, and N0 is the total number of samples, given by . T and B are the detection time and signal bandwidth in Hertz, respectively. The number of samples received in a particular minislot is dependent upon the bandwidth of the sensed spectrum and the sensing time. The received signal in the absence of PU () and presence of PU () is given as follows:where is zero-mean additive white Gaussian noise (AWGN) and is the j-th sample of the PU signal received at the k-th minislot of the of the -th sensing slot by the i-th CR user.

It was shown in [37] that if the primary signal is absent the probability density function of the energy of the received signal at the i-th CR user () follows a central chi-square distribution with mean and variance ; otherwise it follows a noncentral chi-square distribution with mean and variance , which can be estimated aswhere is the signal to noise ratio (SNR) of the received signal at the i-th CR user.

When the total number of samples, N0, is large, the energy signal received, , under both hypotheses and can be approximated by a Gaussian random variable. In our scheme, the energy signal at each minislot is quantized into discrete zones. Multiple bits representing the corresponding zone are transmitted to the FC, rather than transmitting a continuous energy variable (a soft decision) or a single bit (a hard decision). An M-level quantizer of an input variable is represented by a set of quantization levels and a set of quantization thresholds. These quantization thresholds determine the accuracy to which the quantization levels represent the actual received signal.

In the paper, the slotted-frame structure is considered where a frame is one unit of accessing the spectrum. The first slot, called the sensing slot, in each frame is used to sense the spectrum to decide whether the PU is active or not. If it is decided in the sensing slot that the PU is absent, the CR users transmit in the transmission slot. Otherwise, they remain silent for the duration of the transmission slot. When the duration of transmission slot is over, the CR users will start sensing the spectrum again.

Because wireless channel changes rapidly, the spectrum is sensed multiple times instead of only once so as to consider the changing behavior of the channel. To do this, in the paper, the sensing slot is divided into minislots. In each minislot, the spectrum is sensed independently and based on the result, a sensing report is formed. A sensing report is formed according to the quantized decision of each minislot, which is expressed by (4) and will be used in the classification phase later. For spectrum sensing, the energy detection is utilized where samples of received energy are summed and compared with a threshold and based on the comparison result it is decided that whether PU is present or absent.

In this work the number of quantization levels is four, i.e., . These levels or quantization zones are represented by Z1, Z2, Z3, and Z4. Zones Z1 and Z2 represent low energy or the absence of the PU, while Z3 and Z4 represent high energy or the presence of the PU. The quantized energy zones are given aswhere represents the quantized energy for the k-th minislot of the -th sensing slot of the i-th CR user and , , and are the thresholds that differentiate different quantization zones. The set of quantization zones is and the set of thresholds is ,,. Equation (4) signifies that, in case of H0, the average received energy at i-th CR user at the k-th sensing slot () can be quantized into either Z1 or Z2 and in case of H1, is quantized into either Z3 or Z4. According to our quantization scheme Z1 and Z2 represent H0 and Z3 and Z4 represent H1.

At each sensing slot, a sensing report is formed that consists of symbols belonging to q. The report for the i-th CR user at the w-th sensing slot is called sensing report and is represented by , which contains n elements belonging to q (the sensing report formation is further explained in Section 3.1). This report is used as a feature vector for the machine learning algorithm. During the training phase, this report is assigned to a sensing class based on ACK and the global decision, which will be discussed in detail in Section 3.1. The next section describes the spectrum sensing algorithm at the CR user level.

3. Spectrum Sensing

The proposed spectrum sensing scheme aims to improve PU detection capability under varying environments to improve spectral hole detection. The first goal protects the PU’s data from harmful interference and is the foremost constraint specified by IEEE 802.21 which is the standard for accessing TV white spaces [38]. The second goal efficiently exploits spectrum access opportunities, enabling the CR user to transmit data. For the i-th CR user at the w-th sensing slot, channel availability is decided on the basis of the energy vector (). To correctly map to PU activity, the behavior of the PU has to be learned. Thus the energy vector in our case is analogous to a feature vector in the context of machine learning.

To construct a classifier, i.e., to classify the current sensing report into channel available (H0) or channel busy (H1) classes, a training phase is needed. Each CR user stores energy vectors of size W, where W is the length of the training or training phase. In training phase, the slotted-frame structure is used as explained in Section 2. As explained in Section 2, a one-time slot has a sensing phase and a transmission phase. There are W slots in the training phase. These vectors are input of a classifier in the classification phase, where the current sensing report is compared with previously stored sensing reports to decide between H0 and H1.

In our proposed scheme, first the CR users learn the behavior of the PU by mapping the generated quantized energy vectors, which are called sensing reports, to the accurate status of the PU. The true status of the PU is found through ACK and a reliable combination of local decisions of CR users determined by the FC. The function of the CR user in the training phase is different from its function in the classification phase. In the training phase, sensing reports are assigned to sensing classes according to the actual activity of the PU and the corresponding behavior of the CR user. In the classification phase, sensing reports are sorted into one of the sensing classes using KNN. To accurately calculate the distance between the current sensing report and existing members of the sensing classes, SWA is used. Section 3.1 describes the training phase, while Section 3.2 describes the classification phase.

3.1. Training Phase

In this phase, the operating environment is learned by gauging the behavior of the CR user to the changing activity of the PU. The i-th CR user generates a sensing report , makes a local decision on the basis of the average received energy in the current sensing slot, sends the local decision to the FC, and based on the result of FC and the status of ACK assigns the sensing report to a sensing class. This section will explain these steps in detail.

Let the energy received in the -th sensing slot at the i-th CR user be represented by which is given aswhere is given by (1).

The local decision for the i-th CR user at the -th sensing slot in the training phase is represented by and is given by [3]

The local decision is sent to the FC, which combines local decisions from all CR users and renders a global decision. In the training phase, the simple majority rule is used as the rule of decision combination. The symbol (quantization zone) reported by the majority of CR users determines the global decision at the FC. As can be seen from (6), the local decision during the training phase is in the quantized-hard form, so the global decision at the FC is also in the quantized-hard form. The sensing report of a CR user is as shown in Figure 2. The local sensing report was explained above in the previous section. In Figure 2, the first six minislots constitute a local sensing report. As can be seen every element of the report belongs to q. For every CR user, at every sensing slot a sensing report (the current sensing report is represented by ) is formed, and the local decision is taken according to (6).

Next, the global decision is returned to the CR users. The CR users either transmit or remain silent based on the global decision. If the CR global decision is H0 then this can be verified by the ACK signal which is sent by the CR receiver to the CR sender after the CR receiver receives the transmission. As overlay cognitive radio network is considered, so, there is no interference to the PU communications. The ACK signal is affected by the PU communication only when the spectrum sensing result is wrong and in-fact the ground reality is H1. Based on the local decision and the global decision, there are eight possible cases for the CR user and the sensing classes according to our system model. These possible cases called observations are given below.

Observation 1. The local decision () is Z1 and the global decision is also Z1. The CR user transmits its data. If ACK is received, it means the sensing result was correct and the actual status of the PU was H0. Through the ACK signal, the true status of the PU is known. The sensing report corresponding to this decision () is stored in a class labelled as R1 while in case of absence of ACK signal it is stored in R2.

Observation 2. Both the local decision () and the global decision are Z1, or the global decision is Z2 and the local decision is Z1. The CR user will transmit, but ACK is not received, meaning that the sensing decision was wrong and the PU was available. The CR user will store in a class labelled R2. If ACK signal is received it will be stored in R1. If the local decision is Z1 and the global decision is Z3 or Z4, then will also be stored in this class.

Observation 3. The local decision () is Z2 and the global decision is also Z2. The CR users follow the procedure as explained in Observation 1. If ACK is received, the sensing decision is correct and the PU is not present. is stored in a class labelled R3, otherwise it is stored in R4.

Observation 4. The local decision is Z2 and the global decision is Z1 or . The CR user transmits, and if ACK is not received, is assigned to the class with label R4, otherwise in R3. If the local decision is Z2 and the global decision is either Z3 or Z4, then again will be stored in the class labelled as R4.

Observation 5. The local decision is Z3 and the global decision is also Z3. There will be no transmission in this case. The true status of the PU thus cannot be known. will be assigned to a class which is labelled as R5. The sensing report will also be stored in class R5 if the global decision is Z4 and the local decision is Z3.

Observation 6. The local decision is Z3 but the global decision is either Z1 or Z2. The CR user will transmit. If ACK is received, will be stored in a class labelled R6, otherwise it will be stored in R5.

Observation 7. Both the local and global decisions are Z4. There will be no transmission and will be stored in the class labelled as R7. will also be stored in R7 if the local decision is Z4 and global decision is Z3.

Observation 8. The local decision is Z4, but the global decision is either Z1 or Z2. The CR user will transmit. If ACK is received, will be stored in class R8. If no ACK is received, will be stored in R7.

In the observations above it can be seen that ACK signal is used when the global decision is H0. When the global decision is H1 the CR users do not transmit and thus ACK signal cannot be used to ascertain ground reality. So, in the case when H1 is the global decision at the FC the CR users store the current sensing report in the classes R5 and R7 as the current sensing decision cannot be verified in any other way than at the risk of causing interference to the PU transmission.

The observations are given in decision tree form in Figure 3. As the observations do not stem from one set of decisions, there is no unified root of the decision tree. The decision trees are given in four partitions depending on the local decision. The local decision is abbreviated as LD and the global decision as GD in Figure 3. Figure 3(a) corresponds to the case that the local decision is Z1 and Observations 1 and 2 are obtained. Figure 3(b) corresponds to the case that the local decision is Z2 and the Observations 3 and 4 are obtained. Figures 3(c) and 3(d), respectively, correspond to the cases that the local decisions are Z3 and Z4 and the Observations 5, 6, 7, and 8 are obtained.

These observations help learn the CR user about the surrounding environment and its behavior in response to the environment and also give CR users historical data that can be used in conjunction with the current sensing behavior to more reliably predict the PU status. This process can be seen as cooperative learning where not only is the individual CR user taken into account, but also the impact of other CR users is incorporated through the global decision. This adds spatial diversity to the learning process, where a receiver with better signal to noise ratio (SNR) conditions can drive the behavior of CR users with poorer SNR conditions.

The training phase is run until the CR user is sufficiently trained in the behavior of the surrounding environment, including changing the SNR conditions and changing the behavior of the PU. Fading can also temporarily affect the signal and thus the energy received due to the continuously changing sensing environment. The training scheme developed takes into consideration the presence of fading and thus store sensing reports that may have been the results of either fading or bad sensing in their corresponding categories. As the learning is based on the ACK and reliable decision combination at the FC, classes based on training more reliably reflect the sensing environment and PU activity. The results of either fading or bad sensing at the CR user level are found in the above observations, where the local decision is different from the global decision or when ACK is not received.

The training data is collected locally at each CR user in the training phase. The performance of machine learning techniques is dependent upon the size of the training phase. As the training size increase, the performance also improves. With an increase in the number of CR users a larger area under the PU is covered. Because our training model incorporates the global decision by acting according to it and also through the ACK signal the ground reality is known, the training phase can accurately know the behavior of CR users to the PU activity. With a large number of CR users, each CR user can reflect the ground reality in its training classes through the global decision. With a large training phase, the behavior of CR users to varying nature of the PU activity also can be accurately known. In conventional machine learning techniques, the training phase can gather adequate amount of training data to know the environment. Knowing the exact nature of PU activity is practically not feasible because of the random nature both of wireless channel and of the PU activity. But as will be shown in simulations, given a sufficiently large size of the training phase, the system detection performance can converge even at a very low SNR.

Figure 4(a) presents the frame structure when the FC decides that the PU is present during training phase. The CR users remain silent during the transmission phase in this case. Different operations in the sensing phase happen as first the local decision is made, then the local sensing decision is sent to the FC through a CCC. The FC combines the local sensing decisions and decides whether the PU is present or absent. If the FC decides that the PU is absent then the CR users transmit and hear for the ACK signal over the same channel on which transmission has been done. The CCC is not used for establishing links between the CR users. Rather it the communications happen between the CR user through the spectrum which is licensed to the PU and which is accessed by the CR users if the PU is absent. Figure 4(b) presents the time frame for the case when PU is absent during training phase. On the basis of the ACK signal the sensing report of the sensing slot is assigned into the classes as defined by the observations above. The frame structure is different for training phase from classification phase. In the training phase the sensing classes are updated on the basis of status of the ACK signal which helps in training the CR user to accurately reflect the ground reality.

3.2. Classification Phase

In the previous phase, information was gathered regarding the operating environment and the CR user behavior in response to the changing environment. Learning the environment is made especially difficult by the nature of CR networks. Because of the noisy sensing environment, CR users only obtain partial observations of the environment variables. In addition, CR users must also transmit data. This results in a trade-off between sensing time and throughput: the higher the sensing time, the more accurate the sensing result and thus the more efficient the learning. Therefore, partial observability and capping the sensing time complicate the learning process. A third limitation is that a PU is considered to be autonomous. A CR user may not have any prior information about PU behavior, its operating characteristics, the RF environment, interference levels, or noise power distribution.

Our learning scheme addresses these issues. Partial observability is addressed by incorporating the behavior of other CR users into the learning process through the global decision. The ACK enables CR users to better learn the operating environment and divide the sensing observations into their respective classes more accurately. Our learning scheme requires no prior information and can efficiently map sensing performance to the changing activity of the PU, thus enabling the CR user to more reliably detect the PU.

A frame structure during the classification phase is presented in Figure 5. In the local decision making phase, the spectrum is sensed and a sensing report is created. The first six minislots in the local decision making part of Figure 5 represent the local sensing report. The second part is the classification phase discussed in Section 3.2.3. The last part of the local decision making slot is the reporting phase, where the local decision is reported to the FC, the global decision is returned and the CR user takes action accordingly. The transmission phase follows the local decision making phase.

In this section, we will present in detail how the current sensing report is classified into one of the training classes. KNN, a machine learning algorithm, is used to accurately classify the current instance into one of the sensing classes and thus reliably detect PU activity. Section 3.2.1 presents the KNN algorithm.

3.2.1. K-Nearest Neighbor Algorithm

KNN is a distribution-free machine learning algorithm that classifies observations into one of several classes based on quantitative variables. KNN, being a distribution-free method, is suitable for the context of cognitive radios. KNN classifies a test instance, in our case the current sensing report as described in Section 3.1, into one of several neighboring classes by majority voting. The voting can be modified to calculate the distance between any two sensing reports. In the context of CR networks, it is highly improbable that any two sensing reports are exactly the same, so we have to measure the similarity between them.

The classification plane is divided into a number of neighbors and the distance of the current sensing report to each of those neighbors is found. For the sake of notational simplicity let us denote the sensing report of the current sensing slot at the i-th CR user by onwards. Let be the distance, where represents the neighbors, or the sensing classes obtained in Section 3.1, given by . The distance is calculated to each of the neighbors representing either H0 or H1. Based on the calculated distance, the current sensing report is classified either to H0 or to H1. Section 3.2.2 shows how the distance is calculated and Section 3.2.3 shows the procedure for using KNN for classification.

3.2.2. Smith-Waterman Algorithm

The Smith-Waterman algorithm (SWA) [39] is a local alignment algorithm that calculates an accurate distance between two vectors. The sensing reports in our case can differ from each other due to spatial and temporal diversity, so the voting method conventionally used in KNN, which is based on finding a match or a mismatch, is not applicable here. Instead, we focus on measuring the similarity between sensing reports, using SWA to calculate the distance between the current sensing report and the sensing classes.

SWA consists of three stages: training, matrix fill, and trace back. The three stages are briefly described as follows.

Training: one sensing report is arranged horizontally and the other vertically. The top row and the leftmost column are initialized to 0.

Matrix fill: let the sensing report arranged vertically be and the sensing report arranged horizontally be . Each element of is compared with every element of and the score is computed according to the matrix fill equation as follows:where are indices of the elements of report and report , respectively, is the p-th element of report , is the -th element of report , is the similarity reward between two characters, and is the gap penalty (dissimilarity) that determines the degree of mismatch between and to be penalized. Different reward and penalty values are defined for different types of sequences and applications. Here, we use intuitive values based on experimental results. The gap penalty is determined asand the similarity reward is calculated as

It is important to note that and , which means that the similarity reward and gap penalty have the commutative property. The similarity score between two sensing reports is obtained by taking the maximum element of the score matrix (. The similarity score of the -th sensing report when compared with the j-th sensing report is given as

Trace back: the third stage of the SWA is called trace back and is performed to align sequences based on the scores computed in the “matrix fill” stage. Since our objective is just to find the similarity score, the trace back stage is not required in our work.

3.2.3. Classification

As is explained a sensing report have n elements belonging to q. The sensing report (xi) has to be classified into one of the sensing classes, which are treated as neighbors for xi. The candidate set of neighbors for xi is denoted by and contains all classes as found in Section 3.1 such that and each CR user has its own version of sensing classes.

The current sensing report is compared with every member of each of the sensing classes belonging to . The membership counting vector is represented by . Each element of is the result of comparing xi with the j-th member of the l-th sensing class which is computed by (10). Let be the event that sensing report xi belongs to class l and be the event that sensing report xi does not belong to class l. Furthermore, let be the event that elements in are greater than a threshold. Then the posterior probability () that the current sensing report xi belongs to class l is found asBased on the posterior probability, the local decision for the i-th CR user at the r-th sensing slot, represented by qi,r, is given aswhere P0 is the sum of posterior probabilities of sensing classes representing H0 and is given asand is the sum of posterior probabilities of sensing classes representing H1 and is given as

4. Cooperative Spectrum Sensing

The FC receives the local decisions as Di where . In CSS, the sensing capabilities of CR users are different from each other which results in different local sensing results [40]. In the proposed scheme, we use a weight-based decision combination at the FC. Each CR user is assigned a weight based on its effectiveness.

A partial global decision at FC, represented by , is made by excluding the result of the i-th CR user aswhere is the number of CR users reporting H0 excluding the local decision of the i-th CR user and is given aswhere is indicator function for and is given by

On the other hand, is the number of CR users reporting H1 excluding the local decision of the i-th CR user and is given aswhere is indicator function for and is given by

Partial global decisions are found for all CR users. The local decisions are then combined through a majority rule as and can be expressed aswhere is the number of CR users reporting and is the number of CR users reporting . Based on (15) and (20), the weight for each CR user, , is calculated as

The cumulative weight for each hypothesis where is then calculated aswhere is given byThe final global decision is denoted by and is calculated as

The global decision is returned to CR users and the CR users then transmit or stay silent according to the global decision.

Let where is the channel gain between the primary user and the i-th CR user during the k-th minislot and is the mean SNR as received from the PU. If it is assumed that the system’s coefficients are known, then the system probability of false alarm under nonfading channels is given as [37]where is the complimentary distribution function of the standard Gaussian, i.e., and is the system target probability of detection. The probability of detection and probability of false alarm of the proposed scheme depend both on the probability of the sensing report falling into a particular quantization zone and on the number of minislots in the sensing slot. The target probability of detection and target probability of false alarm are depended upon the number of quantization zones, the portability that under a particular hypothesis the sensing decision will fall in a particular quantization zone and the weight of each quantization zone. The quantization thresholds are adjusted such that the optimal quantization thresholds are found. On the basis of quantization parameters the target probabilities of detection and false alarm are optimized. For cooperative spectrum sensing the target probability of detection, if the weight of the quantization zones is considered the same, i.e., that each quantization zone contributes the same to the final decision combination, can be given as [41]where is the number of CR users having the local sensing decision in zone , is the largest integer less than m, and is the probability of having the local sensing decision in quantization zone under .

The system probability of detection can be given as [37]where is the system target probability of false alarm and is given by [41]and is the probability of having the local sensing decision in quantization zone under .

5. Results and Analysis

In this section we observe the behavior of our proposed scheme and compare it to other schemes through system parameters such as probability of detection, probability of error, and probability of spectral holes exploitation. In [8], the effect of introducing multiple bits for reporting and sensing the spectrum multiple times within the same sensing phase was investigated where the scheme utilizing reporting multiple bits and multiple minislots was shown to be robust against all kind of attacks. Authors in [3, 37] have also shown the reliability gain which is brought by using multiple minislots. The number of CR users is 5, the number of iterations is 1000, the sensing slot duration is 1 ms, the sampling frequency is 300 kHz, and the number of energy samples in each sensing slot is 600. The idle probability of PU is 0.5. The SNR range is from -25 to -10 dB. When the number of CR users is large, clusters are formed for spectrum sensing to reduce the overhead. Authors in [42] considered clusters to sense the spectrum where the number of CR users in each cluster was five. That is, when cluster is considered, the CR users send their local decisions to a cluster-head to reduce the number of direct reports sent to the FC. To consider a higher number of CR users, the concept of cluster needs to be adopted. However, it is beyond the scope of the paper. However, according to [42] as the number of clusters and thus the number of CR users increase the sensing performance also improves. The idle probability is used as 0.5 in literature for the sake of fairness ([8, 42]). If the idle probability of PU is increased, it will provide higher opportunities of transmission to the CR user. Therefore, the idle probability of PU in the paper is taken as 0.5 for maintaining fairness among CR and PU systems. As the idle probability of PU is considered equal to that of probability of activity of the PU, the target detection probability for channel without fading is set to be 0.8 at SNR of -20 dB. The detection probability as is set in this paper with a higher active probability of the PU of 0.5 (the authors in [37] considered a low active probability of PU of 0.3) guarantees the protection of the PU data. We measure the performance of our proposed scheme in both the AWGN channels and also in fading channels by observing our scheme’s behavior and also of other schemes behavior through varying SNR conditions for different system parameters. The training phase strongly impacts the system performance, as through this phase, the sensing classes are developed. The larger this phase, the greater the number of training instances, which means the current sensing report has more similar reports to match with. We plot the proposed scheme with two variants. In one, the training phase is 100 iterations and in the other it is 330 iterations. These schemes are compared with a scheme in which the CR users make a one-bit local decision and the local decisions are combined at the FC by using a conventional OR rule.

In this paper, the probability of error (Pe) is given aswhere Pd is the probability of detection, Pf is the probability of false alarm, P(H0) is the prior probability of H0, and P(H1) is the prior probability of H1. The probability of detection (Pd) is defined asand the probability of false alarm (Pf) is defined aswhere H is the real status of the PU and is equal to a randomly generated stream of ones and zeroes with size equal to the total number of iterations. A one represents the presence of the PU, while a zero represents absence of the PU. The notation means the number of times the condition in the subscript is satisfied. The probability of spectral hole exploitation is represented by Pnf and can be expressed as

Soft-decision combination gives the optimal sensing performance [4]. In [4], it is also shown that hard decision combination gives inferior results but only has one-bit overhead while soft combination incurs a lot of overhead. In one-bit hard combination scheme, sensing information was lost during local decision making because of using only one threshold. By using multiple thresholds, the sensing information loss can be reduced, which leads to better performance, and more overhead. In [7], it is also shown that using two bits for reporting the local decision can significantly improve the sensing performance. The effectiveness of using two bits (four quantization levels) was shown for both perfect and imperfect reporting channels. In [43], H. Sakran et al. utilized three bits to report the local decision to the FC. The performance improvement was shown to be better than using two bits for reporting local decision. In summary, it is obvious that trade-off exists between spectrum sensing performance and overhead when we design the quantization levels. Therefore, in the paper we mainly focus on applying machine learning algorithm into Smith-Waterman algorithm-based soft-decision spectrum sensing by considering the case of four quantization levels. To consider more quantization levels than 4 levels, the whole problem formulation such as the observations in Section 3.1 and the classification classes have to be changed and redesigned. Therefore, simulation results are bounded to the case of four quantization levels.

In the training phase the probability of detection of the proposed scheme is equal to that of majority rule which uses quantization. In machine learning technique, the performance of the proposed scheme is dependent upon the classification phase. In the simulations, the probability of detection is composed of those of both the training and classification phase. Similarly, training data in the proposed scheme is required to train the KNN classifier, and the performance of the classifier is depended on the training size of the data. The proposed scheme utilizes the majority rule to get training data. Since malicious users or anomalies are not considered in the paper, the majority rule works by majority voting and corresponding performance will be dependent upon local sensing decisions of the CR users. When the training phase is over, the classifier will have ample data available to the changing behavior of PU and will be trained.

Figure 6 shows the system detection performance in an AWGN channel. The proposed scheme with the larger training phase outperforms the other two schemes. The proposed scheme with a smaller training phase has the same detection performance as an OR rule in the low SNR regime. The reason is that the sensing reports in low SNR regimes do not have large distances from each other. The energies received under both hypotheses in the low SNR regime vary little from each other and thus, the scheme with fewer training instances fails to learn the environment more reliably. As the SNR improves, the proposed scheme with the smaller training phase results in more reliable spectrum sensing than conventional schemes. Figure 7 shows the error performance as calculated by (29). In this figure, it can also be seen that the proposed scheme with the larger training phase has a low probability of error even in the low SNR regime. The scheme with the smaller training phase converges to one with a larger training phase in better SNR conditions, which shows that even with a smaller training size the proposed scheme can result in more reliable spectrum sensing than conventional schemes.

Figure 8 shows the capability of the proposed scheme to exploit spectral holes which is defined by (28). Exploiting available opportunities for transmitting data is the highest priority from the perspective of a CR user. Even in bad SNR conditions our proposed scheme enables CR users to exploit data transmission opportunities. The proposed scheme with the smaller training phase lags behind the one with the larger training phase in bad SNR conditions but converges to the scheme with the larger training phase in good SNR regimes.

In the region of high SNR, the sensing reports which are formed are better reflections of the PU’s activity. The sensing performance can be improved under the region of high SNR regimes since the PU signal will take larger portion of the received signal, compared to the added noise. That is to say, when SNR gets larger, a smaller number of training samples and further a smaller size of the training window are required to train the classifier. Therefore, when the SNR improves, a smaller training size results in the same performance. On the other hand, in the region of lower SNR, a larger training size and a higher training size are needed to accurately reflect the PU’s activity. All the three schemes show same performance trend but at different SNR levels. The OR rule has the best detection performance among conventional schemes, as it uses the most relaxed criteria for declaring, whether the PU is present or not out of all the conventional rules. However, this means that the OR rule cannot efficiently exploit data transmission opportunities. These figures show that our proposed scheme can protect PU data more effectively as well as provide more data transmission opportunities.

Figure 9 shows the detection performance of the proposed scheme in a fading environment. Fading affects the power of the received signal and thus the number of energy samples required to efficiently decide the status of the PU. In nonfading environment the amplitude gain of the channel is deterministic while in the fading channels the amplitude gain of the channel varies [17]. Thus the probability of detection is dependent upon the instantaneous SNR. The effect of fading on performance of spectrum sensing was investigated in detail by [17]. Instead of following (2) and (3) for setting up a simulation environment, we have followed a path-loss model to incorporate fading as presented in [44]. We assume a path-loss model where the signal goes fading proportional to , where is the distance between the PU and the CR users and . The average distance between the PU and CR users is assumed to be 20 m. The performance of our proposed scheme with the larger training phase outperforms the OR rule by 5% when the SNR is -23 dB, but when the SNR improves to -16 dB, the improvement is about 20%. The detection performance of the proposed scheme outperforms the OR rule by a larger margin when SNR conditions improve. As can be seen from the figure, the OR rule has a very poor detection performance in a fading environment despite the fact that it has the best detection performance among conventional fusion rules. Figure 10 shows the error performance of the proposed scheme in a fading environment. It can be seen that with increasing SNR, the error reduces. At -25 dB, the error probability is just above 0.1. Due to fading, the error probability of the OR rule is 0.35, which is very high compared to our proposed scheme.

Figure 11 shows the effect of the number of CR users on the performance of cooperative spectrum sensing under fading channels where some CR users undergo deep fading and thus have unreliable training data. To reflect the effect of increasing number of CR users fading conditions are required as in nonfading channels the performance with increasing the number of CR users remains the same because the training data of less CR users is also reliable and reflect the PU activity accurately. In the figure for each number of CR users the SNR is varied from -25 to -10 and then the mean of probabilities of detection is found. For instance, when the number of CR users is 6 the probability of detection for a multiple values of SNR varying between -25 and -15 is calculated and then the mean of the computed probabilities is the mean probability of detection. The mean probability of detection is represented by . As the values shown are mean values so the cannot converge to 1. For each values of SNR the system is run 1,000 time for the proposed scheme having training size of 100 and 300 times for the proposed scheme having training size of 100. As can be seen as the number of CR users increases beyond a limit, in this case beyond 10 the improvement in mean probability of detection is not abrupt. That is because of the reasons explained in first paragraph of this section that to utilize the gain which can be introduced by increasing the number of CR users clusters need to be formed. When instead of cluster-heads the FC combines the sensing decisions of all CR users then the sensing decisions of many CR users may fall outside of the similarity distances range as calculated in Section 3.2.2 and thus their reports will be rejected. From Figure 11 it can be seen that the mean probability of detection of the proposed scheme with a larger training size surpasses the performance of the other schemes. A mean probability of detection when the number of users equals 20 reaches 0.8 which is target detection probability as we consider in this paper at SNR of -20 dB for nonfading channels when the number of CR users is 5 as we have considered in this paper. The proposed scheme reaches highest mean probability of detection of near 0.7 and the OR can achieve highest mean probability of detection of less than 0.6.

6. Conclusion

In this paper, a machine learning-based reliable spectrum sensing scheme is proposed. The proposed scheme learns from the environment by taking into account the true status of the PU. Sensing reports are stored in appropriate sensing classes and then the current sensing report is classified into one of the sensing classes. Based on the result of classification, the PU is declared present or absent. Local decisions are combined at the FC by a novel decision combination scheme that takes into account the reliability of the CR users. Mechanisms at both the CR level and the FC level ensure reliable spectrum sensing. Simulation results show that our proposed scheme has better detection performance and better spectral hole exploitation capability than the conventional OR rule. Fading affects detection performance, but our scheme detects successfully 80% of the times at -10 dB SNR even in a fading environment.

Data Availability

The simulations are carried out through Matlab.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2015R1D1A1A09057077) as well as by the Korea Government (MSIT) (2018R1A2B6001714).