Abstract

Pulsar stars, usually neutron stars, are spherical and compact objects containing a large quantity of mass. Each pulsar star possesses a magnetic field and emits a slightly different pattern of electromagnetic radiation which is used to identify the potential candidates for a real pulsar star. Pulsar stars are considered an important cosmic phenomenon, and scientists use them to study nuclear physics, gravitational waves, and collisions between black holes. Defining the process of automatic detection of pulsar stars can accelerate the study of pulsar stars by scientists. This study contrives an accurate and efficient approach for true pulsar detection using supervised machine learning. For experiments, the high time-resolution (HTRU2) dataset is used in this study. To resolve the data imbalance problem and overcome model overfitting, a hybrid resampling approach is presented in this study. Experiments are performed with imbalanced and balanced datasets using well-known machine learning algorithms. Results demonstrate that the proposed hybrid resampling approach proves highly influential to avoid model overfitting and increase the prediction accuracy. With the proposed hybrid resampling approach, the extra tree classifier achieves a 0.993 accuracy score for true pulsar star prediction.

1. Introduction

Pulsar star represents a stellar remnant often formed by the remains of a collapsed giant star. Usually a neutron star, a pulsar is small in size but contains a large amount of mass. Despite being uncommon, pulsar stars are very important for scientists to study nuclear physics, general relativity, gravitational waves, and factors leading to the collisions of black holes. In 1967, Jocelyn and Anthony Hewish accidentally discovered a pulsar when they were studying distant galaxies [1]. Looking at a particular point through the telescope, they noticed radiation pulses and named them little green men 1 (LGM1). Later these unidentified objects were termed pulsars due to emission as pulses. Now they are called the pulsating source of radiation (PSR), and B1919 + 12 (PSR B1919 + 21) shows the position of the pulsar in the sky [2]. The emission pattern of each pulsar varies over each rotation, so it is averaged over several rotations to determine a star as a pulsar candidate. Without enough radiation, it is very difficult to detect a true pulsar star. However, under certain conditions, detection is possible such as when angled at earth or X-rays burst caused by the detonataion also known as supernova.

Pulsars are the rapidly pivoting astronomical objects detected as a neutron star that emits radiation at the rate of 100,000 km/k to 150,000 km/s with regular intervals and patterns. Through rays, pulsars emit electromagnetic power that gradually slows down, and pulsars become quiet within ten to a hundred million years. According to the Australian Telescope National Facility (ATNF) catalogue, around 2801 pulsars are identified [3, 4], and an estimated 20,000 to 100,000 pulsars are present in our galaxy indicating that 90% of the pulsars are yet to be identified [5]. Detecting true pulsar is not a trivial task as it is challenging to detect pulsar from the noisy time series data. Each pulsar produces slightly different patterns of signals which make it different from the other signals, and these patterns are called pulsar profiles. In practice, pulsar detection is based on radio frequency interference which makes the identification of legitimate signals very hard. The signals which fulfill the criterion of pulsars are termed as “candidates” and may be termed as new pulsars.

Several automated and human-based methods are used to identify the legitimate candidates for pulsars, and this process is known as “candidate selection” [6]. Until the 2000s, manual selection of candidates was used to find pulsars which generally requires 1–300 s for inspecting each observation [7]. Therefore, for manual inspection of 1 million candidates, up to 80,000 hours of a person are needed. So, manual classification techniques for evaluating pulsar candidates are not appropriate and suitable. Consequently, other techniques are developed to carry out pulsar candidate identification like graphical and automated methods. However, these techniques are computationally expensive as a lot of work is required to uplift the speed and sensitivity of algorithms [8].

By the time, algorithms decreased the ratio of noise in pulsar signals, and signal-to-noise ratio (SNR) became an important factor for pulsar detection. In pulsar astronomy, another important feature called dispersion measure (DM) of the pulsar is also used [9]. The delay of the pulse is associated with DM and radio frequency and has been regarded as an important feature for finding pulsars. Both supervised and unsupervised approaches can be used to perform pulsar detection. For example, unsupervised approaches can be used to group the pulsar data into different clusters whereby the features of each cluster can be further analyzed to select pulsar candidates. This approach is particularly useful for large amounts of unlabeled data. For the HTRU2 dataset, the labels are added by the experts, so supervised machine learning models seem appropriate. One major limitation with the recent works on pulsar detection is the use of imbalanced data. HTRU2 contains a large number of non-pulsar samples while pulsar samples are very few which affects the performance of the classification models. This imbalanced dataset can lead to model overfitting on majority class data. For such models, even though high accuracy is reported, the F1 score is significantly different than the accuracy. Despite the proposal of several automated approaches for finding pulsars, the gap between the provided and the desired accuracy and sensitivity demands further research in this domain. To this end, this study proposes an automated approach for true pulsar prediction using supervised machine learning algorithms and makes the following contributions:(i)This study devises a methodology for automatic detection of pulsars using the supervised machine learning algorithms. For this purpose, the performance of several well-known machine learning algorithms is analyzed such as random forest (RF), extra tree classifier (ETC), gradient boosting classifier (GBC), and logistic regression (LR). In addition, a multilayer perceptron (MLP) is added in the study.(ii)The HTRU2 dataset is used for conducting experiments, and the influence of dataset imbalance is extensively investigated. Three resampling approaches such as synthetic minority oversampling technique (SMOTE), adaptive synthetic (ADASYN), and cluster centroids (CC) are studied for their efficacy in data balance. Ultimately, a hybrid data resampling approach, concatenated resampling (CR), is proposed to solve the data imbalance problem of the HTRU2 dataset.(iii)Extensive experiments are performed to analyze the effect of data balance with SMOTE, ADASYN, CC, and CR on pulsar detection accuracy. Experimental results and performance comparison with state-of-the-art approaches prove that the CR approach performs superior to other resampling approaches.

The rest of the paper is arranged in the following manner. Research papers related to the current study are discussed in Section 2. Section 3 describes the dataset, machine learning algorithms used for experiments, resampling approaches, and the details for the proposed hybrid resampling. Results and discussions are presented in Section 4 while Section 5 provides the conclusion.

Due to the importance of the detection task for true pulsar stars, several automated approaches have been proposed. These approaches can be broadly categorized under three groups: machine learning approaches, deep learning approaches, and approaches focusing on features’ importance. Due to the success of machine learning approaches for various tasks such as classification, object detection, and text analysis, a large number of machine learning-based methods are available in general [10]. However, the pulsar detection domain is not extensively studied and lacks the desired accuracy.

The authors present a machine learning-based approach in [7] for the pulsar selection. It deals with 16 million pulsar candidates obtained from the reprocessing of the Parkes’ multibeam survey dataset. A radio transit discovery method named V-FASTR fused random forest is proposed in [11]. V-FASTR has the capacity to consequently shift through realized occasion types with 98.6% accuracy on the training data and 99% on test data. The authors utilize 6 different models to characterize scattered pulsar bunches using signal pulse seek framework in [12]. The dataset used in the research contains 300 pulsars examples and 9600 non-pulsar examples. Several datasets have been generated using different imbalance treatments. Experimental results show that multiclass ensemble tree learner has high performance and low false positive rate when used with oversampled data.

The study [13] used different machine learning algorithms like GBC, AdaBoost, and XGBoost for the classification of pulsar candidates. To deal with the data imbalance problem, SMOTE is used for oversampling the minority class in the dataset. Several important features from each algorithm are determined for pulsar classification. The major issue with this technique is that the accuracy of radio frequency interference classification is very sensitive to feature selection. The authors present a hybrid machine learning model, random tree boosting voting classifier (RTB-VC), in [14] for pulsar star prediction. RTB-VC combines the free-based classifiers for training on the HTRU2 dataset. RTB-VC uses various combinations of hard voting, soft voting, and weighted voting to obtain high accuracy. A 98.3% F1 score is reported using the proposed RTB-VC model.

Due to the deployment of deep learning approaches in diverse fields for classification and their high accuracy, several deep learning-based models have been adopted for pulsar detection and classification. For example, the authors used a convolutional neural network (CNN) in the PCIS algorithm from the ResNet model for pulsar detection in [15]. On the GBNCC dataset, the proposed system achieved 96% accuracy. Similarly, the research [16] uses an artificial neural network (ANN) for finding true pulsar stars from the HTRU dataset. The research achieves an accuracy of 85% to detect pulsars by visually impaired investigation. It also dismisses 99% of noisy candidates. Both the studies greatly improved recall and decreased the false positive rate. However, the used feature selection method is simple which is based on the hypothesis and subjective to experience. Artificial errors can be made easier which readily affects the performance of the used approaches.

The study [17] focused on pulsar classification using hierarchical deep neural network (DNN). To reduce the training time of DNN, pseudoinverse learning (PIL) is preferred over the gradient descent (GD) method. The proposed model provides 94.65% and 87.66% F1 scores for HTRU medlar and PMPS-26 k datasets, respectively. Despite the low F1 score compared to CNN + BPNN, training time for the proposed model is 5 times low than traditional CNN models. A swift model for the elimination of radio frequency interference (RFI) in pulsar data was proposed in [22]. For learning RFI signatures of real pulsars, PIL-based single hidden layer autoencoder (AE) was used. Results indicate that AE is more robust in learning RFI signatures and can be used to remove them from fast-sampled spectra. As a result, the signals from real pulsars can be obtained. The study [20] investigated the pulsar classification using three datasets: HTRU mid-latitude dataset, the MINIST dataset, and the CIFAR-10 dataset. In the first stage, strong representations for the pulsar candidate are developed in the image domain by extracting deep features with a deep convolutional generative adversarial network (DCGAN). During the second stage, MLP-based classifier is defined using a pseudoinverse learning autoencoder (PILAE). For data imbalance, the SMOTE oversampling technique is used. The achieved accuracy on the HTRU dataset with different data splitting ratios is 100%. On the MINIST dataset, 97.50% accuracy is achieved while CIFAR-10 shows an accuracy of 100%.

The study [6] extracted eight unbiased statistical features including mean, kurtosis, variance, and skewness from the DM curve and pulse profile curve and designed Gaussian–Hellinger fast decision tree for imbalanced data. Using the statistical features on two datasets including HTRU-1 and LOTAAS, 92.8% recall is achieved with a false positive rate of only 0.5%. The research discovered 20 new pulsars from the LOTAAS dataset using the same strategy. A hierarchical candidate shifting model (HCSM) was proposed in [18] where the cost of incorrect prediction of positive samples is emphasized and multiple classifiers are assembled. Handcrafted features are used from three datasets including HTRU, HTRU-1, and LOTAAS to train three classifiers, which collectively make the assemble classifier. Emphasizing the positive examples and assigning higher weights to them produce better results with the proposed model. HCSM achieves a recall value of 97.49% for HTRU dataset, 84.52% for HTRU-1 dataset, and 100% for LOTAAS dataset. A summary of discussed research works is presented in Table 1.

3. Materials and Methods

3.1. Dataset

On account of the importance of pulsar detection, several datasets have been provided for pulsar detection over the years. For the current study, the HTRU2 dataset from Kaggle is used which was collected during high time-resolution [6, 23, 24]. The dataset was compiled by Dr. Robert Lyon and contains 17,998 examples of pulsars and non-pulsars, and 8 features are attributed to each record [6]. The dataset contains examples for pulsar and non-pulsar star observation. The dataset was collected in a high time resolution survey. The survey observed the Galactic plane in the region −120° < l < 30° and b ≤ 15°. The dataset contains 16,259 specious examples which are the outcomes of RFI/noise, and only 1,639 are pulsar examples; all examples are labeled by human annotators. The dataset does not provide any information related to the pulsars or other astronomical details. Pulsar Feature Lab tool is used to extract pulsar feature data by using candidate files [25]. Table 2 shows the details for the number of samples for pulsar and non-pulsar classes while Table 3 describes the features of the dataset.

3.2. Problem Statement

Keeping in view the results of the related studies discussed in the previous section, it is clear that the dataset used for experiments is not balanced. Similarly, the most commonly used dataset, i.e., the HTRU2 dataset, is highly imbalanced. Only 1,639 samples belong to the pulsar class out of the total 17,998 samples. The class imbalance would result in model overfitting as the machine learning models tend to give higher weight to the class with a higher number of samples. As a result, the F1 score is affected despite good accuracy results from the machine learning models. This study aims at solving this problem by proposing a hybrid resampling approach to achieve high pulsar detection accuracy.

3.3. Data Resampling for Imbalanced Dataset

Looking at the statistics of the dataset given in Table 2, only 1,639 out of 17,898 examples are pulsars while 16,259 are non-pulsars. This is a 1 : 10 ratio which makes the dataset highly imbalanced because the class distribution is skewed towards a specific class. Data imbalance affects the classification performance of the classifiers because the machine learning classifiers tend to the majority class while training. It creates problems for classification. Several approaches can be utilized to deal with the data imbalance. For the present study, two data resampling approaches are adopted.

3.4. Synthetic Minority Oversampling Technique

SMOTE is a widely used oversampling technique to manage imbalanced data [26]. When class distributions are skewed towards a specific class, an imbalanced data problem arises. SMOTE increases the number of data instances by developing random synthetic data of the minority class from its nearest neighbors using Euclidean distance. The newly developed instances are very similar to the original data as the new instances are developed based on the original features [27]. SMOTE is not the best option while dealing with the high-dimensional data because it can create additional noise which is not the case with the HTRU2 dataset used in the current study. SMOTE is adopted based on the results reported in [12, 28] where the data have a ratio of 1 : 10, just like the current study. By generating the samples for the minority class using SMOTE, we get a 1 : 1 ratio of pulsar and non-pulsar as shown in Table 4. For SMOTE implementation, we used an open-source Python toolbox, called imbalanced-learn which uses Scikit-learn, SciPy, and NumPy.

3.5. Adaptive Synthetic Resampling

ADASYN is used for upsampling the minority class samples in an imbalanced dataset [29, 30]. Being the enhanced form of SMOTE, ADASYN has been regarded as superior to SMOTE. ADASYN generates synthetic alternatives for observations of the minority class. The ease and difficulty of generating observations depend upon the learning difficulty. An observation is “hard to learn” if several observations exist in the majority class, having similar features to that of minority class observation. It essentially leads to the observation surrounded by majority class instances when plotted in the features space which makes it harder for the models to learn. Due to its efficiency and reliability, it is widely used in many applications like detection of cancer, credit card fraud detection, and so on.

3.6. Cluster Centroids

Besides using SMOTE and ADASYN oversampling approaches, this study utilizes cluster centroid undersampling approaches to downsize the majority class. During this process, clusters of the majority class are formed and the whole cluster is replaced with the centroid to undersample it. For this purpose, the current study uses the K-mean algorithm to find the clusters of the majority class.

3.7. Supervised Machine Learning Models

For performing classification, several types of machine learning models are available. The availability of open-source library Scikit-learn helps researchers to solve classification problems using machine learning and ensemble learning [31]. Well-known machine learning algorithms are selected due to their reported performance. Instead of devising new models, already established models are selected, and their performance is optimized using several hyperparameters. The machine learning models used in this research are RF, LR, GBC, ETC, and MLP. Several parameters of these models are fine-tuned to optimize the performance, and the list of used parameters is provided in Table 5.

3.8. Random Forest

RF is a tree-based ensemble learning model, which produces accurate predictions by combining many weak learners [32]. The bagging technique is used where a variety of decision trees are used during training with various bootstrap samples [33]. A bootstrap sample is derived by subsampling the training dataset with replacement, where the size of the sample is the same as that of the training dataset. RF uses decision trees for the prediction process, and a big issue in the construction of decision trees is proof of identity of the attributes for root nodes at each level. This method is termed attribute selection. In ensemble classification, some classifiers are trained and their results are pooled through a voting process. Previously, many researchers have proposed ensemble learning approaches [3436]. The widely used ensemble learning methods are bagging [37] and boosting [38, 39]. In the bagging (or bootstrap aggregating) technique, classifiers are trained on the bootstrap samples to minimize the variance of classification. RF has the following mathematical form:where is the final prediction by the majority of decision trees and is the number of decision trees taking part in the production process.

3.9. Gradient Boosting Classifier

In GBC, several weak learning classifiers work together to create a strong learning model. The working principle of gradient boosting is time-consuming and computationally expensive because it creates several independent trees. Gradient boosting has been previously used by several studies in astronomy [24]. For example, study [40] uses GBC for photometric classification of supernova while study [41] uses GBC for the detection and classification of galaxy using Galaxy Zoo catalogue. Mean square error (MSE) is used in the GBC aswhere is the loss, is the ith target value, shows the ith prediction, and is the loss function.

Based on learning rate, GBC updates predictions and finds the values where MSE has the minimum value. Minimization of MSE is represented using the equation below:where is the learning rate and is the sum of all the residual values which are near to 0 or minimum and predicted values are very close to the actual values.

3.10. Extra Tree Classifier

ETC is a meta-estimator also known as the extra randomized tree that uses extra decision trees and fits them into various subsamples of the dataset. To improve the accuracy, it uses the averaging technique and controls the overfitting of the model. ETC works similar to RF, but the difference lies in the construction of trees in the forest. In ETC, each tree is made from the original training sample. Random samples of best features are used for decision and the Gini index is used to select the top feature to separate the data in the tree. ETC has been utilized to perform various tasks in astronomy. For example, study [42] uses the ETC model for neutrinos detection from a point-like source with the collaboration of KM3Net which is the cubic kilometer neutrino telescope.

3.11. Logistic Regression

LR is a statistical method used to deal with classification problems. LR analyzes the data to estimate the probability of class members. For classification problems where the target variables are categorical, LR is the first choice to perform classification. It processes the relationship between categorical dependent variables and one or more independent variables by estimating probability using the logistic function. A logistic curve or logistic function is a common “S” shaped or sigmoid curve and is defined aswhere is the Euler number, is the x-value of the sigmoid midpoint, is the curve’s maximum value, and shows the steepness of the curve. LR works well on binary classification and shows good performance for text classification as well [43, 44].

3.12. Multilayer Perceptron

An MLP consists of one or more layers of neurons. MLP is a feed-forward neural network model which maps the set of input data to a set of appropriate outputs and every layer is fully connected. Data are fed into the input layer that passes through one or more hidden layers. The hidden layers provide the level of abstraction, and predictions are made on the visible or output layer [45]. Multiple neurons can be stacked in one layer, and multiple layers have better predictive capacity.

The MLP model consists of three layers: one input layer, one hidden layer, and one output layer. We used 32 neurons in the input layer with ReLU activation function, 64 neurons in the hidden layer, and the output layer used one neuron with a sigmoid activation function. The value used for the dropout layer is 0.2. For compilation, we used Adam optimizer, binary_crossentropy loss function, and 100 epochs.

3.13. Proposed Resampling Approach

This study proposes a data resampling approach called combined resampling (CR). CR is a resampling technique that concatenates the results of three resampling techniques including SMOTE, ADASYN, and CC for enhancing the prediction results. Results of all three resampling techniques are concatenated along the horizontal axis, which increases the size of the data. CR is defined as

Here HTRU2 is the original dataset with an imbalanced target class ratio.

refers to the output data after balancing target ratio using the ADASYN technique; similarly, and are data outputs after SMOTE and CC are applied on the original HTRU2 dataset, while represent the number of features/attributes and represent the number of records.

Here, , , and are the results of the ADASYN, SMOTE, and CC techniques, respectively, while is the concatenation result of these three resampling techniques. Additionally, shows the number of attributes, and is the number of records. Figure 1 illustrates the proposed CR approach to perform resampling from the original dataset, where , , and are resampled instances of data from three different techniques which are combined to make the new sampled dataset.

3.14. Proposed Methodology for Pulsar Detection

For detecting the pulsars, the current study leverages the supervised machine learning approach. The concept of ensemble and hybrid approaches is very popular in the machine learning task. A number of studies can be found that leverage hybrid and ensemble models for a variety of tasks in several domains such as image processing, classification, text analysis, and so on [46, 47]. For example, study [48] uses a stack generalization technique and ensemble learning approach for pulsar prediction. Similarly, ensemble approaches are also used for predicting the numeric scores for Google apps in [49]. Hybrid or ensemble approaches are also used for text analysis [50]. Results reported for hybrid approaches provide the motivation to utilize a hybrid approach for the task at hand.

The flow of the proposed methodology is shown in Figure 2. As the first step, the HTRU2 dataset is obtained from Kaggle. The HTRU2 dataset contains pulsar and non-pulsar examples in an unequal ratio with non-pulsar examples as majority class and pulsar as a minority class. Owing to the influence of data imbalance on the performance of the classifiers, this problem is solved using the proposed approach. For analyzing the influence of data splitting on the prediction accuracy, data splitting is performed before resampling and resampling before splitting, in a ratio of 70 : 30 for both approaches. When data are split before resampling, resampling is applied only on the training set. For data balancing, CC, SMOTE, ADASYN, and CR techniques are used. Table 6 shows the count for both pulsar and non-pulsar samples when resampling is performed before splitting, and Table 7 shows the count for both pulsar and non-pulsar samples when resampling is applied only on the training set.

After data splitting and resampling, machine learning models are trained including RF, ETC, GBC, LR, and MLP using 70% of data. The rest (30%) is used to evaluate the trained models. The evaluation is performed using accuracy, precision, recall, and F1 score.

3.15. Performance Evaluation Metrics

Several performance evaluation methods are used to evaluate the machine learning models. The blend of different evaluation tools is helpful to determine the efficacy of an approach [51]. Therefore, in this research, four well-known metrics are used including accuracy, precision, recall, and F1 score. In addition, the confusion matrix helps to show true positive (TP), true negative (TN), false positive (FP), and false negative (FN) which are used to calculate the values for accuracy, precision, recall, and F1 score. These metrics are calculated using the following equations:

4. Results and Discussion

This study performs experiments using a Core i7 7th generation machine operating on Windows 10. Implementation of the machine learning algorithms is done using Python script on Jupyter Notebook.

4.1. Results without Resampling

The performance of machine learning models without data resampling is shown in Table 8. The performance of RF is the highest as compared to other models with 0.980 and 0.887 scores for accuracy and F1 score, respectively. Performance of LR is marginally low with 0.980 accuracy and 0.885 F1 score. A noteworthy point is a difference in the prediction accuracy and F1 score. Such difference in the accuracy and F1 score is often caused by the data imbalance. Models have an overfit due to high number of samples in the majority class and make false predictions for the minority class, leading to the difference in the prediction accuracy and F1 score.

4.2. Results Using CC Undersampling

To improve the performance of machine learning models, data resampling is carried out using the CC technique. The CC technique is used for data balancing and reduces the chances of the model overfitting. The CC technique is an undersampling approach that reduces the number of samples of the majority class by randomly selecting the records and removing them, thus making the number of samples of the majority and minority class equal.

Results given in Table 9 indicate that the difference in the prediction accuracy and F1 score has been reduced after applying the resampling. Using an equal number of samples for training reduces the probability of model overfitting and reduces the gap between accuracy and other performance evaluation metrics. On the other hand, the overall performance of the machine learning models is reduced as well. The primary reason for this downfall in performance is the size of the data used for models’ training. Being a data undersampling approach, CC reduces the size of data, and models’ training is affected which leads to performance degradation. Despite a decrease in the performance of different models, RF shows the best performance with the undersampled data and achieves 0.943 accuracy score and 0.940 F1 score. The performance of other classifiers is similar except for MLP which achieves an accuracy of 0.905 and F1 score of 0.898.

4.3. Results Using SMOTE Oversampling

The performance of machine learning models after data oversampling is shown in Table 10. Results indicate that the performance of the machine learning models has been elevated when trained on the oversampled data using SMOTE. Oversampling increases the size of data which provides large feature set to train the models which boost their prediction accuracy. As for the performance of the machine learning models, ETC outperforms all models with an accuracy of 0.982 and F1 score of 0.982. All other models also show improvement in their performance with SMOTE oversampling technique. The performance of RF is slightly lower than that of ETC with an accuracy of 0.976. Overall, tree-based models show prominent performance as compared to linear and neural network models. Tree-based models perform significantly better due to their ensemble architecture. ETC, RF, and GBC combine several decision trees in learning and prediction procedures and perform superior on the HTRU2 dataset.

4.4. Results after Applying ADASYN Sampling

For the current study, ADASYN oversampling is also used to balance the dataset. The performance of machine learning models using the ADASYN oversampled data is shown in Table 11. Results suggest that the performance of the machine learning models has been improved when used with ADASYN oversampled data. Tree-based models again outperform linear models and MLP and achieve good scores for performance evaluation metrics. For example, ETC achieves the highest accuracy score of 0.981 and F1 score of 0.982. The performance of linear model LR and neural network model MLP dropped when used with ADASYN resampling because of the dataset’s new sample feature correlation.

4.5. Results with Proposed Combined Resampling

For the proposed approach, resampled data from SMOTE, ADASYN, and CC are concatenated along the 0 axis which increases the size of data and leads to significant improvement in the performance of machine learning models. Results shown in Table 12 indicate that machine learning models perform better with the proposed CR sampling approach. Both ETC and RF achieve >99% accuracy with the CR technique with a similar F1 score which indicates that the models do not experience overfitting when trained with CR resampled data. The elevated performance is due to the concatenation of resampled data from different sampling approaches. It provided the models with different variations of samples to learn and make them more significant as compared to an individual data resampling technique. As a result, the performance of machine learning models has been significantly improved.

Using the proposed feature resampling approach, ETC outperforms with all resampling techniques and most significantly with the proposed CR resampling approach as shown in Figure 3.

Figure 4 shows the confusion matrix of the best performer ETC with all resampling approaches. The confusion matrix shows that ETC makes 20,327 correct predictions out of 20,448 total predictions with only 121 false predictions with CR resampling. On the other hand, when ETC is used with SMOTE, 166 predictions are false and 9,590 predictions are correct out of 9,756 total predictions. Out of the 166 total false predictions, the model makes 101 false predictions from the resampled data which indicates the data generated by SMOTE to balance the dataset lead to false predictions. For the ADASYN case, the ETC model performs slightly poor than SMOTE as it makes 9,543 correct and 166 false predictions out of 9,709 total predictions. In the case of the CC undersampling technique, the performance is not good enough due to the reduced number of samples used for training. ETC gives 921 correct and 63 false predictions out of 984 total predictions. In light of discussed results, the performance of machine learning models when used with the proposed CR resampling approach is better than that of both oversampling and undersampling approaches.

4.6. Results with Resampling on Training Set

Due to the highest performance of ETC with all the resampling approaches used for the current study, ETC is used for further analysis. For this purpose, the training dataset is balanced and ETC is trained on the balanced dataset while tested on the imbalanced dataset. Results given in Table 13 show that ETC outperforms all other models with this approach as well. ETC achieves the highest accuracy of 0.981 with the proposed CR resampling approach. However, the overall performance of the model has been reduced following this approach. Furthermore, values for accuracy and F1 score are sharply different than the values obtained in the previous approach.

4.7. Results with Deep Learning Models

This study also deploys the state-of-the-art deep learning models for pulsar detection. Customized architectures of long short-term memory (LSTM), deep neural network (DNN) [10], and gated recurrent unit (GRU) models are used [52]. Architectural details and list and values of used variables are provided in Table 14.

Deep learning models are compiled with binary cross-entropy and Adam optimizer, and 100 epochs are used for training. The performance of LSTM, GRU, and DNN is measured in terms of accuracy, precision, recall, and F1 score. Performance results given in Table 15 indicate that the achieved accuracy from three deep learning models is the same. However, the performance has marginal variance when precision, recall, and F1 scores are considered. Owing to the importance of the F1 score, LSTM and GRU show a better F1 score of 0.94 each as compared to the DNN model. Results prove that the optimized machine learning models have superior performance than deep learning models. Model fitting for deep learning models requires thousands of samples to show better performance; consequently, their performance is slightly less than that of machine learning models due to the small size of the dataset.

4.8. Results Using 10-Fold Cross-Validation

To corroborate the significance of the proposed resampling approach and performance of the machine learning models, 10-fold cross-validation is used, and results are given in Table 16. All models are employed with each data sampling approach to analyze the performance. Results indicate that the highest accuracy is obtained by ETC with the proposed hybrid sampling approach which shows the supremacy of the proposed approach over other data sampling approaches.

4.9. Comparison with the State-of-the-Art Studies

For evaluating the efficacy of the proposed approach, a performance comparison is done with the previous similar approaches. To this end, approaches that utilize the HTRU2 dataset have been selected. For example, study [6] conducted experiments using the same dataset with the proposed GH-VFDT model. Similarly, study [14] performed experiments on the same dataset using the proposed RTB-VC for pulsar prediction. Table 17 shows the comparison of the proposed approach with the previous studies to illustrate the significance of the study.

4.10. Statistical Analysis of CR Technique

This study also performs a statistical T-test to show the significance of CR techniques. The T-test considers two hypotheses as follows:(i)Null hypothesis: the CR technique is statistically significant to other data balancing techniques.(ii)Alternative hypothesis: the CR technique is not statistically significant to other data balancing techniques.

The T-test shows that the results of tree-based models RF, ETC, and GBC with CR techniques accept the null hypothesis and reject the alternative hypothesis which means that these tree-based models are statistically significant with the CR technique as compared to all other resampling techniques.

5. Conclusion

Pulsar detection is a significant task and possesses great importance for studying several phenomena of nuclear physics. Automatic detection of pulsars from the collected data is a topic of significant importance in this regard. Due to the imbalanced nature of the HTRU2 dataset, the prediction accuracy is not up to the standard. This study proposes a concatenated resampling (CR) approach for data balance and a methodology to utilize the proposed CR for pulsar prediction with high accuracy. For this purpose, the performance of several machine learning algorithms is investigated and analyzed. Experimental results indicate that oversampling approaches SMOTE and ADASYN perform better than the undersampling cluster centroid approach. The increased feature vector for the oversampled data tends to boost the performance of the machine learning classifiers, especially the ETC, which achieves the highest accuracy with all resampling approaches. Performance evaluation metrics are much better for ETC when used with the proposed CR approach with an accuracy of 0.993. Combining multiple resampling approaches elevates the performance of machine learning classifiers and reduces the influence of data imbalance. Results show that tree-based classifiers perform better than linear classifiers. Regarding the use of deep learning models, LSTM and GRU provide better F1 scores than DNN. Performance comparison with state-of-the-art approaches indicates that the proposed approach outperforms them and achieves higher accuracy.

This study leverages the supervised approach by optimizing several well-known machine learning models. However, the use of unsupervised models is expected to provide interesting results. Important observations can be made by clustering the HTRU dataset into groups, and analysis can be performed to highlight the features of probable candidates for pulsars.

Data Availability

The HTRU2 dataset is available at https://www.kaggle.com/colearninglounge/predicting-pulsar-starintermediate.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Authors’ Contributions

Ernesto Lee and Furqan Rustam contributed equally to this study.

Acknowledgments

This research was supported by the Florida Center for Advanced Analytics and Data Science funded by Ernesto.Net (under the Algorithms for Good Grant).