Computational and Mathematical Methods in Medicine

Computational and Mathematical Methods in Medicine / 2015 / Article

Research Article | Open Access

Volume 2015 |Article ID 626975 |

Congwei Sun, Zhijun Dai, Hongyan Zhang, Lanzhi Li, Zheming Yuan, "Binary Matrix Shuffling Filter for Feature Selection in Neuronal Morphology Classification", Computational and Mathematical Methods in Medicine, vol. 2015, Article ID 626975, 9 pages, 2015.

Binary Matrix Shuffling Filter for Feature Selection in Neuronal Morphology Classification

Academic Editor: Michele Migliore
Received24 Jan 2015
Revised14 Mar 2015
Accepted15 Mar 2015
Published29 Mar 2015


A prerequisite to understand neuronal function and characteristic is to classify neuron correctly. The existing classification techniques are usually based on structural characteristic and employ principal component analysis to reduce feature dimension. In this work, we dedicate to classify neurons based on neuronal morphology. A new feature selection method named binary matrix shuffling filter was used in neuronal morphology classification. This method, coupled with support vector machine for implementation, usually selects a small amount of features for easy interpretation. The reserved features are used to build classification models with support vector classification and another two commonly used classifiers. Compared with referred feature selection methods, the binary matrix shuffling filter showed optimal performance and exhibited broad generalization ability in five random replications of neuron datasets. Besides, the binary matrix shuffling filter was able to distinguish each neuron type from other types correctly; for each neuron type, private features were also obtained.

1. Introduction

To accelerate the understanding of neuronal characteristics in the brain, the prerequisite is to classify neurons correctly. It is therefore necessary to develop a uniform methodology for their classification. The existing classification techniques are usually based on structural functions and the numbers of dendrites to fit the models [1]. As neuronal morphology is closely related to neuronal characteristics and functions, neuroscientists have been making great efforts to study neurons from the perspective of neuronal morphology. Renehan et al. [2] employed intracellular recording and labeling techniques to examine potential relationships between the physiology and morphology of brainstem gustatory neurons and demonstrated a positive correlation between the breadth of responsiveness and the number of dendritic branch points. In the study by Badea and Nathans [3], detailed morphologies for all major classes of retinal neurons in adult mouse were visualized. After analyzing the multidimensional parametric space, the neurons were clustered into subgroups by using Ward’s and -means algorithms. In the study by Kong et al. [4], retinal ganglion cells were imaged in three dimensions and the morphologies of a series of 219 cells were analyzed. A total of 26 parameters were studied, of which three parameters, level of stratification, extent of the dendritic field, and density of branching, were used to get an effective clustering, and the neurons could often be matched to ganglion cell types defined by previous studies. In addition, a quantitative analysis based on topology and seven morphometric parameters was performed by Ristanović et al. in adult dentate nucleus [5], and neurons were classified into four types in this region. A number of neuronal morphologic indices such as soma surface, number of stems, length, and diameter were designed [6], which makes it possible to classify neurons based on morphological characteristics.

In the study by Li et al. [7], a total of 60 neurons were selected randomly and five of the twenty morphologic characteristics were extracted by principal component analysis (PCA), after which neurons were clustered into four types. Jiang et al. [8] extracted four principal components of neuronal morphology by PCA and employed back propagation neural network (BPNN) to distinguish the same kinds of neuron in different species. However, the above studies [25] only focused on a particular neuronal type or specific region of the brain, aiming to solve specific issues rather than classify neurons systematically. In this form, only a few samples were selected and the classification results were not independently tested, which is not persuasive enough. Moreover, the methodologies used in previous studies [7, 8] were mainly based on PCA and cluster analysis. PCA is the optimal linear transformation to minimize the mean square reconstruction error, but it only considers second order statistics, and if the data have nonlinear dependencies, higher order statistics should be taken into account [9]. Besides, the principal component was a compression of attributes, and it was hard to interpret the respective contribution. Therefore, feature selection (FS) is necessary, which is able to simplify the model by removing redundant and irrelevant features.

Available feature selection methods fall into three categories, (i) filter methods, in which inherent features of datasets are used to rank variables, and the algorithm complexities are low. However, redundant phenomena are usually present among the selected features, which may result in low classification accuracy. Univariate filter methods include -test [10], correlation coefficient [11], Chi-square statistics [12], information gain [13], relief [14], signal-to-noise ratio [15], Wilcoxon rank sum [16], and entropy [17]. Multivariable filter methods include mRMR [18], correlation-based feature selection [19], and Markov blanket filter [20]. There are also (ii) wrapper methods, where the training precision and algorithm complexity are high, which usually leads to overfitting. Representative methods include sequential forward selection [21], sequential backward selection [21], sequential floating selection [22], genetic algorithm [23], and ant colony algorithm [24]. SVM and ANN are usually used for implementation. There are also (iii) embedded methods, including support vector machine recursive feature elimination (SVM-RFE) [25], support vector machine with RBF kernel based on recursive feature elimination (SVM-RBF-RFE) [26], support vector machine and T statistics recursive feature elimination (SVM-T-RFE) [27], and random forest [28], which use internal information of the classification model to evaluate selected features.

In this work, a new feature selection method named BMSF was used. It not only overcame over fitting problem in a large dimensional search space but also took potential feature interactions into account during feature selection. Seven types of neurons, including pyramidal neuron, Purkinje neuron, sensory neuron, motoneuron, bipolar interneuron, tripolar interneuron, and multipolar interneuron, that have different characteristics and functions in the database were selected, being derived from all the existing species or brain regions (up to version 6.0). BMSF was used to reduce features nonlinearly, and support vector classification (SVC) model was built to classify neurons based on the reserved morphological characteristics. SVM-RFE and rough set theory were used to give a comparison with the introduced feature selection methods, while another two classifiers including the back propagation neural network (BPNN) and Naïve Bayes (NB), which are widely used in the pattern recognition field, were employed to test the robustness of the BMSF. A systematic classification of neurons would facilitate the understanding of neuronal structure and function.

2. Materials and Methods

2.1. Data Sources

Data sets used in this work were downloaded from the database [6, 29]. is a web-based inventory dedicated to densely archiving and organizing all publicly shared digital reconstructions of neuronal morphology. was started and maintained by the Computational Neuroanatomy Group at the Krasnow Institute for Advanced Study, George Mason University. This project is part of a consortium for the creation of a “neuroscience information framework,” endorsed by the Society for Neuroscience, funded by the National Institutes of Health, led by Cornell University (Dr. Daniel Gardner), and including numerous academic institutions such as Yale University, Stanford University, and University of California, San Diego ( The data sets used in this study were documented in Table 1. A total of 5862 neurons were selected, and training and test sets were divided randomly in the percentage of 2 : 1 in each neuron type. Finally, we obtained five pairs of data sets, each with random samples.

Neuron typeNumber of
training sets
Number of
test sets


2.2. Feature Extraction and Selection

Dendritic cells in the database were cut into a series of compartments, and each compartment was characterized by an identification number, a type, and the spatial coordinates of the cylinder ending point, the radius value, and the identification number of the “parent.” Although the digital description constituted a completely accurate mapping of dendritic morphology, it bore little intuitive information [30]. In this work, 43 attributes that held more intuitive information were extracted with L-measure software [31], and related morphological indices and descriptions are shown in Table 2. For convenience, we gave an abbreviation for each neuronal morphological index, as listed in the second column of Table 2.

NumberAbbr. Morphological indexDescription

1SSSoma_surfaceSomatic surface area
2_stemsTotal number of trees
3_bifsTotal number of bifurcations
4_branchNumber of bifurcations plus terminations
5_tipsNumber of terminal tips of a neuron
6NWNeuronal_width95% of second principal component
7NHNeuronal_height95% of first principal component
8NDNeuronal_depth95% of third principal component
9TyTypeCompartments are assigned to four different types: 1 = soma, 2 = axon, 3 = dendrites, and 4 = apical dendrites
10DiDiameterAverage branch diameter
11DpDiameter_powDiameter of each compartment of the neuron raised to the power of 1.5
12LeLengthTotal arborization length
13SuSurfaceSurface area of each compartment
14SASection areaTotal arborization surface area
15VoVolumeTotal internal volume of the arborization
16EDEuc distanceMaximum euclidean (straight) distance from soma to tips
17PDPath distanceMaximum path (along the tree) distance from soma to tips
18BOBranch_orderMaximum branch order number of bifurcations from soma to tips
19TdTerminal degreeTotal number of tips each segment will terminate into
20TSTerminal segmentNumber of compartments that comprise the terminal branch
21Ta1Taper_1The change in diameter over path length between two critical points
22Ta2Taper_2The ratio of the change in diameter to the initial diameter of two critical points. The initial diameter is usually larger
23BplBranch_path lengthSummation of the individual compartment lengths that form a branch
24CoContractionAverage contraction (the ratio between euclidean and path length calculated on each branch)
25FrFragmentationTotal number of reconstruction points
26DRDaughter_ratioRatio between the diameter of the bigger daughter and the smaller daughter of the current bifurcation
27PDRParent-daughter_ratioRatio between the diameter of a daughter and its father for each critical point
28PaPartition_asymmetryAverage over all bifurcations of the absolute value of ()/( + − 2), where and are the numbers of tips in the two subtrees
29RPRall_powerAverage over all bifurcations of the sum of the diameters of the two daughters, elevated to 1.5, divided by the diameter of the parent, and elevated to 1.5
30PkPk = 0, 5
31PcPk_classicRall power is set to 1.5
32Pk2Pk_2Rall power is set to 2
33BalBif_ampl_localAverage over all bifurcations of the angle between the first two daughter compartments
34BarBif_ampl_remoteAverage over all bifurcations of the angle between the following bifurcations or tips
35BtlBif_tilt_localThe angles between the end of the parent branch and the initial part of the daughter branches at the bifurcation
36BtrBif_tilt_remoteThe angles between the previous node of the current bifurcating father and the daughter nodes
37BtolBif_torque_localAngle between the current plane of bifurcation and the previous plane of bifurcation
38BtorBif_torque_remoteAngle between the current plane of bifurcation and the previous plane of bifurcation
39LpdLast_parent_diamDiameter of last bifurcation before the terminal tips
40DtDiam_thresholdDiameter of first compartment after the terminal bifurcation leading to a terminal tip
41HTHillman thresholdComputation of the weighted average diameter between 50% of father and 25% of daughter diameters of the terminal bifurcation
42HeHelixHelicity of the branches of the neuronal tree. It needs to be at least 3 compartments long to compute the helicity
43FDFractal_dimFractal dimension metric of the branches in the dendrite trees

It was considered redundant among attributes. Feature selection was able to save the cost of computational time and storage and simplify models when dealing with high dimensional data sets, and it was also useful to improve classification accuracy by removing redundant and irrelevant features.

2.2.1. Binary Matrix Shuffling Filter

For rapid and efficient selection of high-dimensional features, we have reported a novel method named binary matrix shuffling filter (BMSF) based on support vector classification (SVC). The method was successfully applied to the classification of nine cancer datasets and obtained excellent results [32]. The outline of the algorithm is as follows.

Firstly, denoting the original training set as , which includes samples and features, where , , we randomly generate a matrix with dimensions with entries being either 1 or 0, representing whether the feature in that column is included in the modeling or not. Where is the given number of combinations ( in this paper), the number of 1 or 0 in each column (each feature) is equal.

Secondly, for each combination, there will be a reduced training set from the original training set according to the subscripts of those selected features, and classification accuracy can be obtained through tenfold cross validation. By repeating this process times, values of accuracy are obtained.

Thirdly, taking the values of accuracy as the new dependent variable, the random 0 or 1 matrix as the independent variable matrix, a new training set is constructed. To evaluate the contribution of a single feature to the model, we change all the 1 in th column to 0 and all the 0 in that column to 1 (keeping the other columns unchanged) to produce two test sets with all the elements of 0 or 1 in th column. The newly produced training set is used to build the model to predict the two kinds of test sets, and the predictive vectors and are then obtained.

Comparing the mean value of vectors and , if the mean value of is bigger than that of , the feature corresponding to this column tends to give better classification performance. Otherwise, this feature should be excluded. Repeating this process, the features are screened in multiple rounds until no more can be deleted.

Detailed procedures can be found in our previous study [32]. This method is able to find a parsimonious set of features which has high joint prediction power.

2.2.2. Support Vector Machine Recursive Feature Elimination

SVM-RFE is an application of recursive feature elimination (RFE) using the weight magnitude as the ranking criterion. It eliminates redundant features and yields more compact feature subsets. The features are eliminated according to a criterion related to their support to the discrimination function, and the support vector is retrained at each step. This method was first successfully used in gene feature selection and afterwards in the fields of bioinformatics, genomics, transcriptomics, and proteomics. For the technical details of the method, refer to the original study by Guyon et al. [25].

2.2.3. Rough Set Theory

Rough set theory, introduced by Pawlak [33] in the early 1980s, is a tool for representing and reasoning about imprecise and uncertain data. It constitutes a mathematical framework for inducing minimal decision rules from training examples. Each rule induced from the decision table identifies a minimal set of features discriminating one particular example from other classes. The set of rules induced from all the training examples constitutes a classificatory model capable of classifying new objects. The selected feature subset not only retains the representational power but also has minimal redundancy. A typical application of the rough set method usually includes three steps: construction of decision tables, model induction, and model evaluation [34]. The algorithm used in this work is derived from the study by Hu et al. [3537].

2.3. Classification Techniques
2.3.1. Support Vector Classification

Support vector classification, based on statistic learning theory, is widely used in the machine learning field [38]. In SVM, structural risk minimization is a substitution of traditional empirical risk minimization, and it is particularly suitable for small sample size, high-dimensional, nonlinearity, overfitting, dimension disaster, local minima, and strong collinear problems. Meanwhile, it also performs excellent generalization abilities. In this work the nonlinear radial basis function (RBF) was selected, where the ranges of parameters and for optimization were −5 to 15 and 3 to −15 (base-2 logarithm), respectively. The cross validation and independent test were carried out using in-home programs written in MATLAB (version R2012a).

2.3.2. Back Propagation Neural Network

BPNN is one of the most widely employed techniques among the artificial neural network (ANN) models. The general structure of the network consists of an input layer, a variable number of hidden layers containing any number of nodes, and an output layer. The back propagation learning algorithm modifies the feed-forward connections between the input and hidden units and the hidden and outputs units to adjust appropriate connection weights to minimize the error [39]. Java-based software WEKA [40] was used to fit the model.

2.3.3. Naïve Bayes

Naïve Bayes is a classification technique obtained by applying a relatively simple method to a training dataset [41]. A Naïve Bayes classifier calculates the probability that a given instance belongs to a certain class. Considering its simple structure and ease of implementation, Naïve Bayes often performs well. Naïve Bayes models were also implemented in the WEKA software, and all the parameters were set by default.

3. Results and Discussion

3.1. Selected Feature Subsets

Feature selection methods are applied to training sets to get optimal feature subsets. For each method, five sets of features were obtained. Table 3 shows the reserved feature subsets derived from BMSF, SVM-RFE, and rough set theory, respectively. Five feature subsets are numbered with Roman numerals I to V for five replications. The number of selected features is also listed in Table 3.

Feature selection
Number of
Selected features

SVM-RFEI10SS, HT, DR, Bpl, NH, Btr, Bal, Su, SA, Lpd
II13HT, RP, SS, Ta1, Btr, BO, Dp, Di, Td, Fr, DR, Bar, NH
III12HT, FD, SS, DR, Btr, Dp, Di, Fr, BO, Td, Su, Ty
IV14HT, RP, SS, Ta2, Btr, Di, Dp, Fr, BO, Td, SA, Vo, Ta1, TS
V15HT, Lpd, SS, Bpl, Btr, Bal, NH, Ta1, Su, Di, SA, Vo, Fr, Ta2, Ty

Rough setI13, Co, NW, SS, NH, Ty, RP, HT, He, FD, ND, Pa, Btr
II13, Co, NW, SS, NH, Ty, RP, HT, He, ND, Pa, FD, Btl
III11, RP, NW, Ty, NH, SS, Pa, He, HT, ND, FD
IV13, Pa, NW, SS, NH, Ty, RP, ND, HT, He, Btl, SA, FD
V13, Pa, NW, SS, Ty, NH, RP, He, HT, ND, Btr, SA, FD

BMSFI8, RP, NW, PDR, HT, Bar, SS, Ta2
II6, Btol, NW, HT, Bar, Ta2
III8, Pa, NW, HT, Bar, Ta2, Bal, Lpd
IV7, Lpd, NW, Pa, PDR, Ta2, HT
V8, Btr, NW, HT, Bar, Ta2, PDR, SA

As shown in Table 3, approximately eight features on average were reserved by BMSF, while the number of features derived from SVM-RFE and rough set theory was more than ten. BMSF retained fewer features, which were more informative and easy to interpret. The feature ranking list showed the importance of a certain feature. In the feature subsets of BMSF and rough set, ranked first in five replications, which indicated that had a strong ability to discriminate neuron types. We calculated the frequency of each of the selected features in the five replications. Except for , features NW, HT, and Ta2 were also reserved in five random replications simultaneously, and their ranking lists were similar in the five BMSF subsets.

3.2. Classification Performance
3.2.1. Comparison of Independent Test Accuracies Using Different Models

In order to evaluate the performance of BMSF and make a comparison with SVM-RFE and rough set, three classifiers were employed to perform independent test. Including the classification performance without features selection, there were twelve classification accuracies. The average accuracies on five random datasets are presented in Table 4.

Feature selection methodsNaïve Bayes (%)BPNN (%)SVC (%)Average (%)

All features61.35 ± 26.8291.46 ± 1.2297.10 ± 0.4383.30
SVM-RFE30.78 ± 12.9491.38 ± 0.8393.29 ± 1.2071.82
Rough set51.30 ± 3.5992.75 ± 0.4693.05 ± 1.4579.03
BMSF70.53 ± 6.3691.46 ± 1.4597.84 ± 0.5786.61

Average (%)50.8791.8694.73

The independent classification accuracy is the ratio of the total correctly classified samples to the total test samples. As shown in Table 4, of the twelve results obtained, the optimal classification model based on the five datasets is BMSF-SVC (97.84%), followed by SVC without feature selection (97.1%). Excellent classification results on the SVC classifier indicated that all the extracted features were useful in identifying neurons, and few irrelevant features were extracted. Further, after feature selection by BMSF, the classification accuracy of SVC increased. This phenomenon suggested that BMSF deleted redundant features successfully and simplified models with fewer features. On the other hand, the feature subsets derived from SVM-RFE and rough set did not contribute to increasing the accuracies on SVC; in fact, they decreased sharply. A similar finding can be found for Naïve Bayes, as the two feature selection methods decreased the performance of Naïve Bayes, while BMSF improved the performance. The classifier BPNN showed little sensitivity to feature subsets, and the classification performance was at similar levels. With fewer features, BMSF also obtained good accuracy on BPNN, and a simplified model may be useful in further interpretation.

The above independent accuracies indicated that BMSF has an excellent generalization ability and robustness on the three classifiers. We also calculated the average performance of each feature selection method on the three classifiers and the classification performance based on the three different feature selection methods. The results are listed in the last row and column of Table 4. The average classification accuracy based on BMSF was also the best.

As the datasets used in this work are unbalanced (as shown in Table 1), it is necessary to break down the independent test accuracy to obtain the classification performance of each cell type. Based on the predicted labels, the sensitivities of each cell type in the five replications are presented in Table 5.

ClassifierFS methodPyramidalMotoneuronSensoryTripolarBipolarMultipolarPurkinje

NBAll 30.96 ± 1.96 18.24 ± 3.1241.62 ± 5.4761.26 ± 7.9094.16 ± 4.7498.34 ± 3.7196.00 ± 8.94
SVM-RFE29.22 ± 16.422.31 ± 3.2629.80 ± 60.556.25 ± 9.6692.50 ± 6.1888.33 ± 21.7396.0 ± 8.94
Rough set52.38 ± 4.4822.32 ± 3.0839.20 ± 3.9597.5 ± 2.794.16 ± 6.9885.0 ± 10.8796.0 ± 8.94
BMSF77.26 ± 7.6725.38 ± 3.7338.93 ± 4.6360.83 ± 4.090.83 ± 5.4351.67 ± 21.5792.0 ± 17.89

BPNNAll 99.10 ± 0.7582.46 ± 9.7857.84 ± 19.6442.94 ± 11.760.00 ± 0.000.00 ± 0.0052.00 ± 48.17
SVM-RFE99.12 ± 0.3683.22 ± 18.4445.24 ± 23.5962.50 ± 9.6415.84 ± 35.420.00 ± 0.0080.0 ± 34.61
Rough set99.08 ± 0.3678.92 ± 4.3771.80 ± 3.0957.06 ± 12.10.00 ± 0.000.00 ± 0.0076.0 ± 8.94
BMSF98.42 ± 1.0872.00 ± 6.0166.16 ± 16.6460.0 ± 18.4714.16 ± 31.670.00 ± 0.0076.0 ± 43.36

SVCAll 99.56 ± 0.1882.46 ± 6.9593.69 ± 5.2387.50 ± 6.0797.5 ± 2.2818.33 ± 17.0796.0 ± 8.94
SVM-RFE99.55 ± 0.1365.38 ± 4.5869.66 ± 15.6469.58 ± 7.0072.5 ± 31.260.00 ± 0.0088.0 ± 17.89
Rough set99.52 ± 0.1177.54 ± 5.5854.23 ± 15.0967.08 ± 7.7189.17 ± 5.590.00 ± 0.0092.0 ± 10.95
BMSF99.63 ± 0.1492.46 ± 4.995.84 ± 1.1083.33 ± 7.3799.17 ± 1.861.67 ± 3.7392.0 ± 17.89

For seven neuron types, BMSF-SVC exhibited the best performance on pyramidal neuron, motoneuron, sensory neuron, and bipolar neuron. Though tripolar and multipolar neurons showed excellent performance on Naïve Bayes, they did not do very well on other neuron types. The classification result of multipolar neuron was poor; however, SVM-RFE and rough set also performed less well on SVC. We found that the predicted labels of multipolar neuron are almost the same as those of the pyramidal neuron in all the models, which indicated that the unbalanced datasets had an effect on the prediction of multipolar neuron.

3.2.2. Distinguishing a Certain Neuron Type from Others by BMSF-SVC

To evaluate whether a certain feature subset is useful in identifying only a single cell type, the optimal model (BMSF-SVC) in this study was employed. For seven neurons types, six hierarchy models were established. In each hierarchy model, it was a binary classification problem. Due to the imbalanced datasets in this paper, accuracy and the Matthews correlation coefficient (MCC) were used to evaluate the established models, and recall was used to evaluate the classification performance of single neuron type as follows:where TP, TN, FP, and FN were true positive, true negative, false positive, and false negative, respectively, which derived from the confusion matrix. In this paper, positive samples were a certain neuron type and all the rest of the neuron types were negative samples. Positive samples were selected according to the number of samples in each type, and the datasets in each hierarchy are presented in Table 6. For each neuron type, private feature subsets were obtained.

Positive versus negative cell typeAccuracy (%)MCC (%)Recall (%)Private feature subsets

{motoneuron, sensory, tripolar,
bipolar, multipolar, Purkinje}
99.10 ± 0.1297.05 ± 0.4099.76 ± 0.10, Lpd, NW, Co, Pk2, HT, Bar, Su, PDR, Ta2
, Lpd, NW, Bal, Pk2, Vo, Bar, HT, PDR, Ta2
, Lpd, NW, Bal, Pk2, Vo, Bar, HT, Pc, Ta2
, Lpd, Bal, NW, HT, Pc, Vo, PDR, Ta2
, Lpd, Co, NW, HT, PDR, Vo, Bar, Ta2

{sensory, tripolar, bipolar,
multipolar, Purkinje}
97.26 ± 1.4494.3 ± 3.0294.50 ± 5.21SS
SS, NH, , Ta1, SA, Ta2, HT, Su, NW, Vo
SS, NH, , HT, Vo, NW, Lpd, Dp, Su, SA
SS, NH, , Lpd, Ta1, Vo, HT, Le, Ta2, SA
SS, NH, , HT, SA, Ta1, Vo, ND, Le, Dp

{tripolar, bipolar, multipolar,
90.15 ± 1.2480.62 ± 2.4697.85 ± 1.38Pa
Pa, SS, SA, Ta1, ND, Pk2, Btr, NW, Pk, Btl
Pa, SS, SA, Ta1, ND, Ty, Co, Di, Btr
Pa, SS, SA, Ta1, ND, Ty, Di
Pa, SS, SA, ND, Ta1, Ty, Btr, NW, Lpd, Pk

{bipolar, multipolar, Purkinje}
99.16 ± 0.5698.32 ± 1.1299.17 ± 1.41NW
NW, SS, He, Pa, ND,
NW, SS, He
NW, SS, He
NW, SS, He, Pa, ND

{multipolar, Purkinje}
96.95 ± 3.0793.86 ± 6.2495.83 ± 2.95
, Vo, He, Ty, Su
, Vo, Su, Ty, He, Ta2, NW, Btor, Pk
, Vo, Su, NW, Ta2, Di, Pc, He, Btor
, Vo, Su, He, Di, Ta2, Btor, Pk

100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00DR

As shown in Table 6, the accuracies and MCC in each hierarchy indicated the effectiveness of the models. We obtained private feature subsets for each neuron type. These features were useful in identifying the corresponding neurons, and the perfect recall may support our conclusion. The above finding suggested that BMSF was not only useful in identifying all seven cell types but also able to discriminate specific neuron types.

In this paper, we used a new feature selection method named BMSF for neuronal morphology classification. Interactions are taken into consideration to get highly accurate classification of neurons, and this method usually selects a small amount of features for easy interpretation. As shown in Table 3, eight features were reserved via BMSF, which was less than the number of features obtained by the other two feature selection methods. The BMSF method automatically conducts multiple rounds of filtering and guided random search in the large feature subset space and reports the final list of features. Though this process is wrapped with SVC, the features selected have general applicability to multiple classification algorithms. This conclusion can be demonstrated by the classification performance shown in Table 4.

We should point out that different runs of BMSF may produce different lists of feature subsets. This phenomenon arises from the fact that there are many possible characteristics that may be used to distinguish neurons. For example, feature subsets derived from rough set theory and BMSF achieve similar classification accuracy when applied to SVC classifier. Our goal is to find a minimal set of such features that the combination of them can well differentiate the dependent variables.

The reserved feature subsets on the same data set that resulted from different feature selection methods differed greatly. Li et al. [7] and Jiang et al. [8] selected features from the first twenty attributes of Table 1 only, so they inevitably ignore the attributes that were reserved by BMSF. Therefore, feature extraction by L-measure software was necessary. Another drawback of their feature selection methods was that they did not reduce the variables in the nonlinear manner. For example, PCA only considers second order statistics, and interactions cannot be taken into account.

Conventional classification techniques were built on the premise that the input data sets were balanced; if not, the classification performance would decrease sharply [42]. There were 3908 neurons in the training set, but the number of neurons in each type differed greatly (Table 1). For example, there were only 24 and 11 multipolar interneurons and Purkinje neurons, respectively, whereas the number of pyramidal neurons was 3172, and the unbalanced data sets would have a negative effect on the classification results (Table 5). Therefore, we conducted the hierarchy model for each neuron type, and BMSF was demonstrated as useful in distinguishing specific neuron types from others.

4. Conclusion

We introduced a new feature selection method named BMSF for neuronal morphology classification, obtained satisfactory accuracy for all of the datasets and each hierarchy model, and were able to select private parsimonious feature subsets for each neuron type. However, it was obvious that classification based simply on neuronal morphology was inadequate. As time goes by, dendrites may continue to grow and axons will generate additional terminals, which will undoubtedly lead to changes in the vital parameters [8]. Therefore, combining biophysical characteristics with function characteristics to investigate the neuronal classification problem will be a productive direction in the future.

Conflict of Interests

All the authors declare that they have no conflict of interests regarding the publication of this paper.


This work was supported by the National Natural Science Foundation of China no. 31000666 and no. 61300130 and by China Postdoctoral Science Foundation nos. 2012M511722 and 2014T70769.


  1. M. Bota and L. W. Swanson, “The neuron classification problem,” Brain Research Reviews, vol. 56, no. 1, pp. 79–88, 2007. View at: Publisher Site | Google Scholar
  2. W. E. Renehan, Z. Jin, X. Zhang, and L. Schweitzer, “Structure and function of gustatory neurons in the nucleus of the solitary tract. II. Relationships between neuronal morphology and physiology,” The Journal of Comparative Neurology, vol. 367, no. 2, pp. 205–221, 1996. View at: Google Scholar
  3. T. C. Badea and J. Nathans, “Quantitative analysis of neuronal morphologies in the mouse retina visualized by a using a genetically directed reporter,” Journal of Comparative Neurology, vol. 480, no. 4, pp. 331–351, 2004. View at: Publisher Site | Google Scholar
  4. J. H. Kong, D. R. Fish, R. L. Rockhill, and R. H. Masland, “Diversity of ganglion cells in the mouse retina: unsupervised morphological classification and its limits,” Journal of Comparative Neurology, vol. 489, no. 3, pp. 293–310, 2005. View at: Publisher Site | Google Scholar
  5. D. Ristanović, N. T. Milošević, B. D. Stefanović, D. L. Marić, and K. Rajković, “Morphology and classification of large neurons in the adult human dentate nucleus: a qualitative and quantitative analysis of 2D images,” Neuroscience Research, vol. 67, no. 1, pp. 1–7, 2010. View at: Publisher Site | Google Scholar
  6. G. A. Ascoli, D. E. Donohue, and M. Halavi, “NeuroMorpho.Org: a central resource for neuronal morphologies,” The Journal of Neuroscience, vol. 27, no. 35, pp. 9247–9251, 2007. View at: Publisher Site | Google Scholar
  7. C. Li, X. Xie, and X. Wu, “A universal neuronal classification and naming scheme based on the neuronal morphology,” in Proceedings of the IEEE International Conference on Computer Science and Network Technology (ICCSNT '11), vol. 3, pp. 2083–2087, December 2011. View at: Publisher Site | Google Scholar
  8. R. Jiang, Q. Liu, and S. Liu, “A proposal for the morphological classification and nomenclature of neurons,” Neural Regeneration Research, vol. 6, no. 25, pp. 1925–1930, 2011. View at: Publisher Site | Google Scholar
  9. G. Kerschen and J. C. Golinval, “Non-linear generalization of principal component analysis: from a global to a local approach,” Journal of Sound and Vibration, vol. 254, no. 5, pp. 867–876, 2002. View at: Publisher Site | Google Scholar | MathSciNet
  10. I. Hedenfalk, D. Duggan, Y. Chen et al., “Gene-expression profiles in hereditary breast cancer,” The New England Journal of Medicine, vol. 344, no. 8, pp. 539–548, 2001. View at: Publisher Site | Google Scholar
  11. V. R. Iyer, M. B. Eisen, D. T. Ross et al., “The transcriptional program in the response of human fibroblasts to serum,” Science, vol. 283, no. 5398, pp. 83–87, 1999. View at: Publisher Site | Google Scholar
  12. X. Jin, A. Xu, R. Bie, and P. Guo, “Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles,” in Data Mining for Biomedical Applications, vol. 3916 of Lecture Notes in Computer Science, pp. 106–115, Springer, Berlin, Germany, 2006. View at: Publisher Site | Google Scholar
  13. M. Dash and H. Liu, “Feature selection for classification,” Intelligent Data Analysis, vol. 1, no. 1–4, pp. 131–156, 1997. View at: Publisher Site | Google Scholar
  14. K. Kenji and A. R. Larry, “The feature selection problem: traditional methods and a new algorithm,” in Proceedings of the 10th National Conference on Artificial Intelligence, W. Swartout, Ed., pp. 129–134, AAAI Press/The MIT Press, San Jose, Calif, USA, July 1992. View at: Google Scholar
  15. T. R. Golub, D. K. Slonim, P. Tamayo et al., “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” Science, vol. 286, no. 5439, pp. 531–527, 1999. View at: Publisher Site | Google Scholar
  16. Z. Fang, R. Du, and X. Cui, “Uniform approximation is more appropriate for wilcoxon rank-sum test in gene set analysis,” PLoS ONE, vol. 7, no. 2, Article ID e31505, 2012. View at: Publisher Site | Google Scholar
  17. S. Zhu, D. Wang, K. Yu, T. Li, and Y. Gong, “Feature selection for gene expression using model-based entropy,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. 1, pp. 25–36, 2010. View at: Publisher Site | Google Scholar
  18. H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226–1238, 2005. View at: Publisher Site | Google Scholar
  19. Y. Wang, I. V. Tetko, M. A. Hall et al., “Gene selection from microarray data for cancer classification—a machine learning approach,” Computational Biology and Chemistry, vol. 29, no. 1, pp. 37–46, 2005. View at: Publisher Site | Google Scholar
  20. M. Han and X. Liu, “Forward feature selection based on approximate Markov blanket,” in Advances in Neural Networks—ISNN 2012, vol. 7368 of Lecture Notes in Computer Science, pp. 64–72, Springer, Berlin, Germany, 2012. View at: Publisher Site | Google Scholar
  21. J. Kittler, “Feature set search algorithms,” in Pattern Recognition and Signal Processing, C. H. Chen, Ed., pp. 41–60, Sijthoff and Noordhoff, Alphen aan den Rijn, The Netherlands, 1978. View at: Google Scholar
  22. P. Pudil, J. Novovičová, and J. Kittler, “Floating search methods in feature selection,” Pattern Recognition Letters, vol. 15, no. 11, pp. 1119–1125, 1994. View at: Publisher Site | Google Scholar
  23. B. Q. Hu, R. Chen, D. X. Zhang, G. Jiang, and C. Y. Pang, “Ant colony optimization vs genetic algorithm to calculate gene order of gene expression level of Alzheimer's disease,” in Proceedings of the IEEE International Conference on Granular Computing (GrC '12), pp. 169–172, Hangzhou, China, August 2012. View at: Publisher Site | Google Scholar
  24. L. J. Cai, L. B. Jiang, and Y. Q. Yi, “Gene selection based on ACO algorithm,” Application Research of Computers, vol. 25, no. 9, pp. 2754–2757, 2008. View at: Google Scholar
  25. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machines,” Machine Learning, vol. 46, no. 1–3, pp. 389–422, 2002. View at: Publisher Site | Google Scholar
  26. Q. Liu, A. H. Sung, Z. Chen et al., “Gene selection and classification for cancer microarray data based on machine learning and similarity measures,” BMC Genomics, vol. 12, no. 5, article S1, 2011. View at: Publisher Site | Google Scholar
  27. X. Li, S. Peng, J. Chen, B. Lü, H. Zhang, and M. Lai, “SVM-T-RFE: a novel gene selection algorithm for identifying metastasis-related genes in colorectal cancer using gene expression profiles,” Biochemical and Biophysical Research Communications, vol. 419, no. 2, pp. 148–153, 2012. View at: Publisher Site | Google Scholar
  28. K. K. Kandaswamy, K. C. Chou, T. Martinetz et al., “AFP-Pred: a random forest approach for predicting antifreeze proteins from sequence-derived properties,” Journal of Theoretical Biology, vol. 270, no. 1, pp. 56–62, 2011. View at: Publisher Site | Google Scholar
  29. G. A. Ascoli, Computational Neuroanatomy: Principles and Methods, Humana Press, Totawa, NJ, USA, 2002.
  30. G. A. Ascoli, J. L. Krichmar, S. J. Nasuto, and S. L. Senft, “Generation, description and storage of dendritic morphology data,” Philosophical Transactions of the Royal Society, Series B: Biological Sciences, vol. 356, no. 1412, pp. 1131–1145, 2001. View at: Publisher Site | Google Scholar
  31. R. Scorcioni, S. Polavaram, and G. A. Ascoli, “L-measure: a web-accessible tool for the analysis, comparison and search of digital reconstructions of neuronal morphologies,” Nature Protocols, vol. 3, no. 5, pp. 866–876, 2008. View at: Publisher Site | Google Scholar
  32. H. Zhang, H. Wang, Z. Dai, M. S. Chen, and Z. Yuan, “Improving accuracy for cancer classification with a new algorithm for genes selection,” BMC Bioinformatics, vol. 13, no. 1, article 298, 2012. View at: Publisher Site | Google Scholar
  33. Z. Pawlak, Rough Set: Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, Boston, Mass, USA, 1991.
  34. Y. Cao, S. Liu, L. Zhang, J. Qin, J. Wang, and K. Tang, “Prediction of protein structural class with Rough Sets,” BMC Bioinformatics, vol. 7, article 20, 2006. View at: Publisher Site | Google Scholar
  35. Q. Hu, D. Yu, and Z. Xie, “Information-preserving hybrid data reduction based on fuzzy-rough techniques,” Pattern Recognition Letters, vol. 27, no. 5, pp. 414–423, 2006. View at: Publisher Site | Google Scholar
  36. Q. Hu, D. Yu, Z. Xie, and J. Liu, “Fuzzy probabilistic approximation spaces and their information measures,” IEEE Transactions on Fuzzy Systems, vol. 14, no. 2, pp. 191–201, 2006. View at: Publisher Site | Google Scholar
  37. Q. Hu and D. Yu, “Entropies of fuzzy indiscernibility relation and its operations,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 12, no. 5, pp. 575–589, 2004. View at: Publisher Site | Google Scholar | MathSciNet
  38. V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, NY, USA, 2000. View at: Publisher Site | MathSciNet
  39. R. Hecht-Nielsen, “Theory of the backpropagation neural network,” in Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN '89), pp. 593–605, June 1989. View at: Google Scholar
  40. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA data mining software,” ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10–18, 2009. View at: Publisher Site | Google Scholar
  41. T. M. Mitchell, Machine Learning, McGraw-Hill, 1997.
  42. Y. Tang, Y. Q. Zhang, N. V. Chawla, and S. Krasser, “SVMs modeling for highly imbalanced classification,” IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics, vol. 39, no. 1, pp. 281–288, 2009. View at: Publisher Site | Google Scholar

Copyright © 2015 Congwei Sun et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

More related articles

 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

Article of the Year Award: Outstanding research contributions of 2020, as selected by our Chief Editors. Read the winning articles.