NetDA: An R Package for Network-Based Discriminant Analysis Subject to Multilabel Classes
In this paper, we introduce the R package NetDA, which aims to deal with multiclassification with network structures in predictors accommodated. To address the natural feature of network structures, we apply Gaussian graphical models to characterize dependence structures of the predictors and directly estimate the precision matrix. After that, the estimated precision matrix is employed to linear discriminant functions and quadratic discriminant functions. The R package NetDA is now available on CRAN, and the demonstration of functions is summarized as a vignette in the online documentation.
Multiclassification, known as a classification problem that the number of classes is greater than two, is a great challenge in data science. In supervised learning, discriminant analysis has been a useful method to do classification. In the conventional method (e.g., Hastie et al. ; Section 4.3), linear discriminant functions, which are formulated in terms of mean vectors and the inverse of covariance matrices of the predictors, are used to classify subjects. In addition, some advanced methods have been proposed to address complex settings in the past literature. For example, Guo et al.  discussed the LDA method and its application in microarray data analysis. Safo and Ahn  studied generalized sparse linear discriminant analysis for multilabel responses. In the presence of high-dimensional predictors, several advanced approaches have also been explored (e.g., Clemmensen et al. ; Witten and Tibshirani [5).
However, network structures of predictors, which reflect (pairwise) dependence among predictors, are ubiquitous in data analysis . In the recent developments, Chen et al.  proposed a graphical-based logistic regression model. He et al.  proposed surrogate variables that were transformed from network structures and implemented them to the support vector machine. Regarding the framework of discriminant analysis, Cai et al.  and Liu et al.  developed graph-based linear discriminant analysis, but their approaches are restricted to binary responses. Moreover, for general data analysts, it is important for them to directly implement existing software and do data analysis. However, rare software related to classification with network structure accommodated has been available. While some R packages related to discriminant analysis exist, such as MASS, sparseLDA, and penalizedLDA, they are not able to handle network structures in predictors.
Motivated by these concerns and to address these challenges, we follow the strategy proposed by Chen  and develop an R package, which is called NetDA. Under the normality assumption for predictors, we apply the graphical lasso method to estimate precision matrices and the corresponding network structures for the predictors. Since precision matrices are the inverse of covariance matrices, it motivates us to directly implement them to linear/quadratic discriminant functions. This strategy is different from the conventional linear discriminant analysis that simply employs empirical estimates of covariance matrices. Moreover, the other issue is prediction. Based on fitted models and predicted values, we also develop a function that contains several commonly used criteria to assess the performance of classification and prediction.
The article is organized as follows. Section 2 introduces the data structure and outlines the methodology in our package. Section 3 describes the usage of the package NetDA. Section 4 illustrates the package by a real dataset. We finally conclude the article in Section 5.
2. Overview of Methodology
In this section, we primarily overview the data structure and network-based discriminant analysis proposed by Chen .
2.1. Data Structure
Suppose that the data contain subjects that come from classes, where is a fixed integer greater than 2 and the classes are nominal. Let be the size in class with , and hence, . Let denote the -dimensional vector of responses with the th component being , which reflects the class membership that the th subject is in the th class for and .
Let denote the dimension of predictors for each subject. Define as the matrix of predictors for and , where the component represents the th predictor for the th subject. Furthermore, let represent the -dimensional vector of the th predictor in the th column of , and let denote the -dimensional predictor vector for the th subject in the th row of . Let denote an independent and identically distributed (i.i.d.) sample. We let lower case letters represent realized values for the corresponding random variables. For example, stands for a realized value of .
2.2. Gaussian Graphical Models
For and , let denote the conditional probability density function of the predictor taking a value given that subject comes from the th class. We particularly consider the case where the conditional distribution given is assumed to be a multivariate normal distribution with a mean vector and a positive-definite covariance matrix . Then, the conditional probability density function is given by
Moreover, by the suitable reparametrization (e.g., Hastie et al. ; p. 246), we can transfer (1) to the Gaussian graphical model (GGM) based on class . The exact formulation is given bywhere includes all the indices and contains all pairs with unequal coordinates, which yields a graph (e.g., Hastie et al. ; Chapter 17); is the symmetric precision matrix with ; and is the -dimensional vector of parameters with . Our main interest is to estimate , since, as we will see later, the main concern in discriminant analysis is to estimate the inverse of the covariance matrix. On the other hand, from the perspective of graphical models, nonzero implies that and are conditionally dependent given other variables in class , while zero value of gives conditional independence of and given other variables. Thus, the precision matrix reflects the network structure of the predictors.
In the past literature, graphical LASSO (GLASSO)  is a common method to estimate . The key idea of GLASSO is based on the likelihood function. To see this, we follow the similar discussion in page 247 of Hastie et al.  and write the log-likelihood function of based on (2) with :where , and is the sum of diagonal entries for a square matrix, andwith being the th eigenvalue of . Assume that the precision matrix is sparse. To estimate and identify network structures by retaining dependent pairs of vertices and removing independent ones, we apply the -norm as a constraint to achieve the desired result. In other words, the estimator of can be obtained by the following optimization:where is the penalty function and is a tuning parameter. The optimization problem (5) is called GLASSO . The detailed algorithm can be found in page 248 of Hastie et al. , and the estimator of can be determined by Bayesian information criterion (BIC). By the similar discussion in Yuan and Lin , the estimated network structure determined by (5) is equal to the true graph with probability approaching one under suitable conditions. For the computation, the R package glasso can be implemented to derive the estimate .
2.3. Discriminant Analysis
The idea of discriminant analysis is to model the distribution of the predictors separately for each of the response classes , and then to use the Bayes theorem to describe the conditional probabilities (e.g., James et al. ).
Specifically, let denote the probability that the th subject is randomly selected from class so that . Moreover, applying the Bayes theorem to the conditional density function and gives the posterior probability asfor and .
To compare two classes and with , we calculate the log-ratio of (6), given by
Scenario 1. If the covariance matrices in (1) are assumed to be common, that is, for every with being a positive definite matrix, (7) becomesIf equation (8) >0, then , showing that subject with predictors is more likely selected from class than from class . Consequently, (8) defines a boundary between classes and in the sense that there is a linear function in separating classes and .
Motivated by the form of (8), we consider a linear function in asMoreover, and can be empirically estimated, respectively, asFor the estimation of , or equivalently , we adopt (5) by pooling all subjects in the dataset and denote as the estimator of . Therefore, (9) can be estimated asand we call (11) the network-based linear discriminant function (NetLDA) and it is used to determine the class label for a new observation. For the prediction of a new subject with the predictor , we first calculate using (11) for . Next, we find that is defined asand the class label for this subject is then predicted as .
Scenario 2. We allow , or equivalently, , for any and . Then, under a distribution assumption (1), we haveReplacing the first term in the right-hand side of (7) by (13) yieldsTherefore, based on (14), we further define a quadratic function of based on the class :For , the estimator of the precision matrix , denoted as , is obtained by (5) based on the predictor information in class ; and can be estimated by (10). Therefore, (15) can be estimated byThe function (16) is called the network-based quadratic discriminant function (NetQDA) and is used to determine the class label for a new observation. For the prediction of a new subject with the predictor , we first calculate using (16) for . Next, we find that is defined asand the class label for this subject is then predicted as .
2.4. Assessment of Classification and Prediction
We first introduce micro-averaged metrics (e.g., Chen et al. ). Let and represent the classes of the subject indexes for validation and training datasets, respectively. Let and denote the sizes of the training and validation data, respectively. We use the training data to fit models and then apply fitted models to compute predicted values for . After that, for , we calculate the number of the true positives (TP), the number of the false positives (FP), and the number of the false negatives (FN) under the validation data , respectively:where is the indicator function.
We define precision (PRE) and recall (REC) under the validation data , respectively, as
Then, micro-F-score is defined as
According to definitions in (19), when all subjects are correctly classified, FP and FN are equal to zero, yielding that PRE and REC are equal to one; if all subjects are falsely classified, then TP is equal to zero, and thus, PRE and REC are equal to zero. Therefore, values of PRE and REC are between zero and one. Moreover, under the range , the F-score falls in as well by treating as zero. In principle, the higher values of PRE, REC, and F-score reflect the better performance and the more accurate classification (e.g., Chen et al. ).
In addition to criteria above, the other commonly used criterion is the adjusted Rand index (ARI). For and under the validation data , we define
Moreover, we define for and for . Then, ARI under the validation data is defined as Hubert and Arabie .
As mentioned in Hubert and Arabie , ARI is bounded above by one, and the higher value of ARI indicates the more accurate classification.
2.5. Benchmark of NetDA
The conventional linear discriminant methods (e.g., MASS, sparseLDA, and penalizedLDA) aim to adopt (9) with estimated by the inverse of the empirical estimator of . However, this approach may encounter cumbersome computation or possible singularity when calculating inverse matrices. In addition, if is sparse, all entries in empirical estimators of are nonzero, and it indicates that some unconnected pairs of predictors may be falsely included. As a result, imprecise estimator of may implicitly affect the performance of classification.
Unlike existing methods, the first contribution of NetDA is to estimate directly. The graphical lasso method is a tool to identify zero entries in and estimate nonzero ones. This approach enables us to retain connected pairs of predictors and exclude unconnected ones. Moreover, our implementation can avoid computing inverse of .
The second contribution of NetDA is to handle heterogeneous network structure that stratifies the predictor information by class when characterizing the predictor network structures, and is able to deal with multiclassification by adopting network structure in each class.
3. Details of NetDA
In this section, we first overview the technique that we need in the existing package. After that, we describe functions in our package.
3.1. Library Overview
In our package, we use the package glasso in R software. Specifically, the package glasso follows from the graphical lasso method proposed by Friedman et al. . The purpose of glasso is to detect network structures of random vectors that follow multivariate normal distributions. In particular, under multivariate normal distributions and operations in glasso, the detection of network structures is equivalent to the estimation of precision matrix.
In our package, we take the wine dataset as an example, which is available in https://archive.ics.uci.edu/ml/datasets/wine. These data were collected based on a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. In this dataset, there are three types of wines and 13 constituents, including alcohol (Alcohol), malic acid (Malic acid), ash (Ash), alkalinity of ash (Alkalinity), magnesium (Magnesium), total phenols (phenols), flavanoids (Flavanoids), nonflavanoid phenols (Nonflavanoid), proanthocyanins (Proanthocyanins), color intensity (Color), hue (Hue), OD280/OD315 of diluted wines (OD280), and proline (Proline).
In the following analysis, the response is types of wines that are labeled as 1, 2, and 3; constituents are treated as predictors that are continuous. The goal is to adopt the information of constituents to construct predictive models and then use them to classify type of wines for a given subject.
NetDA contains two methods. The first method is called NetLDA, which aims to estimate the precision matrix by pooling all individuals in the data, and the corresponding discriminant function is given by (11). The second approach is called NetQDA, whose strategy is to estimate precision matrices based on individuals in different classes, and then use class-dependent estimated precision matrices to define quadratic discriminant functions in (16). Unlike NetLDA, NetQDA takes possibly class-dependent network structures of the predicted variables into account and uses network structures in different classes to determine which classes individuals belong to. When either linear discriminant functions or quadratic discriminant functions are obtained, they can be used to determine the class for a new subject.
To implement the NetLDA and NetQDA methods, we use the following command:
NetDA (X, Y, method, X_test) where the meaning of each argument is described as follows:(i)X: this is an matrix of the predictors from the training data(ii)Y: this is an -dimensional vector of the response from the training data, whose elements are positive integers and reflect class-labels(iii)Method: it is a scalar to determine the classification method: method = 1 represents NetLDA in (11), and method = 2 represents NetQDA in (16)(iv)X_test: this is an matrix of the predictors from the validation data
The purpose of NetDA is to apply the training data “X” and “Y” to determine a fitted model that is specified by the argument “method.” After that, we use “X_test and a fitted model to determine the predicted class for subjects in the validation data. Therefore, the function NetDA returns a list of components:(i)yhat: it is a vector of predicted responses obtained by NetLDA or NetQDA based on the predictors in the validation data (X_test).(ii)Network: this is the estimators of precision matrices. If “method = 1” is chosen, then there is one precision matrix; if “method = 2” is given, then there are precision matrices.
The function Metrics is utilized to assess the performance of classification and prediction based on some commonly used criteria that are introduced in the Section 2.4. Specifically, given responses from the validation data and predicted values obtained by NetDA, we first derive a confusion matrix to see the classification result. To further assess the performance of prediction, we evaluate precision, recall, F-score, and ARI defined in (19), (20), and (22), respectively.
To obtain the desired results, we use the following command:
Metrics (yhat, Y_test) where the meaning of each argument is described as follows:(i)yhat: this is an m-dimensional vector of the predicted responses determined by NetDA or other methods(ii)Y_test: this is an m-dimensional vector of the response from the validation data
The function metrics returns a list of components:(i)Confusion matrix: a confusion matrix based on predicted values (yhat) and responses from the validation data (Y_test)(ii)(PRE, REC, F-score): values of precision, recall, and F-score defined in (19) and (20), respectively(iii)ARI: values of the ARI defined in (22)
4. Demonstration of NetDA
In this section, we demonstrate standard analysis of classification and prediction based on two functions in the package. To show the advantage of NetDA, we compare with existing packages MASS, sparseLDA, and penalizedLDA.
Let denote the number of classes, and let denote the dimension of predictors. We specify the sample size , in which the size of the th class is given by for . For each class, we consider different network structures of predictors. Specifically, for class , let denote a matrix whose diagonal entries are zero and off-diagonal entries are specified as either one or zero to reflect edges of the corresponding two nodes in Figure 1. That is, for , entry in is 1 if the edge exists between and and 0 otherwise. In addition, we further define a diagonal matrix whose nonzero entries are taken as the common value , where represents the smallest eigenvalue of . Finally, we define the precision matrix as that is invertible. Therefore, based on Gaussian graphical models, the -dimensional vector of predictors in class is generated from a multivariate normal distribution with mean zero and the covariance matrix for .
Let Theta1, Theta2, and Theta3 denote matrices that reflect network structures in left, middle, and right panels of Figure 1, respectively, and one can specify those three matrices as follows: >Theta1 > Theta2 > Theta3
Following the description above, we generate the dimensional matrix for predictors and then determine the simulated data: >Theta1 = Theta1 + diag (0.1 + abs (min (eigen (Theta1) $value)), p) >Sigma1 = cov2cor (solve (Theta1)) >X1 = mvrnorm (n = (n/I), rep (0, p), Sigma1, tol = 1e − 6, empirical = FALSE) > >Theta2 = Theta2 + diag (0.1 + abs (min (eigen (Theta2) $value)), p) >Sigma2 = cov2cor (solve (Theta2)) >X2 = mvrnorm (n = (n/I), rep (0, p), Sigma2, tol = 1e − 6, empirical = FALSE) > >Theta3 = Theta3 + diag (0.1 + abs (min (eigen (Theta3) $value)), p) >Sigma3 = cov2cor (solve (Theta3)) >X3 = mvrnorm (n = (n/I), rep (0, p), Sigma3, tol = 1e − 6, empirical = FALSE) >data = cbind (c (rep (1, n/I), rep (2, n/I), rep (3, n/I)), rbind (X1, X2, X3))
To perform classification, we implement NetDA function with two different scenarios described in Section 2.3. In addition, we demonstrate three existing methods labeled as lda (MASS), sda (sparseLDA), and pda (penalizedLDA), respectively. Detailed descriptions are given below: >Y = data [, 1] >X = data [, 2 : 13] >#Demonstration of MASS >lda = lda (Y·X, prior = c (length (which (Y = = 1)), length (which (Y = = 2)), ++length (which (Y = = 3)))/length (Y)) >yhat_lda = predict (lda, data.frame (X)) $class > >#Demonstration of sparseLDA >y = matrix (0, n, I) >y [1 : 200, 1] = 1 >y [201 : 400, 2] = 1 >y [401 : 600, 3] = 1 >colnames (y) < −c (“1,” “2,” “3”) >sda = sda (data.frame (X), y, lambda = 1e − 6, stop = −1, maxIte = 25, +trace = TRUE) >yhat_sda = as.numeric (unlist (predict (sda, data.frame (X)) $class)) >#Demonstration of penalizedLDA >pda = PenalizedLDA (X, Y, lambda = 0.14, K = 2) >yhat_pda = as.numeric (unlist (predict (pda, data.frame (X)))) [1 : n] >#Demonstration of NetDA >yhat_netlda = NetDA (X, Y, method = 1, X) $yhat >yhat_netqda = NetDA (X, Y, method = 2, X) $yhat
After that, to assess the performance of classification, we adopt the function Metrics to compute values of criteria (19) and (20) as shown by  and (22) indicated by . >F_lda = Metrics (yhat_lda, Y)  >F_sda = Metrics (yhat_sda, Y)  >F_pda = Metrics (yhat_pda, Y)  >F_netlda = Metrics (yhat_netlda, Y)  >F_netqda = Metrics (yhat_netqda, Y)  >ARI_lda = Metrics (yhat_lda, Y)  >ARI_sda = Metrics (yhat_sda, Y)  >ARI_pda = Metrics (yhat_pda, Y)  >ARI_netlda = Metrics (yhat_netlda, Y)  >ARI_netqda = Metrics (yhat_netqda, Y) 
We repeat above simulations 500 times and summarize numerical results in Table 1. We observe that the package NetDA provides higher values of PRE, REC, F-score, and adn ARI, showing that the classification obtained by the NetDA method is more accurate than that determined by other methods. Specifically, compare with MASS and NetLDA, we can see that the latter outperforms the former method, which is due to the incorporation of network structure with irrelevant pairs of predictors removed from . On the other hand, for the comparison between NetLDA and NetQDA, we can see that the NetQDA is much better than the NetLDA method, because the NetQDA method successfully detects network structures from each class, and those detected network structures are valid to do classification. Those numerical findings verify the discussion in Subsection “Benchmark of NetDA.”
4.2. Real Data Analysis
In this study, we take the wine dataset as an example, which is introduced in Section 3, to demonstrate the package NetDA. To demonstrate the functions and perform classification and prediction, we first split the full data into the training data and the validation data. In our example, we take the first 45 samples in each class to obtain the training data and use the remaining samples in each class to form the validation data. >data (WineData) >Y = WineData [, 1] #the response >X = WineData [, 2 : 14] #the predictors >D1 = WineData [which (Y = = 1), ] >D2 = WineData [which (Y = = 2), ] >D3 = WineData [which (Y = = 3), ] >#An example of user-specific training data and validation data >#“Train” represents the training data and “Test” represents validation data in our example. >Train = rbind (D1 [1 : 45, ], D2 [1 : 45, ], D3 [1 : 45, ]) >Test = rbind (D1 [46 : dim (D1) , ], D2 [46 : dim (D2) , ], D3 [46 : dim (D3) , ]) >#The response (Y) and predictors (X) in the training data >X = Train [, 2 : 14] > Y = Train [, 1] >#The response (Y_test) and predictors (X_test) in the validation data >X_test = Test [, 2 : 14] > Y_test = Test [, 1]
When the training data and the validation data are determined, we employ the function NetDA to perform classification. We insert “X,”“Y,” and “X_test” to the function NetDA, and we denote “NetLDA” and “NetQDA” as the argument method = 1 and method = 2, respectively. The resulting vectors of predicted classes and estimated precision matrices are given by “$yhat” and “$Network,” respectively. >NetDA (X, Y, method = 1, X_test) -> NetLDA >yhat_lda = NetLDA$yhat >Net_lda = NetLDA$Network >yhat_lda  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2  2 2 2 2 3 3 3 3 >round (Net_lda, 3) ############# >NetDA (X, Y, method = 2, X_test) -> NetQDA >yhat_qda = NetQDA$yhat >Net_qda = NetQDA$Network >yhat_qda  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2  2 3 3 3 3 >round (Net_qda [], 3) >round (Net_qda [], 3) >round (Net_qda [], 3)
Moreover, for the visualization of estimated network structures, we further apply the packages network, GGally and sna, to draw the network structure based on the estimated precision matrices. The following commands are implemented to draw network structures, and the corresponding figures determined by NetDA with the arguments method = 1 and method = 2 are displayed in Figures 2 and 3, respectively. >library (network) >library (GGally) >library (sna) >material_name = c (“Alcohol,” “Malic acid,” “Ash,” “Alkalinity,” “Magnesium,” “phenols,” + “Flavanoids,” “Nonflavanoid,” “Proanthocyanins,” “Color,” “Hue,” “OD280,” “Proline”) >ggnet2 (Net_lda, mode = “circle,” size = 8, label = material_name, label.size = 5) >ggnet2 (Net_qda [], mode = “circle,” size = 8, label = material_name, label.size = 5) >ggnet2 (Net_qda [], mode = “circle,” size = 8, label = material_name, label.size = 5) >ggnet2 (Net_qda [], mode = “circle,” size = 8, label = material_name, label.size = 5)
From Figures 2 and 3, we can observe that precision matrices provide complex network structures in predictors. In particular, in Figure 3, we can see that the estimated class-dependent network structures are different from each other, and the network structure in class 2 looks more complex than others. To assess the performance of prediction, we input predicted values (yhat_lda or yhat_qda) and responses in the validation data (Y_test) to the function Metrics, and the resulting values are displayed below. >Metrics (yhat_lda, Y_test) $“Confusion matrix” [, 1] [, 2] [, 3] [1, ] 15 0 0 [2, ] 0 26 0 [3, ] 0 1 4 $“(PRE, REC, F-score)”  0.9782609 0.9782609 0.9782609 $ARI  0.9410827 ############# >Metrics (yhat_qda, Y_test) $“Confusion matrix” [, 1] [, 2] [, 3] [1, ] 15 0 0 [2, ] 0 27 0 [3, ] 0 0 4 $“(PRE, REC, F-score)”  1 1 1 $ARI  1
Finally, we further adopt the function lda in the packages MASS, sparseLDA, and penalizedLDA to perform the conventional discriminant methods and compare them with our NetDA. Detailed implementations and numerical results are given below: >Wine = data.frame (cbind (Y, X)) >##Demonstration of MASS >lda = lda (Y., Wine, prior = c (length (which (Y = = 1)), length (which (Y = = 2)), +length (which (Y = = 3)))/length (Y)) >predict (lda, X_test) $class -> lda_pred > >Metrics (lda_pred, Y_test) $“Confusion matrix” [, 1] [, 2] [, 3] [1, ] 15 1 0 [2, ] 0 26 0 [3, ] 0 0 4 $“(PRE, REC, F-score)”  0.9782609 0.9782609 0.9782609 $ARI  0.9196658 >#Demonstration of sparseLDA >n = length (Y) >I = max (Y) >y = matrix (0, n, I) >y [1 : 45, 1] = 1 >y [46 : 90, 2] = 1 >y [91 : 135, 3] = 1 >colnames (y) <- c (“1,” “2,” “3”) >sda = sda (data.frame (X), y, lambda = 1e − 6, stop = −1, maxIte = 25, +trace = TRUE) ite: 1 ridge cost: 101.1419 : 0.001636884 ite: 2 ridge cost: 47.61608 : 0.002647311 ite: 3 ridge cost: 47.61608 : 0.002647311 ite: 1 ridge cost: 129.1129 : 0.01364973 ite: 2 ridge cost: 129.1129 : 0.01364973 final update, total ridge cost: 176.729 : 0.01629704 >sda_pred = as.numeric (unlist (predict (sda, data.frame (X_test)) $class)) >Metrics (sda_pred, Y_test) $“Confusion matrix” [, 1] [, 2] [, 3] [1, ] 14 26 2 [2, ] 0 0 0 [3, ] 0 0 1 $“(PRE, REC, F-score)”  0.3488372 0.3488372 0.3488372 $ARI  0.07272024 >#Demonstration of penalizedLDA >pda = PenalizedLDA (X, Y, lambda = 0.14, K = 2) >pda_pred = as.numeric (unlist (predict (pda, data.frame (X_test)))) [1 : 46] >Metrics (pda_pred, Y_test) $“Confusion matrix” [, 1] [, 2] [, 3] [1, ] 14 4 0 [2, ] 0 21 0 [3, ] 0 1 3 $“(PRE, REC, F-score)”  0.8837209 0.8837209 0.8837209 $ARI  0.6229476
In general, we can see that NetLDA and NetQDA have the satisfactory performance in prediction. For the NetLDA method, there is one misclassification as shown in the confusion matrix, while the predicted classes determined by NetQDA are all equal to the responses in the validation data. From the comparison to NetDA, we observe from a confusion matrix determined by the conventional linear discriminant analysis (lda) is comparable to that obtained by the NetLDA method, but it is interesting to see that the value of ARI determined by NetLDA is slightly larger than that based on lda. In addition, it is clear to see that the NetQDA method is better than lda. On the contrary, it is surprising to see that two penalized methods sparseLDA and penalizedLDA do not have satisfactory performance of classification and prediction, especially that sparseLDA has the most unexpected result. In summary, the numerical results in this data analysis show (a) the importance of incorporating predictor network structures in the classification procedure, and (b) the advantage of adopting class-dependent network structures.
Classification and prediction have been important topics in supervised learning, and discriminant analysis is a useful method in statistical learning. While many methods have been developed, little method has been available to handle potential network structures in predictors when building predictive models. In addition, rare relevant software has been developed for statistical analysts whose interest is to incorporate network structures and obtain precise classification.
To address this concern, we develop an R package NetDA for public use. Our package provides two functions. The function NetDA aims to incorporate the information of network structures in predictors to do linear or quadratic discriminant functions. The other function Metrics summarizes some useful and informative criteria to assess the performance of classification and prediction. A detailed documentation and concrete examples illustrate the validity of the methods in this package. Finally, some further developments can be explored based on the current package, including the alternative approaches of detection of network structures (e.g., Hastie et al. ; Section 9.4), nonparametric discriminant analysis with network structure accommodated (e.g., Chen ), and analysis of noisy data, such as measurement error models (e.g., Chen and Yi ; Chen and Yi ).
Data used to support this study are available at https://archive.ics.uci.edu/ml/datasets/wine.
Conflicts of Interest
The author declares that there are no conflicts of interest.
The author also specifically thanks Ms. Lingyu Cai for kind assistance in preparing a vignette. This research was supported by the Ministry of Science and Technology with grant ID 110-2118-M-004-006-MY2.
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, New York, NY, USA, 2008.
W. He, G. Y. Yi, and L.-P. Chen, “Support vector machine with graphical network structures in features,” in Proceedings of the 15th International Conference on Machine Learning and Data Mining, MLDM 2019, vol. 2, pp. 557–570, 2019, https://easychair.org/publications/preprint/g6d1.View at: Google Scholar
T. Hastie, R. Tibshirani, and M. Wainwright, Statistical Learning with Sparsity: The Lasso and Generalizations, CRC Press, New York, NY, USA, 2015.
G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning: With Applications in R, Springer, New York, NY, USA, 2017.
L.-P. Chen and G. Y. Yi, “De-noising analysis of noisy data with graphical models,” Electronic Journal of Statistics, vol. 16, pp. 3861–3909, 2022.View at: Google Scholar