Applied Computational Intelligence and Soft Computing

Review Article

Mental Health Prediction Using Machine Learning: Taxonomy, Applications, and Challenges

Table 5

Summary of the machine learning approaches within mental health problems.


Author, year	Mental health problem	Sample data set	Machine learning model	Performances	Comments

Greenstein, 2012 [20]	Schizophrenia	(i) 98 childhood-onset schizophrenia (ii) 99 healthy controls	Random forest	Accuracy: 73.7%	Regional brain measured have been chosen to provide lower resolution compared to the higher resolution voxel-wise measures
Jo et al., 2020 [21]	Schizophrenia	(i) 48 schizophrenia patients (ii) 24 healthy controls	(i) Random forest (ii) Multinomial naive Bayes (iii) XGBoost (iv) Support vector machine	Accuracy: (i) Random forest: 68.9% (ii) Multinomial naive Bayes: 66.9% (iii) XGBoost: 66.3% (iv) Support vector machine: 58.2%
Yang et al., 2010 [22]	Schizophrenia	(i) 20 schizophrenia patients (ii) 20 healthy controls	Support vector machine	Accuracy: (i) 0.82 with functional magnetic resonance imaging (ii) 0.74 with single nucleotide polymorphism
Srinivasagopalan et al., 2019 [23]	Schizophrenia	(i) 69 schizophrenia patients (ii) 75 controls	(i) Deep learning (ii) Support vector machine (iii) Random forest (iv) Logistic regression	Accuracy: (i) Deep learning: 94.44% (ii) Support vector machine: 82.68% (iii) Random forest: 83.33% (iv) Logistic regression: 82.77%
Pläschke et al., 2017 [24]	Schizophrenia	(i) 86 schizophrenia patients (ii) 84 healthy controls	Support vector machine	Accuracy: 68%	Young-old classification was dependent on all networks and outperformed clinical classification
Pinaya et al., 2016 [25]	Schizophrenia	(i) 143 schizophrenia patients (ii) 83 healthy controls	(i) Deep belief network (ii) Support vector machine	Accuracy: (i) Deep belief network: 73.6% (ii) Support vector machine: 68.1%
Chekroud et al., 2016 [27]	Depression	1949 patients with level 1 of depression	Gradient boosting	Accuracy: 64.6%
Sau and Bhakta, 2017 [29]	Depression and anxiety	510 elderly patients	(i) Bayesian network (ii) Naive Bayes (iii) Logistic regression (iv) Multilayer perceptron (v) Sequential minimal optimisation (vi) K-star (vii) Random subspace (viii) J48 (ix) Random forest (x) Random tree	Accuracy: (i) Bayesian network: 79.8% (ii) Naïve Bayes: 79.6% (iii) Logistic Regression: 72.4% (iv) Multilayer perceptron: 77.8% (v) Sequential minimal optimisation: 75.3% (vi) K-star: 75.3% (vii) Random subspace: 87.5% (viii) J48: 87.8% (ix) Random forest: 89.0% (x) Random tree: 85.1%	The random forest model has been tested with another data set and achieved the accuracy of 91.0%
Ahmed et al., 2019 [28]	Depression and anxiety	Data set of depression and anxiety	(i) Convolutional neural network (ii) Support vector machine (iii) Linear discriminant analysis (iv) K-nearest neighbour	Highest accuracy achieved by convolutional neural network: 96.0% for anxiety and 96.8% for depression
Katsis et al., 2011 [30]	Anxiety	Physiological signals among anxiety patients	(i) Artificial neuro networks (ii) Random forest (iii) Neuro-fuzzy system (iv) Support vector machine	Accuracy: (i) Artificial neuro networks: 77.3% (ii) Random forest: 80.83% (iii) Neuro-fuzzy system: 84.3% (iv) Support vector machine: 78.5%	Overall classification accuracy is 84.3%
Sau and Bhakta, 2019 [31]	Depression and anxiety	Data set of 470 seafarers	(i) CatBoost (ii) Logistic regression (iii) Support vector machine (iv) Naive Bayes (v) Random forest	Accuracy: (i) CatBoost: 89.3% (ii) Logistic regression: 87.5% (iii) Support vector machine: 87.5% (iv) Naive Bayes: 82.1% (v) Random forest: 78.6%	CatBoost has achieved the highest accuracy of 89.3% and highest precision of 89.0%
Hilbert et al., 2017 [32]	Anxiety	Multimodal behavioural data with sample of anxiety disorders, healthy persons and major depression	Support vector machine	Accuracy: (i) 90.10% for the case classification (ii) 67.46% for the disorder classification
Jerry et al., 2019 [33]	Depression	Text and audio data sets	(i) Gaussian process classification (ii) Logistic regression (iii) Neural networks (iv) Random forest (v) Support vector machine (vi) XGBoost (vii) K-nearest neighbours	Mean F1-score for text data set: (i) Gaussian process classification: 0.71 (ii) Logistic regression: 0.69 (iii) Neural networks: 0.68 (iv) Random forest: 0.73 (v) Support vector machine: 0.72 (vi) XGBoost: 0.69 (vii) K-nearest neighbours: 0.67 Mean F1-score for audio data set: (i) Gaussian process classification: 0.48 (ii) Logistic regression: 0.48 (iii) Neural networks: 0.42 (iv) Random forest: 0.44 (v) Support vector machine: 0.40 (vi) XGBoost: 0.50 (vii) K-nearest neighbours: 0.49	For the text data set, random forests show the best performance. For the audio data set, XGBoost show the best performance.
Rocha-Rego et al., 2014 [34]	Bipolar disorder	(i) 40 subjects with bipolar disorder (ii) 40 subjects of healthy controls	Gaussian process classification	Accuracy: 69–78% Sensitivity: 64–77% Specificity: 69–99%
Grotegerd et al., 2013 [35]	Bipolar disorder	(i) 10 subjects with bipolar disorder (ii) 10 subjects with unipolar disorder (iii) 10 healthy controls	(i) Gaussian process classification (ii) Support vector machine	Accuracy: (i) Gaussian process classification: 70% (ii) Support vector machine: 70%
Valenza et al., 2016 [36]	Bipolar disorder	Electrocardiogram signals from the patients	Support vector machine	Accuracy: 69%
Mourão-Miranda et al., 2012 [37]	Bipolar disorder	(i) 18 subjects with bipolar disorder (ii) 18 subjects with unipolar disorder (iii) 18 healthy controls	Gaussian process classification	Accuracy: 67% Specificity: 72% Sensitivity: 61%
Roberts et al., 2016 [38]	Bipolar disorder	(i) 49 bipolar disorder patients (ii) 71 at-risk subjects (iii) 80 healthy controls	Multiclass support vector machine	Overall accuracy: 64.3%
Akinci et al., 2012 [39]	Bipolar disorder	(i) 40 subjects with bipolar disorder (ii) 55 healthy controls	Support vector machine	Accuracy: 96.36%
Wu et al., 2016 [40]	Bipolar disorder	(i) 21 subjects of bipolar disorder (ii) 21 healthy controls	LASSO	Accuracy: 71% AUC: 0.714
Reece et al., 2017 [41]	PTSD	(i) 63 PTSD (ii) 111 healthy controls	Random forest	AUC: 0.89	There are two categories for detecting the PTSD and depression
Leightley et al., 2018 [42]	PTSD	13,690 subjects of the military forces from 2004 to 2009	(i) Support vector machine (ii) Random Forest (iii) Artificial neural networks (iv) Bagging	Accuracy: (i) Support vector machine: 91% (ii) Random forest: 97% (iii) Artificial neural networks: 89% (iv) Bagging: 95%	Exploitation of alcohol, gender, and deployment status are the variables affected to the performance
Papini et al., 2018 [43]	PTSD	(i) 110 PTSD patients (ii) 231 trauma-exposed controls	Gradient-boosted decision trees	Accuracy: 78% AUC: 0.85 Sensitivity: 69% Specificity: 83%
Conrad et al., 2017 [44]	PTSD	(i) 441 trauma-exposed subjects as training data set (ii) 211 trauma-exposed subjects as testing data set	(i) Random forest with conditional inference (ii) LASSO (iii) Linear regression	Accuracy: (i) Random forest with conditional inference: 77.25% (ii) LASSO: 74.88% (iii) Linear regression: 75.36%
Marmar et al., 2019 [45]	PTSD	(i) 52 subjects of PTSD (ii) 77 trauma-exposed controls	Random forest	Accuracy: 89.1% AUC: 0.954
Vergyri et al., 2015 [46]	PTSD	(i) 15 subjects of PTSD (ii) 24 trauma-exposed controls	(i) Gaussian backend (ii) Decision tree (iii) Neural network (iv) Boosting	Overall accuracy: 77%	Speech features have particular power for the prediction of the PTSD
Salminen et al., 2019 [47]	PTSD	97 war veterans	Support vector machine	Accuracy: 69% Sensitivity: 58% Specificity: 81%	The important feature chosen is the surface area in the right posterior cingulate
Rangaprakash et al., 2017 [48]	PTSD	87 male soldiers	Support vector machine	Accuracy: 83.59%	PTSD is associated with hippocampal-striatal hyperconnectivity
Sumathi and Poorna, 2016 [49]	Mental health problems among children	Data set from interviews which contained 60 instances	(i) Average one-dependence estimator (AODE) (ii) Multilayer perceptron (iii) Logical analysis tree (LAT) (iv) Radial basis function network (RBFN) (v) K-star (vi) Functional tree (FT)	Accuracy: (i) AODE: 71% (ii) Multilayer perceptron: 78% (iii) LAT: 70% (iv) RBFN: 57% (v) K-star: 42% (vi) FT: 42%	The highest accuracy is achieved by multilayer perceptron, which is 78%
Tate et al., 2020 [50]	Mental health problems among children	7638 twins from the Child and Adolescent Twin Study in Sweden	(i) Random forest (ii) Support vector machine (iii) Neural network (iv) Logistic regression (v) XGBoost	AUC: (i) Random forest: 0.739 (ii) Support vector machine: 0.736 (iii) Neural network: 0.705 (iv) Logistic regression: 0.700 (v) XGBoost: 0.692	The performances of machine learning models are close to each other