Journal of Healthcare Engineering

Review Article

Different Data Mining Approaches Based Medical Text Data

The information of analysis methods for medical text data.


Methods	Purpose	Algorithms	Advantages	Shortcomings

Clustering	Classify similar subjects in medical texts	K-means [28, 29]	1.Simple and fast 2. Scalability and efficiency	1. Large amount of data and time-consuming 2. More restrictions on use

Classification	Read medical text data for intention recognition	ANN [30, 31]	1. Solve complex mechanisms in text data 2. High degree of self-learning 3. Strong fault tolerance	1. Slow training 2. Many parameters and difficulty in adjusting parameters
		Decision tree [32, 33]	1. Handle continuous variables and missing values 2. Judge the importance of features	1. Overfitting 2. The result is unstable
		Naive bayes [34]	1. The learning process is easy 2. Good classification performance	Higher requirements for data independence

Association rules	Mine frequent items and corresponding association rules from massive medical text datasets	Apriori [35, 36]	Simple and easy to implement	Low efficiency and time-consuming
		FP-tree [37]	1. Reduce the number of database scans 2. Reduce the amount of memory space	High memory overhead
		FP-growth [38]	1. Improve data density structure 2. Avoid repeated scanning	Harder to achieve
Logistic Regression	Analyze how variables affect results	Logistic regression [39]	1.Visual understanding and interpretation 2. Very sensitive to outliers	1.Easy underfitting 2. Cannot handle a large number of multiclass features or variables