Review Article

Different Data Mining Approaches Based Medical Text Data

Table 2

The information of analysis methods for medical text data.

MethodsPurposeAlgorithmsAdvantagesShortcomings

ClusteringClassify similar subjects in medical textsK-means [28, 29]1.Simple and fast
2. Scalability and efficiency
1. Large amount of data and time-consuming
2. More restrictions on use

ClassificationRead medical text data for intention recognitionANN [30, 31]1. Solve complex mechanisms in text data
2. High degree of self-learning
3. Strong fault tolerance
1. Slow training
2. Many parameters and difficulty in adjusting parameters
Decision tree [32, 33]1. Handle continuous variables and missing values
2. Judge the importance of features
1. Overfitting
2. The result is unstable
Naive bayes [34]1. The learning process is easy
2. Good classification performance
Higher requirements for data independence

Association rulesMine frequent items and corresponding association rules from massive medical text datasetsApriori [35, 36]Simple and easy to implementLow efficiency and time-consuming
FP-tree [37]1. Reduce the number of database scans
2. Reduce the amount of memory space
High memory overhead
FP-growth [38]1. Improve data density structure
2. Avoid repeated scanning
Harder to achieve
Logistic RegressionAnalyze how variables affect resultsLogistic regression [39]1.Visual understanding and interpretation
2. Very sensitive to outliers
1.Easy underfitting
2. Cannot handle a large number of multiclass features or variables