Review Article
Different Data Mining Approaches Based Medical Text Data
Table 2
The information of analysis methods for medical text data.
| Methods | Purpose | Algorithms | Advantages | Shortcomings |
| Clustering | Classify similar subjects in medical texts | K-means [28, 29] | 1.Simple and fast 2. Scalability and efficiency | 1. Large amount of data and time-consuming 2. More restrictions on use |
| Classification | Read medical text data for intention recognition | ANN [30, 31] | 1. Solve complex mechanisms in text data 2. High degree of self-learning 3. Strong fault tolerance | 1. Slow training 2. Many parameters and difficulty in adjusting parameters | Decision tree [32, 33] | 1. Handle continuous variables and missing values 2. Judge the importance of features | 1. Overfitting 2. The result is unstable | Naive bayes [34] | 1. The learning process is easy 2. Good classification performance | Higher requirements for data independence |
| Association rules | Mine frequent items and corresponding association rules from massive medical text datasets | Apriori [35, 36] | Simple and easy to implement | Low efficiency and time-consuming | FP-tree [37] | 1. Reduce the number of database scans 2. Reduce the amount of memory space | High memory overhead | FP-growth [38] | 1. Improve data density structure 2. Avoid repeated scanning | Harder to achieve | Logistic Regression | Analyze how variables affect results | Logistic regression [39] | 1.Visual understanding and interpretation 2. Very sensitive to outliers | 1.Easy underfitting 2. Cannot handle a large number of multiclass features or variables |
|
|