Complexity

Research Article

Analysis and Prediction of CET4 Scores Based on Data Mining Algorithm

Table 2

K-nearest neighbor CET4 performance analysis and prediction algorithm flow.

Algorithm flow: K-nearest neighbor CET4 performance analysis and prediction

Step 1. Coincidence matrix. A table displays rows defined by actual values and columns defined by predicted values, as well as the number of records in each cell that conforms to the schema. If more than one field related to the same output segment is generated, but these fields are generated by different models, the sum of the cases where these fields are the same but different is counted and displayed.

Step 2. Performance evaluation. This statistic is a measure used to predict the average information content of bits in the model for records belonging to that category. Considering the different difficulty of different categories in the classification problem, the accurate prediction of rare categories will get a higher performance evaluation than that of common categories. If the model does not perform random guesses for a category, the category's performance evaluation index will be 0.

Step 3. Number of letters. For models that generate confidence fields, this option reports statistics about the confidence values and their relationship to the predictions. This option has two settings: one is a threshold. The accuracy of the report meets a specified percentage of confidence. The second is to improve accuracy. Report the confidence level of accuracy improved by the specified coefficient.

Step 4. Divide by partition. If you use the partitioning field to split the record into a training example 8, a test example, and a validation example, selecting this option will display the results for each partition separately.

Step 5. User-defined analysis. The CLEM expression is used to specify what should be evaluated for each record in order to combine the score values at the record level into an overall score value.