Research Article
A Semiautomated Framework for Integrating Expert Knowledge into Disease Marker Identification
Table 1
Optimal integrated CAs derived from (A) the distance-based hierarchical clustering and (B) the disease-model-related functional selection approaches.
|
Clustering based on
| Optimal Integrated CA | Distance matrix |
Data
expression
profiles |
Functional
relationships | A combination of the two | No. of clusters | 1 | 6 | 12 | 6 | 12 | 6 | 12 |
| (A) Different distance matrices | BALF | Full | 0.83 | 0.86 | 0.79 | 0.93 | 0.80 | 0.81 | 0.81 | Partial | | 0.93 | 0.96 | 0.96 | 1.00 | 1.00 | 0.99 | Plasma | Full | 0.66 | 0.68 | | 0.54 | | 0.62 | | Partial | | 0.77 | | 0.79 | | 0.83 | |
| | Number of proteins1 | All | Top 3 | | | | No. of clusters | 1 | 12 | | 1 | 12 | | |
| (B) Disease model-related functional selection | BALF | Full | 0.81 | 0.90 | | 0.88 | 0.82 | | | | Partial | | 0.93 | | | 0.99 | | | Plasma | Full | 0.57 | 0.56 | | 0.59 | 0.65 | | | | Partial | | 0.73 | | | 0.87 | | |
|
|
This refers to the different number of significantly changed proteins (all proteins or top 3 proteins) used in each cluster.
|