About this Journal Submit a Manuscript Table of Contents
Evidence-Based Complementary and Alternative Medicine
Volume 2012 (2012), Article ID 516473, 5 pages
Research Article

Visual Agreement Analyses of Traditional Chinese Medicine: A Multiple-Dimensional Scaling Approach

1Department of Traditional Chinese Medicine, Changhua Christian Hospital, Changhua 50006, Taiwan
2Graduate Institute of Statistics and Information Science, National Changhua University of Education, Changhua 50058, Taiwan
3Department of Computer Science and Engineering, National Sun Yat-Sen University, Kaohsiung 80424, Taiwan

Received 4 July 2012; Revised 9 August 2012; Accepted 17 August 2012

Academic Editor: Zhaoxiang Bian

Copyright © 2012 Lun-Chien Lo et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


The study of TCM agreement in terms of a powerful statistical tool becomes critical in providing objective evaluations. Several previous studies have conducted on the issue of consistency of TCM, and the results have indicated that agreements are low. Traditional agreement measures only provide a single value which is not sufficient to justify if the agreement among several raters is strong or not. In light of this observation, a novel visual agreement analysis for TCM via multiple dimensional scaling (MDS) is proposed in this study. If there are clusters present in the raters in a latent manner, MDS can prove itself as an effective distinguisher. In this study, a group of doctors, consisting of 11 experienced TCM practitioners having clinical experience ranging from 3 to 15 years with a mean of 5.5 years from the Chinese Medicine Department at Changhua Christian Hospital (CCH) in Taiwan were asked to diagnose a total of fifteen tongue images, the Eight Principles derived from the TCM theorem. The results of statistical analysis show that, if there are clusters present in the raters in a latent manner, MDS can prove itself as an effective distinguisher.

1. Introduction

Reliability is an indispensable requirement in the biomedical diagnostics. The intraclass or interclass reliabilities have been proposed by many authors [17]. There are many works studying agreement measures for western medical diagnostics. However, only a few of them perform agreement analysis for TCM practitioners. In most of the literature concerning TCM agreement, even though complex combinations of TCM diagnostics are considered, a so-called proportion of agreement measure is adopted. The “proportion of agreement,” shown by evidence, overlooks the possible bias caused by randomness. In order to remedy the bias, Cohen proposed his renowned “alpha” measure. Soon after his contribution, weighted kappa, Fleiss kappa, and so forth had been proposed to deal with more complex data types and more raters. Reference [8] considered a reliability measure called “Krippendorff’s alpha” to investigate the agreement of tongue diagnoses when there are many practitioners, and the data is ordinal. Krippendorff’s alpha coefficient equal to 0.7343 was reported in their study.

The core of diagnosis in Chinese Medicine is “pattern identification/syndrome differentiation and treatment” with inspection, listening, and smelling examination, inquiry, and palpation as the bases. Inspection tops the four diagnoses, and tongue diagnosis is a crucial part during observation. The tongue is connected to the internal organs through meridians; thus the conditions of organs, qi, blood, and body fluids as well as the degree and progression of disease are all reflected on the tongue. Organ conditions, properties, and variations of pathogens can be revealed through observation of tongue. Tongue inspection refers to the shape, color, and coating of a tongue that is, the degree of dimension for tongue diagnosis is three. Krippendorff’s alpha is a good approach for agreement analysis when evaluating the agreement of many TCM practitioners with ordinal data. However, it is complex, and only a single index representing agreement is rendered. More importantly, Krippendorff’s alpha cannot deal with high-dimensional ordinal data obtained through the TCM tongue diagnosis. These two aforementioned pitfalls invalidate the application of Krippendorff’s alpha to the analysis of multidimensional agreement data. Other effective means has to be sought.

In light of the previous observation, we aim at proposing an effective approach to simultaneously deal with highdimensional ordinal data as well as the case when clusters present in the rating result.

A single value of agreement can only represent the “averaging mass” of agreement. We can hardly derive any meaning information out of the single agreement measure, especially when there are clusters present. For example, in the diagnosis of tongue shapes (thick, medium, and thin), suppose that there are three TCM practitioners judging some patients as “thick” and the other three practitioners “medium.” We might reach a low-agreement conclusion, though the agreement is strong, respectively, within each of the two groups. It is interesting that, although it might be low in overall agreement, different TCM prescriptions could work well equally. With these perspectives, an alternative approach, such as multiple-dimensional scaling (MDS), may prove itself as a better alternative to analyzing the agreement of diagnostics among many TCM practitioners with high-dimensional ordinal data. Kupper and Hafner proposed a method to assess the extent of interrater agreement when each unit to be rated is characterized by a subset of distinct nominal attributes [9]. When the attribute data is high-dimensional, the interrater agreement can be treated as the similarity used in multiple-dimensional scaling [10] (MDS). The essence of MDS is an attempt to represent the observed similarities or dissimilarities in the form of a geometrical model by embedding the stimuli of interest in some coordinate space so that a specified measure of distance, for example, Euclidean distance, between the points in the space represents the observed proximities. In other words, MDS is the search for a low-dimensional space where each space point represents stimulus and the distance between points corresponds to dissimilarity.

In this study, we recruited eleven TCM practitioners with ages ranging from 29 to 47. A total of 15 tongue pictures, taken by the Automatic Tongue Diagnosis System (ATDS) developed to extract tongue features to assist clinical diagnosis, are randomly chosen.

For each of these fifteen tongue images, the recruited TCM practitioners have to identify the patterns according to Eight principles. The Eight principal syndromes are made up of four pairs of opposites, namely, Yin and Yang, Cold and Hot, Empty and Full (or Deficiency and Excess), Exterior and Interior. A symptom or disease can possess several of these properties simultaneously.

2. Method and Results

2.1. Patients and TCM Tongue Inspectors

Fifteen pictures of tongues are randomly selected from the archive of the Department of TCM, Changhua Christian Hospital (CCH). The pictures were taken by a digital image capturing and analyzing system called ATDS and were rated by eleven TCM practitioners with ages ranging from 29 to 47. The recruited TCM physicians have to classify each image, based on the Eight Principles, according to the features revealed by the tongues.

2.2. Statistical Analysis

In this study we use four dissimilarity measures to conduct a nonmetric MDS which was first proposed by Kruskal [11, 12]. The four measures refer to Kupper and Hafner’s IAMA [9] (interrater agreement for multiple attributes), mean character difference (MCD), index of association [10] (IOA), and average Cohen’s kappa (Cohen’s kappa). The IAMA measure is a chance-corrected concordance. Among these four measures, IAMA and Cohen’s kappa belong to similarity measures, while the other two measure dissimilarity. These four measures will be described in detail in the Appendix. Table 1 is a summary of the patterns of the fifteen patients that are identified by the eleven TCM physicians of CCH according to the Eight Principles. The letters in the body of the table refer to specific TCM physicians. In Table 2, the dissimilarities obtained by IAMA among the TCM physicians are listed. For example, the interrater agreement between rater A and rater C is 0.2462 therefore the dissimilarity can be defined by 10.2462=0.7538. Naturally, the diagonal entries are identically zero. The MDS graphs of agreement measures by the proposed four approaches are illustrated in Figure 1. The upper-left graph uses IAMA measure to conduct MDS, the upper-right one corresponds to the MCD method, the lower-left one represents the IOA method, and the lower-right one employs averaging Cohen’s kappa of each attribute in the eight patterns between two distinct raters.

Table 1:  A summary of the patterns of the fifteen patients identified by the eleven TCM physicians. Alphabets in the table entries correspond to different TCM physicians participated.
Table 2: Dissimilarities obtained by IAMA among the TCM physicians.
Figure 1: MDS graphs for multiple attributes of Eight Principles for 11 TCM practitioners and 15 patients.

3. Results

We summarize the diagnoses of the patterns of the fifteen patients in Table 1. According to the four measures mentioned previously, MDS analysis may be conducted to further derive these similarity or dissimilarity measures. Figure 1 shows that the MDS graphs by IAMA and Cohen’s kappa are similar. Rater C is an outlier for all these four graphs. Besides, the graphs by IAMA and Cohen’s kappa share some characteristics in common. Note raters I and F are a little away from the biggest cluster formed by raters B, D, E, G, J, and K. Secondly, raters A and H form a small cluster. Traditional MDS distances using MCD or IOA lead to similar results. From Figure 1, raters C, I, H, and A are isolated singletons. There exists only one cluster formed by raters B, D, E, F, G, J, and K. In all these four graphs, raters B, D, E, G, J and K form a cluster.

4. Conclusion

In the TCM diagnostics, the practitioners are routinely confronted with a multiple-dimensional qualitative problem of symptom identification. Conventionally, the diagnosis according to Eight Principles summarizes the dynamics of a patient pursuing TCM treatment. When a TCM practitioner receives the information taken by way of the four diagnostics called “inspection, listening (smelling), inquiring and palpation,” he has to distinguish the patterns which are coherent with the symptoms exhibited by the patients. Therefore, how to measure the agreement of the diagnoses according to the vector attributes observed by TCM practitioners is an important issue.

For a single attribute, the researchers are used to adopt Cohen’s kappa, Fleiss kappa, or Krippendorff’s alpha to obtain a single-valued agreement measure. There is a drawback in these popular agreement measures. It does not have a rule of thumb to judge the level of agreement. In this study, we introduce a novel approach in deriving interrater agreement including IAMA proposed by Kupper and Hafner and the averaging Cohen’s kappa, to calculate dissimilarities between any pair of raters. Using the dissimilarity measures, the MDS analysis can be conducted and an agreement graph is subsequently obtained. Figure 1 shows that rater C remains an outlier for all of the four methods. It might be due to that his diagnosis includes many “mixture” patterns, for example, “Yin” mixed with “Yang,” or “Cold” mixed with “Hot,” and so forth. Rater C is a senior TCM physician in the department of TCM of CCH and has a very long experience of research. Moreover, raters A and H are not only TCM practitioners in CCH, but also participate actively in advanced TCM studies for many years. From these analyses, other than agreement, we can distinguish the raters by clusters. As we mentioned in the Introduction section, the conventional single agreement is quite restricted in terms of successfully interpreting the meaning hidden underneath. It cannot judge whether a given “moderate” agreement coefficient is sufficient to quantify the reliability of TCM diagnostics or not. If there are clusters present in the raters in a latent manner, MDS can prove itself as an effective distinguisher.


A. IAMA Responses Proposed by Kupper and Hafner

Consider a study in which two equally trained raters, say raters A and B, independently examine each of 𝑁 units. Let 𝐴𝑖 denote the subset of attribute for the 𝑖th unit chosen by rater A, and let card(𝐴𝑖)=𝑎𝑖, 0𝑎𝑖𝑘, denote the cardinality of set 𝐴𝑖. The symbol 𝐴 stands for the complement of set 𝐴. We may depict the data for the 𝑖th unit as follows.

Define the random variable 𝑌𝑖𝐴=card𝑖𝐵𝑖+card𝐴𝑖𝐵𝑖=𝑘𝑎𝑖𝑏𝑖+2𝑋𝑖(A.1) to be the number of attributes for the 𝑖th unit either chosen by both raters or not chosen by either rater. Define the following agreement proportion: 𝜋𝑖=𝑌𝑖𝑘,(A.2) the overall concordance 1𝜋=𝑁𝑁𝑖=1𝜋𝑖,(A.3) and the chance-corrected concordance 𝜋𝐴𝐵=𝜋𝜋01𝜋0,where𝜋0=1𝑁𝑘𝑁𝑖=1𝑎min𝑖,𝑏𝑖.(A.4)

B. MCD and IOA Differences

The mean character difference (MCD) and index of association (IOA) are popular distances used in MDS analysis. Let 𝑥=(𝑥1,,𝑥𝑛) and 𝑦=(𝑦1,,𝑦𝑛) be two vectors of attributes. The MCD distance is defined as 1𝑑(𝑥,𝑦)=𝑛𝑛𝑖=1||𝑥𝑖𝑦𝑖||,(B.1) and the IOA distance is defined by 1𝑑(𝑥,𝑦)=2𝑛𝑖=1||||𝑥𝑖𝑛𝑗=1𝑥𝑗𝑦𝑖𝑛𝑗=1𝑦𝑗||||.(B.2)

Conflict of Interests

No competing financial interests exist.


The authors are grateful to the anonymous reviewers for their valuable suggestions.


  1. A. Goodman Leo and H. Kruskal William, “Measures of association for cross classifications,” Journal of the American Statistical Association, vol. 49, pp. 732–764, 1954.
  2. J. Cohen, “A coefficient of agreement for nominal scales,” Educational and Psychological Measurement, vol. 20, no. 1, pp. 37–46, 1960.
  3. J. Cohen, “Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit,” Psychological Bulletin, vol. 70, no. 4, pp. 213–220, 1968. View at Publisher · View at Google Scholar · View at Scopus
  4. J. L. Fleiss and C. Jacob, “The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability,” Educational and Psychological Measurement, vol. 33, pp. 613–619, 1973.
  5. J. L. Fleiss, “Measuring nominal scale agreement among many raters,” Psychological Bulletin, vol. 76, no. 5, pp. 378–382, 1971. View at Publisher · View at Google Scholar · View at Scopus
  6. K. Krippendorff, “Estimating the reliability, systematic error, and random error of interval data,” Educational and Psychological Measurement, vol. 30, no. 1, pp. 61–70, 1970.
  7. K. Krippendorff, “Quantitative guidelines for communicable disease control programs,” Biometrics, vol. 34, no. 1, p. 142, 1978. View at Scopus
  8. L. C. Lo, T. L. Cheng, Y. C. Huang, Y. L. Chen, and J. T. Wang, “Analysis of agreement on traditional Chinese medical diagnostics for many practitioners,” Evidence-Based complementary and Alternative Medicine, vol. 2012, Article ID 17801, 5 pages, 2012. View at Publisher · View at Google Scholar
  9. L. L. Kupper and K. B. Hafner, “On assessing interrater agreement for multiple attribute responses,” Biometrics, vol. 45, no. 3, pp. 957–967, 1989. View at Scopus
  10. B. S. Everitt, S. Landau, M. Leese, and D. Stahl, Cluster Analysis, Wiley Series in Probability and Statistics, John Wiley & Sons, LTD, Chichester, UK, 2011.
  11. J. B. Kruskal, “Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis,” Psychometrika, vol. 29, no. 1, pp. 1–27, 1964. View at Publisher · View at Google Scholar · View at Scopus
  12. J. B. Kruskal, “Nonmetric multidimensional scaling: a numerical method,” Psychometrika, vol. 29, no. 2, pp. 115–129, 1964. View at Publisher · View at Google Scholar · View at Scopus