Research Article | Open Access
On Agreement Tables with Constant Kappa Values
Kappa coefficients are standard tools for summarizing the information in cross-classifications of two categorical variables with identical categories, here called agreement tables. When two categories are combined the kappa value usually either increases or decreases. There is a class of agreement tables for which the value of Cohen’s kappa remains constant when two categories are combined. It is shown that for this class of tables all special cases of symmetric kappa coincide and that the value of symmetric kappa is not affected by any partitioning of the categories.
In behavioral and biomedical science researchers are often interested in measuring the intensity of a behavior or a disease. Examples are psychologists that assess how anxious a speech-anxious subject appears while giving a talk, pathologists that rate the severity of lesions from scans, or competing diagnostic devices that classify the extent of a disease in patients into categories. These phenomena are typically classified using a categorical rating system, for example, with categories (A) slight, (B) moderate, and (C) extreme. Because ratings usually entail a certain degree of subjective judgment, researchers frequently want to assess the reliability of the categorical rating system that is used. One way to do this is to assign two observers to rate independently the same set of subjects. The reliability of the rating system can then be assessed by analyzing the agreement between the observers. High agreement between the ratings can be seen as a good indication of consensus in the diagnosis and interchangeability of the ratings of the observers.
Various statistical methodologies have been developed for analyzing agreement of a categorical rating system [1, 2]. For instance, loglinear models can be used for studying the patterns of agreement and sources of disagreement [3, 4]. However, in practice researchers often want to express the agreement between the raters in a single number. In this context, standard tools for summarizing agreement between observers are coefficients Cohen’s kappa in the case of nominal categories [5–7] and weighted kappa in the case of ordinal categories [8–11]. With ordinal categories one may expect more disagreement or confusion on adjacent categories than on categories that are further apart. Weighted kappa allows the user to specify weights to describe the closeness between categories . Both Cohen’s kappa and weighted kappa are corrected for agreement due to chance. The coefficients were originally proposed in the context of agreement studies, but nowadays they are used for summarizing all kinds of cross-classifications of two variables with the same categories [11, 12].
The number of categories used in various rating systems usually varies from the minimum number of two to five in many practical applications. It is sometimes desirable to combine some of the categories . For example, when two categories are easily confused, combining the categories usually improves the reliability of the rating system . By collapsing categories the number of categories of the rating system is reduced. If there is a lot of disagreement between two categories, we expect the kappa value to increase if we combine the categories. This is usually the case. However, Schouten  showed that there is a class of agreement tables for which the value of Cohen’s kappa remains constant when categories are merged. This is not what one expects from an agreement coefficient like Cohen’s kappa. The question, then, arises: do other (weighted) kappa coefficients exhibit the same property for these tables? If the answer is negative, it would make sense to replace Cohen’s kappa by a weighted kappa with more favorable properties with regard to these agreement tables.
In this paper we present several properties of kappa coefficients with symmetric weighting schemes with respect to this particular class of agreement tables. The paper is organized as follows. In the next section we introduce notation, define weighted kappa, and discuss some of its special cases, including Cohen’s kappa. The results are presented in Section 3. Section 4 contains a conclusion.
2. Kappa Coefficients
In this section we introduce notation and define the kappa coefficients. For notational convenience weighted kappa is here defined in terms of dissimilarity scaling . If the weights are dissimilarities, pairs of categories that are further apart are assigned higher weights.
Suppose two fixed observers independently rate the same set of subjects using the same set of categories that are defined in advance. For a population of subjects, let denote the proportion classified in category by the first observer and in category by the second observer, where , . The quantities are the marginal probabilities. They reflect how often the observers used the categories. The cell probabilities of the square table are not directly observed. Let denote the contingency table of observed frequencies. Assuming a multinominal sampling model with the total number of subjects fixed, the maximum likelihood estimate of is given by [14, 15]. Since the rows and columns of have the same labels, the contingency table is usually called an agreement table. Table 1 presents two hypothetical agreement tables with three categories A, B, and C.
Let for , be nonnegative real numbers with . The weighted kappa coefficient can be defined as [8, 12] The numerator of the fraction in (2) is the weighted observed disagreement, while the denominator of the fraction is the weighted chance-expected disagreement. The value of (2) is 1 when there is perfect agreement between the two observers, zero when the weighted observed disagreement is equal to the weighted chance-expected disagreement, and negative when the weighted observed disagreement is larger than the weighted chance-expected disagreement.
Under a multinominal sampling model with fixed, the maximum likelihood estimate of (2) is Estimate (3) is obtained by substituting for the cell probabilities in (2). A large sample standard error of (3) can be found in .
In this paper we are interested in the following special case of (2). We may require that weighted kappa has a symmetric weighting scheme; that is, for , . Since for , this symmetric kappa is given by Special cases of coefficient (4) that are used in practice are Cohen’s kappa [5, 7, 12] for nominal categories and linear kappa [10, 17] and quadratic kappa [9, 11, 18] for ordinal categories. Cohen’s kappa and quadratic kappa each have been used in thousands of applications [6, 11, 19]. The two coefficients are briefly discussed below.
The identity weights are defined as An example of weighting scheme (5) is presented in the left panel of Table 2. If we use weighting scheme (5) in (2), we obtain Cohen’s unweighted kappa  Perhaps a more familiar definition of Cohen’s kappa is Formulas (6) and (7) are equivalent; definition (6) will be used in Section 3 below. Coefficient (6) has value 1 when the observers agree completely, value zero when agreement is equal to that expected under independence, and negative value when agreement is less than expected by chance.
The quadratic weights are defined as for , . An example of the weights is presented in the right panel of Table 2. If we use the quadratic weights in (2), we obtain the quadratic kappa [9, 18] Coefficient (8) is the most popular version of weighted kappa in the case that the categories of the rating system are ordinal [2, 11, 19]. The quadratic kappa can be interpreted as an intraclass correlation, which is a proportion of variance [9, 18]. However, the quadratic kappa is not always sensitive to differences in exact agreement , and high values of the quadratic kappa can be found even when the level of exact agreement is low .
3. A Class of Agreement Tables
It is sometimes desirable to combine some of the categories . For example, when two categories are frequently confused, combining the categories may improve the reliability of the rating system. Suppose we combine two categories and , and let be a nonnegative real number. In this paper we focus on the class of agreement tables that satisfy the condition Condition (9) holds, for example, if there is perfect agreement between the raters. In this case and we have and for and , . It turns out that there are many nonperfect agreement tables that also satisfy (9). Examples are the agreement tables in Table 1. For the two tables, the value of is .397 and .644, respectively. The examples in Table 1 show that agreement tables that satisfy (9) are not necessarily symmetric. Furthermore, since the examples appear to be ordinary agreement tables that can be encountered in practice, it appears that the class of agreement tables satisfying (9) is not trivial.
For Cohen’s kappa in (6) Schouten  showed that if (9) holds, then the kappa value cannot be increased or decreased by combing categories. In this section we present various additional results for other special cases of symmetric kappa in (4). Theorem 1 shows that all special cases of symmetric kappa coincide if (9) holds.
Theorem 1. If (9) holds, then .
Lemma 2. Let and be real numbers. One has
Proof. Since and are positive numbers, we have or . Adding to both sides we obtain or .
Theorem 3. If all special cases of symmetric kappa are equal, then (9) holds.
Proof. Let with be arbitrary categories. Let denote the value of the special case of symmetric kappa with and all other off-diagonal weights equal to 1. Since all special cases of symmetric kappa are equal, we have in particular for some real number . Using (6), the identity is equivalent to Since , it follows from application of Lemma 2 to identity (14) and the use of identity (6) that
Note that in the proof of Theorem 3 certain special cases of coefficient (4) are used. Condition (9) will not necessarily hold if two arbitrary special cases of symmetric kappa are equal. We have the following consequences of Theorems 1 and 3.
Corollary 4. It holds that for and , .
Corollary 5. It holds that
Theorem 6. Let denote the value of symmetric kappa of an agreement table with categories and the value of the table that is obtained by combining categories and . If condition (9) holds, then one has .
Proof. Since (9) holds, it follows from Theorem 1 that for some . Let denote the category that is obtained by merging and . Let with and , be an arbitrary category. We have the four relationsFurthermore, since (9) holds, we have the identitiesApplying Lemma 2 to the identities in (18a) and (18b) we obtain Moreover, using (17a), (17b), (17c), (17d), and (19), we have It follows from identity (20) that condition (9) also holds for the collapsed table. Application of Theorem 1 then yields that , from which we may conclude that .
Theorem 6 shows that if the value of Cohen’s kappa in (6) remains constant when categories are combined, then the value of symmetric kappa in (4) also remains constant when categories are combined. By repeatedly applying Theorem 6 we obtain the following consequence.
Corollary 7. Let denote the value of symmetric kappa of an agreement table with categories and the value of the collapsed table corresponding to any partitioning of the categories. If (9) holds, then one has .
Kappa coefficients are standard tools for summarizing agreement between two observers on a categorical rating scale. The coefficients are nowadays used for summarizing the information in all types of cross-classifications of two variables with the same categories. In the case of nominal categories Cohen’s kappa is a standard tool. In this paper we considered a class of agreement tables for which the value of Cohen’s kappa remains constant when two categories are combined. It was shown that for this class of agreement tables all special cases of symmetric kappa, that is, all kappa coefficients with a symmetric weighting scheme, coincide (Theorem 1). Furthermore, for this class of agreement tables the value of symmetric kappa remains constant when categories are merged (Theorem 6 and Corollary 7).
Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.
The author thanks an anonymous reviewer for several helpful comments and valuable suggestions on a previous version of the paper. The comments have improved the presentation of the paper. This research is part of Veni project 451-11-026 funded by the Netherlands Organisation for Scientific Research.
- U. Jakobsson and A. Westergren, “Statistical methods for assessing agreement for ordinal data,” Scandinavian Journal of Caring Sciences, vol. 19, no. 4, pp. 427–431, 2005.
- M. Maclure and W. C. Willett, “Misinterpretation and misuse of the Kappa statistic,” The American Journal of Epidemiology, vol. 126, no. 2, pp. 161–169, 1987.
- A. Agresti, “Modelling patterns of agreement and disagreement,” Statistical Methods in Medical Research, vol. 1, no. 2, pp. 201–218, 1992.
- A. Agresti, Categorical Data Analysis, Wiley, Hoboken, NJ, USA, 2002.
- J. Cohen, “A coefficient of agreement for nominal scales,” Educational and Psychological Measurement, vol. 20, pp. 37–46, 1960.
- L. M. Hsu and R. Field, “Interrater agreement measures: comments on , Cohen's Kappa, Scott's , and Aickin's α,” Understanding Statistics, vol. 2, no. 3, pp. 205–219, 2003.
- M. J. Warrens, “Cohen's kappa can always be increased and decreased by combining categories,” Statistical Methodology, vol. 7, no. 6, pp. 673–677, 2010.
- J. Cohen, “Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit,” Psychological Bulletin, vol. 70, no. 4, pp. 213–220, 1968.
- J. L. Fleiss and J. Cohen, “The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability,” Educational and Psychological Measurement, vol. 33, pp. 613–619, 1973.
- S. Vanbelle and A. Albert, “A note on the linearly weighted kappa coefficient for ordinal scales,” Statistical Methodology, vol. 6, no. 2, pp. 157–163, 2009.
- M. J. Warrens, “Some paradoxical results for the quadratically weighted kappa,” Psychometrika, vol. 77, no. 2, pp. 315–323, 2012.
- M. J. Warrens, “Conditional inequalities between Cohen's kappa and weighted kappas,” Statistical Methodology, vol. 10, pp. 14–22, 2013.
- H. J. A. Schouten, “Nominal scale agreement among observers,” Psychometrika, vol. 51, no. 3, pp. 453–466, 1986.
- A. Agresti, Categorical Data Analysis, Wiley, New York, NY, USA, 1990.
- Y. M. M. Bishop, S. E. Fienberg, and P. W. Holland, Discrete Multivariate Analysis: Theory and Practice, The MIT Press, Cambridge, Mass, USA, 1975.
- J. L. Fleiss, J. Cohen, and B. S. Everitt, “Large sample standard errors of kappa and weighted kappa,” Psychological Bulletin, vol. 72, no. 5, pp. 323–327, 1969.
- D. Cicchetti and T. Allison, “A new procedure for assessing reliability of scoring EEG sleep recordings,” The American Journal of EEG Technology, vol. 11, pp. 101–109, 1971.
- C. Schuster, “A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales,” Educational and Psychological Measurement, vol. 64, no. 2, pp. 243–253, 2004.
- P. Graham and R. Jackson, “The analysis of ordinal agreement data: beyond weighted kappa,” Journal of Clinical Epidemiology, vol. 46, no. 9, pp. 1055–1062, 1993.
Copyright © 2014 Matthijs J. Warrens. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.