Table of Contents Author Guidelines Submit a Manuscript
Journal of Mathematics
Volume 2014, Article ID 231909, 9 pages
http://dx.doi.org/10.1155/2014/231909
Research Article

Power Weighted Versions of Bennett, Alpert, and Goldstein’s

Institute of Psychology, Unit Methodology and Statistics, Leiden University, P.O. Box 9555, 2300 RB Leiden, The Netherlands

Received 30 May 2014; Accepted 7 September 2014; Published 3 December 2014

Academic Editor: Yonghui Sun

Copyright © 2014 Matthijs J. Warrens. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. M. A. Tanner and M. A. Young, “Modeling ordinal scale disagreement,” Psychological Bulletin, vol. 98, no. 2, pp. 408–415, 1985. View at Publisher · View at Google Scholar · View at Scopus
  2. A. Agresti, “A model for agreement between ratings on an ordinal scale,” Biometrics, vol. 44, no. 2, pp. 539–548, 1988. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus
  3. A. Agresti, Analysis of Ordinal Categorical Data, Wiley, Hoboken, NJ, USA, 2nd edition, 2010. View at MathSciNet
  4. M. P. Becker, “Using association models to analyse agreement data: two examples,” Statistics in Medicine, vol. 8, no. 10, pp. 1199–1207, 1989. View at Publisher · View at Google Scholar · View at Scopus
  5. P. Graham and R. Jackson, “The analysis of ordinal agreement data: beyond weighted kappa,” Journal of Clinical Epidemiology, vol. 46, no. 9, pp. 1055–1062, 1993. View at Publisher · View at Google Scholar · View at Scopus
  6. J. Cohen, “Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit,” Psychological Bulletin, vol. 70, no. 4, pp. 213–220, 1968. View at Publisher · View at Google Scholar · View at Scopus
  7. M. Maclure and W. C. Willett, “Misinterpretation and misuse of the Kappa statistic,” The American Journal of Epidemiology, vol. 126, no. 2, pp. 161–169, 1987. View at Publisher · View at Google Scholar · View at Scopus
  8. J. Cohen, “A coefficient of agreement for nominal scales,” Educational and Psychological Measurement, vol. 20, pp. 37–46, 1960. View at Google Scholar
  9. M. J. Warrens, “Cohen's kappa can always be increased and decreased by combining categories,” Statistical Methodology, vol. 7, no. 6, pp. 673–677, 2010. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet · View at Scopus
  10. L. M. Hsu and R. Field, “Interrater agreement measures: comments on kappan, Cohen's kappa, Scott's π and Aickin's α,” Understanding Statistics, vol. 2, pp. 205–219, 2003. View at Google Scholar
  11. J. Sim and C. C. Wright, “The kappa statistic in reliability studies: use, interpretation, and sample size requirements,” Physical Therapy, vol. 85, no. 3, pp. 257–268, 2005. View at Google Scholar · View at Scopus
  12. R. L. Brennan and D. J. Prediger, “Coefficient kappa: some uses, misuses, and alternatives,” in Educational and Psychological Measurement, vol. 41, pp. 687–699, 1981. View at Google Scholar
  13. J. S. Uebersax, “Diversity of decision-making models and the measure ment of interrater agreement,” Psychological Bulletin, vol. 101, no. 1, pp. 140–146, 1987. View at Publisher · View at Google Scholar · View at Scopus
  14. A. R. Feinstein and D. V. Cicchetti, “High agreement but low kappa: I. the problems of two paradoxes,” Journal of Clinical Epidemiology, vol. 43, no. 6, pp. 543–549, 1990. View at Publisher · View at Google Scholar · View at Scopus
  15. C. A. Lantz and E. Nebenzahl, “Behavior and interpretation of the κ statistic: resolution of the two paradoxes,” Journal of Clinical Epidemiology, vol. 49, no. 4, pp. 431–434, 1996. View at Publisher · View at Google Scholar · View at Scopus
  16. J. de Mast and W. N. van Wieringen, “Measurement system analysis for categorical measurements: agreement and kappa-type indices,” Journal of Quality Technology, vol. 39, pp. 191–202, 2007. View at Google Scholar
  17. J. de Mast, “Agreement and kappa-type indices,” The American Statistician, vol. 61, no. 2, pp. 148–153, 2007. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  18. W. D. Thompson and S. D. Walter, “A reappraisal of the kappa coefficient,” Journal of Clinical Epidemiology, vol. 41, no. 10, pp. 949–958, 1988. View at Publisher · View at Google Scholar · View at Scopus
  19. W. Vach, “The dependence of Cohen's kappa on the prevalence does not matter,” Journal of Clinical Epidemiology, vol. 58, no. 7, pp. 655–661, 2005. View at Publisher · View at Google Scholar · View at Scopus
  20. A. von Eye and M. von Eye, “On the marginal dependency of Cohen's κ,” European Psychologist, vol. 13, no. 4, pp. 305–315, 2008. View at Publisher · View at Google Scholar · View at Scopus
  21. H. Brenner and U. Kliebsch, “Dependence of weighted kappa coefficients on the number of categories,” Epidemiology, vol. 7, no. 2, pp. 199–202, 1996. View at Publisher · View at Google Scholar · View at Scopus
  22. M. J. Warrens, “Some paradoxical results for the quadratically weighted kappa,” Psychometrika, vol. 77, no. 2, pp. 315–323, 2012. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet · View at Scopus
  23. E. M. Bennett, R. Alpert, and A. C. Goldstein, “Communications through limited-response questioning,” Public Opinion Quarterly, vol. 18, no. 3, pp. 303–308, 1954. View at Publisher · View at Google Scholar · View at Scopus
  24. U. N. Umesh, R. A. Peterson, and M. T. Sauber, “Interjudge agreement and the maximum value of kappa,” Educational and Psychological Measurement, vol. 49, pp. 835–850, 1989. View at Google Scholar
  25. G. J. Meyer, “Assessing reliability: critical corrections for a critical examination of the Rorschach comprehensive system,” Psychological Assessment, vol. 9, no. 4, pp. 480–489, 1997. View at Publisher · View at Google Scholar · View at Scopus
  26. M. J. Warrens, “The effect of combining categories on Bennett, Alpert and Goldstein's S,” Statistical Methodology, vol. 9, no. 3, pp. 341–352, 2012. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  27. J. J. Randolph, “Free-marginal multirater kappa (multirater κ free): an alternative to Fleiss' fixed-marginal multirater kappa,” in Proceedings of the Joensuu Learning and Instruction Symposium, Joensuu, Finland, 2005.
  28. S. Janson and J. Vegelius, “On generalizations of the G index and the Phi coefficient to nominal scales,” Multivariate Behavioral Research, vol. 14, no. 2, pp. 255–269, 1979. View at Publisher · View at Google Scholar
  29. C. L. Janes, “An extension of the random error coefficient of agreement to n×n tables,” The British Journal of Psychiatry, vol. 134, no. 6, pp. 617–619, 1979. View at Publisher · View at Google Scholar · View at Scopus
  30. J. W. Holley and J. P. Guilford, “A note on the G index of agreement,” Educational and Psychological Measurement, vol. 24, no. 4, pp. 749–753, 1964. View at Publisher · View at Google Scholar
  31. A. E. Maxwell, “Coefficients of agreement between observers and their interpretation,” British Journal of Psychiatry, vol. 116, pp. 651–655, 1977. View at Google Scholar
  32. K. Krippendorff, “Association, agreement, and equity,” Quality and Quantity, vol. 21, no. 2, pp. 109–123, 1987. View at Publisher · View at Google Scholar · View at Scopus
  33. K. L. Gwet, Handbook of Inter-Rater Reliability, Advanced Analytics, Gaithersburg, Md, USA, 2012.
  34. D. Cicchetti and T. Allison, “A new procedure for assessing reliability of scoring EEG sleep recordings,” The American Journal of EEG Technology, vol. 11, pp. 101–110, 1971. View at Google Scholar
  35. S. Vanbelle and A. Albert, “A note on the linearly weighted kappa coefficient for ordinal scales,” Statistical Methodology, vol. 6, no. 2, pp. 157–163, 2009. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet · View at Scopus
  36. M. J. Warrens, “Cohen's linearly weighted kappa is a weighted average,” Advances in Data Analysis and Classification, vol. 6, no. 1, pp. 67–79, 2012. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  37. J. L. Fleiss and J. Cohen, “The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability,” Educational and Psychological Measurement, vol. 33, pp. 613–619, 1973. View at Google Scholar
  38. C. Schuster, “A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales,” Educational and Psychological Measurement, vol. 64, no. 2, pp. 243–253, 2004. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  39. A. Agresti, Categorical Data Analysis, John Wiley & Sons, 1990. View at MathSciNet
  40. Y. M. M. Bishop, S. E. Fienberg, and P. W. Holland, Discrete Multi-Variate Analysis: Theory and Practice, MIT Press, Cambridge, Mass, USA, 1975. View at MathSciNet
  41. N. D. Holmquist, C. A. McMahan, and E. O. Williams, “Variability in classification of carcinoma in situ of the uterine cervix,” Obstetrical & Gynecological Survey, vol. 23, pp. 580–585, 1967. View at Google Scholar
  42. J. R. Landis and G. G. Koch, “An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers,” Biometrics, vol. 33, pp. 363–374, 1977. View at Google Scholar · View at MathSciNet
  43. A. F. Beardon, “Sums of powers of integers,” The American Mathematical Monthly, vol. 103, no. 3, pp. 201–213, 1996. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  44. J. R. Landis and G. G. Koch, “The measurement of observer agreement for categorical data,” Biometrics, vol. 33, pp. 159–174, 1977. View at Google Scholar
  45. D. V. Cicchetti and S. S. Sparrow, “Developing criteria for establishing interrater reliability of specific items: applications to assessment of adaptive behavior,” American Journal of Mental Deficiency, vol. 86, no. 2, pp. 127–137, 1981. View at Google Scholar · View at Scopus
  46. P. E. Crewson, “Reader agreement studies,” American Journal of Roentgenology, vol. 184, no. 5, pp. 1391–1397, 2005. View at Publisher · View at Google Scholar · View at Scopus
  47. J. L. Fleiss, B. Levin, and M. C. Paik, Statistical Methods for Rates and Proportions, Wiley-Interscience, New York, NY, USA, 3rd edition, 2003.
  48. M. J. Warrens, “Conditional inequalities between Cohen's kappa and weighted kappas,” Statistical Methodology, vol. 10, pp. 14–22, 2013. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  49. C. S. Martin, N. K. Pollock, O. G. Bukstein, and K. G. Lynch, “Inter-rater reliability of the SCID alcohol and substance use disorders section among adolescents,” Drug and Alcohol Dependence, vol. 59, no. 2, pp. 173–176, 2000. View at Publisher · View at Google Scholar · View at Scopus
  50. J. S. Simonoff, Analyzing Categorical Data, Springer, New York, NY, USA, 2003.
  51. S. I. Anderson, A. M. Housley, P. A. Jones, J. Slattery, and J. D. Miller, “Glasgow outcome scale: an inter-rater reliability study,” Brain Injury, vol. 7, no. 4, pp. 309–317, 1993. View at Publisher · View at Google Scholar · View at Scopus
  52. D. J. Hand, F. Daly, A. D. Lunn, K. J. McConway, and E. Ostrowski, A Handbook of Small Data Sets, Chapman & Hall, London, UK, 1994.
  53. M. Némethy, L. Paroli, P. G. Williams-Russo, and T. J. J. Blanck, “Assessing sedation with regional anesthesia: inter-rater agreement on a modified Wilson sedation scale,” Anesthesia and Analgesia, vol. 94, no. 3, pp. 723–728, 2002. View at Publisher · View at Google Scholar · View at Scopus
  54. J. M. Seddon, C. R. Sahagian, R. J. Glynn et al., “Evaluation of an iris color classification system,” Investigative Ophthalmology and Visual Science, vol. 31, no. 8, pp. 1592–1598, 1990. View at Google Scholar · View at Scopus
  55. R. W. Bohannon and M. B. Smith, “Interrater reliability of a modified Ashworth scale of muscle spasticity,” Physical Therapy, vol. 67, no. 2, pp. 206–207, 1987. View at Google Scholar · View at Scopus
  56. V. A. J. Maria and R. M. M. Victorino, “Development and validation of a clinical scale for the diagnosis of drug-induced hepatitis,” Hepatology, vol. 26, no. 3, pp. 664–669, 1997. View at Publisher · View at Google Scholar · View at Scopus