Kappa-Statistic Inter Rater Agreement

Calculation details are also given in Altman, 1991 (pp. 406-407). The standard error and 95% confidence interval are calculated according to Fleiss et al., 2003. Kappa is also used to compare machine learning performance, but the directed version, known as Informationdness or Youdens J, is considered more suitable for supervised learning. [20] So far, the discussion has held that the majority was correct, that the minority reviewers were wrong in their scores, and that all reviewers made a deliberate choice of rating. Jacob Cohen understood that this hypothesis could be wrong. Indeed, he explicitly stated that “in the typical situation, there is no criterion of `accuracy` of judgments” (5). .