Although it was definitively rejected as an appropriate measure of the IRR (Cohen, 1960; Krippendorff, 1980), many researchers continue to report on the percentage agreed by coders in their ratings as a coder agreement index. For classified data, this can be expressed as the number of agreements in observations divided by the total number of observations. In the case of ordinal data, intervals or reports, where close but not perfect agreement may be acceptable, the percentages of agreement are sometimes expressed as a percentage of evaluations that coincide over a certain interval. Perhaps the biggest criticism of the percentages of the agreement is that they are not correct for agreements that would be expected by chance and therefore overestimate the level of the agreement. For example, if coders rated 50% of subjects as “depressive” at random and 50% as “non-depressive,” regardless of the actual characteristics of the subject, the expected percentage of the agreement would be 50%, while all overlapping evaluations would be random. If coders randomly rated 10% of subjects as depression and 90% non-depressive, the expected percentage of the agreement would be 82%, although this apparently high level of correspondence is always due to chance alone. Pasisz, D. J., and Hurtz, G.M (2009). Test the group differences in the interrate agreement more within the group. Organ. Res. Methods 12, 590-613.

doi: 10.1177/1094428108319128 Kozlowski, S. W. J., and Hattrup, K. (1992). Disagreement on the intergroup agreement: unravelling issues of coherence with consensus. J. Appl. Psychol. 77, 161-167. doi: 10.1037/0021-9010.77.2.161 Another approach to agreement (useful when there are only two spleens and the scale is continuous) is the calculation of the differences between each pair of observations of the two evaluators. The average of these differences is called Bias and the reference interval (average ± 1.96 × standard deviation) is called the compliance limit. The limitations of the agreement provide an overview of how random variations can influence evaluations.

Cohen, A., Doveh, E., and Eick, U. (2001). Statistical characteristics of the agreement index rwg(j) Psychol. Methods 6, 297-310. doi: 10.1037/1082-989X.6.3.297, P (a) indicating the percentage of the observed agreement and P (e) the probability of an agreement expected by chance. To illustrate the derivative of P (a) and P (e), Table 2 contains hypothetical data from two coders, who answer one of the two answer options for each topic (for example. B, the presence or absence of depression). For the data in Table 2, P (a) corresponds to the percentage of compliance observed, expressed by the sum of the values of the diagonal divided by the total number of subjects (42-37) /100 – 0.79.