Knowledge

Inter-rater reliability

Source đź“ť

229: 246:
to ordinal (ordinal kappa or ICC—stretching assumptions) to interval (ICC, or ordinal kappa—treating the interval scale as ordinal), and ratio (ICCs). There also are variants that can look at agreement by raters across a set of items (e.g., do two interviewers agree about the depression scores for all of the items on the same semi-structured interview for one case?) as well as raters x cases (e.g., how well do two or more raters agree about whether 30 cases have a depression diagnosis, yes/no—a nominal variable).
383:(ICC). There are several types of this and one is defined as, "the proportion of variance of an observation due to between-subject variability in the true scores". The range of the ICC may be between 0.0 and 1.0 (an early definition of ICC could be between −1 and +1). The ICC will be high when there is little variation between the scores given to each item by the raters, e.g. if all raters give the same or similar scores to each of the items. The ICC is an improvement over Pearson's 36: 432: 250:
both agreement is good and the rate of the target condition is near 50% (because it includes the base rate in the calculation of joint probabilities). Several authorities have offered "rules of thumb" for interpreting the level of agreement, many of which agree in the gist even though the words are not identical.
725:
Measurement involving ambiguity in characteristics of interest in the rating target are generally improved with multiple trained raters. Such measurement tasks often involve subjective judgment of quality. Examples include ratings of physician 'bedside manner', evaluation of witness credibility by a
690:
is a versatile statistic that assesses the agreement achieved among observers who categorize, evaluate, or measure a given set of objects in terms of the values of a variable. It generalizes several specialized agreement coefficients by accepting any number of observers, being applicable to nominal,
320:
can be used to measure pairwise correlation among raters using a scale that is ordered. Pearson assumes the rating scale is continuous; Kendall and Spearman statistics assume only that it is ordinal. If more than two raters are observed, an average level of agreement for the group can be calculated
245:
Later extensions of the approach included versions that could handle "partial credit" and ordinal scales. These extensions converge with the family of intra-class correlations (ICCs), so there is a conceptually related way of estimating reliability for each level of measurement from nominal (kappa)
201:
or categorical rating system. It does not take into account the fact that agreement may happen solely based on chance. There is some question whether or not there is a need to 'correct' for chance agreement; some suggest that, in any case, any such adjustment should be based on an explicit model of
729:
Variation across raters in the measurement procedures and variability in interpretation of measurement results are two examples of sources of error variance in rating measurements. Clearly stated guidelines for rendering ratings are necessary for reliability in ambiguous or challenging measurement
249:
Kappa is similar to a correlation coefficient in that it cannot go above +1.0 or below -1.0. Because it is used as a measure of agreement, only positive values would be expected in most situations; negative values would indicate systematic disagreement. Kappa can only achieve very high values when
236:
Kappa is a way of measuring agreement or reliability, correcting for how often ratings might agree by chance. Cohen's kappa, which works for two raters, and Fleiss' kappa, an adaptation that works for any fixed number of raters, improve upon the joint probability in that they take into account the
205:
When the number of categories being used is small (e.g. 2 or 3), the likelihood for 2 raters to agree by pure chance increases dramatically. This is because both raters must confine themselves to the limited number of options available, which impacts the overall agreement rate, and not necessarily
209:
Therefore, the joint probability of agreement will remain high even in the absence of any "intrinsic" agreement among raters. A useful inter-rater reliability coefficient is expected (a) to be close to 0 when there is no "intrinsic" agreement and (b) to increase as the "intrinsic" agreement rate
721:
For any task in which multiple raters are useful, raters are expected to disagree about the observed target. By contrast, situations involving unambiguous measurement, such as simple counting tasks (e.g. number of potential customers entering a store), often do not require more than one person
463:
will be different from zero. If the raters tend to disagree, but without a consistent pattern of one rating higher than the other, the mean will be near zero. Confidence limits (usually 95%) can be calculated for both the bias and each of the limits of agreement.
1128:
Regier, Darrel A.; Narrow, William E.; Clarke, Diana E.; Kraemer, Helena C.; Kuramoto, S. Janet; Kuhl, Emily A.; Kupfer, David J. (2013). "DSM-5 Field Trials in the United States and Canada, Part II: Test-Retest Reliability of Selected Categorical Diagnoses".
643:
demonstrates not only the overall degree of agreement, but also whether the agreement is related to the underlying value of the item. For instance, two raters might agree closely in estimating the size of small items, but disagree about larger items.
655:
between the two methods (inter-rater agreement), but also to assess these characteristics for each method within itself. It might very well be that the agreement between two methods is poor simply because one of the methods has wide
638:
Bland and Altman have expanded on this idea by graphing the difference of each point, the mean difference, and the limits of agreement on the vertical against the average of the two ratings on the horizontal. The resulting
439:
Another approach to agreement (useful when there are only two raters and the scale is continuous) is to calculate the differences between each pair of the two raters' observations. The mean of these differences is termed
241:
in that they treat the data as nominal and assume the ratings have no natural ordering; if the data actually have a rank (ordinal level of measurement), then that information is not fully considered in the measurements.
118:
There are a number of statistics that can be used to determine inter-rater reliability. Different statistics are appropriate for different types of measurement. Some options are joint-probability of agreement, such as
633: 151:
There are several operational definitions of "inter-rater reliability," reflecting different viewpoints about what is a reliable agreement between raters. There are three operational definitions of agreement:
1500:
AgreeStat 360: cloud-based inter-rater reliability analysis, Cohen's kappa, Gwet's AC1/AC2, Krippendorff's alpha, Brennan-Prediger, Fleiss generalized kappa, intraclass correlation coefficients
467:
There are several formulae that can be used to calculate limits of agreement. The simple formula, which was given in the previous paragraph and works well for sample size greater than 60, is
459:
If the raters tend to agree, the differences between the raters' observations will be near zero. If one rater is usually higher or lower than the other by a consistent amount, the
506: 550: 1068:
Cicchetti, D. V.; Sparrow, S. A. (1981). "Developing criteria for establishing interrater reliability of specific items: applications to assessment of adaptive behavior".
210:
improves. Most chance-corrected agreement coefficients achieve the first objective. However, the second objective is not achieved by many known chance-corrected measures.
421: 363: 318: 1025:
Landis, J. Richard; Koch, Gary G. (1977). "An Application of Hierarchical Kappa-type Statistics in the Assessment of Majority Agreement among Multiple Observers".
401: 339: 290: 664:
would be superior from a statistical point of view, while practical or other considerations might change this appreciation. What constitutes narrow or wide
737:, that is, a tendency of rating values to drift towards what is expected by the rater. During processes involving repeated measurements, correction of 171:
Reliable raters are automatons, behaving like "rating machines". This category includes rating of essays by computer This behavior can be evaluated by
178:
Reliable raters behave like independent witnesses. They demonstrate their independence by disagreeing slightly. This behavior can be evaluated by the
197:
The joint-probability of agreement is the simplest and the least robust measure. It is estimated as the percentage of the time the raters agree in a
228: 1390: 1309: 269: 259: 868: 561: 297: 263: 17: 691:
ordinal, interval, and ratio levels of measurement, being able to handle missing data, and being corrected for small sample sizes.
53: 1262:
Bland, J. M., & Altman, D. (1986). Statistical methods for assessing agreement between two methods of clinical measurement.
1475: 1456: 1228: 293: 423:, as it takes into account the differences in ratings for individual segments, along with the correlation between raters. 380: 374: 132: 1553: 1371: 1285: 1104: 75: 1548: 755: 108:, and so on) is the degree of agreement among independent observers who rate, code, or assess the same phenomenon. 237:
amount of agreement that could be expected to occur through chance. The original versions had the same problem as
1558: 1529: 741:
can be addressed through periodic retraining to ensure that raters understand guidelines and measurement goals.
697:
emerged in content analysis where textual units are categorized by trained coders and is used in counseling and
206:
their propensity for "intrinsic" agreement (an agreement is considered "intrinsic" if it is not due to chance).
1517: 1323:
Hayes, A.F.; Krippendorff, K. (2007). "Answering the call for a standard reliability measure for coding data".
57: 473: 111:
Assessment tools that rely on ratings must exhibit good inter-rater reliability, otherwise they are not
1246:
Ludbrook, J. (2010). Confidence in Altman–Bland plots: a critical review of the method of differences.
807: 517: 876: 710: 1429: 681: 172: 140: 136: 46: 734: 706: 640: 406: 348: 303: 8: 750: 1509: 1421: 1340: 1303: 1201: 1042: 1007: 991: 921: 898: 850: 647:
When comparing two methods of measurement, it is not only of interest to estimate both
445: 386: 324: 275: 1391:"Computing inter-rater reliability and its variance in the presence of high agreement" 1481: 1471: 1452: 1413: 1377: 1367: 1291: 1281: 1224: 1193: 1154: 1146: 1110: 1100: 1077: 1050: 999: 1425: 1344: 1205: 1011: 971: 925: 854: 1405: 1332: 1185: 1142: 1138: 1034: 983: 952: 913: 842: 788: 223: 219: 128: 120: 1533: 1521: 831:"Diversity of decision-making models and the measurement of interrater agreement" 698: 555:
However, the most accurate formula (which is applicable for all sample sizes) is
1173: 940: 846: 830: 776: 1189: 917: 792: 456:
provide insight into how much random variation may be influencing the ratings.
1336: 1295: 1542: 1485: 1409: 1381: 1150: 1114: 702: 124: 112: 1449:
Assessing performance: Developing, scoring, and validating performance tasks
232:
Four sets of recommendations for interpreting level of inter-rater agreement
159:
Reliable raters agree with each other about the exact ratings to be awarded.
1417: 1197: 1158: 709:
where unstructured happenings are recorded for subsequent analysis, and in
198: 162:
Reliable raters agree about which performance is better and which is worse.
1504: 1081: 1054: 1003: 179: 713:
where texts are annotated for various syntactic and semantic qualities.
1046: 995: 777:"Rating the ratings: Assessing the psychometric quality of rating data" 701:
where experts code open-ended interview data into analyzable terms, in
60: in this section. Unsourced material may be challenged and removed. 956: 660:
while the other has narrow. In this case, the method with the narrow
628:{\displaystyle {\bar {x}}\pm t_{0.05,n-1}s{\sqrt {1+{\frac {1}{n}}}}} 1526: 1514: 1038: 987: 431: 35: 808:"The computer moves into essay grading: Updating the ancient test" 733:
Without scoring guidelines, ratings are increasingly affected by
156:
Reliable raters agree with the "official" rating of a performance.
1221:
Making sense of statistics in psychology: A second-level course
869:"Correcting Inter-Rater Reliability for Chance Agreement: Why?" 705:
where individual attributes are tested by multiple methods, in
1510:
Inter-rater Reliability Calculator by Medical Education Online
1174:"Intraclass correlations: uses in assessing rater reliability" 444:
and the reference interval (mean Â± 1.96 Ă— 
1499: 972:"The Measurement of Observer Agreement for Categorical Data" 379:
Another way of performing reliability testing is to use the
166:
These combine with two operational definitions of behavior:
511:
For smaller sample sizes, another common simplification is
27:
Measure of consensus in ratings given by multiple observers
1398:
British Journal of Mathematical and Statistical Psychology
1278:
Content analysis : an introduction to its methodology
1248:
Clinical and Experimental Pharmacology and Physiology, 37
1127: 1505:
Statistical Methods for Rater Agreement by John Uebersax
368: 1097:
Statistical methods for rates and proportions. 2nd ed
941:"Measuring nominal scale agreement among many raters" 564: 520: 476: 409: 389: 351: 327: 306: 278: 672:
is a matter of a practical assessment in each case.
1468:
Measures of Interobserver Agreement and Reliability
238: 1446: 1366:(4th ed.). Gaithersburg: Advanced Analytics. 627: 544: 500: 415: 395: 357: 333: 312: 284: 1322: 192: 1540: 1067: 899:"A coefficient of agreement for nominal scales" 774: 202:how chance and error affect raters' decisions. 92:(also called by various similar names, such as 775:Saal, F.E.; Downey, R.G.; Lahey, M.A. (1980). 260:Pearson product-moment correlation coefficient 1094: 1275: 1171: 805: 1527:Online calculator for Inter-Rater Agreement 1447:Johnson, R.; Penny, J.; Gordon, B. (2009). 1242: 1240: 726:jury, and presentation skill of a speaker. 253: 1308:: CS1 maint: location missing publisher ( 1024: 970:Landis, J. Richard; Koch, Gary G. (1977). 969: 365:values from each possible pair of raters. 906:Educational and Psychological Measurement 76:Learn how and when to remove this message 1258: 1256: 1237: 828: 430: 227: 1465: 1218: 675: 264:Spearman's rank correlation coefficient 14: 1541: 938: 426: 1253: 1070:American Journal of Mental Deficiency 896: 1515:Online (Multirater) Kappa Calculator 1388: 1361: 58:adding citations to reliable sources 29: 1364:Handbook of Inter-Rater Reliability 1172:Shrout, P.E.; Fleiss, J.L. (1979). 806:Page, E.B.; Petersen, N.S. (1995). 501:{\displaystyle {\bar {x}}\pm 1.96s} 381:intra-class correlation coefficient 375:Intra-class correlation coefficient 369:Intra-class correlation coefficient 213: 133:concordance correlation coefficient 24: 1355: 1325:Communication Methods and Measures 25: 1570: 1493: 756:Rating (pharmaceutical industry) 545:{\displaystyle {\bar {x}}\pm 2s} 34: 1316: 1269: 1212: 1165: 1121: 1088: 716: 45:needs additional citations for 1143:10.1176/appi.ajp.2012.12070999 1131:American Journal of Psychiatry 1061: 1018: 963: 932: 890: 861: 822: 799: 768: 571: 527: 483: 193:Joint probability of agreement 131:; or inter-rater correlation, 13: 1: 1280:(4th ed.). Los Angeles. 761: 187: 1276:Krippendorff, Klaus (2018). 1095:Fleiss, J. L. (1981-04-21). 722:performing the measurement. 7: 1470:(2nd ed.). CRC Press. 1223:. Oxford University Press. 847:10.1037/0033-2909.101.1.140 744: 10: 1575: 1190:10.1037/0033-2909.86.2.420 918:10.1177/001316446002000104 793:10.1037/0033-2909.88.2.413 679: 372: 257: 217: 146: 102:inter-observer reliability 18:Inter-observer variability 1554:Comparison of assessments 1337:10.1080/19312450709336664 711:computational linguistics 1410:10.1348/000711006X126600 254:Correlation coefficients 1549:Inter-rater reliability 1362:Gwet, Kilem L. (2014). 829:Uebersax, J.S. (1987). 173:generalizability theory 137:intra-class correlation 106:inter-coder reliability 98:inter-rater concordance 90:inter-rater reliability 1559:Statistical data types 1466:Shoukri, M.M. (2010). 1219:Everitt, B.S. (1996). 1178:Psychological Bulletin 945:Psychological Bulletin 835:Psychological Bulletin 781:Psychological Bulletin 629: 546: 502: 436: 417: 397: 359: 335: 314: 286: 233: 939:Fleiss, J.L. (1971). 707:observational studies 630: 547: 503: 434: 418: 416:{\displaystyle \rho } 398: 360: 358:{\displaystyle \rho } 336: 315: 313:{\displaystyle \rho } 287: 239:the joint-probability 231: 94:inter-rater agreement 682:Krippendorff's alpha 676:Krippendorff's alpha 562: 518: 474: 407: 387: 349: 325: 304: 276: 141:Krippendorff's alpha 54:improve this article 1389:Gwet, K.L. (2008). 735:experimenter's bias 666:limits of agreement 662:limits of agreement 658:limits of agreement 653:limits of agreement 454:limits of agreement 450:limits of agreement 427:Limits of agreement 321:as the mean of the 1532:2016-04-10 at the 1520:2009-02-28 at the 897:Cohen, J. (1960). 668:or large or small 625: 542: 498: 446:standard deviation 437: 413: 393: 355: 331: 310: 282: 234: 1477:978-1-4398-1080-4 1458:978-1-59385-988-6 1230:978-0-19-852365-9 873:www.agreestat.com 641:Bland–Altman plot 623: 621: 574: 530: 486: 435:Bland–Altman plot 396:{\displaystyle r} 334:{\displaystyle r} 285:{\displaystyle r} 86: 85: 78: 16:(Redirected from 1566: 1489: 1462: 1443: 1441: 1440: 1434: 1428:. Archived from 1395: 1385: 1349: 1348: 1320: 1314: 1313: 1307: 1299: 1273: 1267: 1266:(8476), 307-310. 1260: 1251: 1244: 1235: 1234: 1216: 1210: 1209: 1169: 1163: 1162: 1125: 1119: 1118: 1092: 1086: 1085: 1065: 1059: 1058: 1022: 1016: 1015: 967: 961: 960: 957:10.1037/h0031619 936: 930: 929: 903: 894: 888: 887: 885: 884: 875:. Archived from 865: 859: 858: 826: 820: 819: 812:Phi Delta Kappan 803: 797: 796: 772: 751:Cronbach's alpha 634: 632: 631: 626: 624: 622: 614: 606: 601: 600: 576: 575: 567: 551: 549: 548: 543: 532: 531: 523: 507: 505: 504: 499: 488: 487: 479: 422: 420: 419: 414: 402: 400: 399: 394: 364: 362: 361: 356: 340: 338: 337: 332: 319: 317: 316: 311: 291: 289: 288: 283: 214:Kappa statistics 81: 74: 70: 67: 61: 38: 30: 21: 1574: 1573: 1569: 1568: 1567: 1565: 1564: 1563: 1539: 1538: 1534:Wayback Machine 1522:Wayback Machine 1496: 1478: 1459: 1438: 1436: 1432: 1404:(Pt 1): 29–48. 1393: 1374: 1358: 1356:Further reading 1353: 1352: 1321: 1317: 1301: 1300: 1288: 1274: 1270: 1264:The Lancet, 327 1261: 1254: 1245: 1238: 1231: 1217: 1213: 1170: 1166: 1126: 1122: 1107: 1093: 1089: 1066: 1062: 1039:10.2307/2529786 1023: 1019: 988:10.2307/2529310 968: 964: 937: 933: 901: 895: 891: 882: 880: 867: 866: 862: 827: 823: 804: 800: 773: 769: 764: 747: 719: 699:survey research 686:Krippendorff's 684: 678: 613: 605: 584: 580: 566: 565: 563: 560: 559: 522: 521: 519: 516: 515: 478: 477: 475: 472: 471: 429: 408: 405: 404: 403:and Spearman's 388: 385: 384: 377: 371: 350: 347: 346: 326: 323: 322: 305: 302: 301: 277: 274: 273: 266: 258:Main articles: 256: 226: 218:Main articles: 216: 195: 190: 185: 149: 88:In statistics, 82: 71: 65: 62: 51: 39: 28: 23: 22: 15: 12: 11: 5: 1572: 1562: 1561: 1556: 1551: 1537: 1536: 1524: 1512: 1507: 1502: 1495: 1494:External links 1492: 1491: 1490: 1476: 1463: 1457: 1444: 1386: 1373:978-0970806284 1372: 1357: 1354: 1351: 1350: 1315: 1286: 1268: 1252: 1236: 1229: 1211: 1184:(2): 420–428. 1164: 1120: 1105: 1087: 1076:(2): 127–137. 1060: 1017: 962: 951:(5): 378–382. 931: 889: 860: 841:(1): 140–146. 821: 798: 766: 765: 763: 760: 759: 758: 753: 746: 743: 718: 715: 680:Main article: 677: 674: 636: 635: 620: 617: 612: 609: 604: 599: 596: 593: 590: 587: 583: 579: 573: 570: 553: 552: 541: 538: 535: 529: 526: 509: 508: 497: 494: 491: 485: 482: 428: 425: 412: 392: 373:Main article: 370: 367: 354: 330: 309: 281: 255: 252: 215: 212: 194: 191: 189: 186: 184: 183: 176: 168: 164: 163: 160: 157: 148: 145: 84: 83: 42: 40: 33: 26: 9: 6: 4: 3: 2: 1571: 1560: 1557: 1555: 1552: 1550: 1547: 1546: 1544: 1535: 1531: 1528: 1525: 1523: 1519: 1516: 1513: 1511: 1508: 1506: 1503: 1501: 1498: 1497: 1487: 1483: 1479: 1473: 1469: 1464: 1460: 1454: 1450: 1445: 1435:on 2016-03-03 1431: 1427: 1423: 1419: 1415: 1411: 1407: 1403: 1399: 1392: 1387: 1383: 1379: 1375: 1369: 1365: 1360: 1359: 1346: 1342: 1338: 1334: 1330: 1326: 1319: 1311: 1305: 1297: 1293: 1289: 1287:9781506395661 1283: 1279: 1272: 1265: 1259: 1257: 1250:(2), 143-149. 1249: 1243: 1241: 1232: 1226: 1222: 1215: 1207: 1203: 1199: 1195: 1191: 1187: 1183: 1179: 1175: 1168: 1160: 1156: 1152: 1148: 1144: 1140: 1136: 1132: 1124: 1116: 1112: 1108: 1106:0-471-06428-9 1102: 1098: 1091: 1083: 1079: 1075: 1071: 1064: 1056: 1052: 1048: 1044: 1040: 1036: 1033:(2): 363–74. 1032: 1028: 1021: 1013: 1009: 1005: 1001: 997: 993: 989: 985: 982:(1): 159–74. 981: 977: 973: 966: 958: 954: 950: 946: 942: 935: 927: 923: 919: 915: 911: 907: 900: 893: 879:on 2018-04-02 878: 874: 870: 864: 856: 852: 848: 844: 840: 836: 832: 825: 817: 813: 809: 802: 794: 790: 786: 782: 778: 771: 767: 757: 754: 752: 749: 748: 742: 740: 736: 731: 727: 723: 714: 712: 708: 704: 703:psychometrics 700: 696: 692: 689: 683: 673: 671: 667: 663: 659: 654: 650: 645: 642: 618: 615: 610: 607: 602: 597: 594: 591: 588: 585: 581: 577: 568: 558: 557: 556: 539: 536: 533: 524: 514: 513: 512: 495: 492: 489: 480: 470: 469: 468: 465: 462: 457: 455: 451: 447: 443: 433: 424: 410: 390: 382: 376: 366: 352: 344: 328: 307: 299: 295: 279: 271: 265: 261: 251: 247: 243: 240: 230: 225: 224:Fleiss' kappa 221: 220:Cohen's kappa 211: 207: 203: 200: 181: 177: 174: 170: 169: 167: 161: 158: 155: 154: 153: 144: 142: 138: 134: 130: 129:Fleiss' kappa 126: 122: 121:Cohen's kappa 116: 114: 109: 107: 103: 99: 95: 91: 80: 77: 69: 66:December 2018 59: 55: 49: 48: 43:This section 41: 37: 32: 31: 19: 1467: 1451:. Guilford. 1448: 1437:. Retrieved 1430:the original 1401: 1397: 1363: 1331:(1): 77–89. 1328: 1324: 1318: 1277: 1271: 1263: 1247: 1220: 1214: 1181: 1177: 1167: 1137:(1): 59–70. 1134: 1130: 1123: 1096: 1090: 1073: 1069: 1063: 1030: 1026: 1020: 979: 975: 965: 948: 944: 934: 912:(1): 37–46. 909: 905: 892: 881:. Retrieved 877:the original 872: 863: 838: 834: 824: 815: 811: 801: 784: 780: 770: 738: 732: 728: 724: 720: 717:Disagreement 694: 693: 687: 685: 669: 665: 661: 657: 652: 648: 646: 637: 554: 510: 466: 460: 458: 453: 449: 448:) is termed 441: 438: 378: 342: 267: 248: 244: 235: 208: 204: 196: 165: 150: 117: 110: 105: 101: 97: 93: 89: 87: 72: 63: 52:Please help 47:verification 44: 739:rater drift 730:scenarios. 294:Kendall's Ď„ 180:Rasch model 113:valid tests 1543:Categories 1439:2010-06-16 1296:1019840156 1027:Biometrics 976:Biometrics 883:2018-12-26 787:(2): 413. 762:References 188:Statistics 125:Scott's pi 1486:815928115 1382:891732741 1304:cite book 1151:0002-953X 1115:926949980 1099:. Wiley. 818:(7): 561. 595:− 578:± 572:¯ 534:± 528:¯ 490:± 484:¯ 411:ρ 353:ρ 308:ρ 1530:Archived 1518:Archived 1426:13915043 1418:18482474 1345:15408575 1206:13168820 1198:18839484 1159:23111466 1012:11077516 926:15926286 855:39240770 745:See also 298:Spearman 1082:7315877 1047:2529786 996:2529310 270:Pearson 268:Either 199:nominal 147:Concept 1484:  1474:  1455:  1424:  1416:  1380:  1370:  1343:  1294:  1284:  1227:  1204:  1196:  1157:  1149:  1113:  1103:  1080:  1055:884196 1053:  1045:  1010:  1004:843571 1002:  994:  924:  853:  452:. The 139:, and 1433:(PDF) 1422:S2CID 1394:(PDF) 1341:S2CID 1202:S2CID 1043:JSTOR 1008:S2CID 992:JSTOR 922:S2CID 902:(PDF) 851:S2CID 695:Alpha 688:alpha 345:, or 296:, or 1482:OCLC 1472:ISBN 1453:ISBN 1414:PMID 1378:OCLC 1368:ISBN 1310:link 1292:OCLC 1282:ISBN 1225:ISBN 1194:PMID 1155:PMID 1147:ISSN 1111:OCLC 1101:ISBN 1078:PMID 1051:PMID 1000:PMID 670:bias 651:and 649:bias 586:0.05 493:1.96 461:bias 442:bias 262:and 222:and 127:and 1406:doi 1333:doi 1186:doi 1139:doi 1135:170 1035:doi 984:doi 953:doi 914:doi 843:doi 839:101 789:doi 300:'s 292:, 272:'s 56:by 1545:: 1480:. 1420:. 1412:. 1402:61 1400:. 1396:. 1376:. 1339:. 1327:. 1306:}} 1302:{{ 1290:. 1255:^ 1239:^ 1200:. 1192:. 1182:86 1180:. 1176:. 1153:. 1145:. 1133:. 1109:. 1074:86 1072:. 1049:. 1041:. 1031:33 1029:. 1006:. 998:. 990:. 980:33 978:. 974:. 949:76 947:. 943:. 920:. 910:20 908:. 904:. 871:. 849:. 837:. 833:. 816:76 814:. 810:. 785:88 783:. 779:. 341:, 143:. 135:, 123:, 115:. 104:, 100:, 96:, 1488:. 1461:. 1442:. 1408:: 1384:. 1347:. 1335:: 1329:1 1312:) 1298:. 1233:. 1208:. 1188:: 1161:. 1141:: 1117:. 1084:. 1057:. 1037:: 1014:. 986:: 959:. 955:: 928:. 916:: 886:. 857:. 845:: 795:. 791:: 619:n 616:1 611:+ 608:1 603:s 598:1 592:n 589:, 582:t 569:x 540:s 537:2 525:x 496:s 481:x 391:r 343:Ď„ 329:r 280:r 182:. 175:. 79:) 73:( 68:) 64:( 50:. 20:)

Index

Inter-observer variability

verification
improve this article
adding citations to reliable sources
Learn how and when to remove this message
valid tests
Cohen's kappa
Scott's pi
Fleiss' kappa
concordance correlation coefficient
intra-class correlation
Krippendorff's alpha
generalizability theory
Rasch model
nominal
Cohen's kappa
Fleiss' kappa

the joint-probability
Pearson product-moment correlation coefficient
Spearman's rank correlation coefficient
Pearson
Kendall's Ď„
Spearman
Intra-class correlation coefficient
intra-class correlation coefficient

standard deviation
Bland–Altman plot

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑