Knowledge

Robinson–Foulds metric

Source 📝

33: 750:
However, this issue is not a problem with RF distances per se, it is a more general criticism of tree distances. Regardless of the behaviour of any specific tree distance a practicing evolutionary biologist might view some tree rearrangements as "important" and other rearrangements as "trivial". Tree distances are tools; they are most useful in the context of other information about the organisms in the trees.
123:
is the number of partitions of data implied by the second tree but not the first tree (although some software implementations divide the RF metric by 2 and others scale the RF distance to have a maximum value of 1). The partitions are calculated for each tree by removing each branch. Thus, the number
673:
In 1985 Day gave an algorithm based on perfect hashing that computes this distance that has only a linear complexity in the number of nodes in the trees. A randomized algorithm that uses hash tables that are not necessarily perfect has been shown to approximate the Robinson-Foulds distance with a
721:
The RF metric remains widely used because the idea of using the number of splits that differ between a pair of trees is a relatively intuitive way to assess the differences among trees for many systematists. This is the primary strength of the RF distance and the reason for its continued use in
757:
The best-performing generalized Robinson-Foulds distances have a basis in information theory, and measure the distance between trees in terms of the quantity of information that the trees' splits hold in common (measured in bits). The Clustering Information Distance (implemented in R package
749:
Another issue to consider when using RF distances is that differences in one clade may be trivial (perhaps if the clade resolves three species within a genus differently) or may be fundamental (if the clade is deep in the tree and defines two fundamental subgroups, such as mammals and birds).
753:
These issues can be addressed by using less conservative metrics. "Generalized RF distances" recognize similarity between similar, but non-identical, splits; the original Robinson Foulds distance doesn't care how similar two groupings are, if they aren't identical they are discarded.
127:
RF distances have been criticized as biased, but they represent a relatively intuitive measure of the distances between phylogenetic trees and therefore remain widely used (the original 1981 paper describing Robinson-Foulds distances was cited more than 2700 times by 2023 based on
722:
phylogenetics. Of course, the number of splits that differ between a pair of trees depends on the number of taxa in the trees so one might argue that this unit is not meaningful. However, it is straightforward to normalize RF distances so they range between zero and one.
132:). Nevertheless, the biases inherent to the RF distances suggest that researches should consider using "Generalized" Robinson–Foulds metrics that may have better theoretical and practical performance and avoid the biases and misleading attributes of the original metric. 973:*Böcker S., Canzar S., Klau G.W. 2013. The generalized Robinson-Foulds metric. In: Darling A., Stoye J., editors. Algorithms in Bioinformatics. WABI 2013. Lecture Notes in Computer Science, vol 8126. Berlin, Heidelberg: Springer. p. 156–169. 742:
Its range of values can depend on tree shape: trees that contain many uneven partitions will command relatively lower distances, on average, than trees with many even partitions.
459: 362: 144:) for each node (which could be empty, but only nodes with degree greater than or equal to three can be labeled by an empty set) the Robinson–Foulds metric finds the number of 1047:
Schuh, R. T. & Polhemus, J. T. (1980). "Analysis of taxonomic congruence among morphological, ecological and biogeographic data sets for the Leptopodomorpha (Hemiptera)".
392: 245: 197:
The authors define two trees to be the same if they are isomorphic and the isomorphism preserves the labeling. The construction of the proof is based on a function called
192: 658:
The RF distance corresponds to an equivalent similarity metric that reflects the resolution of the strict consensus of two trees, first used to compare trees in 1980.
268: 215: 162: 194:
operations to convert one into the other. The number of operations defines their distance. Rooted trees can be examined by attaching a dummy leaf to the root node.
648: 621: 594: 567: 540: 513: 486: 419: 322: 295: 1150: 1190:
Makarenkov, V and Leclerc, B. Comparison of additive trees using circular orders, Journal of Computational Biology,7,5,731-744, 2000,"Mary Ann Liebert, Inc."
1149:
M. Bourque, Arbres de Steiner et reseaux dont certains sommets sont a localisation variable. PhD thesis, University de Montreal, Montreal, Quebec, 1978
982:
Nye T.M.W., Liò P., Gilks W.R. 2006. A novel algorithm and web-based tool for comparing two alternative phylogenetic trees. Bioinformatics. 22:117–119.
961:
Y. Lin, V. Rajan, B.M. Moret A metric for phylogenetic trees based on matching IEEE/ACM Trans. Comput. Biol. Bioinform., 9 (4) (2012), pp. 1014-1022
976:
Bogdanowicz D., Giaro K. 2012. Matching split distance for unrooted binary phylogenetic trees. IEEE/ACM Trans. Comput. Biol. Bioinforma. 9:150–160.
729:
Relative to other metrics, lacks sensitivity, and is thus imprecise; it can take two fewer distinct values than there are taxa in a tree.
979:
Bogdanowicz D., Giaro K. 2013. On a matching distance between rooted phylogenetic trees. Int. J. Appl. Math. Comput. Sci. 23:669–684.
857: 710: 735:
Its value can be counterintuitive. One example is that moving a tip and its neighbour to a particular point on a tree generates a
17: 1194:
Pattengale, Nicholas D.; Gottlieb, Eric J.; Moret, Bernard M.E. (2007). "Efficiently Computing the Robinson–Foulds Metric".
1151:
http://www.worldcat.org/title/arbres-de-steiner-et-reseaux-dont-certains-sommets-sont-a-localisation-variable/oclc/053538946
1277: 1282: 75: 839: 42: 706:(`treedist()` function). For comparing groups of trees, the fastest implementations include HashRF and MrsRF. 827: 845: 1095:"Bayesian and parsimony approaches reconstruct informative trees from simulated morphological datasets" 745:
It performs more poorly than many alternative measures in practical settings, based on simulated trees.
424: 327: 803: 1208: 809: 791: 821: 367: 220: 167: 686:, the metric is often used to compute a distance between two trees. The treedist program in the 883: 1203: 691: 488:. The number of operations in each of these procedures is equivalent to the number of edges in 46: 53: 253: 217:, which contracts an edge (combining the nodes, creating a union of their sets). Conversely, 200: 147: 1007:"Information theoretic Generalized Robinson-Foulds metrics for comparing phylogenetic trees" 914: 713:
to compute distances between trees that represent how languages are related to each other.
662: 626: 599: 572: 545: 518: 491: 464: 397: 300: 273: 119:
is the number of partitions of data implied by the first tree but not the second tree and
57: 8: 732:
It is rapidly saturated; very similar trees can be allocated the maximum distance value.
725:
However, the RF metric also suffers a number of theoretical and practical shortcomings:
1124: 101: 124:
of eligible partitions for each tree is equal to the number of branches in that tree.
1256: 1221: 1169: 1129: 1064: 1029: 1025: 944: 936: 899: 1251: 1234: 1246: 1213: 1181: 1165: 1119: 1109: 1056: 1021: 926: 895: 766: 762:) is recommended as the most suitable alternative to the Robinson-Foulds distance. 1176:
William H. E. Day, "Optimal algorithms for comparing trees with labeled leaves",
739:
difference value than if just one of the two tips were moved to the same place.
129: 759: 703: 699: 698:
Python library (under the name "symmetric difference metric"), and R packages
661:
In their 1981 paper Robinson and Foulds proved that the distance is in fact a
1271: 1068: 940: 683: 1060: 931: 247:
expands an edge (decontraction), where the set can be split in any fashion.
1260: 1225: 1156:
Robinson, D. R.; Foulds, L. R. (1981). "Comparison of phylogenetic trees".
1133: 1114: 1033: 948: 1217: 1094: 1006: 1185: 596:. The sum of the operations is equivalent to a transformation from 765:
An alternative approach to tree distance calculation is to use
687: 140:
Given two unrooted trees of nodes and a set of labels (i.e.,
1046: 141: 769:, rather than splits, as the basis for tree comparison. 695: 1235:"DendroPy: A Python library for phylogenetic computing" 1193: 668: 629: 602: 575: 548: 521: 494: 467: 427: 400: 370: 330: 303: 276: 256: 223: 203: 170: 150: 100:, is a simple way to calculate the distance between 915:"Practical Performance of Tree Comparison Metrics" 642: 615: 588: 561: 534: 507: 480: 453: 413: 386: 356: 316: 289: 262: 239: 209: 186: 156: 1269: 1232: 1155: 882:Robinson, D.F.; Foulds, L.R. (February 1981). 881: 912: 851:hardwiredClusterDistance(tree1, tree2, true) 711:used in quantitative comparative linguistics 394:is used to add the edges only discovered in 1088: 1086: 1084: 1082: 1080: 1078: 1000: 998: 996: 994: 992: 990: 913:Kuhner, Mary K.; Yamato, Jon (2015-03-01). 772: 716: 56:. Please do not remove this message until 1250: 1207: 1123: 1113: 930: 820:Faster than phangorn implementation; see 709:The Robinson–Foulds metric has also been 76:Learn how and when to remove this message 1075: 987: 690:suite offers this function, as does the 677: 52:Relevant discussion may be found on the 1233:Sukumaran, J.; Holder, Mark T. (2010). 14: 1270: 1092: 1004: 969: 967: 877: 875: 873: 26: 669:Algorithms for computing the metric 24: 1143: 884:"Comparison of phylogenetic trees" 702:(`RobinsonFoulds()` function) and 25: 1294: 964: 870: 674:bounded error in sublinear time. 454:{\displaystyle T_{1}\wedge T_{2}} 357:{\displaystyle T_{1}\wedge T_{2}} 1196:Journal of Computational Biology 270:function removes all edges from 31: 1040: 1026:10.1093/bioinformatics/btaa614 955: 906: 833:tree_1.robinson_foulds(tree_2) 135: 13: 1: 1252:10.1093/bioinformatics/btq228 864: 653: 1170:10.1016/0025-5564(81)90043-2 900:10.1016/0025-5564(81)90043-2 797:dist.dendlist(dendlist(x,y)) 542:plus the number of edges in 387:{\displaystyle \alpha ^{-1}} 240:{\displaystyle \alpha ^{-1}} 187:{\displaystyle \alpha ^{-1}} 7: 1278:Computational phylogenetics 1180:, Number 1, December 1985. 96:, often abbreviated as the 94:symmetric difference metric 58:conditions to do so are met 10: 1299: 1283:Bioinformatics algorithms 1178:Journal of Classification 1093:Smith, Martin R. (2019). 1005:Smith, Martin R. (2020). 1158:Mathematical Biosciences 888:Mathematical Biosciences 773:Software implementations 717:Strengths and weaknesses 263:{\displaystyle \alpha } 210:{\displaystyle \alpha } 157:{\displaystyle \alpha } 1115:10.1098/rsbl.2018.0632 644: 617: 590: 563: 536: 509: 482: 455: 415: 388: 358: 318: 291: 264: 241: 211: 188: 158: 18:Robinson-Foulds metric 1218:10.1089/cmb.2007.R012 1061:10.1093/sysbio/29.1.1 932:10.1093/sysbio/syu085 678:Specific applications 645: 643:{\displaystyle T_{2}} 618: 616:{\displaystyle T_{1}} 591: 589:{\displaystyle T_{1}} 564: 562:{\displaystyle T_{2}} 537: 535:{\displaystyle T_{2}} 510: 508:{\displaystyle T_{1}} 483: 481:{\displaystyle T_{2}} 456: 416: 414:{\displaystyle T_{2}} 389: 359: 319: 317:{\displaystyle T_{2}} 292: 290:{\displaystyle T_{1}} 265: 242: 212: 189: 159: 815:RobinsonFoulds(x, y) 627: 600: 573: 546: 519: 492: 465: 425: 398: 368: 328: 301: 274: 254: 221: 201: 168: 148: 45:of this article is 1186:10.1007/BF01908061 1049:Systematic Biology 919:Systematic Biology 853:from PhyloNetworks 640: 613: 586: 559: 532: 505: 478: 451: 411: 384: 354: 314: 287: 260: 237: 207: 184: 154: 107:It is defined as ( 102:phylogenetic trees 1245:(12): 1569–1571. 1020:(20): 5007–5013. 862: 861: 650:, or vice versa. 86: 85: 78: 16:(Redirected from 1290: 1264: 1254: 1229: 1211: 1173: 1164:(1–2): 131–147. 1138: 1137: 1127: 1117: 1099: 1090: 1073: 1072: 1044: 1038: 1037: 1011: 1002: 985: 971: 962: 959: 953: 952: 934: 910: 904: 903: 894:(1–2): 131–147. 879: 852: 834: 816: 798: 780:Language/Program 777: 776: 767:Quartet distance 649: 647: 646: 641: 639: 638: 622: 620: 619: 614: 612: 611: 595: 593: 592: 587: 585: 584: 569:that are not in 568: 566: 565: 560: 558: 557: 541: 539: 538: 533: 531: 530: 515:that are not in 514: 512: 511: 506: 504: 503: 487: 485: 484: 479: 477: 476: 460: 458: 457: 452: 450: 449: 437: 436: 420: 418: 417: 412: 410: 409: 393: 391: 390: 385: 383: 382: 363: 361: 360: 355: 353: 352: 340: 339: 323: 321: 320: 315: 313: 312: 297:that are not in 296: 294: 293: 288: 286: 285: 269: 267: 266: 261: 246: 244: 243: 238: 236: 235: 216: 214: 213: 208: 193: 191: 190: 185: 183: 182: 163: 161: 160: 155: 81: 74: 70: 67: 61: 35: 34: 27: 21: 1298: 1297: 1293: 1292: 1291: 1289: 1288: 1287: 1268: 1267: 1146: 1144:Further reading 1141: 1108:(2). 20180632. 1102:Biology Letters 1097: 1091: 1076: 1045: 1041: 1009: 1003: 988: 972: 965: 960: 956: 911: 907: 880: 871: 867: 850: 832: 814: 799:from dendextend 796: 775: 719: 680: 671: 656: 634: 630: 628: 625: 624: 607: 603: 601: 598: 597: 580: 576: 574: 571: 570: 553: 549: 547: 544: 543: 526: 522: 520: 517: 516: 499: 495: 493: 490: 489: 472: 468: 466: 463: 462: 445: 441: 432: 428: 426: 423: 422: 405: 401: 399: 396: 395: 375: 371: 369: 366: 365: 348: 344: 335: 331: 329: 326: 325: 308: 304: 302: 299: 298: 281: 277: 275: 272: 271: 255: 252: 251: 228: 224: 222: 219: 218: 202: 199: 198: 175: 171: 169: 166: 165: 149: 146: 145: 138: 122: 118: 114: 110: 90:Robinson–Foulds 82: 71: 65: 62: 51: 36: 32: 23: 22: 15: 12: 11: 5: 1296: 1286: 1285: 1280: 1266: 1265: 1239:Bioinformatics 1230: 1209:10.1.1.75.3338 1202:(6): 724–735. 1191: 1188: 1174: 1153: 1145: 1142: 1140: 1139: 1074: 1039: 1014:Bioinformatics 986: 984: 983: 980: 977: 963: 954: 925:(2): 205–214. 905: 868: 866: 863: 860: 859: 854: 848: 842: 841: 836: 830: 824: 823: 818: 812: 806: 805: 800: 794: 788: 787: 784: 781: 774: 771: 747: 746: 743: 740: 733: 730: 718: 715: 692:RAxML_standard 679: 676: 670: 667: 655: 652: 637: 633: 610: 606: 583: 579: 556: 552: 529: 525: 502: 498: 475: 471: 448: 444: 440: 435: 431: 408: 404: 381: 378: 374: 351: 347: 343: 338: 334: 311: 307: 284: 280: 259: 234: 231: 227: 206: 181: 178: 174: 153: 137: 134: 130:Google Scholar 120: 116: 112: 108: 84: 83: 66:September 2020 39: 37: 30: 9: 6: 4: 3: 2: 1295: 1284: 1281: 1279: 1276: 1275: 1273: 1262: 1258: 1253: 1248: 1244: 1240: 1236: 1231: 1227: 1223: 1219: 1215: 1210: 1205: 1201: 1197: 1192: 1189: 1187: 1183: 1179: 1175: 1171: 1167: 1163: 1159: 1154: 1152: 1148: 1147: 1135: 1131: 1126: 1121: 1116: 1111: 1107: 1103: 1096: 1089: 1087: 1085: 1083: 1081: 1079: 1070: 1066: 1062: 1058: 1054: 1050: 1043: 1035: 1031: 1027: 1023: 1019: 1015: 1008: 1001: 999: 997: 995: 993: 991: 981: 978: 975: 974: 970: 968: 958: 950: 946: 942: 938: 933: 928: 924: 920: 916: 909: 901: 897: 893: 889: 885: 878: 876: 874: 869: 858: 855: 849: 847: 844: 843: 840: 837: 831: 829: 826: 825: 822: 819: 817:from TreeDist 813: 811: 808: 807: 804: 801: 795: 793: 790: 789: 785: 782: 779: 778: 770: 768: 763: 761: 755: 751: 744: 741: 738: 734: 731: 728: 727: 726: 723: 714: 712: 707: 705: 701: 697: 694:package, the 693: 689: 685: 684:phylogenetics 675: 666: 664: 659: 651: 635: 631: 608: 604: 581: 577: 554: 550: 527: 523: 500: 496: 473: 469: 446: 442: 438: 433: 429: 406: 402: 379: 376: 372: 349: 345: 341: 336: 332: 309: 305: 282: 278: 257: 248: 232: 229: 225: 204: 195: 179: 176: 172: 151: 143: 133: 131: 125: 105: 103: 99: 95: 91: 80: 77: 69: 59: 55: 49: 48: 44: 38: 29: 28: 19: 1242: 1238: 1199: 1195: 1177: 1161: 1157: 1105: 1101: 1052: 1048: 1042: 1017: 1013: 957: 922: 918: 908: 891: 887: 764: 756: 752: 748: 736: 724: 720: 708: 681: 672: 660: 657: 421:to the tree 249: 196: 139: 126: 106: 97: 93: 89: 87: 72: 63: 41: 1055:(1): 1–26. 364:, and then 324:, creating 136:Explanation 98:RF distance 1272:Categories 865:References 654:Properties 461:to build 43:neutrality 1204:CiteSeerX 1069:1063-5157 941:1076-836X 835:from ete3 439:∧ 377:− 373:α 342:∧ 258:α 230:− 226:α 205:α 177:− 173:α 152:α 54:talk page 1261:20421198 1226:17691890 1134:30958126 1034:32619004 949:25378436 783:Function 760:TreeDist 704:phangorn 700:TreeDist 696:DendroPy 115:) where 47:disputed 1125:6405459 1259:  1224:  1206:  1132:  1122:  1067:  1032:  947:  939:  828:Python 786:Notes 688:PHYLIP 663:metric 1098:(PDF) 1010:(PDF) 846:Julia 737:lower 1257:PMID 1222:PMID 1130:PMID 1065:ISSN 1030:PMID 945:PMID 937:ISSN 856:See 838:See 802:See 250:The 164:and 142:taxa 88:The 40:The 1247:doi 1214:doi 1182:doi 1166:doi 1120:PMC 1110:doi 1057:doi 1022:doi 927:doi 896:doi 682:In 623:to 92:or 1274:: 1255:. 1243:26 1241:. 1237:. 1220:. 1212:. 1200:14 1198:. 1162:53 1160:. 1128:. 1118:. 1106:15 1104:. 1100:. 1077:^ 1063:. 1053:29 1051:. 1028:. 1018:36 1016:. 1012:. 989:^ 966:^ 943:. 935:. 923:64 921:. 917:. 892:53 890:. 886:. 872:^ 665:. 111:+ 104:. 1263:. 1249:: 1228:. 1216:: 1184:: 1172:. 1168:: 1136:. 1112:: 1071:. 1059:: 1036:. 1024:: 951:. 929:: 902:. 898:: 810:R 792:R 636:2 632:T 609:1 605:T 582:1 578:T 555:2 551:T 528:2 524:T 501:1 497:T 474:2 470:T 447:2 443:T 434:1 430:T 407:2 403:T 380:1 350:2 346:T 337:1 333:T 310:2 306:T 283:1 279:T 233:1 180:1 121:B 117:A 113:B 109:A 79:) 73:( 68:) 64:( 60:. 50:. 20:)

Index

Robinson-Foulds metric
neutrality
disputed
talk page
conditions to do so are met
Learn how and when to remove this message
phylogenetic trees
Google Scholar
taxa
metric
phylogenetics
PHYLIP
RAxML_standard
DendroPy
TreeDist
phangorn
used in quantitative comparative linguistics
TreeDist
Quartet distance
R

R

Python

Julia



Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.