Knowledge

Wagner–Fischer algorithm

Source 📝

1341:
of a string in a text. This modification gives the end-position of matching substrings of the text. To determine the start-position of the matching substrings, the number of insertions and deletions can be stored separately and used to compute the start-position from the end-position.
1286:
We can give different penalty costs to insertion, deletion and substitution. We can also give penalty costs that depend on which characters are inserted, deleted or substituted.
1478: 1264: 1345:
The resulting algorithm is by no means efficient, but was at the time of its publication (1980) one of the first algorithms that performed approximate search.
1746: 1218:
We can store the number of insertions, deletions, and substitutions separately, or even the positions at which they occur, which is always
1641: 1680: 1606: 1337:
By initializing the first row of the matrix with zeros, we obtain a variant of the Wagner–Fischer algorithm that can be used for
1611: 430:
Two examples of the resulting matrix (hovering over an underlined number reveals the operation performed to get that number):
1427: 1367: 1616: 1328:
is the Levenshtein distance), which is much faster than the regular dynamic programming algorithm if the distance is small.
1827: 1601: 47:. Navarro lists the following inventors of it, with date of publication, and acknowledges that the list is incomplete: 1695: 1636: 1508: 1543: 1858: 1723: 57: 1728: 1444: 1807: 1533: 1812: 1690: 1657: 1593: 1822: 1718: 1662: 1563: 1517: 93: 1387: 96:
of the first string and all prefixes of the second, then we can compute the values in the matrix by
1817: 1621: 1553: 88:
The Wagner–Fischer algorithm computes edit distance based on the observation that if we reserve a
1863: 1766: 990: 966: 1185:
is smaller than the minimum of the three, and use this to show one of the three is not minimal.
100:
the matrix, and thus find the distance between the two full strings as the last value computed.
1474: 1382: 1338: 1771: 1713: 1573: 1501: 1063:
Otherwise, the distance is the minimum of the three possible ways to do the transformation:
1578: 1178: 128: 89: 1240: 1215:), since it only requires that the previous row and current row be stored at any one time. 8: 1776: 28: 1705: 1672: 1420:
Algorithms on strings, trees, and sequences: computer science and computational biology
1400: 1290: 73: 44: 1837: 1460: 1423: 1404: 1477:. Student Section of the European Summer School in Logic, Language and Information ( 1301:
values can be computed in parallel, and the algorithm can be adapted to perform the
131:
between them, looks as follows. The input strings are one-indexed, while the matrix
1832: 1802: 1756: 1558: 1494: 1456: 1392: 981:
operations. At the end, the bottom-right element of the array contains the answer.
20: 1685: 1626: 1538: 1309: 1294: 969:
maintained throughout the algorithm is that we can transform the initial segment
1738: 1631: 1198: 51: 1852: 1548: 1525: 63: 32: 1231:
If we are only interested in the distance if it is smaller than a threshold
1751: 1568: 1396: 1761: 104: 97: 1177:
is in fact minimal; this is more difficult to show, and involves an
842: 619: 495: 1797: 1475:
An approximate gazetteer for GATE based on levenshtein distance
176:// the first i characters of s and the first j characters of t 1781: 1486: 248:// target prefixes can be reached from empty source prefix 212:// source prefixes can be transformed into empty string by 1235:, then it suffices to compute a diagonal stripe of width 1268:
in the matrix. In this way, the algorithm can be run in
1173:
This proof fails to validate that the number placed in
953: 916: 879: 805: 758: 650: 588: 557: 526: 1332: 1308:
By examining diagonals instead of rows, and by using
1243: 1157:
is of course the number required to transform all of
1109:
and then do the same transformation, for a total of
173:// for all i and j, d will hold the distance between 1258: 1193:Possible modifications to this algorithm include: 1009:It is initially true on row and column 0 because 1850: 1056:and just leave the last character alone, giving 1368:"A guided tour to approximate string matching" 1225:We can normalize the distance to the interval 1197:We can adapt the algorithm to use less space, 43:The Wagner–Fischer algorithm has a history of 1502: 1422:. Cambridge, UK: Cambridge University Press. 1417: 1305:function in phases to eliminate dependencies. 993:is that we can transform the initial segment 1509: 1495: 1312:, we can find the Levenshtein distance in 768: 763: 1442: 1386: 1361: 1359: 1357: 1188: 1013:can be transformed into the empty string 678: 1681:Comparison of regular-expression engines 1021:characters. Similarly, we can transform 1005:operations. This invariant holds since: 1445:"Lazy Dynamic-Programming can be Eager" 1365: 1128:operations, then we can do the same to 1052:operations, then we can do the same to 984: 92:to hold the edit distances between all 83: 1851: 1354: 1642:Zhu–Takaoka string matching algorithm 1490: 1283:is the length of the shortest string. 1149:The operations required to transform 179:// note that d has (m+1)*(n+1) values 103:A straightforward implementation, as 1607:Boyer–Moore string-search algorithm 1078:operations, then we can simply add 35:between two strings of characters. 13: 1333:Seller's variant for string search 14: 1875: 1696:Nondeterministic finite automaton 1637:Two-way string-matching algorithm 1293:poorly, due to a large number of 1105:operations, then we can remove 251:// by inserting every character 1612:Boyer–Moore–Horspool algorithm 1602:Apostolico–Giancarlo algorithm 1467: 1436: 1411: 1: 1348: 1617:Knuth–Morris–Pratt algorithm 1544:Damerau–Levenshtein distance 1461:10.1016/0020-0190(92)90202-7 1443:Allison L (September 1992). 1132:, and exchange the original 31:algorithm that computes the 7: 1808:Compressed pattern matching 1534:Approximate string matching 1516: 1140:afterwards, for a total of 10: 1880: 1813:Longest common subsequence 1724:Needleman–Wunsch algorithm 1594:String-searching algorithm 1473:Bruno Woltzenlogel Paleo. 1144:operations (substitution). 989:As mentioned earlier, the 434: 215:// dropping all characters 38: 1823:Sequential pattern mining 1790: 1737: 1704: 1671: 1663:Commentz-Walter algorithm 1651:Multiple string searching 1650: 1592: 1584:Wagner–Fischer algorithm 1524: 1366:Navarro, Gonzalo (2001). 1179:argument by contradiction 79:Lowrance and Wagner, 1975 1833:String rewriting systems 1818:Longest common substring 1729:Smith–Waterman algorithm 1554:Gestalt pattern matching 140: 111:that takes two strings, 25:Wagner–Fischer algorithm 1767:Generalized suffix tree 1691:Thompson's construction 1090:operations (insertion). 1040:, and we can transform 1017:by simply dropping all 1719:Hirschberg's algorithm 1418:Gusfield, Dan (1997). 1260: 1189:Possible modifications 1113:operations (deletion). 1859:Algorithms on strings 1574:Levenshtein automaton 1564:Jaro–Winkler distance 1397:10.1145/375360.375365 1375:ACM Computing Surveys 1261: 1029:by simply adding all 135:is zero-indexed, and 16:Programming algorithm 1622:Rabin–Karp algorithm 1579:Levenshtein distance 1259:{\displaystyle 2k+1} 1241: 1116:If we can transform 1093:If we can transform 1066:If we can transform 985:Proof of correctness 129:Levenshtein distance 84:Calculating distance 58:Needleman and Wunsch 1777:Ternary search tree 1339:fuzzy string search 1297:. However, all the 1181:in which we assume 1001:using a minimum of 977:using a minimum of 138:is a closed range. 29:dynamic programming 1706:Sequence alignment 1673:Regular expression 1449:Inf. Proc. Letters 1256: 1082:afterwards to get 127:, and returns the 45:multiple invention 1846: 1845: 1838:String operations 1429:978-0-521-58519-4 1295:data dependencies 1169:holds our result. 963: 962: 959: 958: 684: 683: 1871: 1803:Pattern matching 1757:Suffix automaton 1559:Hamming distance 1511: 1504: 1497: 1488: 1487: 1482: 1471: 1465: 1464: 1440: 1434: 1433: 1415: 1409: 1408: 1390: 1372: 1363: 1304: 1300: 1282: 1278: 1267: 1265: 1263: 1262: 1257: 1234: 1227: 1221: 1184: 1176: 1168: 1164: 1160: 1156: 1152: 1143: 1139: 1135: 1131: 1127: 1123: 1119: 1112: 1108: 1104: 1100: 1096: 1089: 1085: 1081: 1077: 1073: 1069: 1059: 1055: 1051: 1047: 1043: 1039: 1032: 1028: 1024: 1020: 1016: 1012: 1004: 1000: 996: 980: 976: 972: 955: 918: 881: 844: 807: 770: 765: 760: 688: 687: 680: 652: 621: 590: 559: 528: 497: 437: 436: 433: 432: 426: 423: 420: 417: 414: 413:substitutionCost 411: 408: 405: 402: 399: 396: 393: 390: 387: 384: 381: 378: 375: 372: 369: 366: 363: 360: 357: 356:substitutionCost 354: 351: 348: 345: 342: 341:substitutionCost 339: 336: 333: 330: 327: 324: 321: 318: 315: 312: 309: 306: 303: 300: 297: 294: 291: 288: 285: 282: 279: 276: 273: 270: 267: 264: 261: 258: 255: 252: 249: 246: 243: 240: 237: 234: 231: 228: 225: 222: 219: 216: 213: 210: 207: 204: 201: 198: 195: 192: 189: 186: 183: 180: 177: 174: 171: 168: 165: 162: 159: 156: 153: 150: 147: 144: 137: 21:computer science 1879: 1878: 1874: 1873: 1872: 1870: 1869: 1868: 1849: 1848: 1847: 1842: 1786: 1733: 1700: 1686:Regular grammar 1667: 1646: 1627:Raita algorithm 1588: 1539:Bitap algorithm 1520: 1515: 1485: 1472: 1468: 1441: 1437: 1430: 1416: 1412: 1388:10.1.1.452.6317 1370: 1364: 1355: 1351: 1335: 1324:)) time (where 1310:lazy evaluation 1302: 1298: 1289:This algorithm 1280: 1269: 1242: 1239: 1238: 1236: 1232: 1226: 1219: 1191: 1182: 1174: 1166: 1162: 1158: 1154: 1150: 1141: 1137: 1133: 1129: 1125: 1121: 1117: 1110: 1106: 1102: 1098: 1094: 1087: 1083: 1079: 1075: 1071: 1067: 1057: 1053: 1049: 1045: 1041: 1037: 1030: 1026: 1022: 1018: 1014: 1010: 1002: 998: 994: 987: 978: 974: 970: 428: 427: 424: 421: 419:// substitution 418: 415: 412: 409: 406: 403: 400: 397: 394: 391: 388: 385: 382: 379: 376: 373: 370: 367: 364: 361: 358: 355: 352: 349: 346: 343: 340: 337: 334: 331: 328: 325: 322: 319: 316: 313: 310: 307: 304: 301: 298: 295: 292: 289: 286: 283: 280: 277: 274: 271: 268: 265: 262: 259: 256: 253: 250: 247: 244: 241: 238: 235: 232: 229: 226: 223: 220: 217: 214: 211: 208: 205: 202: 199: 196: 193: 190: 187: 184: 181: 178: 175: 172: 169: 166: 163: 160: 157: 154: 151: 148: 145: 142: 136: 107:for a function 86: 41: 17: 12: 11: 5: 1877: 1867: 1866: 1864:String metrics 1861: 1844: 1843: 1841: 1840: 1835: 1830: 1825: 1820: 1815: 1810: 1805: 1800: 1794: 1792: 1788: 1787: 1785: 1784: 1779: 1774: 1769: 1764: 1759: 1754: 1749: 1743: 1741: 1739:Data structure 1735: 1734: 1732: 1731: 1726: 1721: 1716: 1710: 1708: 1702: 1701: 1699: 1698: 1693: 1688: 1683: 1677: 1675: 1669: 1668: 1666: 1665: 1660: 1654: 1652: 1648: 1647: 1645: 1644: 1639: 1634: 1632:Trigram search 1629: 1624: 1619: 1614: 1609: 1604: 1598: 1596: 1590: 1589: 1587: 1586: 1581: 1576: 1571: 1566: 1561: 1556: 1551: 1546: 1541: 1536: 1530: 1528: 1522: 1521: 1514: 1513: 1506: 1499: 1491: 1484: 1483: 1466: 1435: 1428: 1410: 1352: 1350: 1347: 1334: 1331: 1330: 1329: 1306: 1287: 1284: 1255: 1252: 1249: 1246: 1229: 1223: 1216: 1190: 1187: 1171: 1170: 1147: 1146: 1145: 1114: 1091: 1061: 1034: 986: 983: 961: 960: 957: 956: 951: 948: 945: 942: 939: 936: 933: 930: 927: 923: 922: 919: 914: 911: 908: 905: 902: 899: 896: 893: 889: 888: 885: 882: 877: 874: 871: 868: 865: 862: 859: 855: 854: 851: 848: 845: 840: 837: 834: 831: 828: 825: 821: 820: 817: 814: 811: 808: 803: 800: 797: 794: 791: 787: 786: 783: 780: 777: 774: 771: 766: 761: 756: 753: 749: 748: 745: 742: 739: 736: 733: 730: 727: 724: 721: 718: 717: 714: 711: 708: 705: 702: 699: 696: 693: 691: 685: 682: 681: 676: 673: 670: 667: 664: 661: 658: 654: 653: 648: 645: 642: 639: 636: 633: 630: 626: 625: 622: 617: 614: 611: 608: 605: 602: 598: 597: 594: 591: 586: 583: 580: 577: 574: 570: 569: 566: 563: 560: 555: 552: 549: 546: 542: 541: 538: 535: 532: 529: 524: 521: 518: 514: 513: 510: 507: 504: 501: 498: 493: 490: 486: 485: 482: 479: 476: 473: 470: 467: 464: 461: 460: 457: 454: 451: 448: 445: 442: 440: 141: 85: 82: 81: 80: 77: 70: 67: 61: 55: 40: 37: 15: 9: 6: 4: 3: 2: 1876: 1865: 1862: 1860: 1857: 1856: 1854: 1839: 1836: 1834: 1831: 1829: 1826: 1824: 1821: 1819: 1816: 1814: 1811: 1809: 1806: 1804: 1801: 1799: 1796: 1795: 1793: 1789: 1783: 1780: 1778: 1775: 1773: 1770: 1768: 1765: 1763: 1760: 1758: 1755: 1753: 1750: 1748: 1745: 1744: 1742: 1740: 1736: 1730: 1727: 1725: 1722: 1720: 1717: 1715: 1712: 1711: 1709: 1707: 1703: 1697: 1694: 1692: 1689: 1687: 1684: 1682: 1679: 1678: 1676: 1674: 1670: 1664: 1661: 1659: 1656: 1655: 1653: 1649: 1643: 1640: 1638: 1635: 1633: 1630: 1628: 1625: 1623: 1620: 1618: 1615: 1613: 1610: 1608: 1605: 1603: 1600: 1599: 1597: 1595: 1591: 1585: 1582: 1580: 1577: 1575: 1572: 1570: 1567: 1565: 1562: 1560: 1557: 1555: 1552: 1550: 1549:Edit distance 1547: 1545: 1542: 1540: 1537: 1535: 1532: 1531: 1529: 1527: 1526:String metric 1523: 1519: 1512: 1507: 1505: 1500: 1498: 1493: 1492: 1489: 1480: 1476: 1470: 1462: 1458: 1455:(4): 207–12. 1454: 1450: 1446: 1439: 1431: 1425: 1421: 1414: 1406: 1402: 1398: 1394: 1389: 1384: 1380: 1376: 1369: 1362: 1360: 1358: 1353: 1346: 1343: 1340: 1327: 1323: 1319: 1315: 1311: 1307: 1296: 1292: 1288: 1285: 1276: 1272: 1253: 1250: 1247: 1244: 1230: 1224: 1217: 1214: 1210: 1207:) instead of 1206: 1202: 1201: 1196: 1195: 1194: 1186: 1180: 1148: 1115: 1092: 1065: 1064: 1062: 1035: 1008: 1007: 1006: 992: 982: 968: 952: 949: 946: 943: 940: 937: 934: 931: 928: 925: 924: 920: 915: 912: 909: 906: 903: 900: 897: 894: 891: 890: 886: 883: 878: 875: 872: 869: 866: 863: 860: 857: 856: 852: 849: 846: 841: 838: 835: 832: 829: 826: 823: 822: 818: 815: 812: 809: 804: 801: 798: 795: 792: 789: 788: 784: 781: 778: 775: 772: 767: 762: 757: 754: 751: 750: 746: 743: 740: 737: 734: 731: 728: 725: 722: 720: 719: 715: 712: 709: 706: 703: 700: 697: 694: 692: 690: 689: 686: 677: 674: 671: 668: 665: 662: 659: 656: 655: 649: 646: 643: 640: 637: 634: 631: 628: 627: 623: 618: 615: 612: 609: 606: 603: 600: 599: 595: 592: 587: 584: 581: 578: 575: 572: 571: 567: 564: 561: 556: 553: 550: 547: 544: 543: 539: 536: 533: 530: 525: 522: 519: 516: 515: 511: 508: 505: 502: 499: 494: 491: 488: 487: 483: 480: 477: 474: 471: 468: 465: 463: 462: 458: 455: 452: 449: 446: 443: 441: 439: 438: 435: 431: 139: 134: 130: 126: 122: 118: 114: 110: 106: 101: 99: 98:flood filling 95: 91: 78: 75: 71: 69:Sellers, 1974 68: 65: 62: 59: 56: 53: 50: 49: 48: 46: 36: 34: 33:edit distance 30: 26: 22: 1752:Suffix array 1658:Aho–Corasick 1583: 1569:Lee distance 1469: 1452: 1448: 1438: 1419: 1413: 1381:(1): 31–88. 1378: 1374: 1344: 1336: 1325: 1321: 1317: 1313: 1291:parallelizes 1279:time, where 1274: 1270: 1212: 1208: 1204: 1199: 1192: 1172: 1161:into all of 988: 964: 429: 404:// insertion 132: 124: 120: 116: 112: 108: 102: 87: 42: 24: 18: 1762:Suffix tree 1060:operations. 1033:characters. 389:// deletion 72:Wagner and 1853:Categories 1349:References 123:of length 115:of length 105:pseudocode 1405:207551224 1383:CiteSeerX 1165:, and so 991:invariant 967:invariant 1481:), 2007. 146:Distance 143:function 109:Distance 94:prefixes 52:Vintsyuk 1828:Sorting 1798:Parsing 1518:Strings 1303:minimum 1266:⁠ 1237:⁠ 371:minimum 197:element 182:declare 74:Fischer 64:Sankoff 39:History 1479:ESSLLI 1426:  1403:  1385:  422:return 119:, and 90:matrix 76:, 1974 66:, 1972 60:, 1970 54:, 1968 23:, the 1791:Other 1747:DAFSA 1714:BLAST 1401:S2CID 1371:(PDF) 1320:(1 + 1153:into 1038:s = t 997:into 973:into 27:is a 1782:Trie 1772:Rope 1424:ISBN 1299:cost 1136:for 965:The 350:else 311:from 290:from 260:from 224:from 209:zero 194:each 161:char 152:char 1457:doi 1393:doi 1142:k+1 1124:in 1120:to 1111:k+1 1101:in 1097:to 1088:k+1 1086:in 1074:in 1070:to 1048:in 1044:to 1036:If 1025:to 305:for 284:for 254:for 218:for 191:set 185:int 19:In 1855:: 1453:43 1451:. 1447:. 1399:. 1391:. 1379:33 1377:. 1373:. 1356:^ 1275:kl 1213:mn 926:y 921:4 892:a 887:5 858:d 853:6 824:n 819:6 790:u 785:7 752:S 747:8 716:y 713:a 710:d 707:r 704:u 701:t 698:a 695:S 657:g 629:n 624:3 601:i 596:3 573:t 568:4 545:t 540:5 517:i 512:6 489:s 484:6 459:n 456:e 453:t 450:t 447:i 444:k 368::= 359::= 344::= 326:if 317:to 296:to 278::= 266:to 242::= 230:to 206:to 200:in 1510:e 1503:t 1496:v 1463:. 1459:: 1432:. 1407:. 1395:: 1326:d 1322:d 1318:m 1316:( 1314:O 1281:l 1277:) 1273:( 1271:O 1254:1 1251:+ 1248:k 1245:2 1233:k 1228:. 1222:. 1220:j 1211:( 1209:O 1205:m 1203:( 1200:O 1183:d 1175:d 1167:d 1163:t 1159:s 1155:t 1151:s 1138:t 1134:s 1130:s 1126:k 1122:t 1118:s 1107:s 1103:k 1099:t 1095:s 1084:t 1080:t 1076:k 1072:t 1068:s 1058:k 1054:s 1050:k 1046:t 1042:s 1031:j 1027:t 1023:s 1019:i 1015:t 1011:s 1003:d 999:t 995:s 979:d 975:t 971:s 954:3 950:4 947:5 944:5 941:5 938:4 935:4 932:5 929:6 917:3 913:4 910:4 907:4 904:4 901:3 898:4 895:5 884:4 880:3 876:4 873:3 870:3 867:3 864:3 861:4 850:5 847:4 843:3 839:3 836:2 833:2 830:2 827:3 816:5 813:4 810:3 806:2 802:2 799:1 796:1 793:2 782:6 779:5 776:4 773:3 769:2 764:1 759:0 755:1 744:7 741:6 738:5 735:4 732:3 729:2 726:1 723:0 679:3 675:4 672:4 669:5 666:6 663:7 660:7 651:2 647:3 644:3 641:4 638:5 635:6 632:6 620:2 616:2 613:3 610:4 607:5 604:5 593:2 589:1 585:2 582:3 579:4 576:4 565:3 562:2 558:1 554:2 551:3 548:3 537:4 534:3 531:2 527:1 523:2 520:2 509:5 506:4 503:3 500:2 496:1 492:1 481:5 478:4 475:3 472:2 469:1 466:0 425:d 416:) 410:+ 407:d 401:, 398:1 395:+ 392:d 386:, 383:1 380:+ 377:d 374:( 365:d 362:1 353:: 347:0 338:: 335:t 332:= 329:s 323:: 320:m 314:1 308:i 302:: 299:n 293:1 287:j 281:j 275:d 272:: 269:n 263:1 257:j 245:i 239:d 236:: 233:m 227:1 221:i 203:d 188:d 170:: 167:) 164:t 158:, 155:s 149:( 133:d 125:n 121:t 117:m 113:s

Index

computer science
dynamic programming
edit distance
multiple invention
Vintsyuk
Needleman and Wunsch
Sankoff
Fischer
matrix
prefixes
flood filling
pseudocode
Levenshtein distance
invariant
invariant
argument by contradiction
O
parallelizes
data dependencies
lazy evaluation
fuzzy string search



"A guided tour to approximate string matching"
CiteSeerX
10.1.1.452.6317
doi
10.1145/375360.375365
S2CID

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.