Knowledge

Damerau–Levenshtein distance

Source 📝

773: 259: 1907:.) Thus, we need to consider only two symmetric ways of modifying a substring more than once: (1) transpose letters and insert an arbitrary number of characters between them, or (2) delete a sequence of characters and transpose letters that become adjacent after deletion. The straightforward implementation of this idea gives an algorithm of cubic complexity: 768:{\displaystyle d_{a,b}(i,j)=\min {\begin{cases}0&{\text{if }}i=j=0,\\d_{a,b}(i-1,j)+1&{\text{if }}i>0,\\d_{a,b}(i,j-1)+1&{\text{if }}j>0,\\d_{a,b}(i-1,j-1)+1_{(a_{i}\neq b_{j})}&{\text{if }}i,j>0,\\d_{a,b}(i-2,j-2)+1_{(a_{i}\neq b_{j})}&{\text{if }}i,j>1{\text{ and }}a_{i}=b_{j-1}{\text{ and }}a_{i-1}=b_{j},\\\end{cases}}} 62:
In his seminal paper, Damerau stated that in an investigation of spelling errors for an information-retrieval system, more than 80% were a result of a single error of one of the four types. Damerau's paper considered only misspellings that could be corrected with at most one edit operation. While the
2074:
The algorithm can be used with any set of words, like vendor names. Since entry is manual by nature, there is a risk of entering a false vendor. A fraudster employee may enter one real vendor such as "Rich Heir Estate Services" versus a false vendor "Rich Hier State Services". The fraudster would
2057:
frequently undergoes insertions, deletions, substitutions, and transpositions, and each of these operations occurs on approximately the same timescale, the Damerau–Levenshtein distance is an appropriate metric of the variation between two strands of DNA. More common in DNA, protein, and other
2033:. In natural languages, strings are short and the number of errors (misspellings) rarely exceeds 2. In such circumstances, restricted and real edit distance differ very rarely. Oommen and Loke even mitigated the limitation of the restricted edit distance by introducing 1823:
To devise a proper algorithm to calculate unrestricted Damerau–Levenshtein distance, note that there always exists an optimal sequence of edit operations, where once-transposed letters are never modified afterwards. (This holds as long as the cost of a transposition,
2075:
then create a false bank account and have the company route checks to the real vendor and false vendor. The Damerau–Levenshtein algorithm will detect the transposed and dropped letter and bring attention of the items to a fraud examiner.
1406:, while the second one computes the Damerau–Levenshtein distance with adjacent transpositions. Adding transpositions adds significant complexity. The difference between the two algorithms consists in that the 47:
between two sequences. Informally, the Damerau–Levenshtein distance between two words is the minimum number of operations (consisting of insertions, deletions or substitutions of a single character, or
1969: 1216: 1115: 1308: 1905: 2290:
Majorek, Karolina A.; Dunin-Horkawicz, Stanisław; et al. (2013), "The RNase H-like superfamily: new members, comparative structural analysis and evolutionary classification",
826: 2194:. Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. pp. 1025–1032. 1680:
The following algorithm computes the true Damerau–Levenshtein distance with adjacent transpositions; this algorithm requires as an additional parameter the size of the alphabet
1006: 936: 161: 2012: 870: 59:
by including transpositions among its allowable operations in addition to the three classical single-character edit operations (insertions, deletions and substitutions).
1849: 1384: 1344: 247: 225: 204: 182: 113: 93: 2164:, Conferences in Research and Practice in Information Technology, vol. 68, Darlinghurst, Australia: Australian Computer Society, Inc., pp. 117–124, 1469:
because that would require the substring to be edited more than once, which is not allowed in OSA, and therefore the shortest sequence of operations is
2377:
Oommen, B. J.; Loke, R. K. S. (1997). "Pattern recognition of strings with substitutions, insertions, deletions and generalized transpositions".
2158:
Bard, Gregory V. (2007), "Spelling-error tolerant, order-independent pass-phrases via the Damerau–Levenshtein string-edit distance metric",
2775: 2463: 2670: 2709: 2635: 2161:
Proceedings of the Fifth Australasian Symposium on ACSW Frontiers : 2007, Ballarat, Australia, January 30 - February 2, 2007
2640: 1910: 2169: 1122: 1021: 2645: 2186: 2119: 2021:
can be modified to process transposition. See the information retrieval section of for an example of such an adaptation.
2856: 2630: 2724: 2665: 2537: 2222:
Levenshtein, Vladimir I. (February 1966), "Binary codes capable of correcting deletions, insertions, and reversals",
1237: 67:, Damerau–Levenshtein distance has also seen uses in biology to measure the variation between protein sequences. 2752: 2059: 1854: 2897: 2757: 2612: 2063: 1522: 778: 2427:
Lowrance, Roy; Wagner, Robert A. (April 1975), "An Extension of the String-to-String Correction Problem",
2892: 2836: 2562: 2127:. Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. pp. 286–293. 2030: 1219: 49: 2841: 2719: 2686: 2622: 1979:
are string lengths. Using the ideas of Lowrance and Wagner, this naive algorithm can be improved to be
63:
original motivation was to measure distance between human misspellings to improve applications such as
951: 881: 2851: 2747: 2691: 2592: 2546: 2083:
The U.S. Government uses the Damerau–Levenshtein distance with its Consolidated Screening List API.
305: 2846: 2650: 2582: 2399: 118: 2887: 2795: 2037:. Nevertheless, one must remember that the restricted edit distance usually does not satisfy the 1982: 1410:
computes the number of edit operations needed to make the strings equal under the condition that
835: 2471: 2394: 1659:
The difference from the algorithm for Levenshtein distance is the addition of one recurrence:
2800: 2742: 2602: 2530: 1117:
corresponds to a match or mismatch, depending on whether the respective symbols are the same,
2607: 2386: 2248: 2231: 1827: 1529: 1521:
Optimal string alignment distance can be computed using a straightforward extension of the
1353: 1313: 56: 36: 32: 875:
Each recursive call matches one of the cases covered by the Damerau–Levenshtein distance:
8: 2805: 2038: 1525: 1486: 1398:
Presented here are two algorithms: the first, simpler one, computes what is known as the
2390: 2235: 2101: – Form of cybersquatting which relies on mistakes when inputting a website address 2058:
bioinformatics related alignment tasks is the use of closely related algorithms such as
2734: 2701: 2505: 2446: 2356: 2312: 2270: 829: 232: 210: 189: 167: 98: 78: 20: 2408: 2251:(March 1964), "A technique for computer detection and correction of spelling errors", 2866: 2509: 2339:
Boytsov, Leonid (May 2011). "Indexing methods for approximate dictionary searching".
2317: 2165: 2450: 2360: 2159: 2861: 2831: 2785: 2587: 2523: 2497: 2436: 2404: 2348: 2307: 2299: 2274: 2260: 2195: 2128: 24: 2714: 2655: 2567: 2470:. International Trade Administration, U.S. Department of Commerce. Archived from 2018: 2767: 2660: 2488:
Navarro, Gonzalo (March 2001), "A guided tour to approximate string matching",
2205: 2188:
Exploring distributional similarity based models for query spelling correction
2138: 2881: 2577: 2554: 2098: 1726:// note that d has indices starting at −1, while a, b and da are one-indexed. 64: 44: 40: 2352: 2200: 2133: 2095:– Suggests corrections that are based on a Damerau–Levenshtein distance of 1 2780: 2597: 2321: 2289: 2501: 2441: 2303: 2265: 1851:, is at least the average of the cost of an insertion and deletion, i.e., 2790: 2042: 52:
of two adjacent characters) required to change one word into the other.
1533: 1724:
d be a 2-d array of integers, dimensions length(a)+2, length(b)+2
1557:
d be a 2-d array of integers, dimensions length(a)+1, length(b)+1
2826: 1705:: distance, integer da := new array of |Σ| integers 1728:
maxdist := length(a) + length(b) d := maxdist
1649:
d := minimum(d, d + 1)
75:
To express the Damerau–Levenshtein distance between two strings
2092: 1559:// note that d is zero-indexed, while a and b are one-indexed. 2121:
An Improved Error Model for Noisy Channel Spelling Correction
2014:
in the worst case, which is what the above pseudocode does.
55:
The Damerau–Levenshtein distance differs from the classical
2810: 1485:. Note that for the optimal string alignment distance, the 761: 2515: 2054: 2029:
Damerau–Levenshtein distance plays an important role in
1964:{\displaystyle O{\big (}M\cdot N\cdot \max(M,N){\big )}} 1800:
cost := 1 d := minimum(d + cost,
1796:cost := 0 db := j 1675: 1630:
cost := 1 d := minimum(d + 1,
1414:, whereas the second one presents no such restriction. 1234:
is then given by the function value for full strings:
1211:{\displaystyle d_{a,b}(i-2,j-2)+1_{(a_{i}\neq b_{j})}} 1110:{\displaystyle d_{a,b}(i-1,j-1)+1_{(a_{i}\neq b_{j})}} 1985: 1913: 1857: 1830: 1356: 1316: 1240: 1125: 1024: 954: 884: 838: 781: 262: 235: 213: 192: 170: 121: 101: 81: 1669:d := minimum(d, d + 1) 1516: 1788:k := da] ℓ := db 2006: 1963: 1899: 1843: 1378: 1338: 1302: 1210: 1109: 1000: 930: 864: 820: 767: 241: 219: 198: 176: 155: 107: 87: 163:is defined, whose value is a distance between an 2879: 1936: 1445:, but the optimal string alignment distance OSA( 297: 2426: 1303:{\displaystyle d_{a,b}{\big (}|a|,|b|{\big )}} 2531: 1956: 1919: 1295: 1259: 2221: 1684:, so that all entries of the arrays are in 1417:Take for example the edit distance between 2538: 2524: 2376: 2117: 1758:d := maxdist d := j 1743:d := maxdist d := i 1665:i > 1 and j > 1 and a = b and a = b 1645:i > 1 and j > 1 and a = b and a = b 2440: 2398: 2334: 2332: 2330: 2311: 2285: 2283: 2264: 2199: 2132: 1226:The Damerau–Levenshtein distance between 2710:Comparison of regular-expression engines 2372: 2370: 2487: 2338: 2247: 2880: 2422: 2420: 2418: 2327: 2280: 2153: 2151: 2118:Brill, Eric; Moore, Robert C. (2000). 1900:{\displaystyle 2W_{T}\geq W_{I}+W_{D}} 1425:. The Damerau–Levenshtein distance LD( 2671:Zhu–Takaoka string matching algorithm 2519: 2367: 1676:Distance with adjacent transpositions 1412:no substring is edited more than once 821:{\displaystyle 1_{(a_{i}\neq b_{j})}} 186:prefix (initial substring) of string 2341:Journal of Experimental Algorithmics 2157: 256:function is defined recursively as: 2636:Boyer–Moore string-search algorithm 2415: 2148: 1513:), and so it is not a true metric. 1461:is used, it is not possible to use 13: 2481: 2184: 2069: 1408:optimal string alignment algorithm 1008:corresponds to an insertion (from 14: 2909: 2725:Nondeterministic finite automaton 2666:Two-way string-matching algorithm 2464:"Consolidated Screening List API" 2078: 1517:Optimal string alignment distance 1400:optimal string alignment distance 1001:{\displaystyle d_{a,b}(i,j-1)+1} 938:corresponds to a deletion (from 931:{\displaystyle d_{a,b}(i-1,j)+1} 2456: 2041:, and thus cannot be used with 2024: 1453:) = 3 because if the operation 1222:between two successive symbols. 2641:Boyer–Moore–Horspool algorithm 2631:Apostolico–Giancarlo algorithm 2241: 2215: 2178: 2111: 2001: 1989: 1951: 1939: 1372: 1364: 1332: 1324: 1289: 1281: 1273: 1265: 1203: 1177: 1166: 1142: 1102: 1076: 1065: 1041: 989: 971: 919: 901: 813: 787: 660: 634: 623: 599: 549: 523: 512: 488: 440: 422: 374: 356: 291: 279: 150: 138: 1: 2409:10.1016/S0031-3203(96)00101-X 2105: 1346:denotes the length of string 70: 2646:Knuth–Morris–Pratt algorithm 2573:Damerau–Levenshtein distance 1393: 156:{\displaystyle d_{a,b}(i,j)} 29:Damerau–Levenshtein distance 7: 2837:Compressed pattern matching 2563:Approximate string matching 2545: 2086: 2031:natural language processing 2017:It is interesting that the 2007:{\displaystyle O(M\cdot N)} 1812:d + (i−k−1) + 1 + (j-ℓ−1)) 1626:cost := 0 865:{\displaystyle a_{i}=b_{j}} 10: 2914: 2842:Longest common subsequence 2753:Needleman–Wunsch algorithm 2623:String-searching algorithm 2468:Trade.gov Developer Portal 2060:Needleman–Wunsch algorithm 2035:generalized transpositions 872:and equal to 1 otherwise. 16:Metric in computer science 2852:Sequential pattern mining 2819: 2766: 2733: 2700: 2692:Commentz-Walter algorithm 2680:Multiple string searching 2679: 2621: 2613:Wagner–Fischer algorithm 2553: 2253:Communications of the ACM 2862:String rewriting systems 2847:Longest common substring 2758:Smith–Waterman algorithm 2583:Gestalt pattern matching 2185:Li; et al. (2006). 2064:Smith–Waterman algorithm 1553:: distance, integer 1528:algorithm that computes 1404:restricted edit distance 2796:Generalized suffix tree 2720:Thompson's construction 2353:10.1145/1963190.1963191 2201:10.3115/1220175.1220304 2134:10.3115/1075218.1075255 37:Vladimir I. Levenshtein 2748:Hirschberg's algorithm 2292:Nucleic Acids Research 2224:Soviet Physics Doklady 2048: 2008: 1965: 1901: 1845: 1380: 1340: 1304: 1212: 1111: 1002: 932: 866: 822: 769: 243: 221: 200: 178: 157: 109: 89: 2603:Levenshtein automaton 2593:Jaro–Winkler distance 2502:10.1145/375360.375365 2490:ACM Computing Surveys 2442:10.1145/321879.321880 2266:10.1145/363958.363994 2009: 1966: 1902: 1846: 1844:{\displaystyle W_{T}} 1773:db := 0 1381: 1379:{\displaystyle j=|b|} 1341: 1339:{\displaystyle i=|a|} 1305: 1213: 1112: 1003: 933: 867: 823: 770: 244: 222: 201: 179: 158: 110: 90: 2651:Rabin–Karp algorithm 2608:Levenshtein distance 1983: 1911: 1855: 1828: 1530:Levenshtein distance 1354: 1314: 1238: 1123: 1022: 952: 882: 836: 779: 260: 233: 211: 190: 168: 119: 99: 79: 57:Levenshtein distance 33:Frederick J. Damerau 2898:Dynamic programming 2806:Ternary search tree 2391:1997PatRe..30..789O 2379:Pattern Recognition 2304:10.1093/nar/gkt1414 2236:1966SPhD...10..707L 2039:triangle inequality 1701:: strings a, b 1549:: strings a, b 1526:dynamic programming 1489:does not hold: OSA( 1487:triangle inequality 254:restricted distance 2893:Information theory 2735:Sequence alignment 2702:Regular expression 2004: 1961: 1897: 1841: 1816:da] := i 1376: 1336: 1300: 1208: 1107: 998: 928: 862: 830:indicator function 818: 765: 760: 239: 217: 196: 174: 153: 105: 85: 43:for measuring the 21:information theory 2875: 2874: 2867:String operations 2171:978-1-920682-49-1 1720:da := 0 1386:is the length of 1218:corresponds to a 724: 690: 670: 559: 454: 388: 316: 242:{\displaystyle b} 220:{\displaystyle j} 199:{\displaystyle a} 177:{\displaystyle i} 108:{\displaystyle b} 88:{\displaystyle a} 2905: 2832:Pattern matching 2786:Suffix automaton 2588:Hamming distance 2540: 2533: 2526: 2517: 2516: 2512: 2476: 2475: 2460: 2454: 2453: 2444: 2424: 2413: 2412: 2402: 2374: 2365: 2364: 2336: 2325: 2324: 2315: 2298:(7): 4160–4179, 2287: 2278: 2277: 2268: 2249:Damerau, Fred J. 2245: 2239: 2238: 2219: 2213: 2212: 2210: 2204:. Archived from 2203: 2193: 2182: 2176: 2174: 2155: 2146: 2145: 2143: 2137:. Archived from 2136: 2126: 2115: 2013: 2011: 2010: 2005: 1970: 1968: 1967: 1962: 1960: 1959: 1923: 1922: 1906: 1904: 1903: 1898: 1896: 1895: 1883: 1882: 1870: 1869: 1850: 1848: 1847: 1842: 1840: 1839: 1687: 1683: 1671:// transposition 1651:// transposition 1590:d := j 1575:d := i 1389: 1385: 1383: 1382: 1377: 1375: 1367: 1349: 1345: 1343: 1342: 1337: 1335: 1327: 1309: 1307: 1306: 1301: 1299: 1298: 1292: 1284: 1276: 1268: 1263: 1262: 1256: 1255: 1233: 1229: 1217: 1215: 1214: 1209: 1207: 1206: 1202: 1201: 1189: 1188: 1141: 1140: 1116: 1114: 1113: 1108: 1106: 1105: 1101: 1100: 1088: 1087: 1040: 1039: 1007: 1005: 1004: 999: 970: 969: 937: 935: 934: 929: 900: 899: 871: 869: 868: 863: 861: 860: 848: 847: 832:equal to 0 when 827: 825: 824: 819: 817: 816: 812: 811: 799: 798: 774: 772: 771: 766: 764: 763: 754: 753: 741: 740: 725: 722: 720: 719: 701: 700: 691: 688: 671: 668: 664: 663: 659: 658: 646: 645: 598: 597: 560: 557: 553: 552: 548: 547: 535: 534: 487: 486: 455: 452: 421: 420: 389: 386: 355: 354: 317: 314: 278: 277: 248: 246: 245: 240: 228: 226: 224: 223: 218: 205: 203: 202: 197: 185: 183: 181: 180: 175: 162: 160: 159: 154: 137: 136: 114: 112: 111: 106: 94: 92: 91: 86: 25:computer science 2913: 2912: 2908: 2907: 2906: 2904: 2903: 2902: 2878: 2877: 2876: 2871: 2815: 2762: 2729: 2715:Regular grammar 2696: 2675: 2656:Raita algorithm 2617: 2568:Bitap algorithm 2549: 2544: 2484: 2482:Further reading 2479: 2462: 2461: 2457: 2425: 2416: 2375: 2368: 2337: 2328: 2288: 2281: 2246: 2242: 2220: 2216: 2208: 2191: 2183: 2179: 2172: 2156: 2149: 2141: 2124: 2116: 2112: 2108: 2089: 2081: 2072: 2070:Fraud detection 2051: 2027: 2019:bitap algorithm 1984: 1981: 1980: 1955: 1954: 1918: 1917: 1912: 1909: 1908: 1891: 1887: 1878: 1874: 1865: 1861: 1856: 1853: 1852: 1835: 1831: 1829: 1826: 1825: 1821: 1814://transposition 1685: 1681: 1678: 1673: 1657: 1640:// substitution 1519: 1396: 1387: 1371: 1363: 1355: 1352: 1351: 1347: 1331: 1323: 1315: 1312: 1311: 1294: 1293: 1288: 1280: 1272: 1264: 1258: 1257: 1245: 1241: 1239: 1236: 1235: 1231: 1227: 1197: 1193: 1184: 1180: 1176: 1172: 1130: 1126: 1124: 1121: 1120: 1096: 1092: 1083: 1079: 1075: 1071: 1029: 1025: 1023: 1020: 1019: 959: 955: 953: 950: 949: 889: 885: 883: 880: 879: 856: 852: 843: 839: 837: 834: 833: 807: 803: 794: 790: 786: 782: 780: 777: 776: 759: 758: 749: 745: 730: 726: 723: and  721: 709: 705: 696: 692: 689: and  687: 667: 665: 654: 650: 641: 637: 633: 629: 587: 583: 580: 579: 556: 554: 543: 539: 530: 526: 522: 518: 476: 472: 469: 468: 451: 449: 410: 406: 403: 402: 385: 383: 344: 340: 337: 336: 313: 311: 301: 300: 267: 263: 261: 258: 257: 234: 231: 230: 212: 209: 208: 207: 191: 188: 187: 169: 166: 165: 164: 126: 122: 120: 117: 116: 100: 97: 96: 80: 77: 76: 73: 17: 12: 11: 5: 2911: 2901: 2900: 2895: 2890: 2888:String metrics 2873: 2872: 2870: 2869: 2864: 2859: 2854: 2849: 2844: 2839: 2834: 2829: 2823: 2821: 2817: 2816: 2814: 2813: 2808: 2803: 2798: 2793: 2788: 2783: 2778: 2772: 2770: 2768:Data structure 2764: 2763: 2761: 2760: 2755: 2750: 2745: 2739: 2737: 2731: 2730: 2728: 2727: 2722: 2717: 2712: 2706: 2704: 2698: 2697: 2695: 2694: 2689: 2683: 2681: 2677: 2676: 2674: 2673: 2668: 2663: 2661:Trigram search 2658: 2653: 2648: 2643: 2638: 2633: 2627: 2625: 2619: 2618: 2616: 2615: 2610: 2605: 2600: 2595: 2590: 2585: 2580: 2575: 2570: 2565: 2559: 2557: 2551: 2550: 2543: 2542: 2535: 2528: 2520: 2514: 2513: 2483: 2480: 2478: 2477: 2474:on 2019-10-30. 2455: 2435:(2): 177–183, 2414: 2400:10.1.1.50.1459 2385:(5): 789–800. 2366: 2326: 2279: 2259:(3): 171–176, 2240: 2230:(8): 707–710, 2214: 2211:on 2010-04-01. 2177: 2170: 2147: 2144:on 2012-12-21. 2109: 2107: 2104: 2103: 2102: 2096: 2088: 2085: 2080: 2079:Export control 2077: 2071: 2068: 2050: 2047: 2026: 2023: 2003: 2000: 1997: 1994: 1991: 1988: 1958: 1953: 1950: 1947: 1944: 1941: 1938: 1935: 1932: 1929: 1926: 1921: 1916: 1894: 1890: 1886: 1881: 1877: 1873: 1868: 1864: 1860: 1838: 1834: 1802://substitution 1690: 1677: 1674: 1661: 1538: 1523:Wagner–Fischer 1518: 1515: 1433:) = 2 because 1395: 1392: 1374: 1370: 1366: 1362: 1359: 1334: 1330: 1326: 1322: 1319: 1297: 1291: 1287: 1283: 1279: 1275: 1271: 1267: 1261: 1254: 1251: 1248: 1244: 1224: 1223: 1205: 1200: 1196: 1192: 1187: 1183: 1179: 1175: 1171: 1168: 1165: 1162: 1159: 1156: 1153: 1150: 1147: 1144: 1139: 1136: 1133: 1129: 1118: 1104: 1099: 1095: 1091: 1086: 1082: 1078: 1074: 1070: 1067: 1064: 1061: 1058: 1055: 1052: 1049: 1046: 1043: 1038: 1035: 1032: 1028: 1017: 997: 994: 991: 988: 985: 982: 979: 976: 973: 968: 965: 962: 958: 947: 927: 924: 921: 918: 915: 912: 909: 906: 903: 898: 895: 892: 888: 859: 855: 851: 846: 842: 815: 810: 806: 802: 797: 793: 789: 785: 762: 757: 752: 748: 744: 739: 736: 733: 729: 718: 715: 712: 708: 704: 699: 695: 686: 683: 680: 677: 674: 666: 662: 657: 653: 649: 644: 640: 636: 632: 628: 625: 622: 619: 616: 613: 610: 607: 604: 601: 596: 593: 590: 586: 582: 581: 578: 575: 572: 569: 566: 563: 555: 551: 546: 542: 538: 533: 529: 525: 521: 517: 514: 511: 508: 505: 502: 499: 496: 493: 490: 485: 482: 479: 475: 471: 470: 467: 464: 461: 458: 450: 448: 445: 442: 439: 436: 433: 430: 427: 424: 419: 416: 413: 409: 405: 404: 401: 398: 395: 392: 384: 382: 379: 376: 373: 370: 367: 364: 361: 358: 353: 350: 347: 343: 339: 338: 335: 332: 329: 326: 323: 320: 312: 310: 307: 306: 304: 299: 296: 293: 290: 287: 284: 281: 276: 273: 270: 266: 238: 216: 195: 173: 152: 149: 146: 143: 140: 135: 132: 129: 125: 104: 84: 72: 69: 65:spell checkers 15: 9: 6: 4: 3: 2: 2910: 2899: 2896: 2894: 2891: 2889: 2886: 2885: 2883: 2868: 2865: 2863: 2860: 2858: 2855: 2853: 2850: 2848: 2845: 2843: 2840: 2838: 2835: 2833: 2830: 2828: 2825: 2824: 2822: 2818: 2812: 2809: 2807: 2804: 2802: 2799: 2797: 2794: 2792: 2789: 2787: 2784: 2782: 2779: 2777: 2774: 2773: 2771: 2769: 2765: 2759: 2756: 2754: 2751: 2749: 2746: 2744: 2741: 2740: 2738: 2736: 2732: 2726: 2723: 2721: 2718: 2716: 2713: 2711: 2708: 2707: 2705: 2703: 2699: 2693: 2690: 2688: 2685: 2684: 2682: 2678: 2672: 2669: 2667: 2664: 2662: 2659: 2657: 2654: 2652: 2649: 2647: 2644: 2642: 2639: 2637: 2634: 2632: 2629: 2628: 2626: 2624: 2620: 2614: 2611: 2609: 2606: 2604: 2601: 2599: 2596: 2594: 2591: 2589: 2586: 2584: 2581: 2579: 2578:Edit distance 2576: 2574: 2571: 2569: 2566: 2564: 2561: 2560: 2558: 2556: 2555:String metric 2552: 2548: 2541: 2536: 2534: 2529: 2527: 2522: 2521: 2518: 2511: 2507: 2503: 2499: 2495: 2491: 2486: 2485: 2473: 2469: 2465: 2459: 2452: 2448: 2443: 2438: 2434: 2430: 2423: 2421: 2419: 2410: 2406: 2401: 2396: 2392: 2388: 2384: 2380: 2373: 2371: 2362: 2358: 2354: 2350: 2346: 2342: 2335: 2333: 2331: 2323: 2319: 2314: 2309: 2305: 2301: 2297: 2293: 2286: 2284: 2276: 2272: 2267: 2262: 2258: 2254: 2250: 2244: 2237: 2233: 2229: 2225: 2218: 2207: 2202: 2197: 2190: 2189: 2181: 2173: 2167: 2163: 2162: 2154: 2152: 2140: 2135: 2130: 2123: 2122: 2114: 2110: 2100: 2099:Typosquatting 2097: 2094: 2091: 2090: 2084: 2076: 2067: 2065: 2061: 2056: 2046: 2044: 2040: 2036: 2032: 2022: 2020: 2015: 1998: 1995: 1992: 1986: 1978: 1974: 1948: 1945: 1942: 1933: 1930: 1927: 1924: 1914: 1892: 1888: 1884: 1879: 1875: 1871: 1866: 1862: 1858: 1836: 1832: 1819: 1815: 1811: 1807: 1803: 1799: 1795: 1791: 1787: 1784: 1780: 1776: 1772: 1769: 1765: 1761: 1757: 1754: 1750: 1746: 1742: 1739: 1735: 1731: 1727: 1723: 1719: 1716: 1712: 1708: 1704: 1700: 1697: 1693: 1689: 1672: 1668: 1664: 1660: 1655: 1652: 1648: 1644: 1641: 1637: 1633: 1629: 1625: 1621: 1618: 1615: 1611: 1607: 1604: 1601: 1597: 1593: 1589: 1586: 1582: 1578: 1574: 1571: 1567: 1563: 1560: 1556: 1552: 1548: 1545: 1542:OSA-distance 1541: 1537: 1535: 1531: 1527: 1524: 1514: 1512: 1508: 1504: 1500: 1496: 1492: 1488: 1484: 1480: 1476: 1472: 1468: 1464: 1460: 1456: 1452: 1448: 1444: 1440: 1436: 1432: 1428: 1424: 1420: 1415: 1413: 1409: 1405: 1401: 1391: 1368: 1360: 1357: 1328: 1320: 1317: 1285: 1277: 1269: 1252: 1249: 1246: 1242: 1221: 1220:transposition 1198: 1194: 1190: 1185: 1181: 1173: 1169: 1163: 1160: 1157: 1154: 1151: 1148: 1145: 1137: 1134: 1131: 1127: 1119: 1097: 1093: 1089: 1084: 1080: 1072: 1068: 1062: 1059: 1056: 1053: 1050: 1047: 1044: 1036: 1033: 1030: 1026: 1018: 1015: 1011: 995: 992: 986: 983: 980: 977: 974: 966: 963: 960: 956: 948: 945: 941: 925: 922: 916: 913: 910: 907: 904: 896: 893: 890: 886: 878: 877: 876: 873: 857: 853: 849: 844: 840: 831: 808: 804: 800: 795: 791: 783: 755: 750: 746: 742: 737: 734: 731: 727: 716: 713: 710: 706: 702: 697: 693: 684: 681: 678: 675: 672: 655: 651: 647: 642: 638: 630: 626: 620: 617: 614: 611: 608: 605: 602: 594: 591: 588: 584: 576: 573: 570: 567: 564: 561: 544: 540: 536: 531: 527: 519: 515: 509: 506: 503: 500: 497: 494: 491: 483: 480: 477: 473: 465: 462: 459: 456: 446: 443: 437: 434: 431: 428: 425: 417: 414: 411: 407: 399: 396: 393: 390: 380: 377: 371: 368: 365: 362: 359: 351: 348: 345: 341: 333: 330: 327: 324: 321: 318: 308: 302: 294: 288: 285: 282: 274: 271: 268: 264: 255: 250: 236: 214: 193: 171: 147: 144: 141: 133: 130: 127: 123: 115:, a function 102: 82: 68: 66: 60: 58: 53: 51: 50:transposition 46: 45:edit distance 42: 41:string metric 38: 34: 31:(named after 30: 26: 22: 2781:Suffix array 2687:Aho–Corasick 2598:Lee distance 2572: 2496:(1): 31–88, 2493: 2489: 2472:the original 2467: 2458: 2432: 2428: 2382: 2378: 2344: 2340: 2295: 2291: 2256: 2252: 2243: 2227: 2223: 2217: 2206:the original 2187: 2180: 2160: 2139:the original 2120: 2113: 2082: 2073: 2052: 2043:metric trees 2034: 2028: 2025:Applications 2016: 1976: 1972: 1822: 1817: 1813: 1809: 1805: 1801: 1797: 1793: 1789: 1785: 1782: 1778: 1777:j := 1 1774: 1770: 1767: 1763: 1762:i := 1 1759: 1755: 1752: 1748: 1747:j := 0 1744: 1740: 1737: 1733: 1732:i := 0 1729: 1725: 1721: 1717: 1714: 1710: 1709:i := 1 1706: 1702: 1698: 1695: 1694:DL-distance 1691: 1686:[0, |Σ|) 1679: 1670: 1666: 1662: 1658: 1653: 1650: 1646: 1642: 1639: 1636:// insertion 1635: 1631: 1627: 1623: 1619: 1616: 1613: 1609: 1608:j := 1 1605: 1602: 1599: 1595: 1594:i := 1 1591: 1587: 1584: 1580: 1579:j := 0 1576: 1572: 1569: 1565: 1564:i := 0 1561: 1558: 1554: 1550: 1546: 1543: 1539: 1520: 1510: 1506: 1502: 1498: 1494: 1490: 1482: 1478: 1474: 1470: 1466: 1462: 1458: 1454: 1450: 1446: 1442: 1438: 1434: 1430: 1426: 1422: 1418: 1416: 1411: 1407: 1403: 1399: 1397: 1225: 1013: 1009: 943: 939: 874: 253: 251: 74: 61: 54: 28: 18: 2791:Suffix tree 1808:d + 1, 1806://insertion 1804:d + 1, 1638:d + cost) 1634:d + 1, 1632:// deletion 1505:) < OSA( 2882:Categories 2106:References 1810://deletion 1781:length(b) 1766:length(a) 1751:length(b) 1736:length(a) 1612:length(b) 1598:length(a) 1583:length(b) 1568:length(a) 1534:pseudocode 229:prefix of 71:Definition 2510:207551224 2395:CiteSeerX 1996:⋅ 1934:⋅ 1928:⋅ 1872:≥ 1783:inclusive 1768:inclusive 1753:inclusive 1738:inclusive 1715:inclusive 1692:algorithm 1614:inclusive 1600:inclusive 1585:inclusive 1570:inclusive 1540:algorithm 1394:Algorithm 1191:≠ 1161:− 1149:− 1090:≠ 1060:− 1048:− 984:− 908:− 801:≠ 735:− 714:− 648:≠ 618:− 606:− 537:≠ 507:− 495:− 435:− 363:− 2451:18892193 2361:15635688 2322:24464998 2087:See also 1971:, where 1497:) + OSA( 1310:, where 669:if  558:if  453:if  387:if  315:if  2857:Sorting 2827:Parsing 2547:Strings 2387:Bibcode 2313:3985635 2275:7713345 2232:Bibcode 1509:,  1501:,  1493:,  1449:,  1429:,  828:is the 227:-symbol 184:-symbol 39:) is a 2508:  2449:  2397:  2359:  2320:  2310:  2273:  2168:  2093:Ispell 2053:Since 1818:return 1792:a = b 1703:output 1654:return 1622:a = b 1551:output 1350:, and 775:where 206:and a 27:, the 2820:Other 2776:DAFSA 2743:BLAST 2506:S2CID 2447:S2CID 2429:J ACM 2357:S2CID 2347:: 1. 2271:S2CID 2209:(PDF) 2192:(PDF) 2142:(PDF) 2125:(PDF) 1699:input 1547:input 1532:. In 2811:Trie 2801:Rope 2318:PMID 2166:ISBN 1975:and 1798:else 1794:then 1713:|Σ| 1667:then 1647:then 1628:else 1624:then 1421:and 1230:and 682:> 571:> 460:> 394:> 252:The 95:and 35:and 23:and 2498:doi 2437:doi 2405:doi 2349:doi 2308:PMC 2300:doi 2261:doi 2196:doi 2129:doi 2062:or 2055:DNA 2049:DNA 1937:max 1775:for 1760:for 1745:for 1730:for 1722:let 1707:for 1606:for 1592:for 1577:for 1562:for 1555:let 1511:ABC 1503:ABC 1483:ABC 1467:ABC 1451:ABC 1443:ABC 1431:ABC 1423:ABC 1402:or 1012:to 942:to 298:min 19:In 2884:: 2504:, 2494:33 2492:, 2466:. 2445:, 2433:22 2431:, 2417:^ 2403:. 2393:. 2383:30 2381:. 2369:^ 2355:. 2345:16 2343:. 2329:^ 2316:, 2306:, 2296:42 2294:, 2282:^ 2269:, 2255:, 2228:10 2226:, 2150:^ 2066:. 2045:. 1820:d 1790:if 1786:do 1779:to 1771:do 1764:to 1756:do 1749:to 1741:do 1734:to 1718:do 1711:to 1696:is 1688:: 1663:if 1656:d 1643:if 1620:if 1617:do 1610:to 1603:do 1596:to 1588:do 1581:to 1573:do 1566:to 1544:is 1536:: 1507:CA 1499:AC 1495:AC 1491:CA 1481:→ 1479:AB 1477:→ 1473:→ 1471:CA 1465:→ 1463:AC 1459:AC 1457:→ 1455:CA 1447:CA 1441:→ 1439:AC 1437:→ 1435:CA 1427:CA 1419:CA 1390:. 1016:), 946:), 249:. 2539:e 2532:t 2525:v 2500:: 2439:: 2411:. 2407:: 2389:: 2363:. 2351:: 2302:: 2263:: 2257:7 2234:: 2198:: 2175:. 2131:: 2002:) 1999:N 1993:M 1990:( 1987:O 1977:N 1973:M 1957:) 1952:) 1949:N 1946:, 1943:M 1940:( 1931:N 1925:M 1920:( 1915:O 1893:D 1889:W 1885:+ 1880:I 1876:W 1867:T 1863:W 1859:2 1837:T 1833:W 1682:Σ 1475:A 1388:b 1373:| 1369:b 1365:| 1361:= 1358:j 1348:a 1333:| 1329:a 1325:| 1321:= 1318:i 1296:) 1290:| 1286:b 1282:| 1278:, 1274:| 1270:a 1266:| 1260:( 1253:b 1250:, 1247:a 1243:d 1232:b 1228:a 1204:) 1199:j 1195:b 1186:i 1182:a 1178:( 1174:1 1170:+ 1167:) 1164:2 1158:j 1155:, 1152:2 1146:i 1143:( 1138:b 1135:, 1132:a 1128:d 1103:) 1098:j 1094:b 1085:i 1081:a 1077:( 1073:1 1069:+ 1066:) 1063:1 1057:j 1054:, 1051:1 1045:i 1042:( 1037:b 1034:, 1031:a 1027:d 1014:b 1010:a 996:1 993:+ 990:) 987:1 981:j 978:, 975:i 972:( 967:b 964:, 961:a 957:d 944:b 940:a 926:1 923:+ 920:) 917:j 914:, 911:1 905:i 902:( 897:b 894:, 891:a 887:d 858:j 854:b 850:= 845:i 841:a 814:) 809:j 805:b 796:i 792:a 788:( 784:1 756:, 751:j 747:b 743:= 738:1 732:i 728:a 717:1 711:j 707:b 703:= 698:i 694:a 685:1 679:j 676:, 673:i 661:) 656:j 652:b 643:i 639:a 635:( 631:1 627:+ 624:) 621:2 615:j 612:, 609:2 603:i 600:( 595:b 592:, 589:a 585:d 577:, 574:0 568:j 565:, 562:i 550:) 545:j 541:b 532:i 528:a 524:( 520:1 516:+ 513:) 510:1 504:j 501:, 498:1 492:i 489:( 484:b 481:, 478:a 474:d 466:, 463:0 457:j 447:1 444:+ 441:) 438:1 432:j 429:, 426:i 423:( 418:b 415:, 412:a 408:d 400:, 397:0 391:i 381:1 378:+ 375:) 372:j 369:, 366:1 360:i 357:( 352:b 349:, 346:a 342:d 334:, 331:0 328:= 325:j 322:= 319:i 309:0 303:{ 295:= 292:) 289:j 286:, 283:i 280:( 275:b 272:, 269:a 265:d 237:b 215:j 194:a 172:i 151:) 148:j 145:, 142:i 139:( 134:b 131:, 128:a 124:d 103:b 83:a

Index

information theory
computer science
Frederick J. Damerau
Vladimir I. Levenshtein
string metric
edit distance
transposition
Levenshtein distance
spell checkers
indicator function
transposition
triangle inequality
Wagner–Fischer
dynamic programming
Levenshtein distance
pseudocode
bitap algorithm
natural language processing
triangle inequality
metric trees
DNA
Needleman–Wunsch algorithm
Smith–Waterman algorithm
Ispell
Typosquatting
An Improved Error Model for Noisy Channel Spelling Correction
doi
10.3115/1075218.1075255
the original

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.