Knowledge

String-searching algorithm

Source 📝

256: 218:
A simple and inefficient way to see where one string occurs inside another is to check at each index, one by one. First, we see if there is a copy of the needle starting at the first character of the haystack; if not, we look to see if there's a copy of the needle starting at the second character of
191:
searching, where the user constructs a pattern of characters or other symbols, and any match to the pattern should fulfill the search. For example, to catch both the American English word "color" and the British equivalent "colour", instead of searching for two different literal strings, one might
89:. This may significantly slow some search algorithms. One of many possible solutions is to search for the sequence of code units instead, but doing so may produce false matches unless the encoding is specifically designed to avoid it. 183:
Finally, for strings that represent natural language, aspects of the language itself become involved. For example, one might wish to find all occurrences of a "word" despite it having alternate spellings, prefixes or suffixes, etc.
1164: 204:
A similar problem introduced in the field of bioinformatics and genomics is the maximal exact matching (MEM). Given two strings, MEMs are common substrings that cannot be extended left or right without causing a mismatch.
125:
Another common example involves "normalization". For many purposes, a search for a phrase such as "to be" should succeed even in places where there is something else intervening between the "to" and the "be":
219:
the haystack, and so forth. In the normal case, we only have to look at one or two characters for each wrong position to see that it is a wrong position, so in the average case, this takes
122:
having other letters immediately adjacent on either side. In that case a search for "hew" or "low" should fail for the example sentence above, even though those literal strings do occur.
407:
time under the assumption that the alphabet has a constant size and all inner nodes in the suffix tree know what leaves are underneath them. The latter can be accomplished by running a
176:
segments which may be ignored for some purposes, or polymorphisms that lead to no change in the encoded proteins, which may not count as a true difference for some other purposes.
118:
Very commonly, however, various constraints are added. For example, one might want to match the "needle" only where it consists of one (or more) complete words—perhaps defined as
1281: 356: 115:
One might request the first occurrence of "to", which is the fourth word; or all occurrences, of which there are 3; or the last, which is the fifth word from the end.
1272: 405: 376: 297:
starts searching from the end of the needle, so it can usually jump ahead a whole needle-length at each step. Baeza–Yates keeps track of whether the previous
1379: 1189: 423:, are intended to find a "closeness" score between the search string and the text rather than a "match/non-match". These are sometimes called 1780: 1214: 1047: 151:
Latin-based alphabets distinguish lower-case from upper-case, but for many purposes string search is expected to ignore the distinction.
1675: 1327: 1714: 1500: 1323:
Melichar, Borivoj, Jan Holub, and J. Polcar. Text Searching Algorithms. Volume I: Forward String Matching. Vol. 1. 2 vols., 2005.
907:
Kurtz, Stefan; Phillippy, Adam; Delcher, Arthur L; Smoot, Michael; Shumway, Martin; Antonescu, Corina; Salzberg, Steven L (2004).
143:
or even arbitrarily large but "parenthetical" things such as footnotes, list-numbers or other markers, embedded images, and so on.
1640: 739: 662: 534: 294: 1102: 239:
is the length of the needle; but in the worst case, searching for a string like "aaaab" in a string like "aaaaaaaaab", it takes
1645: 1482:– Implementations of many String-Matching-Algorithms in Java (BNDM, Boyer-Moore-Horspool, Boyer-Moore-Horspool-Raita, Shift-Or) 179:
Some languages have rules where a different character or form of character must be used at the start, middle, or end of words.
1514: 1402: 1150: 1096: 274:
shown to the right recognizes the word "MOMMY". This approach is frequently generalized in practice to search for arbitrary
1650: 718: 518: 286: 1307: 1892: 77:
In practice, the method of feasible string-search algorithm may be affected by the string encoding. In particular, if a
1861: 1635: 1343: 1729: 1670: 1542: 1450: 1301: 1285: 552: 105:. The goal is to find one or more occurrences of the needle within the haystack. For example, one might search for 1577: 266:(DFA) that recognizes a stored search string. These are expensive to construct—they are usually created using the 1127: 166: 1757: 968:"A practical algorithm for finding maximal exact matches in large sequence datasets using sparse suffix arrays" 290: 271: 263: 1762: 1617: 1510:
Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features
793:
Other classification approaches are possible. One of the most common uses preprocessing as main criteria.
1841: 1567: 1395:
Flexible Pattern Matching Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences
878: 649: 424: 47: 1846: 1724: 1691: 1441: 777:
Naturally, the patterns can not be enumerated finitely in this case. They are represented usually by a
713: 1856: 1752: 1696: 1597: 1551: 734: 36: 1372: 97:
The most basic case of string searching involves one (often very long) string, sometimes called the
1851: 1655: 1587: 1185: 500: 329:, the occurrences of a pattern can be found quickly. As an example, a suffix tree can be built in 1800: 1083:. Lecture Notes in Computer Science. Vol. 1448. Springer Berlin Heidelberg. pp. 14–33. 78: 332: 1488:– Implementations of many String-Matching-Algorithms (for single and multiple patterns) in Java 624: 302: 1210: 1016: 1805: 1747: 1607: 1535: 267: 112:
Some books are to be tasted, others to be swallowed, and some few to be chewed and digested.
1612: 1428: 1416: 1345: 637: 155: 43: 381: 8: 1810: 1324: 147:
Many symbol systems include characters that are synonymous (at least for some purposes):
1509: 1497: 1474: 1469: 1739: 1706: 1491: 1253: 1156: 1039: 992: 967: 883: 863: 782: 408: 361: 275: 188: 1135:
2009 Fourth International Conference on Internet Computing for Science and Engineering
943: 908: 1871: 1446: 1398: 1357: 1297: 1146: 1092: 1073: 997: 948: 930: 162: 32: 1043: 983: 201:
This article mainly discusses algorithms for the simpler kinds of string searching.
1866: 1836: 1790: 1592: 1528: 1432: 1424: 1289: 1257: 1245: 1160: 1138: 1084: 1031: 987: 979: 938: 920: 888: 873: 653: 569: 20: 1494:— Animation in Java, Detailed description and C implementation of many algorithms. 133:
Other "whitespace" characters such as tabs, non-breaking spaces, line-breaks, etc.
42:
A basic example of string searching is when the pattern and the searched text are
1719: 1660: 1572: 1504: 1479: 1331: 778: 318: 306: 140: 1772: 1665: 1436: 868: 608: 420: 240: 220: 71: 1463: 158:, where one composite character is equivalent to two or more other characters. 1886: 1582: 1559: 1293: 934: 301:
characters were a prefix of the search string, and is therefore adaptable to
925: 666:
has been the standard benchmark for the practical string-search literature.
1785: 1602: 1249: 1142: 1074:"A bit-parallel approach to suffix automata: Fast extended string matching" 1001: 952: 326: 198:
where the "?" conventionally makes the preceding character ("u") optional.
1350:
Fast nGramBased String Search Over Data Encoded Using Algebraic Signatures
1035: 1795: 966:
Khan, Zia; Bloom, Joshua S.; Kruglyak, Leonid; Singh, Mona (2009-07-01).
322: 169:, which may vary in their usage, or be of varying importance in matching. 1485: 1088: 173: 51: 1517:– Implementations of Vector and Scalar String-Matching-Algorithms in C 788: 441: 844:
Match the prefix first (Knuth–Morris–Pratt, Shift-And, Aho–Corasick)
1480:
StringSearch – high-performance pattern matching algorithms in Java
255: 54:) ÎŁ. ÎŁ may be a human language alphabet, for example, the letters 847:
Match the suffix first (Boyer–Moore and variants, Commentz-Walter)
840:
Another one classifies the algorithms by their matching strategy:
772: 293:
that recognizes inputs with the string to search for as a suffix,
1831: 39:(also called patterns) are found within a larger string or text. 1392: 1352:, 33rd International Conference on Very Large Data Bases (VLDB) 1282:
International Colloquium on Automata, Languages and Programming
317:
Faster search algorithms preprocess the text. After building a
1498:(PDF) Improved Single and Multiple Approximate String Matching 630: 669: 262:
In this approach, backtracking is avoided by constructing a
1815: 1515:
NyoTengu – high-performance pattern matching algorithm in C
1288:. Vol. 71. Graz, Austria: Springer. pp. 118–132. 652:
and (potentially-infinite) sets of patterns represented as
634: 1520: 906: 101:, and one (often very short) string, sometimes called the 1128:"Fast Variants of the Backward-Oracle-Marching Algorithm" 909:"Versatile and open software for comparing large genomes" 435: 1346:
http://www.cse.scu.edu/~tschwarz/Papers/vldb07_final.pdf
835: 444:
can be classified by the number of patterns each uses.
1325:
http://stringology.org/athens/TextSearchingAlgorithms/
965: 430: 250: 1470:
Large (maintained) list of string-matching algorithms
1015:
Crochemore, Maxime; Perrin, Dominique (1 July 1991).
384: 364: 335: 85:
th character, perhaps requiring time proportional to
789:
Classification by the use of preprocessing programs
1445:, Third Edition. MIT Press and McGraw-Hill, 2009. 399: 370: 350: 1453:. Chapter 32: String Matching, pp. 985–1013. 1884: 1014: 850:Match the best factor first (BNDM, BOM, Set-BOM) 208: 1274:A String Matching Algorithm Fast on the Average 1270: 1071: 773:Algorithms using an infinite number of patterns 1264: 1236:Hume; Sunday (1991). "Fast String Searching". 853:Other strategy (NaĂŻve, Rabin–Karp, Vectorized) 35:that try to find a place where one or several 1536: 603: 270:—but are very quick to use. For example, the 81:is in use, then it may be slower to find the 1072:Navarro, Gonzalo; Raffinot, Mathieu (1998). 447: 309:is an application of Baeza–Yates' approach. 1543: 1529: 1393:Gonzalo Navarro; Mathieu Raffinot (2008), 1235: 1126:Fan, H.; Yao, N.; Ma, H. (December 2009). 991: 942: 924: 670:Algorithms using a finite set of patterns 644: 378:occurrences of a pattern can be found in 1715:Comparison of regular-expression engines 1378:CS1 maint: numeric names: authors list ( 1125: 1475:NIST list of string-matching algorithms 797:Classes of string searching algorithms 460:the length of the searchable text, and 187:Another more complex type of search is 1885: 678:is the length of the longest pattern, 436:Classification by a number of patterns 213: 136:Less commonly, a hyphen or soft hyphen 1676:Zhu–Takaoka string matching algorithm 1524: 1053:from the original on 24 November 2021 836:Classification by matching strategies 607:Asymptotic times are expressed using 16:Searches for patterns within strings 1641:Boyer–Moore string-search algorithm 1464:Huge list of pattern matching links 1421:Carom. ACM 20, (10), 262–272(1977). 1217:from the original on 1 October 2020 686:the length of the searchable text, 663:Boyer–Moore string-search algorithm 464:= |ÎŁ| is the size of the alphabet. 431:Classification of search algorithms 251:Finite-state-automaton-based search 13: 419:Some search methods, for instance 411:from the root of the suffix tree. 336: 254: 235:is the length of the haystack and 192:use a regular expression such as: 14: 1904: 1730:Nondeterministic finite automaton 1671:Two-way string-matching algorithm 1466:Last updated: 12/27/2008 20:18:38 1457: 1417:A fast string searching algorithm 1238:Software: Practice and Experience 1229: 615: 414: 62:and other applications may use a 1492:Exact String Matching Algorithms 312: 1386: 1337: 1317: 1271:Commentz-Walter, Beate (1979). 1192:from the original on 2020-09-20 1167:from the original on 2022-05-10 1108:from the original on 2019-01-05 587:Backward Oracle Matching (BOM) 1646:Boyer–Moore–Horspool algorithm 1636:Apostolico–Giancarlo algorithm 1397:, Cambridge University Press, 1203: 1178: 1119: 1081:Combinatorial Pattern Matching 1065: 1008: 959: 900: 674:In the following compilation, 456:is the length of the pattern, 452:In the following compilation, 394: 388: 345: 339: 264:deterministic finite automaton 1: 1414:R. S. Boyer and J. S. Moore, 984:10.1093/bioinformatics/btp275 894: 209:Examples of search algorithms 161:Many writing systems involve 1651:Knuth–Morris–Pratt algorithm 1578:Damerau–Levenshtein distance 1344:Riad Mokadem; Witold Litwin 1186:"glibc/string/str-two-way.h" 31:, are an important class of 7: 1842:Compressed pattern matching 1568:Approximate string matching 1550: 879:Compressed pattern matching 857: 826:Constructed search engines 690:the number of occurrences. 650:approximate string matching 568:Backward Non-Deterministic 92: 25:string-searching algorithms 10: 1909: 1893:String matching algorithms 1847:Longest common subsequence 1758:Needleman–Wunsch algorithm 1628:String-searching algorithm 1442:Introduction to Algorithms 1211:"musl/src/string/memmem.c" 812:Patterns not preprocessed 648:Can be extended to handle 351:{\displaystyle \Theta (n)} 172:DNA sequences can involve 29:string-matching algorithms 1857:Sequential pattern mining 1824: 1771: 1738: 1705: 1697:Commentz-Walter algorithm 1685:Multiple string searching 1684: 1626: 1618:Wagner–Fischer algorithm 1558: 1017:"Two-way string-matching" 829:Signature methods : 760:Backward Oracle Matching 448:Single-pattern algorithms 1867:String rewriting systems 1852:Longest common substring 1763:Smith–Waterman algorithm 1588:Gestalt pattern matching 1294:10.1007/3-540-09510-1_10 629:search functions in the 281: 1801:Generalized suffix tree 1725:Thompson's construction 926:10.1186/gb-2004-5-2-r12 154:Many languages include 79:variable-width encoding 1753:Hirschberg's algorithm 1250:10.1002/spe.4380211105 1143:10.1109/ICICSE.2009.53 823:Patterns preprocessed 815:Elementary algorithms 804:Text not preprocessed 619:Used to implement the 401: 372: 352: 303:fuzzy string searching 259: 1608:Levenshtein automaton 1598:Jaro–Winkler distance 1036:10.1145/116825.116845 749:sublinear in average 402: 373: 353: 268:powerset construction 258: 139:In structured texts, 1656:Rabin–Karp algorithm 1613:Levenshtein distance 1429:Charles E. Leiserson 747:Θ(M * n) worst case 682:their total length, 638:C standard libraries 609:O, Ω, and Θ notation 400:{\displaystyle O(m)} 382: 362: 333: 1811:Ternary search tree 798: 702:Preprocessing time 473:Preprocessing time 276:regular expressions 214:Naive string search 165:such as accents or 130:More than one space 70:(ÎŁ = {A,C,G,T}) in 27:, sometimes called 1740:Sequence alignment 1707:Regular expression 1503:2017-03-11 at the 1330:2016-03-04 at the 1137:. pp. 56–59. 1089:10.1007/bfb0030778 1024:Journal of the ACM 884:Matching wildcards 864:Sequence alignment 807:Text preprocessed 796: 783:regular expression 719:Knuth–Morris–Pratt 519:Knuth–Morris–Pratt 490:Θ(n+m) in average, 397: 368: 348: 287:Knuth–Morris–Pratt 260: 189:regular expression 46:of elements of an 1880: 1879: 1872:String operations 1404:978-0-521-03993-2 1365:External link in 1244:(11): 1221–1248. 1152:978-1-4244-6754-9 1098:978-3-540-64739-3 978:(13): 1609–1616. 833: 832: 770: 769: 654:regular languages 599: 598: 553:Two-way algorithm 371:{\displaystyle z} 163:diacritical marks 66:(ÎŁ = {0,1}) or a 33:string algorithms 1900: 1837:Pattern matching 1791:Suffix automaton 1593:Hamming distance 1545: 1538: 1531: 1522: 1521: 1433:Ronald L. Rivest 1425:Thomas H. Cormen 1408: 1407: 1390: 1384: 1383: 1376: 1370: 1369: 1363: 1361: 1353: 1341: 1335: 1321: 1315: 1314: 1312: 1306:. Archived from 1279: 1268: 1262: 1261: 1233: 1227: 1226: 1224: 1222: 1207: 1201: 1200: 1198: 1197: 1182: 1176: 1175: 1173: 1172: 1132: 1123: 1117: 1116: 1114: 1113: 1107: 1078: 1069: 1063: 1062: 1060: 1058: 1052: 1021: 1012: 1006: 1005: 995: 963: 957: 956: 946: 928: 904: 889:Full-text search 874:Pattern matching 799: 795: 693: 692: 647: 618: 606: 572:Matching (BNDM) 508:Θ(n) in average, 484:NaĂŻve algorithm 467: 466: 425:"fuzzy" searches 406: 404: 403: 398: 377: 375: 374: 369: 357: 355: 354: 349: 321:, for example a 21:computer science 1908: 1907: 1903: 1902: 1901: 1899: 1898: 1897: 1883: 1882: 1881: 1876: 1820: 1767: 1734: 1720:Regular grammar 1701: 1680: 1661:Raita algorithm 1622: 1573:Bitap algorithm 1554: 1549: 1505:Wayback Machine 1486:StringsAndChars 1460: 1411: 1405: 1391: 1387: 1377: 1368:|surname2= 1367: 1366: 1364: 1355: 1354: 1342: 1338: 1332:Wayback Machine 1322: 1318: 1310: 1304: 1277: 1269: 1265: 1234: 1230: 1220: 1218: 1209: 1208: 1204: 1195: 1193: 1184: 1183: 1179: 1170: 1168: 1153: 1130: 1124: 1120: 1111: 1109: 1105: 1099: 1076: 1070: 1066: 1056: 1054: 1050: 1019: 1013: 1009: 964: 960: 905: 901: 897: 860: 838: 791: 779:regular grammar 775: 748: 735:Commentz-Walter 672: 580:O(mn) at worst 579: 578:Ω(n/m) at best, 544:O(mn) at worst 543: 542:Ω(n/m) at best, 510:O(mn) at worst 509: 491: 450: 438: 433: 417: 383: 380: 379: 363: 360: 359: 334: 331: 330: 319:substring index 315: 307:bitap algorithm 284: 253: 231:) steps, where 216: 211: 196: 113: 95: 64:binary alphabet 17: 12: 11: 5: 1906: 1896: 1895: 1878: 1877: 1875: 1874: 1869: 1864: 1859: 1854: 1849: 1844: 1839: 1834: 1828: 1826: 1822: 1821: 1819: 1818: 1813: 1808: 1803: 1798: 1793: 1788: 1783: 1777: 1775: 1773:Data structure 1769: 1768: 1766: 1765: 1760: 1755: 1750: 1744: 1742: 1736: 1735: 1733: 1732: 1727: 1722: 1717: 1711: 1709: 1703: 1702: 1700: 1699: 1694: 1688: 1686: 1682: 1681: 1679: 1678: 1673: 1668: 1666:Trigram search 1663: 1658: 1653: 1648: 1643: 1638: 1632: 1630: 1624: 1623: 1621: 1620: 1615: 1610: 1605: 1600: 1595: 1590: 1585: 1580: 1575: 1570: 1564: 1562: 1556: 1555: 1548: 1547: 1540: 1533: 1525: 1519: 1518: 1512: 1507: 1495: 1489: 1483: 1477: 1472: 1467: 1459: 1458:External links 1456: 1455: 1454: 1437:Clifford Stein 1422: 1410: 1409: 1403: 1385: 1336: 1316: 1313:on 2017-10-10. 1302: 1263: 1228: 1202: 1177: 1151: 1118: 1097: 1064: 1030:(3): 650–674. 1007: 972:Bioinformatics 958: 913:Genome Biology 898: 896: 893: 892: 891: 886: 881: 876: 871: 869:Graph matching 866: 859: 856: 855: 854: 851: 848: 845: 837: 834: 831: 830: 827: 824: 820: 819: 818:Index methods 816: 813: 809: 808: 805: 802: 790: 787: 774: 771: 768: 767: 765: 763: 761: 758: 754: 753: 750: 745: 742: 737: 731: 730: 727: 724: 721: 716: 710: 709: 706: 705:Matching time 703: 700: 697: 671: 668: 658: 657: 641: 612: 597: 596: 594: 591: 588: 584: 583: 581: 576: 573: 565: 564: 561: 558: 555: 549: 548: 545: 540: 537: 531: 530: 527: 524: 521: 515: 514: 511: 506: 503: 497: 496: 493: 488: 485: 481: 480: 477: 476:Matching time 474: 471: 449: 446: 437: 434: 432: 429: 421:trigram search 416: 415:Other variants 413: 396: 393: 390: 387: 367: 358:time, and all 347: 344: 341: 338: 314: 311: 283: 280: 252: 249: 215: 212: 210: 207: 194: 181: 180: 177: 170: 159: 152: 145: 144: 137: 134: 131: 111: 94: 91: 72:bioinformatics 15: 9: 6: 4: 3: 2: 1905: 1894: 1891: 1890: 1888: 1873: 1870: 1868: 1865: 1863: 1860: 1858: 1855: 1853: 1850: 1848: 1845: 1843: 1840: 1838: 1835: 1833: 1830: 1829: 1827: 1823: 1817: 1814: 1812: 1809: 1807: 1804: 1802: 1799: 1797: 1794: 1792: 1789: 1787: 1784: 1782: 1779: 1778: 1776: 1774: 1770: 1764: 1761: 1759: 1756: 1754: 1751: 1749: 1746: 1745: 1743: 1741: 1737: 1731: 1728: 1726: 1723: 1721: 1718: 1716: 1713: 1712: 1710: 1708: 1704: 1698: 1695: 1693: 1690: 1689: 1687: 1683: 1677: 1674: 1672: 1669: 1667: 1664: 1662: 1659: 1657: 1654: 1652: 1649: 1647: 1644: 1642: 1639: 1637: 1634: 1633: 1631: 1629: 1625: 1619: 1616: 1614: 1611: 1609: 1606: 1604: 1601: 1599: 1596: 1594: 1591: 1589: 1586: 1584: 1583:Edit distance 1581: 1579: 1576: 1574: 1571: 1569: 1566: 1565: 1563: 1561: 1560:String metric 1557: 1553: 1546: 1541: 1539: 1534: 1532: 1527: 1526: 1523: 1516: 1513: 1511: 1508: 1506: 1502: 1499: 1496: 1493: 1490: 1487: 1484: 1481: 1478: 1476: 1473: 1471: 1468: 1465: 1462: 1461: 1452: 1451:0-262-03293-7 1448: 1444: 1443: 1438: 1434: 1430: 1426: 1423: 1420: 1418: 1413: 1412: 1406: 1400: 1396: 1389: 1381: 1374: 1359: 1351: 1347: 1340: 1333: 1329: 1326: 1320: 1309: 1305: 1303:3-540-09510-1 1299: 1295: 1291: 1287: 1283: 1276: 1275: 1267: 1259: 1255: 1251: 1247: 1243: 1239: 1232: 1216: 1212: 1206: 1191: 1187: 1181: 1166: 1162: 1158: 1154: 1148: 1144: 1140: 1136: 1129: 1122: 1104: 1100: 1094: 1090: 1086: 1082: 1075: 1068: 1049: 1045: 1041: 1037: 1033: 1029: 1025: 1018: 1011: 1003: 999: 994: 989: 985: 981: 977: 973: 969: 962: 954: 950: 945: 940: 936: 932: 927: 922: 918: 914: 910: 903: 899: 890: 887: 885: 882: 880: 877: 875: 872: 870: 867: 865: 862: 861: 852: 849: 846: 843: 842: 841: 828: 825: 822: 821: 817: 814: 811: 810: 806: 803: 801: 800: 794: 786: 784: 780: 766: 764: 762: 759: 756: 755: 751: 746: 743: 741: 738: 736: 733: 732: 728: 725: 722: 720: 717: 715: 712: 711: 707: 704: 701: 699:Extension of 698: 695: 694: 691: 689: 685: 681: 677: 667: 665: 664: 655: 651: 646: 642: 639: 636: 632: 628: 627: 622: 617: 613: 610: 605: 601: 600: 595: 592: 589: 586: 585: 582: 577: 574: 571: 567: 566: 562: 559: 556: 554: 551: 550: 546: 541: 538: 536: 533: 532: 528: 525: 522: 520: 517: 516: 512: 507: 504: 502: 499: 498: 494: 489: 486: 483: 482: 478: 475: 472: 469: 468: 465: 463: 459: 455: 445: 443: 428: 426: 422: 412: 410: 409:DFS algorithm 391: 385: 365: 342: 328: 324: 320: 313:Index methods 310: 308: 304: 300: 296: 292: 288: 279: 277: 273: 269: 265: 257: 248: 246: 242: 238: 234: 230: 226: 222: 206: 202: 199: 193: 190: 185: 178: 175: 171: 168: 164: 160: 157: 153: 150: 149: 148: 142: 138: 135: 132: 129: 128: 127: 123: 121: 116: 110: 108: 104: 100: 90: 88: 84: 80: 75: 73: 69: 65: 61: 57: 53: 49: 45: 40: 38: 34: 30: 26: 22: 1786:Suffix array 1692:Aho–Corasick 1627: 1603:Lee distance 1440: 1415: 1394: 1388: 1349: 1339: 1319: 1308:the original 1273: 1266: 1241: 1237: 1231: 1219:. Retrieved 1205: 1194:. Retrieved 1180: 1169:. Retrieved 1134: 1121: 1110:. Retrieved 1080: 1067: 1055:. Retrieved 1027: 1023: 1010: 975: 971: 961: 916: 912: 902: 839: 792: 776: 714:Aho–Corasick 687: 683: 679: 675: 673: 661: 659: 645: 625: 620: 616: 604: 461: 457: 453: 451: 440:The various 439: 418: 327:suffix array 316: 298: 285: 261: 244: 236: 232: 228: 224: 217: 203: 200: 197: 186: 182: 167:vowel points 146: 124: 119: 117: 114: 106: 102: 98: 96: 86: 82: 76: 68:DNA alphabet 67: 63: 59: 55: 41: 28: 24: 18: 1796:Suffix tree 1221:23 November 740:Boyer-Moore 535:Boyer–Moore 323:suffix tree 295:Boyer–Moore 289:computes a 1196:2022-03-22 1171:2019-11-22 1112:2019-11-22 919:(2): R12. 895:References 696:Algorithm 563:O(log(m)) 501:Rabin–Karp 470:Algorithm 442:algorithms 174:non-coding 52:finite set 935:1465-6906 726:Θ(n + o) 539:Θ(m + k) 337:Θ 156:ligatures 1887:Category 1501:Archived 1358:citation 1348:(2007), 1328:Archived 1215:Archived 1190:Archived 1165:Archived 1103:Archived 1048:Archived 1044:15055316 1002:19389736 953:14759262 858:See also 757:Set-BOM 195:colou?r 109:within: 99:haystack 93:Overview 58:through 48:alphabet 1862:Sorting 1832:Parsing 1552:Strings 1258:5902579 1161:6073627 1057:5 April 993:2732316 37:strings 1449:  1435:, and 1401:  1300:  1256:  1159:  1149:  1095:  1042:  1000:  990:  951:  944:395750 941:  933:  708:Space 626:strstr 621:memmem 593:O(mn) 492:O(mn) 479:Space 305:. The 103:needle 44:arrays 1825:Other 1781:DAFSA 1748:BLAST 1311:(PDF) 1278:(PDF) 1254:S2CID 1157:S2CID 1131:(PDF) 1106:(PDF) 1077:(PDF) 1051:(PDF) 1040:S2CID 1020:(PDF) 752:Θ(m) 744:Θ(m) 729:Θ(m) 723:Θ(m) 631:glibc 590:O(m) 575:O(m) 560:O(n) 557:Θ(m) 547:Θ(k) 529:Θ(m) 526:Θ(n) 523:Θ(m) 513:O(1) 505:Θ(m) 495:none 487:none 282:Stubs 1816:Trie 1806:Rope 1447:ISBN 1399:ISBN 1380:link 1373:help 1298:ISBN 1286:LNCS 1223:2019 1147:ISBN 1093:ISBN 1059:2019 998:PMID 949:PMID 931:ISSN 660:The 635:musl 633:and 623:and 570:DAWG 141:tags 1290:doi 1246:doi 1139:doi 1085:doi 1032:doi 988:PMC 980:doi 939:PMC 921:doi 781:or 325:or 291:DFA 272:DFA 120:not 19:In 1889:: 1439:. 1431:, 1427:, 1362:: 1360:}} 1356:{{ 1296:. 1284:. 1280:. 1252:. 1242:21 1240:. 1213:. 1188:. 1163:. 1155:. 1145:. 1133:. 1101:. 1091:. 1079:. 1046:. 1038:. 1028:38 1026:. 1022:. 996:. 986:. 976:25 974:. 970:. 947:. 937:. 929:. 915:. 911:. 785:. 643:3. 614:2. 602:1. 427:. 278:. 247:) 245:nm 227:+ 107:to 74:. 23:, 1544:e 1537:t 1530:v 1419:, 1382:) 1375:) 1371:( 1334:. 1292:: 1260:. 1248:: 1225:. 1199:. 1174:. 1141:: 1115:. 1087:: 1061:. 1034:: 1004:. 982:: 955:. 923:: 917:5 688:o 684:n 680:m 676:M 656:. 640:. 611:. 462:k 458:n 454:m 395:) 392:m 389:( 386:O 366:z 346:) 343:n 340:( 299:j 243:( 241:O 237:m 233:n 229:m 225:n 223:( 221:O 87:N 83:N 60:Z 56:A 50:(

Index

computer science
string algorithms
strings
arrays
alphabet
finite set
bioinformatics
variable-width encoding
tags
ligatures
diacritical marks
vowel points
non-coding
regular expression
O
O

deterministic finite automaton
powerset construction
DFA
regular expressions
Knuth–Morris–Pratt
DFA
Boyer–Moore
fuzzy string searching
bitap algorithm
substring index
suffix tree
suffix array
DFS algorithm

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑