983:
70:
997:
for the strings, and then finding the deepest internal nodes which have leaf nodes from all the strings in the subtree below it. The figure on the right is the suffix tree for the strings "ABAB", "BABA" and "ABBA", padded with unique string terminators, to become "ABAB$ 0", "BABA$ 1" and "ABBA$ 2".
689:
1030:
time (if the size of the alphabet is constant). If the tree is traversed from the bottom up with a bit vector telling which strings are seen below each node, the k-common substring problem can be solved in
769:
605:
77:
The picture shows two strings where the problem has multiple solutions. Although the substring occurrences always overlap, it is impossible to obtain a longer common substring by "uniting" them.
968:
860:
909:
278:
396:
328:
1296:
1450:
1061:
805:
1094:
1028:
80:
The strings "ABABC", "BABCA" and "ABCBA" have only one longest common substring, viz. "ABC" of length 3. Other common substrings are "A", "AB", "B", "BA", "BC" and "C".
364:
545:
485:
1328:
1190:
517:
464:
444:
416:
209:
189:
169:
149:
129:
109:
1380:
Charalampopoulos, Panagiotis; Kociumaka, Tomasz; Pissis, Solon P.; Radoszewski, Jakub (Aug 2021). Mutzel, Petra; Pagh, Rasmus; Herman, Grzegorz (eds.).
17:
1725:
1620:
610:
1455:
1659:
1585:
1338:
Store only non-zero values in the rows. This can be done using hash-tables instead of arrays. This is useful for large alphabets.
1384:. European Symposium on Algorithms. Leibniz International Proceedings in Informatics (LIPIcs). Vol. 204. Schloss Dagstuhl.
1590:
694:
1595:
550:
998:
The nodes representing "A", "B", "AB" and "BA" all have descendant leaves from all of the strings, numbered 0, 1 and 2.
1806:
1580:
1407:
1674:
1615:
1487:
1460:
1417:
1522:
467:
1104:
The following pseudocode finds the set of longest common substrings between two strings with dynamic programming:
918:
810:
865:
1702:
225:
1348:
1842:
1837:
1707:
1562:
1786:
1512:
1791:
1669:
1636:
1572:
31:
1801:
1697:
1641:
1542:
1496:
46:
1600:
1532:
369:
283:
1333:
The last and current row can be stored on the same 1D array by traversing the inner loop backwards
1257:
1745:
1034:
994:
986:
971:
778:
520:
1070:
1004:
1064:
333:
1750:
1692:
1552:
1480:
1142:
L > z z := L ret := {S}
530:
470:
1557:
1301:
1163:
490:
58:
53:
of all of them. There may be more than one longest common substring. Applications include
8:
1755:
912:
772:
1435:
73:
The strings "BADANAT" and "CANADAS" share the maximal-length substrings "ADA" and "ANA".
1684:
1651:
449:
429:
401:
194:
174:
154:
134:
114:
94:
54:
1409:
Algorithms on
Strings, Trees and Sequences: Computer Science and Computational Biology
1816:
1413:
1235:, which is the last character of the longest common substring (of size z) instead of
426:
One can find the lengths and starting positions of the longest common substrings of
1811:
1781:
1735:
1537:
1473:
1385:
38:
1250:
The following tricks can be used to reduce the memory usage of an implementation:
1664:
1605:
1517:
1219:
is used to hold the length of the longest common substring found so far. The set
1717:
1610:
1390:
1461:
Suffix Tree based C implementation of
Longest common substring for two strings
1831:
1527:
1504:
1440:
993:
The longest common substrings of a set of strings can be found by building a
982:
1730:
1547:
1445:
1379:
1740:
69:
1436:
Dictionary of
Algorithms and Data Structures: longest common substring
1451:
Dynamic programming implementations in various languages on wikibooks
50:
1254:
Keep only the last and current row of the DP table to save memory (
524:
1776:
1239:. Thus all the longest common substrings would be, for each i in
684:{\textstyle O\left((n+m)\log \sigma /{\sqrt {\log(n+m)}}\right)}
1456:
working AS3 implementation of the dynamic programming algorithm
1353:
1196:
stores the length of the longest common suffix of the prefixes
989:
for the strings "ABAB", "BABA" and "ABBA", numbered 0, 1 and 2.
1149:L = z ret := ret âȘ {S}
1760:
1441:
Perl/XS implementation of the dynamic programming algorithm
1465:
764:{\displaystyle O\left((n+m)\log \sigma /\log(n+m)\right)}
398:, a longest string which occurs as substring of at least
1223:
is used to hold the set of strings which are of length
1063:
time. If the suffix tree is prepared for constant time
1134:j = 1 L := 1
613:
336:
1304:
1260:
1166:
1073:
1037:
1007:
921:
868:
813:
781:
697:
600:{\displaystyle 2^{o\left({\sqrt {\log(n+m)}}\right)}}
553:
533:
493:
473:
452:
432:
404:
372:
286:
228:
197:
177:
157:
137:
117:
97:
1114:(1..r, 1..n) z := 0 ret := {}
1446:
Perl/XS implementation of the suffix tree algorithm
1231:can be saved efficiently by just storing the index
171:, find a longest string which is substring of both
1322:
1290:
1184:
1088:
1055:
1022:
962:
903:
854:
799:
763:
683:
599:
539:
511:
479:
458:
438:
410:
390:
358:
322:
272:
203:
183:
163:
143:
123:
103:
1829:
1267:
807:. The solutions to the generalized problem take
1382:Faster Algorithms for Longest Common Substring
1481:
963:{\displaystyle \Theta (n_{1}+\cdots +n_{K})}
855:{\displaystyle \Theta (n_{1}+\cdots +n_{K})}
523:. A faster algorithm can be achieved in the
267:
235:
1110:LongestCommonSubstring(S, T) L :=
1488:
1474:
904:{\displaystyle \Theta (n_{1}\cdots n_{K})}
1389:
273:{\displaystyle S=\{S_{1},\ldots ,S_{K}\}}
83:ABABC ||| BABCA ||| ABCBA
1660:Comparison of regular-expression engines
1405:
1359:, all the possible substrings of length
981:
607:. In particular, this algorithm runs in
68:
14:
1830:
1099:
1621:ZhuâTakaoka string matching algorithm
1469:
1373:
86:
1399:
45:of two or more strings is a longest
1586:BoyerâMoore string-search algorithm
1412:. USA: Cambridge University Press.
24:
1074:
1038:
1008:
922:
869:
814:
782:
474:
25:
1854:
1675:Nondeterministic finite automaton
1616:Two-way string-matching algorithm
1429:
527:model of computation if the size
1138:L := L + 1
18:Longest common substring problem
1067:retrieval, it can be solved in
1001:Building the suffix tree takes
1591:BoyerâMooreâHorspool algorithm
1581:ApostolicoâGiancarlo algorithm
1363:that are contained in a string
1317:
1308:
1285:
1282:
1270:
1264:
1179:
1170:
1083:
1077:
1050:
1041:
1017:
1011:
977:
957:
925:
898:
872:
849:
817:
794:
785:
771:space. Solving the problem by
753:
741:
718:
706:
671:
659:
634:
622:
586:
574:
506:
494:
303:
288:
13:
1:
1367:
1349:Longest palindromic substring
1215:, respectively. The variable
421:
391:{\displaystyle 2\leq k\leq K}
323:{\displaystyle |S_{i}|=n_{i}}
1596:KnuthâMorrisâPratt algorithm
1523:DamerauâLevenshtein distance
1291:{\displaystyle O(\min(r,n))}
547:of the input alphabet is in
7:
1787:Compressed pattern matching
1513:Approximate string matching
1495:
1342:
1122:j := 1..n
1056:{\displaystyle \Theta (NK)}
800:{\displaystyle \Theta (nm)}
222:. Given the set of strings
64:
10:
1859:
1792:Longest common subsequence
1703:NeedlemanâWunsch algorithm
1573:String-searching algorithm
1391:10.4230/LIPIcs.ESA.2021.30
1089:{\displaystyle \Theta (N)}
1023:{\displaystyle \Theta (N)}
32:longest common subsequence
29:
1802:Sequential pattern mining
1769:
1716:
1683:
1650:
1642:Commentz-Walter algorithm
1630:Multiple string searching
1629:
1571:
1563:WagnerâFischer algorithm
1503:
359:{\textstyle \sum n_{i}=N}
1812:String rewriting systems
1797:Longest common substring
1708:SmithâWaterman algorithm
1533:Gestalt pattern matching
1396:Here: Theorem 1, p.30:2.
519:time with the help of a
220:common substring problem
214:A generalization is the
43:longest common substring
30:Not to be confused with
27:Computer science problem
1746:Generalized suffix tree
1670:Thompson's construction
1406:Gusfield, Dan (1999) .
1160:This algorithm runs in
1118:i := 1..r
995:generalized suffix tree
987:Generalized suffix tree
972:generalized suffix tree
540:{\displaystyle \sigma }
521:generalized suffix tree
480:{\displaystyle \Theta }
1698:Hirschberg's algorithm
1324:
1292:
1186:
1126:S = T
1090:
1065:lowest common ancestor
1057:
1024:
990:
964:
905:
856:
801:
765:
685:
601:
541:
513:
481:
460:
440:
412:
392:
360:
324:
274:
205:
185:
165:
145:
125:
105:
74:
1553:Levenshtein automaton
1543:JaroâWinkler distance
1325:
1323:{\displaystyle O(nr)}
1293:
1187:
1185:{\displaystyle O(nr)}
1091:
1058:
1025:
985:
965:
906:
857:
802:
766:
686:
602:
542:
514:
512:{\displaystyle (n+m)}
482:
461:
441:
413:
393:
361:
325:
275:
206:
186:
166:
146:
126:
106:
72:
1601:RabinâKarp algorithm
1558:Levenshtein distance
1302:
1258:
1164:
1071:
1035:
1005:
919:
866:
811:
779:
695:
611:
551:
531:
491:
471:
450:
430:
402:
370:
334:
284:
226:
195:
175:
155:
135:
115:
95:
59:plagiarism detection
1843:Dynamic programming
1838:Problems on strings
1756:Ternary search tree
1100:Dynamic programming
913:dynamic programming
773:dynamic programming
91:Given two strings,
1685:Sequence alignment
1652:Regular expression
1320:
1288:
1182:
1086:
1053:
1020:
991:
960:
901:
852:
797:
761:
681:
597:
537:
509:
477:
456:
436:
408:
388:
356:
320:
270:
201:
181:
161:
141:
121:
101:
87:Problem definition
75:
55:data deduplication
1825:
1824:
1817:String operations
674:
589:
459:{\displaystyle T}
439:{\displaystyle S}
411:{\displaystyle k}
204:{\displaystyle T}
184:{\displaystyle S}
164:{\displaystyle n}
144:{\displaystyle T}
124:{\displaystyle m}
104:{\displaystyle S}
16:(Redirected from
1850:
1782:Pattern matching
1736:Suffix automaton
1538:Hamming distance
1490:
1483:
1476:
1467:
1466:
1424:
1423:
1403:
1397:
1395:
1393:
1377:
1329:
1327:
1326:
1321:
1297:
1295:
1294:
1289:
1246:
1242:
1238:
1234:
1230:
1226:
1222:
1218:
1214:
1210:
1203:
1199:
1195:
1192:time. The array
1191:
1189:
1188:
1183:
1153:L := 0
1095:
1093:
1092:
1087:
1062:
1060:
1059:
1054:
1029:
1027:
1026:
1021:
969:
967:
966:
961:
956:
955:
937:
936:
910:
908:
907:
902:
897:
896:
884:
883:
861:
859:
858:
853:
848:
847:
829:
828:
806:
804:
803:
798:
770:
768:
767:
762:
760:
756:
734:
690:
688:
687:
682:
680:
676:
675:
652:
650:
606:
604:
603:
598:
596:
595:
594:
590:
567:
546:
544:
543:
538:
518:
516:
515:
510:
486:
484:
483:
478:
465:
463:
462:
457:
445:
443:
442:
437:
417:
415:
414:
409:
397:
395:
394:
389:
366:. Find for each
365:
363:
362:
357:
349:
348:
329:
327:
326:
321:
319:
318:
306:
301:
300:
291:
279:
277:
276:
271:
266:
265:
247:
246:
210:
208:
207:
202:
190:
188:
187:
182:
170:
168:
167:
162:
150:
148:
147:
142:
130:
128:
127:
122:
110:
108:
107:
102:
39:computer science
21:
1858:
1857:
1853:
1852:
1851:
1849:
1848:
1847:
1828:
1827:
1826:
1821:
1765:
1712:
1679:
1665:Regular grammar
1646:
1625:
1606:Raita algorithm
1567:
1518:Bitap algorithm
1499:
1494:
1432:
1427:
1420:
1404:
1400:
1378:
1374:
1370:
1345:
1303:
1300:
1299:
1259:
1256:
1255:
1244:
1240:
1236:
1232:
1228:
1224:
1220:
1216:
1212:
1208:
1206:end at position
1201:
1197:
1193:
1165:
1162:
1161:
1158:
1102:
1072:
1069:
1068:
1036:
1033:
1032:
1006:
1003:
1002:
980:
951:
947:
932:
928:
920:
917:
916:
892:
888:
879:
875:
867:
864:
863:
843:
839:
824:
820:
812:
809:
808:
780:
777:
776:
730:
705:
701:
696:
693:
692:
651:
646:
621:
617:
612:
609:
608:
566:
562:
558:
554:
552:
549:
548:
532:
529:
528:
492:
489:
488:
472:
469:
468:
451:
448:
447:
431:
428:
427:
424:
403:
400:
399:
371:
368:
367:
344:
340:
335:
332:
331:
314:
310:
302:
296:
292:
287:
285:
282:
281:
261:
257:
242:
238:
227:
224:
223:
196:
193:
192:
176:
173:
172:
156:
153:
152:
136:
133:
132:
116:
113:
112:
96:
93:
92:
89:
84:
67:
35:
28:
23:
22:
15:
12:
11:
5:
1856:
1846:
1845:
1840:
1823:
1822:
1820:
1819:
1814:
1809:
1804:
1799:
1794:
1789:
1784:
1779:
1773:
1771:
1767:
1766:
1764:
1763:
1758:
1753:
1748:
1743:
1738:
1733:
1728:
1722:
1720:
1718:Data structure
1714:
1713:
1711:
1710:
1705:
1700:
1695:
1689:
1687:
1681:
1680:
1678:
1677:
1672:
1667:
1662:
1656:
1654:
1648:
1647:
1645:
1644:
1639:
1633:
1631:
1627:
1626:
1624:
1623:
1618:
1613:
1611:Trigram search
1608:
1603:
1598:
1593:
1588:
1583:
1577:
1575:
1569:
1568:
1566:
1565:
1560:
1555:
1550:
1545:
1540:
1535:
1530:
1525:
1520:
1515:
1509:
1507:
1501:
1500:
1493:
1492:
1485:
1478:
1470:
1464:
1463:
1458:
1453:
1448:
1443:
1438:
1431:
1430:External links
1428:
1426:
1425:
1418:
1398:
1371:
1369:
1366:
1365:
1364:
1351:
1344:
1341:
1340:
1339:
1336:
1335:
1334:
1319:
1316:
1313:
1310:
1307:
1287:
1284:
1281:
1278:
1275:
1272:
1269:
1266:
1263:
1181:
1178:
1175:
1172:
1169:
1106:
1101:
1098:
1085:
1082:
1079:
1076:
1052:
1049:
1046:
1043:
1040:
1019:
1016:
1013:
1010:
979:
976:
959:
954:
950:
946:
943:
940:
935:
931:
927:
924:
900:
895:
891:
887:
882:
878:
874:
871:
851:
846:
842:
838:
835:
832:
827:
823:
819:
816:
796:
793:
790:
787:
784:
759:
755:
752:
749:
746:
743:
740:
737:
733:
729:
726:
723:
720:
717:
714:
711:
708:
704:
700:
679:
673:
670:
667:
664:
661:
658:
655:
649:
645:
642:
639:
636:
633:
630:
627:
624:
620:
616:
593:
588:
585:
582:
579:
576:
573:
570:
565:
561:
557:
536:
508:
505:
502:
499:
496:
476:
455:
435:
423:
420:
407:
387:
384:
381:
378:
375:
355:
352:
347:
343:
339:
317:
313:
309:
305:
299:
295:
290:
269:
264:
260:
256:
253:
250:
245:
241:
237:
234:
231:
200:
180:
160:
140:
120:
100:
88:
85:
82:
66:
63:
26:
9:
6:
4:
3:
2:
1855:
1844:
1841:
1839:
1836:
1835:
1833:
1818:
1815:
1813:
1810:
1808:
1805:
1803:
1800:
1798:
1795:
1793:
1790:
1788:
1785:
1783:
1780:
1778:
1775:
1774:
1772:
1768:
1762:
1759:
1757:
1754:
1752:
1749:
1747:
1744:
1742:
1739:
1737:
1734:
1732:
1729:
1727:
1724:
1723:
1721:
1719:
1715:
1709:
1706:
1704:
1701:
1699:
1696:
1694:
1691:
1690:
1688:
1686:
1682:
1676:
1673:
1671:
1668:
1666:
1663:
1661:
1658:
1657:
1655:
1653:
1649:
1643:
1640:
1638:
1635:
1634:
1632:
1628:
1622:
1619:
1617:
1614:
1612:
1609:
1607:
1604:
1602:
1599:
1597:
1594:
1592:
1589:
1587:
1584:
1582:
1579:
1578:
1576:
1574:
1570:
1564:
1561:
1559:
1556:
1554:
1551:
1549:
1546:
1544:
1541:
1539:
1536:
1534:
1531:
1529:
1528:Edit distance
1526:
1524:
1521:
1519:
1516:
1514:
1511:
1510:
1508:
1506:
1505:String metric
1502:
1498:
1491:
1486:
1484:
1479:
1477:
1472:
1471:
1468:
1462:
1459:
1457:
1454:
1452:
1449:
1447:
1444:
1442:
1439:
1437:
1434:
1433:
1421:
1419:0-521-58519-8
1415:
1411:
1410:
1402:
1392:
1387:
1383:
1376:
1372:
1362:
1358:
1356:
1352:
1350:
1347:
1346:
1337:
1332:
1331:
1314:
1311:
1305:
1279:
1276:
1273:
1261:
1253:
1252:
1251:
1248:
1207:
1176:
1173:
1167:
1156:
1152:
1148:
1145:
1141:
1137:
1133:
1129:
1125:
1121:
1117:
1113:
1109:
1105:
1097:
1080:
1066:
1047:
1044:
1014:
999:
996:
988:
984:
975:
973:
952:
948:
944:
941:
938:
933:
929:
914:
893:
889:
885:
880:
876:
844:
840:
836:
833:
830:
825:
821:
791:
788:
774:
757:
750:
747:
744:
738:
735:
731:
727:
724:
721:
715:
712:
709:
702:
698:
677:
668:
665:
662:
656:
653:
647:
643:
640:
637:
631:
628:
625:
618:
614:
591:
583:
580:
577:
571:
568:
563:
559:
555:
534:
526:
522:
503:
500:
497:
487:
453:
433:
419:
405:
385:
382:
379:
376:
373:
353:
350:
345:
341:
337:
315:
311:
307:
297:
293:
262:
258:
254:
251:
248:
243:
239:
232:
229:
221:
217:
212:
198:
178:
158:
138:
118:
98:
81:
78:
71:
62:
60:
56:
52:
48:
44:
40:
33:
19:
1796:
1731:Suffix array
1637:AhoâCorasick
1548:Lee distance
1408:
1401:
1381:
1375:
1360:
1354:
1249:
1245:S-z)..(ret)]
1205:
1159:
1154:
1150:
1146:
1143:
1139:
1135:
1131:
1127:
1123:
1119:
1115:
1111:
1107:
1103:
1000:
992:
970:time with a
425:
219:
215:
213:
90:
79:
76:
42:
36:
1741:Suffix tree
1298:instead of
978:Suffix tree
691:time using
1832:Categories
1368:References
1227:. The set
911:time with
862:space and
422:Algorithms
151:of length
111:of length
49:that is a
1075:Θ
1039:Θ
1009:Θ
942:⋯
923:Θ
915:and take
886:⋯
870:Θ
834:⋯
815:Θ
783:Θ
739:
728:σ
725:
657:
644:σ
641:
572:
535:σ
475:Θ
418:strings.
383:≤
377:≤
338:∑
252:…
51:substring
1343:See also
1108:function
525:word RAM
280:, where
65:Examples
1807:Sorting
1777:Parsing
1497:Strings
1416:
1204:which
1155:return
1130:i = 1
1096:time.
775:costs
47:string
1770:Other
1726:DAFSA
1693:BLAST
1357:-gram
1112:array
1761:Trie
1751:Rope
1414:ISBN
1211:and
1200:and
1157:ret
1151:else
1144:else
1136:else
446:and
330:and
191:and
131:and
57:and
41:, a
1386:doi
1268:min
1241:ret
1229:ret
1221:ret
1120:for
1116:for
736:log
722:log
654:log
638:log
569:log
466:in
37:In
1834::
1330:)
1247:.
1243:,
1147:if
1140:if
1132:or
1128:if
1124:if
974:.
211:.
61:.
1489:e
1482:t
1475:v
1422:.
1394:.
1388::
1361:n
1355:n
1318:)
1315:r
1312:n
1309:(
1306:O
1286:)
1283:)
1280:n
1277:,
1274:r
1271:(
1265:(
1262:O
1237:S
1233:i
1225:z
1217:z
1213:j
1209:i
1202:T
1198:S
1194:L
1180:)
1177:r
1174:n
1171:(
1168:O
1084:)
1081:N
1078:(
1051:)
1048:K
1045:N
1042:(
1018:)
1015:N
1012:(
958:)
953:K
949:n
945:+
939:+
934:1
930:n
926:(
899:)
894:K
890:n
881:1
877:n
873:(
850:)
845:K
841:n
837:+
831:+
826:1
822:n
818:(
795:)
792:m
789:n
786:(
758:)
754:)
751:m
748:+
745:n
742:(
732:/
719:)
716:m
713:+
710:n
707:(
703:(
699:O
678:)
672:)
669:m
666:+
663:n
660:(
648:/
635:)
632:m
629:+
626:n
623:(
619:(
615:O
592:)
587:)
584:m
581:+
578:n
575:(
564:(
560:o
556:2
507:)
504:m
501:+
498:n
495:(
454:T
434:S
406:k
386:K
380:k
374:2
354:N
351:=
346:i
342:n
316:i
312:n
308:=
304:|
298:i
294:S
289:|
268:}
263:K
259:S
255:,
249:,
244:1
240:S
236:{
233:=
230:S
218:-
216:k
199:T
179:S
159:n
139:T
119:m
99:S
34:.
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.